Post on 02-May-2023
An information-theoretic approach to interactions
in images
GIUSEPPE BOCCIGNONE 1 and MARIO FERRARO2,*
1 Dipartimento di Ingegneria dell'Informazione e Ingegneria Elettrica, Universitá di Salerno and INFM, Unità di Salerno, via Ponte Don Melillo, 1, 84084 Fisciano (SA), Italy
2 Dipartimento di Fisica Sperimentale, Universitá di Torino and INFM, Unità di Torino, via Giuria 1, 10125 Turin, Italy
Received 23 July 1998; accepted 23 February 1999
Abstract-In this paper it will be argued that the notion of interactions in images is closely related to that of entropy associated with an image, and it will be shown that interactions make processing of the information coming from the retina computationally less expensive. A procedure will be presented, based on the evolution of joint entropy across different scales, to gauge the contributions of different
types of interactions to the structure of the images.
INTRODUCTION
The human visual system is very good at capturing and processing visual infor-
mation. Visual patterns appear, in general, to have a well-defined structure and
information is commonly associated with the amount of form/structure present in a
pattern. The rationale behind this paper is that structure naturally emerges by considering
an act of perception as a physical measurement - an experiment - upon a given
pattern. Briefly, the pattern is modified by some physical transformation (which
represents the observer), and the pattern, together with the chosen transformation, defines a dynamic system whose evolution in time can be characterized in terms of
some suitable physical parameters. In this framework, the structure is naturally a
result of to the way pattern elements interact with each other along the transforma-
tion ; by interaction we mean the constraints by which intensity values at a given
pixel q depend on the values the intensity takes in some neighborhood of q.
*To whom correspondence should be addressed. E-mail: ferraro@ph.unito.it
346
Thus, it is clear that in the image per se, considered as a distribution of light
intensity over a set of pixels, there are no interactions, the latter being the result of
an operation performed by the visual system on the image. However, the emergence of interactions depends on the way light intensity is distributed in the original image and on the existence of dependencies/correlations between parts of the image. For
instance, consider a pixel with a given level of intensity; if all other pixels in a
certain region have the same intensity, it can be said that there exists a long-range correlation among pixels. In this case, of course, the dependency is not related
to a particular orientation, i.e. it is isotropic. However, there exist long-range correlations that are not isotropic: for example, all pixels forming a curve are
linked by a long-range dependency, occurring only in the direction of the curve.
By contrast, short-range correlations have effect only in small regions, where image
intensity varies sharply over relatively short distances.
In general, short-range interactions can only account for local features of the im-
age, whereas global features depend on long-range interactions. Thus the dichotomy between long- and short-range interactions seems to mirror the distinction between
local and global features. The work of Gestalt psychologists on grouping shows the
relevance of non-local constraints (Koffka, 1935; Wertheimer, 1938), that would
correspond, in the framework of this paper, to long-range isotropic interactions.
Note also that long-range interactions can be the result of propagation of short-
range ones, provided that some constraints are imposed. For instance, vector fields
generated by local intensity gradients, produce curves via integration, but to do so
they must have the property of holonomy; in other words, local vectors must be
aligned 'head to tail' rather than scattered across the visual field (Hoffman, 1970). Local edges can be thought of as generated by short-range interactions, whereas the
curve itself arises as a result of the constraint of holonomy, that links the elements of
the tangent field in an orderly way, establishing a long-range interaction. Similarly, contour perception involves a form of grouping or ordering of some elements of the
visual input to connect them in a line, that is interactions among simple elements
of the contour (Caelli et al., 1978); in this case interactions are non-local as well as
anisotropic. The visual system is certainly optimal in computing dependencies or correlations
occurring at different ranges. If the visual system could only compute local
functionals, as with feature detectors, its performance would be rather limited in
evaluating long-range interactions. This problem is well known in computer vision
where the action of local-edge operators produces a disconnected set of local edges and further processing is needed to link these elements in curves that correspond to
meaningful boundaries in the scene (see, for instance, Ballard and Brown, 1982).
Clearly, if the system can adapt to compute non-local functionals of image intensity, no such drop of performance would be observed. This idea has suggested models
and experiments in which the overall performance is a function of the parameters which control the relevant correlation lengths. The same rationale lies behind many
image models used in computer vision, such as the weak membrane or the thin plate
347
(Grimson, 1981; Terzopoulos, 1983; Blake and Zisserman, 1987), or others based
on Markov random fields (Geman and Geman, 1984; Marroquin, 1985; Geiger and
Yuille, 1991 ), where energy functions are used to model interactions among pixels.
Recently, some generalization has been provided by Zhu and Mumford (1997).
However, all the above mentioned approaches, while providing models useful
for different kinds of processing capabilities, do not answer a very fundamental
question: in which sense is a dynamical system, set up according to any of those
models, an information processing system? A first contribution of this paper is to show how the notion of interaction is not
just useful to produce models of images, but has deeper implications concerning the processing of information by the visual system. In particular, interactions
established along any kind of transformation, induce a conditioning between pixels that reduce the entropy, i.e. the uncertainty of the image. Intuitively, interactions
can be considered as signals which encode the exchangeability of pixels, so reducing
image entropy and making it computationally less expensive the processing of the
information coming from the retina.
In the next section we will discuss how such interactions constrain the complexity of early visual processing through an entropy reduction mechanism, and how
interactions can be driven by suitable laws.
A second important point we address is how multi-range interactions relate to
the fact that the visual system analyzes images at different resolutions (Wandell,
1995). In general, the research on the human visual system works on the basis
that visual information is processed in parallel by a number of spatial-frequency- tuned channels and that, in the visual pathway, filters of different sizes operate at the
same location. The relationship between interactions and resolution can be precisely stated as follows.
Let t, a non-negative parameter, denote the resolution which is assumed to
decrease as t increases. The intensity I is defined as a function from D x
R: I: (x, y) x t I (x, y, t), D c R2 being the domain of the image. In general, the image intensity at scale t can be derived from the original image I (x, y, 0) as
where k is the Green function which measures the influence of pixels q' = (x', y') on the pixel q = (x, y) at a given scale, so that the intensity at q = (x, y) depends on the intensity on some neighborhood of q.
The kernel k can be defined in a way to model interactions in the image at different
scales. If the domain of integration is infinite, then the integration simplifies to
convolution,
348
If, further, k is a Gaussian with variance Q'- = 2t (for discussion of this equality, see Lindeberg, 1994) equation (2) is the solution of the diffusion equation'
I
Equation (3) has been widely used in computer vision to determine the behavior of
images at different scales (Witkin, 1983; Koenderink, 1984; Lindeberg, 1994). It is worth noting that kernels of different shapes can be related to different
types of interactions. For instance, the emergence of anisotropic interactions can
be obtained by suitably deforming the shape of the Gaussian kernel; Nitzberg and
Shiota (1992) show that an elongated Gaussian kernel appears as the fundamental
solution of an anisotropic diffusion equation, such as the one proposed by Perona
and Malik (1990). Analogously the 'mexican hat' filter, well known in the literature, can be used to produce local edges signatures. More generally, kernels apt to
reproduce reaction-diffusion processes could be designed (Zhu and Mumford,
1997). In the following section, 'Interactions across scales', we further investigate the
above relationship between interactions and resolution. We also provide some
operative procedure and simulations to visualize such a connection. In this paper we will limit our investigation to linear diffusion, since this process is more suitable
to represent early, context-free, visual information processes; however, an example
relating to non-linear diffusion will be presented.
Finally, in the Concluding remarks, we discuss overall results so far achieved and
highlight some links to biological vision issues.
INTERACTIONS AND INFORMATION
In this section, we show, from an information-theoretic standpoint, how, during the
very first stage of a visual processing system, correlations or dependencies among
parts of the image serve for an overall complexity reduction mechanism.
To this purpose, it is worth recalling that dependencies find their origins in the
regularity of states of the world. Thus, for instance, the matter coheres to form
objects because of interactions at the molecular or atomic level, and the nature
of intermolecular forces constraints the shape objects can take, so that shapes of
natural objects are not arbitrary. The same kind of forces determine also the physical
properties of the surfaces, e.g. albedo or roughness. Also, configurations or relative
position of objects in a scene obey physical laws and, to some extent, it can be
said that the enormous amount of structure of the physical world determines the
regularities of images. Finally, object surfaces project the light on the retina, and
form images, according to the laws of optics, such as reflection, refraction, and
diffraction.
To perform the visual perception process, the visual system must be able to ana-
lyze the intensity distribution on the retina and extract properties of the image. Such
349
a process has been described as a process of unconscious inference (Helmholtz,
1925).
Clearly, the task of analyzing the intensity distribution on the retina and extract
properties of the image would be of enormous complexity if dependencies between
different parts of the image were not to be taken into account.
Consider the domain D to be a lattice of N sites (pixels in our case) and let the
intensity at each pixel s be given by the number m, of photons reaching s (Frieden,
1972), thus the image is determined by their distribution. To each pixel s, is then
associated a number of photons ms whose distribution can be seen as the realization
of a random field 0. More formally, let s c D be any family of random
variables indexed by s E D. Suppose, for simplicity, that the variables F,. = Fs(t) take values on some finite set J, depending on the realization of events to, t], t2, ... , each event being the outcome of an experiment. A possible sample f of F, is
denoted by
and it is called a configuration of the field.
In our case, then, fs is the random number of rr2,, photons impinging on pixel .s.
If no interactions are taken into account, we can assume that the m photons are
randomly allocated one at a time among the N sites, with uniform spatial probability (this model is often referred to as the 'monkey model'). Thus, the total number of
photons is computable as m = 1 IIi' For any given m the number of possible configuration is (Feller, 1968):
Suppose now that m E J = {0, 1,..., M}, where M is the maximum number of
photons that can reach the retina. Then, the number II of possible configurations is:
(Feller, 1968). Define the entropy associated to the field 0 as
where p( f ) is the probability that the configuration f occurs, and the sum runs over
all possible configurations f. Note that if all configurations have the same probability then
350
Denote by p(f,) = Pr{F5. _ fs) the probability distribution of photons at site .s.
Then, the entropy of the random variable F, is given by
and it is known from information theory that
(Cover and Thomas, 1991 ). In the inequality (9), the equality holds if, and only if, the random variables F; are independent, that is
If /7(/y) is constant for every In then /?(/s) = 1/M and = log M, for all s.
As a consequence, if there are no interactions in the image, all F,. are independent and hence the entropy of the image is
It has been stressed in the Introduction that in the image per se there are no
interactions among pixels, and that the latter arise as a consequence of processing of visual information. Then, How can be considered the entropy of the image before
any processing. In general, the operations performed on the original image result in
the emergence of interactions and hence in the decrease of the entropy. The previous, informal, definition of an interaction as the constraint by which
intensity values at a given pixel q depend on the values the intensity takes in some
neighbourhood of q, along a transformation, can so far be stated more precisely. Given the sites of the image s, and sp, a # ,B, an interaction is a processing relation
which induces a conditioning p(fça I fçf3) = Prl F,, (t) _ . f"a I F'/3(t)
= fSf3} between the random variables Fsa (t) and Fçf3(t) over the .succe.s.sion of events to, tl , t2,... The entropy reduction effect of such conditioning can be described in detail as
follows.
Suppose that there is no interaction between pixels in the image. Given pixels sa and sp, the probability p(f,,) is independent from the probability p(fçf3) and hence
the joint probability density P(f"a' f,. ) is given by
From (10) it follows
351
Taking the expectations of both sides of the equation (11) and using the fact that
L p(fs) = 1, it is straightforward to see that
(Cover and Thomas, 1991), where FSfJ) is the joint entropy
If interactions occur among points of the image, they make the probabilities
dependent upon each other. For instance, if the random variable F, depends on
FS(j' then equation (10) becomes
Following the same line of reasoning as above we find:
where H (Fs f3 is the conditional entropy
Now, it is well known that conditioning reduces entropy, i.e. H(Fs{31 F,. ) x H(F.\fj) (Cover and Thomas, 1991 ). Therefore, the joint entropy of equation (15) is reduced
with respect to that defined in equation (12). It is worth noting, at this point, that
the definition of interaction as a reciprocal conditioning of the kind is
general. The definition does not depend on the specific realization of the interaction, in other terms, on the visual operation performed.
One such operation is the choice of appropriate a priori models of the image. For
instance, one can model the image or, namely, the random field 0 as a Markov
random field. In this case, the conditional probability of a chosen variable having a particular value, given the values of the rest of the variables, is identical to the
conditional probability, given the values of the field 0 in a small set of pixels. The latter is usually called the neighborhood of the given pixel (see for instance
Marroquin, 1985). By choosing a system of neighborhoods on the lattice of sites, we can define a clique C as either a single pixel, or a set of pixels so that all
pixels belonging to C are in the neighborhood of each other. It is possible to show
that the probability distribution of the configurations f has the form of the Gibbs
distribution
352
where Z is a normalizing constant, f3 is a control parameter. The 'energy function' E is of the form
where C ranges over the cliques associated with the given neighborhood system and the potential Vc are functions supported on them. The function Vc defines the interactions among pixels and can be chosen in a way to generate the appropriate probability distribution of the configuration f'. The relation between Vc and the conditional probabilities is given by
where Z' is a normalization factor (see Marroquin, 1985).
Equation (19) shows that the interactions, defined by Vc, make the random
variables, defined at every pixel, dependent on the random variables defined in a
neighborhood, and hence interactions decrease the entropy of the image. The use of energy functions is not the only way to define interactions in images.
As discussed in the Introduction, the interactions of increasing range can be modeled via the diffusion equation. The two formulations can be linked up by noting that, instead of defining a whole distribution with E( f ), it is possible to use the energy to set up a variational problem. In particular, it is possible to minimize
E( f ) by gradient descent. It is well known that solutions I(., t) of parabolic partial differential equations of the form 3, or in the non-linear form as in Perona and Malik (1990) and Zhu and Mumford (1997), can be classically associated to steady- states of quasi-linear functionals when t oc, and all such equations can be seen as a descent method for quasi-linear elliptic variational problems (Morse and
Feshbach, 1953). This property will be exploited in the next section to investigate the relationship between interaction range and resolution.
In order to compute the effect of interactions on the entropy, one needs to know the distributions p(/Ij)' p(/ljP fl'a)' and generally this knowledge is not available. However, an estimate of these distributions can be obtained by considering the empirical marginal distribution (histogram) of the image (see Zhu and Mumford, 1997, for a discussion).
Let p (i ) = nile, where ni is the number of pixels of intensity i and N is the total number of pixels. We assume that for the pixel Sa, p(i) is an estimate of p(f,,); of
course, this implies that the probabilities p( fs) are assumed to be the same for each
pixel. It must be recalled that flj!) is the a priori probability with which two
pixels s, and s f3 and with gray-level and respectively, occur in the image. To estimate p(fç"" fljJ one can use p (i , j ; d), the relative frequency with which
two pixels at a distance d, measured in some metric, have gray-levels i and j
respectively. This way all local information is lost, in that the indices i, j now
353
refer to gray-levels, which can be supposed to be integers, and are not pixels coordinates. In the following it will be assumed that d - 1, measured in the
maximum value metric (Duda and Hart, 1973), that is for every pixel we consider
an 8-neighbourhood, and we shall write p (i , j ) instead of p (i , j ; 1). The relative frequencies p(i, j), a measure of the dependence between gray-levels
i and j, define a symmetric non-negative matrix G = [Pi.)]. H (Ff3, FcJ is now
replaced by
It should be noted that the matrix G defined here is basically the average over a
neighbhourhood of each element of the co-occurrence matrix (Haralick and Shapiro, 1992) used to classify textures.
Images with different statistics give rise to specific matrices G. Consider for
instance a random image; all intensity levels appear with the same frequency and the
occurrence of a gray-level, say i, at pixel Sa is independent from the occurrence of
gray-level j at a pixel s/3 in a neighbourhood of Sex. Then, p(i, j ) = p(i ) p( j ), with p(i ) = p( j ), and all elements of the matrix must have the same value.
Correspondingly, Hg must be relatively high. In practice what must be expected is a
sparse matrix, with relatively high values of p(i, j) occurring randomly at locations
(i, j ). An example is shown in Fig. 1.
In the opposite case, when the image has uniform intensity, say io, p(io, _j ) = 6;_, j : hence, all matrix elements with values different from zero collapse in just one point (io, io); obviously, in this case, Hg = 0. Consider a pixel Sex: if the grey level
difference between s, and its neighbors s,, (as in the case of a smooth edge or a
Figure 1. (a) Random image. (b) Matrix G. For visualization purposes the elements 1)(i, j) of the matrix have been quantized by using uniform quantization within the display gray-level interval [0, 255]. Here the highest values of p(i, j) are represented by dark points.
354
uniform patch), the (sa, sv) pairs contribute to diagonal or almost diagonal elements.
On the contrary, if Sa bears a high edge value, at least some of its pairs contribute to
elements ga¡J that lie far from the diagonal of G.
In the following, the analysis of more complex images will be presented. We
will see in simulation examples that 'structured' information appears at the onset of
long-range anisotropic correlations, which will be reflected within the matrix G as
an enforcement of the near-diagonal elements.
INTERACTIONS ACROSS SCALES
In the previous section we observed that interactions evolve across scales of
decreasing resolution. Here, this evolution will be simulated by looking to
interactions along different scales generated by a diffusion equation. Diffusion tends to smooth out the image so that, at the limit for t 00, the image
is just uniform over the domain D. Note that we are dealing with linear, isotropic diffusion that induces a smoothing everywhere in the image, and in particular at the
edges. However, edges corresponding to boundaries between parts of the surface last
longer, as t increases, than edges due to noise or texture (Witkin, 1983). Therefore, it can be said that diffusion produces isotropic long-range interactions, by enlarging
regions of almost constant intensity. If t is not too large, anisotropic long-range interactions along boundaries are preserved, whereas at the limit for t - oc only the
basic long-range interaction, corresponding to the average intensity, is preserved. What is the effect of this evolution on the joint probabilities p(i, j)? It is clear
that it increases the conditioning; at the limit t –" oc, p (i , j ) = p( j ) i ) p (i ) = p (i ) and also p (i ) = bi,io. Thus, we must expect the matrix G to have just one non-zero
element and correspondingly Hg to go to zero:
It is of interest to follow the evolution of G in some detail, since new features can
be revealed.
Two examples are shown here, the 'rabbit behind the fence' and the 'rock over a
paper'; they present all types of interaction we have considered so far. Numerical
simulations have been performed by discretizing diffusion equation (3) according to classical difference schemes (centered differences for spatial derivatives and
forward differences for time derivative). For each iteration, the matrix G and Hx have been computed. In the following, to stress the dependence on the level of
resolution, we shall write G(t) and Hg(t). Consider Fig. 2. Here, it can be seen that the diffusion process deletes short-
range anisotropic interactions (the texture regions corresponding to the newspaper columns), while preserving, for a certain interval of scales, long-range anisotropic ones (edges defining the silhouette of the stone). Such a perceptual effect is reflected
355
Figure 2. (a) 'Rock over a paper' image at ditferent scales of resolution, represented by iterations of the diffusion process. From top to bottom each row corresponds respectively to cl = 0, 50, and 200 iterations. (b) Matrix G at the same scales.
356
Figure 3. (a) 'Rabbit behind the fence' image at different scales of resolution, represented by iterations of the diffusion process. From top to bottom each row corresponds respectively to d = 0. 50, and 200 iterations. (b) Matrix G at the same scales.
357
in the behavior of the matrix G. In the top picture representing G at t = 0, the vertical stripes account for non-uniform regions, while near-diagonal elements
represent uniform parts. Stronger edges are indicated by off-diagonal top right non-zero elements. In the central image, t = 50, the vertical stripes have been
blurred and the near-diagonal elements enhanced, while elements at the edge still
persist. It is worth remarking that the sharp partition of the diagonal elements
corresponds to the visually noticeable object/background 'segmentation effect', due
to the coarse graining process. Such an overall effect is also given in the bottom
picture representing G(200). Similar effects can be observed in Fig. 3.
The original image used in this example is particularly significant for describing the perceptual effects of short-range interactions vs long-range anisotropic interac-
tions. The former occur in the regularly textured fence, the latter in the rabbit's and
cage's shapes. In the original image the rabbit's outline is hidden by the fence. This
can be noted considering G(0), which appears as a sparse matrix with a 'shadow'
diagonal. After 50 iterations (middle row) the fence is almost completely blurred, and G(50) is much more diagonalized with respect to G(0). The bottom row am-
plifies this effect, but the diagonal elements of G(200) are less than those of G(50),
clearly indicating that isotropic long-range interactions are becoming prevalent. One could summarize the perceptual effect due to the various types of interactions
by comparing the overall behavior of the 'stone over a page' image against the
'rabbit behind a fence', under the diffusion process. This has been done by
computing Hg as a function of t ; as expected Hg decreases as t increases (see Fig. 4). This procedure can obviously be applied to processes different from isotropic,
linear diffusion. For instance, if one applies anisotropic diffusion to Fig. 1, the result
is the formation of a more ordered structure, where longer interactions emerge.
Correspondingly, the elements different from zero of matrix G tend to be aligned
along the main diagonal, as represented in Fig. 5.
Figure 4. Joint entropies as a function of iterations of the diffusion process for the 'rock over a paper' (thick line) and 'rabbit behind the fence' (thin line), respectively. The range of iterations is [0, 200].
358
Figure 5. (a) Result of anisotropic diffusion applied at the image of Fig. 1, after 100 iterations. (b) Corresponding G matrix.
Clearly, a similar analysis can be applied to patterns formed through generalized
anisotropic diffusion processes such as the Gibbs reaction-diffusion proposed by Zhu and Mumford (1997).
CONCLUSIONS
In this paper we have presented a framework to characterize interactions in images as related to the processes of image encoding by the visual system. As remarked in
the Introduction, the key issue of the work concerns the way interactions are related
to image entropy. Main results can be summarized in the following points.
( 1 ) The notion of interaction as a processing relation has been addressed. Under
the assumption that the image together with the chosen transformation (the
observer) define a dynamic system, any interaction is related to the operations
performed upon the image along the transformation. Namely, any interaction
induces a conditioning among the random variables constituting the intensity distribution field, over the succession of events generated by the transformation.
(2) It has been shown that interactions reduce the entropy of the images. From an
information-theoretic perspective, they reduce the uncertainty the visual system must overcome to generate a representation of the external world via the image. This is a general result which does not depend on the actual law realizing the
conditioning. On the other hand, when the physical law is shaped in the form
of a suitable a priori model of the image, then interactions can be expressed in terms of a very large class of dynamical systems, for instance diffusion
processes.
359
(3) By choosing a diffusion process, the relation between multi-range interactions
and image resolution has been stated. A computational procedure has been
proposed to make explicit the link between the notion of interaction, resolution, and that of entropy associated to the image.
About this latter point, it is worth making some comments. The structure of the
matrix G provides information on the relevance of interactions of different range, at various scales of resolution. Its evolution clarifies the increasing preponderance of long-range interaction when the resolution is decreased. Thus, the procedure
presented here allows to gauge the various contributions on the basis of their
persistence across scale. Such persistence is reflected in the evolution of G, whereas Hg provides a global measure of entropy. Results of the simulations
show that the structure of the images depends on interactions of different ranges, each corresponding to some feature of the image. It must be noted that some
interactions are not readily apparent at a given scale. For instance, the total (or
average) intensity of the image can be considered as corresponding to the basic
long-distance interaction, since the constraint it imposes affects all pixels; however, it can be revealed only when the resolution is coarse enough, that is when t is large. In general the transition across different scales of resolution causes interactions of
a certain range to disappear while revealing new interactions of a longer range.
Correspondingly, the matrix G changes its structure. In this case, the emergence of long-range interactions is represented by the fact that most p(i, j) different from
zero are distributed along the main diagonal; if only the basic long-range interaction
is present, the main diagonal collapses in a single value different from zero.
The symmetric relationship between multi-resolution and multi-range interactions
suggests also that, since the visual system operates over multiple scales, all
interactions are relevant in the processing of visual information.
The diffusion process is actually effective at modelling the coupling or spread of signals between neighboring retinal cells and to generate Gaussian weighted
receptive fields. Further, it is efficient, since it requires information only from
immediately neighboring cells, and thus is relatively simple to implement. Due to
such characteristics, diffusion has been recently adopted (Shah and Levine, 1996) to model different kinds of retinal processes such as cone coupling within the fovea
(Tsukamoto et al., 1992), generations of horizontal cell-receptive fields through cone output diffusion (Naka, 1982), and integration of horizontal cell outputs
performed by bipolar cells (Rodieck, 1973). In this paper, isotropic diffusion was adopted since we are representing an
uncommitted first stage of processing, aimed at handling a large class of different
situations and in which no or very little prior knowledge is available. The principle here is to distinguish bare information used at the very early perceptual level from
later stages, when additional information become available to be used for tuning the
front-end processing to the more specific tasks at hand. Clearly, it is possible to
invoke more refined processing which can provide such structure-oriented tuning.
360
In this respect, Zhu and Mumford ( 1997) have recently proposed an elegant and
generalized form of the energy described in equation ( 18). Namely, E( f') -
an element of a set of K, suitable filters or transformations. This generalization does not change the essence of the
problem from the standpoint of the work here described. In fact, given the random
variable fl'CJ' the entropy of a function of F,a , say is less than or equal to
the entropy of flcr (Cover and Thomas, 1991 ), namely
Therefore, this model can also be theoretically interpreted as providing a dynam- ical system based on entropy reduction through conditioning, and experimentally
investigated as for anisotropic diffusion at the end of the previous section (Fig. 5).
Notwithstanding the context-free nature of the proposed approach, it has been
possible to characterize interactions that depend on range and isotropy, which give rise to the three basic types of image parts: smooth regions, textures and continuous
curves over a relatively long range. It is interesting to note that the visual system seems to be tuned to process these types of interactions, in stages subsequent to
retinal processes. Experimental results reported by von der Heydt et al. (1991, 1992) show the existence of two types of neurons in areas V and V2 of the visual cortex
of monkeys, which they named grating cells and bar cells. Grating cells respond to a grating of bars of appropriate orientation, position and periodicity, while being inhibited with respect to single bars. On the contrary, bar cells respond strongly to single bars, but their response decreases when further bars are added. Thus, bar
and grating cells seem to play complementary roles; bar cells are selective only for
form information as present in contours, while grating cells only respond to texture
information. Of course, far-reaching conclusions should not be drawn from such
qualitative similarities, and remaining questions concerning biological plausibility are left to the reader's speculation and further research. However, as a final and
general comment, it is worth stressing the many insights that can be achieved when
basic aspects of visual processing are handled within an information-theoretical
framework.
NOTE
1. It should be noted that in case of limited domain the solution of diffusion
equation is not given by equation (2) (Morse and Feshbach, 1953), however, numerical experiments show that the difference is very small (Niessen et al., 1994).
Acknowledgements
The authors are grateful to the referees, for their enlightening and valuable com-
ments have greatly improved the quality and clarity of an earlier version of this
paper.
361
REFERENCES
Ballard, D. H. and Brown, C. M. (1982). Computer Vision. Prentice Hall, Englewood Cliffs, NJ. Blake, A. and Zisserman, A. (1987). Visual Reconstruction. MIT Press, Cambridge, MA. Caelli, T. M., Preston, G. A. N. and Howell, E. R. (1978). Implication of spatial summation models
for processes of contour perception: a geometric perspective, Vision Research 18, 723-734. Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. Wiley, New York, NY. Duda, R. O. and Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley, New York, NY. Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. I. Wiley, New York,
NY. Frieden, B. R. (1972). Restoring with maximum likelihood and maximum entropy, J. Optical Soc.
America 62, 511-518.
Geiger, D. and Yuille, A. (1991). A common framework for image segmentation, Int. J. Computer Vision 6, 401-412.
Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the bayesian restoration of images, IEEE Trans. Pattern Anal. and Machine Intell. 6, 721-741.
Grimson, W. E. L. (1981). From Images to Surfaces. MIT Press, Cambridge, MA. Haralick, R. M. and Shapiro, L. G. (1979). Computer and Robot Vision, Vol. 1. Addison-Wesley,
Reading, MA. Helmholtz, H. (1925). Physiological Optics, Vol. III: The Perception of Vision. Optical Society of
America, Rochester, NY (1st edn, in German, 1910). Hoffman, W. C. (1970). Higher visual perception as prolongation of the basic Lie transformation
group, Math. Biosci. 6, 437-471. Koenderink, J. J. (1984). The structure of images, Biol. Cybern. 50, 363-370.
Koffka, K. ( 1935). Principles of Gestalt Psychology. Harcourt Brace, New York, NY.
Lindeberg, T. (1994). Scale-Space Theory in Computer Vision. Kluwer Academic Press, Dordrecht, The Netherlands.
Marroquin, J. L. (1985). Probabilistic solutions of inverse problems. PhD Thesis, MIT. Morse, P. M. and Feshbach, H. (1953). Methods of Theoretical Physics. McGraw-Hill, Toronto. Naka, K. (1982). The cells horizontal cells talk to, Vision Research 22, 653-661. Niessen, W., ter Haar-Romeny, B. M. and Viergever, M. (1994). Numerical analysis of geometry-
driven diffusion equations, in: Geometry-Driven Diffusion in Computer Vision, B. M. ter Haar-
Romeny (Ed.), pp. 393-410. Kluwer Academic Publishers, Dordrecht, The Netherlands.
Nitzberg, M. and Shiota, T. (1992). Nonlinear image filtering with edge and corner enhancement, IEEE Trans. Pattern Anal. and Machine Intell. 14, 826-833.
Perona, P. and Malik, J. (1990). Scale-space and edge detection using anisotropic diffusion, IEEE Trans. Pattern Anal. and Machine Intell. 12, 629-639.
Rodieck, R. (1973). The Vertebrate Retina. W. H. Freeman, S. Francisco, CA. Shah, S. and Levine, M. D. (1996). Visual information processing in primate cone pathways - Part I:
A model, IEEE Trans. System Man and Cybern. 26, 259-274.
Terzopoulos, D. (1983). Multilevel computational processes for visual surface reconstruction, Camp. Vis., Graphics and Image Process. 24, 52-96.
Tsukamoto, Y., Masaarachia, P., Schein, S. and Sterling, P. (1992). Gap junctions between the pedicles of Macaque foveal cones, Vision Research 32, 1809-1815.
von der Heydt, R., Peterhans, E. and Dursteler, M. R. (1991). Grating cells in monkey visual cortex: coding texture, in: Channels in the Visual Nervous System: Neurophysiology, Psychophysics and Models, B. Blum (Ed.), pp. 53-73. Freund, London.
von der Heydt, R., Peterhans, E. and Dursteler, M. R. (1992) Periodic-pattern-selective cells in
monkey visual cortex, J. Neurosci. 12, 1416-1434. Wandell, A. B. (1995). Foundations of Vision. Sinauer Associates, Sunderland, MA.
362
Wertheimer, M. (1938). Laws of organization in perceptual form, in: A Source Book of Gestalt Psychology, W. D. Ellis (Ed.), pp. 71-88. Harcourt Brace, New York, NY.
Witkin, A. P. (1983). Scale space filtering, in: Proc. Intern. Joint Conf. Artif. Intell., Karlsruhe, Germany, pp. 1019-1023.
Zhu, S. C. and Mumford, D. (1997). Prior learning and Gibbs reaction-diffusion, IEEE Trans. Pattern Anal. and Machine Intell. 19, 1236-1250.