An information-theoretic approach to interactions in images

An information-theoretic approach to interactions

in images

GIUSEPPE BOCCIGNONE 1 and MARIO FERRARO2,*

1 Dipartimento di Ingegneria dell'Informazione e Ingegneria Elettrica, Universitá di Salerno and INFM, Unità di Salerno, via Ponte Don Melillo, 1, 84084 Fisciano (SA), Italy

2 Dipartimento di Fisica Sperimentale, Universitá di Torino and INFM, Unità di Torino, via Giuria 1, 10125 Turin, Italy

Received 23 July 1998; accepted 23 February 1999

Abstract-In this paper it will be argued that the notion of interactions in images is closely related to that of entropy associated with an image, and it will be shown that interactions make processing of the information coming from the retina computationally less expensive. A procedure will be presented, based on the evolution of joint entropy across different scales, to gauge the contributions of different

types of interactions to the structure of the images.

INTRODUCTION

The human visual system is very good at capturing and processing visual infor-

mation. Visual patterns appear, in general, to have a well-defined structure and

information is commonly associated with the amount of form/structure present in a

pattern. The rationale behind this paper is that structure naturally emerges by considering

an act of perception as a physical measurement - an experiment - upon a given

pattern. Briefly, the pattern is modified by some physical transformation (which

represents the observer), and the pattern, together with the chosen transformation, defines a dynamic system whose evolution in time can be characterized in terms of

some suitable physical parameters. In this framework, the structure is naturally a

result of to the way pattern elements interact with each other along the transforma-

tion ; by interaction we mean the constraints by which intensity values at a given

pixel q depend on the values the intensity takes in some neighborhood of q.

*To whom correspondence should be addressed. E-mail: [email protected]

346

Thus, it is clear that in the image per se, considered as a distribution of light

intensity over a set of pixels, there are no interactions, the latter being the result of

an operation performed by the visual system on the image. However, the emergence of interactions depends on the way light intensity is distributed in the original image and on the existence of dependencies/correlations between parts of the image. For

instance, consider a pixel with a given level of intensity; if all other pixels in a

certain region have the same intensity, it can be said that there exists a long-range correlation among pixels. In this case, of course, the dependency is not related

to a particular orientation, i.e. it is isotropic. However, there exist long-range correlations that are not isotropic: for example, all pixels forming a curve are

linked by a long-range dependency, occurring only in the direction of the curve.

By contrast, short-range correlations have effect only in small regions, where image

intensity varies sharply over relatively short distances.

In general, short-range interactions can only account for local features of the im-

age, whereas global features depend on long-range interactions. Thus the dichotomy between long- and short-range interactions seems to mirror the distinction between

local and global features. The work of Gestalt psychologists on grouping shows the

relevance of non-local constraints (Koffka, 1935; Wertheimer, 1938), that would

correspond, in the framework of this paper, to long-range isotropic interactions.

Note also that long-range interactions can be the result of propagation of short-

range ones, provided that some constraints are imposed. For instance, vector fields

generated by local intensity gradients, produce curves via integration, but to do so

they must have the property of holonomy; in other words, local vectors must be

aligned 'head to tail' rather than scattered across the visual field (Hoffman, 1970). Local edges can be thought of as generated by short-range interactions, whereas the

curve itself arises as a result of the constraint of holonomy, that links the elements of

the tangent field in an orderly way, establishing a long-range interaction. Similarly, contour perception involves a form of grouping or ordering of some elements of the

visual input to connect them in a line, that is interactions among simple elements

of the contour (Caelli et al., 1978); in this case interactions are non-local as well as

anisotropic. The visual system is certainly optimal in computing dependencies or correlations

occurring at different ranges. If the visual system could only compute local

functionals, as with feature detectors, its performance would be rather limited in

evaluating long-range interactions. This problem is well known in computer vision

where the action of local-edge operators produces a disconnected set of local edges and further processing is needed to link these elements in curves that correspond to

meaningful boundaries in the scene (see, for instance, Ballard and Brown, 1982).

Clearly, if the system can adapt to compute non-local functionals of image intensity, no such drop of performance would be observed. This idea has suggested models

and experiments in which the overall performance is a function of the parameters which control the relevant correlation lengths. The same rationale lies behind many

image models used in computer vision, such as the weak membrane or the thin plate

https://www.researchgate.net/publication/22478666_Implications_of_spatial_summation_models_for_processes_of_contour_perception_a_geometric_perspective?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

https://www.researchgate.net/publication/222624250_Higher_visual_perception_as_prolongation_of_the_basic_Lie_transformation_group?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

https://www.researchgate.net/publication/224982394_Principles_of_Gestalt_Psychology?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

347

(Grimson, 1981; Terzopoulos, 1983; Blake and Zisserman, 1987), or others based

on Markov random fields (Geman and Geman, 1984; Marroquin, 1985; Geiger and

Yuille, 1991 ), where energy functions are used to model interactions among pixels.

Recently, some generalization has been provided by Zhu and Mumford (1997).

However, all the above mentioned approaches, while providing models useful

for different kinds of processing capabilities, do not answer a very fundamental

question: in which sense is a dynamical system, set up according to any of those

models, an information processing system? A first contribution of this paper is to show how the notion of interaction is not

just useful to produce models of images, but has deeper implications concerning the processing of information by the visual system. In particular, interactions

established along any kind of transformation, induce a conditioning between pixels that reduce the entropy, i.e. the uncertainty of the image. Intuitively, interactions

can be considered as signals which encode the exchangeability of pixels, so reducing

image entropy and making it computationally less expensive the processing of the

information coming from the retina.

In the next section we will discuss how such interactions constrain the complexity of early visual processing through an entropy reduction mechanism, and how

interactions can be driven by suitable laws.

A second important point we address is how multi-range interactions relate to

the fact that the visual system analyzes images at different resolutions (Wandell,

1995). In general, the research on the human visual system works on the basis

that visual information is processed in parallel by a number of spatial-frequency- tuned channels and that, in the visual pathway, filters of different sizes operate at the

same location. The relationship between interactions and resolution can be precisely stated as follows.

Let t, a non-negative parameter, denote the resolution which is assumed to

decrease as t increases. The intensity I is defined as a function from D x

R: I: (x, y) x t I (x, y, t), D c R2 being the domain of the image. In general, the image intensity at scale t can be derived from the original image I (x, y, 0) as

where k is the Green function which measures the influence of pixels q' = (x', y') on the pixel q = (x, y) at a given scale, so that the intensity at q = (x, y) depends on the intensity on some neighborhood of q.

The kernel k can be defined in a way to model interactions in the image at different

scales. If the domain of integration is infinite, then the integration simplifies to

convolution,

https://www.researchgate.net/publication/37593083_Visual_Reconstruction?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

https://www.researchgate.net/publication/242498730_From_Images_to_Surface?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

https://www.researchgate.net/publication/223160257_Multilevel_computational_processes_for_visual_surface_reconstruction_Comp_Vision_Graph_Image_Proc_24_52-96?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

348

If, further, k is a Gaussian with variance Q'- = 2t (for discussion of this equality, see Lindeberg, 1994) equation (2) is the solution of the diffusion equation'

I

Equation (3) has been widely used in computer vision to determine the behavior of

images at different scales (Witkin, 1983; Koenderink, 1984; Lindeberg, 1994). It is worth noting that kernels of different shapes can be related to different

types of interactions. For instance, the emergence of anisotropic interactions can

be obtained by suitably deforming the shape of the Gaussian kernel; Nitzberg and

Shiota (1992) show that an elongated Gaussian kernel appears as the fundamental

solution of an anisotropic diffusion equation, such as the one proposed by Perona

and Malik (1990). Analogously the 'mexican hat' filter, well known in the literature, can be used to produce local edges signatures. More generally, kernels apt to

reproduce reaction-diffusion processes could be designed (Zhu and Mumford,

1997). In the following section, 'Interactions across scales', we further investigate the

above relationship between interactions and resolution. We also provide some

operative procedure and simulations to visualize such a connection. In this paper we will limit our investigation to linear diffusion, since this process is more suitable

to represent early, context-free, visual information processes; however, an example

relating to non-linear diffusion will be presented.

Finally, in the Concluding remarks, we discuss overall results so far achieved and

highlight some links to biological vision issues.

INTERACTIONS AND INFORMATION

In this section, we show, from an information-theoretic standpoint, how, during the

very first stage of a visual processing system, correlations or dependencies among

parts of the image serve for an overall complexity reduction mechanism.

To this purpose, it is worth recalling that dependencies find their origins in the

regularity of states of the world. Thus, for instance, the matter coheres to form

objects because of interactions at the molecular or atomic level, and the nature

of intermolecular forces constraints the shape objects can take, so that shapes of

natural objects are not arbitrary. The same kind of forces determine also the physical

properties of the surfaces, e.g. albedo or roughness. Also, configurations or relative

position of objects in a scene obey physical laws and, to some extent, it can be

said that the enormous amount of structure of the physical world determines the

regularities of images. Finally, object surfaces project the light on the retina, and

form images, according to the laws of optics, such as reflection, refraction, and

diffraction.

To perform the visual perception process, the visual system must be able to ana-

lyze the intensity distribution on the retina and extract properties of the image. Such

https://www.researchgate.net/publication/16718285_The_Structure_of_Images?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

https://www.researchgate.net/publication/2588638_Scale-Space_Theory_in_Computer_Vision?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

https://www.researchgate.net/publication/2588638_Scale-Space_Theory_in_Computer_Vision?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

https://www.researchgate.net/publication/220814553_Scale-Space_Filtering?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

349

a process has been described as a process of unconscious inference (Helmholtz,

1925).

Clearly, the task of analyzing the intensity distribution on the retina and extract

properties of the image would be of enormous complexity if dependencies between

different parts of the image were not to be taken into account.

Consider the domain D to be a lattice of N sites (pixels in our case) and let the

intensity at each pixel s be given by the number m, of photons reaching s (Frieden,

1972), thus the image is determined by their distribution. To each pixel s, is then

associated a number of photons ms whose distribution can be seen as the realization

of a random field 0. More formally, let s c D be any family of random

variables indexed by s E D. Suppose, for simplicity, that the variables F,. = Fs(t) take values on some finite set J, depending on the realization of events to, t], t2, ... , each event being the outcome of an experiment. A possible sample f of F, is

denoted by

and it is called a configuration of the field.

In our case, then, fs is the random number of rr2,, photons impinging on pixel .s.

If no interactions are taken into account, we can assume that the m photons are

randomly allocated one at a time among the N sites, with uniform spatial probability (this model is often referred to as the 'monkey model'). Thus, the total number of

photons is computable as m = 1 IIi' For any given m the number of possible configuration is (Feller, 1968):

Suppose now that m E J = {0, 1,..., M}, where M is the maximum number of

photons that can reach the retina. Then, the number II of possible configurations is:

(Feller, 1968). Define the entropy associated to the field 0 as

where p( f ) is the probability that the configuration f occurs, and the sum runs over

all possible configurations f. Note that if all configurations have the same probability then

https://www.researchgate.net/publication/228057717_An_Introduction_to_Probability_Theory_and_Its_Applications_II?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

https://www.researchgate.net/publication/228057717_An_Introduction_to_Probability_Theory_and_Its_Applications_II?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

350

Denote by p(f,) = Pr{F5. _ fs) the probability distribution of photons at site .s.

Then, the entropy of the random variable F, is given by

and it is known from information theory that

(Cover and Thomas, 1991 ). In the inequality (9), the equality holds if, and only if, the random variables F; are independent, that is

If /7(/y) is constant for every In then /?(/s) = 1/M and = log M, for all s.

As a consequence, if there are no interactions in the image, all F,. are independent and hence the entropy of the image is

It has been stressed in the Introduction that in the image per se there are no

interactions among pixels, and that the latter arise as a consequence of processing of visual information. Then, How can be considered the entropy of the image before

any processing. In general, the operations performed on the original image result in

the emergence of interactions and hence in the decrease of the entropy. The previous, informal, definition of an interaction as the constraint by which

intensity values at a given pixel q depend on the values the intensity takes in some

neighbourhood of q, along a transformation, can so far be stated more precisely. Given the sites of the image s, and sp, a # ,B, an interaction is a processing relation

which induces a conditioning p(fça I fçf3) = Prl F,, (t) _ . f"a I F'/3(t)

= fSf3} between the random variables Fsa (t) and Fçf3(t) over the .succe.s.sion of events to, tl , t2,... The entropy reduction effect of such conditioning can be described in detail as

follows.

Suppose that there is no interaction between pixels in the image. Given pixels sa and sp, the probability p(f,,) is independent from the probability p(fçf3) and hence

the joint probability density P(f"a' f,. ) is given by

From (10) it follows

https://www.researchgate.net/publication/224773133_Elements_of_Information_Theory_Wiley?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

351

Taking the expectations of both sides of the equation (11) and using the fact that

L p(fs) = 1, it is straightforward to see that

(Cover and Thomas, 1991), where FSfJ) is the joint entropy

If interactions occur among points of the image, they make the probabilities

dependent upon each other. For instance, if the random variable F, depends on

FS(j' then equation (10) becomes

Following the same line of reasoning as above we find:

where H (Fs f3 is the conditional entropy

Now, it is well known that conditioning reduces entropy, i.e. H(Fs{31 F,. ) x H(F.\fj) (Cover and Thomas, 1991 ). Therefore, the joint entropy of equation (15) is reduced

with respect to that defined in equation (12). It is worth noting, at this point, that

the definition of interaction as a reciprocal conditioning of the kind is

general. The definition does not depend on the specific realization of the interaction, in other terms, on the visual operation performed.

One such operation is the choice of appropriate a priori models of the image. For

instance, one can model the image or, namely, the random field 0 as a Markov

random field. In this case, the conditional probability of a chosen variable having a particular value, given the values of the rest of the variables, is identical to the

conditional probability, given the values of the field 0 in a small set of pixels. The latter is usually called the neighborhood of the given pixel (see for instance

Marroquin, 1985). By choosing a system of neighborhoods on the lattice of sites, we can define a clique C as either a single pixel, or a set of pixels so that all

pixels belonging to C are in the neighborhood of each other. It is possible to show

that the probability distribution of the configurations f has the form of the Gibbs

distribution



https://www.researchgate.net/publication/37600803_Probabilistic_Solution_of_Inverse_Problems?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

352

where Z is a normalizing constant, f3 is a control parameter. The 'energy function' E is of the form

where C ranges over the cliques associated with the given neighborhood system and the potential Vc are functions supported on them. The function Vc defines the interactions among pixels and can be chosen in a way to generate the appropriate probability distribution of the configuration f'. The relation between Vc and the conditional probabilities is given by

where Z' is a normalization factor (see Marroquin, 1985).

Equation (19) shows that the interactions, defined by Vc, make the random

variables, defined at every pixel, dependent on the random variables defined in a

neighborhood, and hence interactions decrease the entropy of the image. The use of energy functions is not the only way to define interactions in images.

As discussed in the Introduction, the interactions of increasing range can be modeled via the diffusion equation. The two formulations can be linked up by noting that, instead of defining a whole distribution with E( f ), it is possible to use the energy to set up a variational problem. In particular, it is possible to minimize

E( f ) by gradient descent. It is well known that solutions I(., t) of parabolic partial differential equations of the form 3, or in the non-linear form as in Perona and Malik (1990) and Zhu and Mumford (1997), can be classically associated to steady- states of quasi-linear functionals when t oc, and all such equations can be seen as a descent method for quasi-linear elliptic variational problems (Morse and

Feshbach, 1953). This property will be exploited in the next section to investigate the relationship between interaction range and resolution.

In order to compute the effect of interactions on the entropy, one needs to know the distributions p(/Ij)' p(/ljP fl'a)' and generally this knowledge is not available. However, an estimate of these distributions can be obtained by considering the empirical marginal distribution (histogram) of the image (see Zhu and Mumford, 1997, for a discussion).

Let p (i ) = nile, where ni is the number of pixels of intensity i and N is the total number of pixels. We assume that for the pixel Sa, p(i) is an estimate of p(f,,); of

course, this implies that the probabilities p( fs) are assumed to be the same for each

pixel. It must be recalled that flj!) is the a priori probability with which two

pixels s, and s f3 and with gray-level and respectively, occur in the image. To estimate p(fç"" fljJ one can use p (i , j ; d), the relative frequency with which

two pixels at a distance d, measured in some metric, have gray-levels i and j

respectively. This way all local information is lost, in that the indices i, j now

https://www.researchgate.net/publication/37600803_Probabilistic_Solution_of_Inverse_Problems?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

353

refer to gray-levels, which can be supposed to be integers, and are not pixels coordinates. In the following it will be assumed that d - 1, measured in the

maximum value metric (Duda and Hart, 1973), that is for every pixel we consider

an 8-neighbourhood, and we shall write p (i , j ) instead of p (i , j ; 1). The relative frequencies p(i, j), a measure of the dependence between gray-levels

i and j, define a symmetric non-negative matrix G = [Pi.)]. H (Ff3, FcJ is now

replaced by

It should be noted that the matrix G defined here is basically the average over a

neighbhourhood of each element of the co-occurrence matrix (Haralick and Shapiro, 1992) used to classify textures.

Images with different statistics give rise to specific matrices G. Consider for

instance a random image; all intensity levels appear with the same frequency and the

occurrence of a gray-level, say i, at pixel Sa is independent from the occurrence of

gray-level j at a pixel s/3 in a neighbourhood of Sex. Then, p(i, j ) = p(i ) p( j ), with p(i ) = p( j ), and all elements of the matrix must have the same value.

Correspondingly, Hg must be relatively high. In practice what must be expected is a

sparse matrix, with relatively high values of p(i, j) occurring randomly at locations

(i, j ). An example is shown in Fig. 1.

In the opposite case, when the image has uniform intensity, say io, p(io, _j ) = 6;_, j : hence, all matrix elements with values different from zero collapse in just one point (io, io); obviously, in this case, Hg = 0. Consider a pixel Sex: if the grey level

difference between s, and its neighbors s,, (as in the case of a smooth edge or a

Figure 1. (a) Random image. (b) Matrix G. For visualization purposes the elements 1)(i, j) of the matrix have been quantized by using uniform quantization within the display gray-level interval [0, 255]. Here the highest values of p(i, j) are represented by dark points.

https://www.researchgate.net/publication/216721999_Pattern_Classification_and_Scene_Analysis?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

354

uniform patch), the (sa, sv) pairs contribute to diagonal or almost diagonal elements.

On the contrary, if Sa bears a high edge value, at least some of its pairs contribute to

elements ga¡J that lie far from the diagonal of G.

In the following, the analysis of more complex images will be presented. We

will see in simulation examples that 'structured' information appears at the onset of

long-range anisotropic correlations, which will be reflected within the matrix G as

an enforcement of the near-diagonal elements.

INTERACTIONS ACROSS SCALES

In the previous section we observed that interactions evolve across scales of

decreasing resolution. Here, this evolution will be simulated by looking to

interactions along different scales generated by a diffusion equation. Diffusion tends to smooth out the image so that, at the limit for t 00, the image

is just uniform over the domain D. Note that we are dealing with linear, isotropic diffusion that induces a smoothing everywhere in the image, and in particular at the

edges. However, edges corresponding to boundaries between parts of the surface last

longer, as t increases, than edges due to noise or texture (Witkin, 1983). Therefore, it can be said that diffusion produces isotropic long-range interactions, by enlarging

regions of almost constant intensity. If t is not too large, anisotropic long-range interactions along boundaries are preserved, whereas at the limit for t - oc only the

basic long-range interaction, corresponding to the average intensity, is preserved. What is the effect of this evolution on the joint probabilities p(i, j)? It is clear

that it increases the conditioning; at the limit t –" oc, p (i , j ) = p( j ) i ) p (i ) = p (i ) and also p (i ) = bi,io. Thus, we must expect the matrix G to have just one non-zero

element and correspondingly Hg to go to zero:

It is of interest to follow the evolution of G in some detail, since new features can

be revealed.

Two examples are shown here, the 'rabbit behind the fence' and the 'rock over a

paper'; they present all types of interaction we have considered so far. Numerical

simulations have been performed by discretizing diffusion equation (3) according to classical difference schemes (centered differences for spatial derivatives and

forward differences for time derivative). For each iteration, the matrix G and Hx have been computed. In the following, to stress the dependence on the level of

resolution, we shall write G(t) and Hg(t). Consider Fig. 2. Here, it can be seen that the diffusion process deletes short-

range anisotropic interactions (the texture regions corresponding to the newspaper columns), while preserving, for a certain interval of scales, long-range anisotropic ones (edges defining the silhouette of the stone). Such a perceptual effect is reflected

https://www.researchgate.net/publication/220814553_Scale-Space_Filtering?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

355

Figure 2. (a) 'Rock over a paper' image at ditferent scales of resolution, represented by iterations of the diffusion process. From top to bottom each row corresponds respectively to cl = 0, 50, and 200 iterations. (b) Matrix G at the same scales.

356

Figure 3. (a) 'Rabbit behind the fence' image at different scales of resolution, represented by iterations of the diffusion process. From top to bottom each row corresponds respectively to d = 0. 50, and 200 iterations. (b) Matrix G at the same scales.

357

in the behavior of the matrix G. In the top picture representing G at t = 0, the vertical stripes account for non-uniform regions, while near-diagonal elements

represent uniform parts. Stronger edges are indicated by off-diagonal top right non-zero elements. In the central image, t = 50, the vertical stripes have been

blurred and the near-diagonal elements enhanced, while elements at the edge still

persist. It is worth remarking that the sharp partition of the diagonal elements

corresponds to the visually noticeable object/background 'segmentation effect', due

to the coarse graining process. Such an overall effect is also given in the bottom

picture representing G(200). Similar effects can be observed in Fig. 3.

The original image used in this example is particularly significant for describing the perceptual effects of short-range interactions vs long-range anisotropic interac-

tions. The former occur in the regularly textured fence, the latter in the rabbit's and

cage's shapes. In the original image the rabbit's outline is hidden by the fence. This

can be noted considering G(0), which appears as a sparse matrix with a 'shadow'

diagonal. After 50 iterations (middle row) the fence is almost completely blurred, and G(50) is much more diagonalized with respect to G(0). The bottom row am-

plifies this effect, but the diagonal elements of G(200) are less than those of G(50),

clearly indicating that isotropic long-range interactions are becoming prevalent. One could summarize the perceptual effect due to the various types of interactions

by comparing the overall behavior of the 'stone over a page' image against the

'rabbit behind a fence', under the diffusion process. This has been done by

computing Hg as a function of t ; as expected Hg decreases as t increases (see Fig. 4). This procedure can obviously be applied to processes different from isotropic,

linear diffusion. For instance, if one applies anisotropic diffusion to Fig. 1, the result

is the formation of a more ordered structure, where longer interactions emerge.

Correspondingly, the elements different from zero of matrix G tend to be aligned

along the main diagonal, as represented in Fig. 5.

Figure 4. Joint entropies as a function of iterations of the diffusion process for the 'rock over a paper' (thick line) and 'rabbit behind the fence' (thin line), respectively. The range of iterations is [0, 200].

358

Figure 5. (a) Result of anisotropic diffusion applied at the image of Fig. 1, after 100 iterations. (b) Corresponding G matrix.

Clearly, a similar analysis can be applied to patterns formed through generalized

anisotropic diffusion processes such as the Gibbs reaction-diffusion proposed by Zhu and Mumford (1997).

CONCLUSIONS

In this paper we have presented a framework to characterize interactions in images as related to the processes of image encoding by the visual system. As remarked in

the Introduction, the key issue of the work concerns the way interactions are related

to image entropy. Main results can be summarized in the following points.

( 1 ) The notion of interaction as a processing relation has been addressed. Under

the assumption that the image together with the chosen transformation (the

observer) define a dynamic system, any interaction is related to the operations

performed upon the image along the transformation. Namely, any interaction

induces a conditioning among the random variables constituting the intensity distribution field, over the succession of events generated by the transformation.

(2) It has been shown that interactions reduce the entropy of the images. From an

information-theoretic perspective, they reduce the uncertainty the visual system must overcome to generate a representation of the external world via the image. This is a general result which does not depend on the actual law realizing the

conditioning. On the other hand, when the physical law is shaped in the form

of a suitable a priori model of the image, then interactions can be expressed in terms of a very large class of dynamical systems, for instance diffusion

processes.

https://www.researchgate.net/publication/3192772_Prior_learning_and_Gibbs_reaction-diffusion?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

359

(3) By choosing a diffusion process, the relation between multi-range interactions

and image resolution has been stated. A computational procedure has been

proposed to make explicit the link between the notion of interaction, resolution, and that of entropy associated to the image.

About this latter point, it is worth making some comments. The structure of the

matrix G provides information on the relevance of interactions of different range, at various scales of resolution. Its evolution clarifies the increasing preponderance of long-range interaction when the resolution is decreased. Thus, the procedure

presented here allows to gauge the various contributions on the basis of their

persistence across scale. Such persistence is reflected in the evolution of G, whereas Hg provides a global measure of entropy. Results of the simulations

show that the structure of the images depends on interactions of different ranges, each corresponding to some feature of the image. It must be noted that some

interactions are not readily apparent at a given scale. For instance, the total (or

average) intensity of the image can be considered as corresponding to the basic

long-distance interaction, since the constraint it imposes affects all pixels; however, it can be revealed only when the resolution is coarse enough, that is when t is large. In general the transition across different scales of resolution causes interactions of

a certain range to disappear while revealing new interactions of a longer range.

Correspondingly, the matrix G changes its structure. In this case, the emergence of long-range interactions is represented by the fact that most p(i, j) different from

zero are distributed along the main diagonal; if only the basic long-range interaction

is present, the main diagonal collapses in a single value different from zero.

The symmetric relationship between multi-resolution and multi-range interactions

suggests also that, since the visual system operates over multiple scales, all

interactions are relevant in the processing of visual information.

The diffusion process is actually effective at modelling the coupling or spread of signals between neighboring retinal cells and to generate Gaussian weighted

receptive fields. Further, it is efficient, since it requires information only from

immediately neighboring cells, and thus is relatively simple to implement. Due to

such characteristics, diffusion has been recently adopted (Shah and Levine, 1996) to model different kinds of retinal processes such as cone coupling within the fovea

(Tsukamoto et al., 1992), generations of horizontal cell-receptive fields through cone output diffusion (Naka, 1982), and integration of horizontal cell outputs

performed by bipolar cells (Rodieck, 1973). In this paper, isotropic diffusion was adopted since we are representing an

uncommitted first stage of processing, aimed at handling a large class of different

situations and in which no or very little prior knowledge is available. The principle here is to distinguish bare information used at the very early perceptual level from

later stages, when additional information become available to be used for tuning the

front-end processing to the more specific tasks at hand. Clearly, it is possible to

invoke more refined processing which can provide such structure-oriented tuning.

https://www.researchgate.net/publication/16908005_The_cells_horizontal_cells_talk_to?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

https://www.researchgate.net/publication/5589212_Visual_information_processing_in_primate_cone_pathways_-_Part_I_A_model?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

https://www.researchgate.net/publication/21817348_Gap_junctions_between_pedicles_of_macaque_foveal_cones?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

360

In this respect, Zhu and Mumford ( 1997) have recently proposed an elegant and

generalized form of the energy described in equation ( 18). Namely, E( f') -

an element of a set of K, suitable filters or transformations. This generalization does not change the essence of the

problem from the standpoint of the work here described. In fact, given the random

variable fl'CJ' the entropy of a function of F,a , say is less than or equal to

the entropy of flcr (Cover and Thomas, 1991 ), namely

Therefore, this model can also be theoretically interpreted as providing a dynamical system based on entropy reduction through conditioning, and experimentally

investigated as for anisotropic diffusion at the end of the previous section (Fig. 5).

Notwithstanding the context-free nature of the proposed approach, it has been

possible to characterize interactions that depend on range and isotropy, which give rise to the three basic types of image parts: smooth regions, textures and continuous

curves over a relatively long range. It is interesting to note that the visual system seems to be tuned to process these types of interactions, in stages subsequent to

retinal processes. Experimental results reported by von der Heydt et al. (1991, 1992) show the existence of two types of neurons in areas V and V2 of the visual cortex

of monkeys, which they named grating cells and bar cells. Grating cells respond to a grating of bars of appropriate orientation, position and periodicity, while being inhibited with respect to single bars. On the contrary, bar cells respond strongly to single bars, but their response decreases when further bars are added. Thus, bar

and grating cells seem to play complementary roles; bar cells are selective only for

form information as present in contours, while grating cells only respond to texture

information. Of course, far-reaching conclusions should not be drawn from such

qualitative similarities, and remaining questions concerning biological plausibility are left to the reader's speculation and further research. However, as a final and

general comment, it is worth stressing the many insights that can be achieved when

basic aspects of visual processing are handled within an information-theoretical

framework.

NOTE

1. It should be noted that in case of limited domain the solution of diffusion

equation is not given by equation (2) (Morse and Feshbach, 1953), however, numerical experiments show that the difference is very small (Niessen et al., 1994).

Acknowledgements

The authors are grateful to the referees, for their enlightening and valuable com-

ments have greatly improved the quality and clarity of an earlier version of this

paper.


https://www.researchgate.net/publication/202924305_Methods_of_Theoretical_Physics_Part_II?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

https://www.researchgate.net/publication/254878369_Numerical_Analysis_of_Geometry-Driven_Diffusion_Equations?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

https://www.researchgate.net/publication/3192772_Prior_learning_and_Gibbs_reaction-diffusion?el=1_x_8&enrichId=rgreq-cd99db29-950f-4ca0-8972-a3e65edce9cf&enrichSource=Y292ZXJQYWdlOzEyODU2MzM1O0FTOjEzNzI1MTk1NDYzMDY1NkAxNDA5NzM0ODI0Mjgz

361

REFERENCES

Ballard, D. H. and Brown, C. M. (1982). Computer Vision. Prentice Hall, Englewood Cliffs, NJ. Blake, A. and Zisserman, A. (1987). Visual Reconstruction. MIT Press, Cambridge, MA. Caelli, T. M., Preston, G. A. N. and Howell, E. R. (1978). Implication of spatial summation models

for processes of contour perception: a geometric perspective, Vision Research 18, 723-734. Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. Wiley, New York, NY. Duda, R. O. and Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley, New York, NY. Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. I. Wiley, New York,

NY. Frieden, B. R. (1972). Restoring with maximum likelihood and maximum entropy, J. Optical Soc.

America 62, 511-518.

Geiger, D. and Yuille, A. (1991). A common framework for image segmentation, Int. J. Computer Vision 6, 401-412.

Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the bayesian restoration of images, IEEE Trans. Pattern Anal. and Machine Intell. 6, 721-741.

Grimson, W. E. L. (1981). From Images to Surfaces. MIT Press, Cambridge, MA. Haralick, R. M. and Shapiro, L. G. (1979). Computer and Robot Vision, Vol. 1. Addison-Wesley,

Reading, MA. Helmholtz, H. (1925). Physiological Optics, Vol. III: The Perception of Vision. Optical Society of

America, Rochester, NY (1st edn, in German, 1910). Hoffman, W. C. (1970). Higher visual perception as prolongation of the basic Lie transformation

group, Math. Biosci. 6, 437-471. Koenderink, J. J. (1984). The structure of images, Biol. Cybern. 50, 363-370.

Koffka, K. ( 1935). Principles of Gestalt Psychology. Harcourt Brace, New York, NY.

Lindeberg, T. (1994). Scale-Space Theory in Computer Vision. Kluwer Academic Press, Dordrecht, The Netherlands.

Marroquin, J. L. (1985). Probabilistic solutions of inverse problems. PhD Thesis, MIT. Morse, P. M. and Feshbach, H. (1953). Methods of Theoretical Physics. McGraw-Hill, Toronto. Naka, K. (1982). The cells horizontal cells talk to, Vision Research 22, 653-661. Niessen, W., ter Haar-Romeny, B. M. and Viergever, M. (1994). Numerical analysis of geometry-

driven diffusion equations, in: Geometry-Driven Diffusion in Computer Vision, B. M. ter Haar-

Romeny (Ed.), pp. 393-410. Kluwer Academic Publishers, Dordrecht, The Netherlands.

Nitzberg, M. and Shiota, T. (1992). Nonlinear image filtering with edge and corner enhancement, IEEE Trans. Pattern Anal. and Machine Intell. 14, 826-833.

Perona, P. and Malik, J. (1990). Scale-space and edge detection using anisotropic diffusion, IEEE Trans. Pattern Anal. and Machine Intell. 12, 629-639.

Rodieck, R. (1973). The Vertebrate Retina. W. H. Freeman, S. Francisco, CA. Shah, S. and Levine, M. D. (1996). Visual information processing in primate cone pathways - Part I:

A model, IEEE Trans. System Man and Cybern. 26, 259-274.

Terzopoulos, D. (1983). Multilevel computational processes for visual surface reconstruction, Camp. Vis., Graphics and Image Process. 24, 52-96.

Tsukamoto, Y., Masaarachia, P., Schein, S. and Sterling, P. (1992). Gap junctions between the pedicles of Macaque foveal cones, Vision Research 32, 1809-1815.

von der Heydt, R., Peterhans, E. and Dursteler, M. R. (1991). Grating cells in monkey visual cortex: coding texture, in: Channels in the Visual Nervous System: Neurophysiology, Psychophysics and Models, B. Blum (Ed.), pp. 53-73. Freund, London.

von der Heydt, R., Peterhans, E. and Dursteler, M. R. (1992) Periodic-pattern-selective cells in

monkey visual cortex, J. Neurosci. 12, 1416-1434. Wandell, A. B. (1995). Foundations of Vision. Sinauer Associates, Sunderland, MA.

362

Wertheimer, M. (1938). Laws of organization in perceptual form, in: A Source Book of Gestalt Psychology, W. D. Ellis (Ed.), pp. 71-88. Harcourt Brace, New York, NY.

Witkin, A. P. (1983). Scale space filtering, in: Proc. Intern. Joint Conf. Artif. Intell., Karlsruhe, Germany, pp. 1019-1023.

Zhu, S. C. and Mumford, D. (1997). Prior learning and Gibbs reaction-diffusion, IEEE Trans. Pattern Anal. and Machine Intell. 19, 1236-1250.

An information-theoretic approach to interactions in images

Documents

Transcript of An information-theoretic approach to interactions in images