A fuzzy video content representation for video summarization and content-based retrieval

19
Signal Processing 80 (2000) 1049}1067 A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis*, Nikolaos D. Doulamis, Stefanos D. Kollias Department of Electrical and Computer Engineering, National Technical University of Athens, 9, Heroon Polytechniou str., Zografou 157 73, Greece Received 30 April 1999; accepted 15 November 1999 Abstract In this paper, a fuzzy representation of visual content is proposed, which is useful for the new emerging multimedia applications, such as content-based image indexing and retrieval, video browsing and summarization. In particular, a multidimensional fuzzy histogram is constructed for each video frame based on a collection of appropriate features, extracted using video sequence analysis techniques. This approach is then applied both for video summarization, in the context of a content-based sampling algorithm, and for content-based indexing and retrieval. In the "rst case, video summarization is accomplished by discarding shots or frames of similar visual content so that only a small but meaningful amount of information is retained (key-frames). In the second case, a content-based retrieval scheme is investigated, so that the most similar images to a query are extracted. Experimental results and comparison with other known methods are presented to indicate the good performance of the proposed scheme on real-life video record- ings. ( 2000 Elsevier Science B.V. All rights reserved. Zusammenfassung In dieser Arbeit wird eine unscharfe (fuzzy) Darstellung des visuellen lnhalts vorgeschlagen, die von Nutzen fu K r die neu aufkommenden Multimedia-Anwendungen wie inhaltsbasierte Bildindizierung und -au$ndung, Video-Browsing und Video-Zusammenfassung ist. Im speziellen wird ein mehrdimensionales fuzzy Histogramm fu K r jedes Videobild kon- struiert, welches auf einer Menge von geeigneten Merkmalen beruht, die mittels Verfahren zur Video-Sequenzanalyse extrahiert werden. Dieser Ansatz wird sowohl auf die Video-Zusammenfassung (in Zusammenhang mit einem inhalts- basierten Abtastalgorithmus) als auch auf die inhaltsbasierte Indizierung und Au$ndung angewandt. Im ersten Fall wird die Video-Zusammenfassung durch Verwerfen von Bildern mit a K hnlichem visuellen Inhalt erzielt, so da{ nur eine kleine aber aussagekra K ftige Menge von Information zuru K ckbehalten wird (Schlu K sselbilder). Im zweiten Fall wird eine inhal- tsbasierte Au$ndungsmethode untersucht, so da{ die einer Anfrage am meisten entsprechenden Bilder extrahiert werden. Die gute Leistungsfa K higkeit der vorgeschlagenen Methode bei echten Videoaufnahmen wird durch experimen- telle Ergebnisse und durch Vergleich mit anderen bekannten Methoden gezeigt. ( 2000 Elsevier Science B.V. All rights reserved. Re 2 sume 2 Dans cet article, nous proposons une repre H sentation #oue du contenu visuel qui est utile pour les nouvelles applications multime H dias e H mergeantes, telles l'indexage et la re H cupe H ration d'images par le contenu, ou le parcours et le * Corresponding author. E-mail address: adoulam@image.ntua.gr (A.D. Doulamis) 0165-1684/00/$ - see front matter ( 2000 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 5 - 1 6 8 4 ( 0 0 ) 0 0 0 1 9 - 0

Transcript of A fuzzy video content representation for video summarization and content-based retrieval

Signal Processing 80 (2000) 1049}1067

A fuzzy video content representation for video summarizationand content-based retrieval

Anastasios D. Doulamis*, Nikolaos D. Doulamis, Stefanos D. Kollias

Department of Electrical and Computer Engineering, National Technical University of Athens, 9, Heroon Polytechniou str., Zografou 157 73,Greece

Received 30 April 1999; accepted 15 November 1999

Abstract

In this paper, a fuzzy representation of visual content is proposed, which is useful for the new emerging multimediaapplications, such as content-based image indexing and retrieval, video browsing and summarization. In particular,a multidimensional fuzzy histogram is constructed for each video frame based on a collection of appropriate features,extracted using video sequence analysis techniques. This approach is then applied both for video summarization, in thecontext of a content-based sampling algorithm, and for content-based indexing and retrieval. In the "rst case, videosummarization is accomplished by discarding shots or frames of similar visual content so that only a small butmeaningful amount of information is retained (key-frames). In the second case, a content-based retrieval scheme isinvestigated, so that the most similar images to a query are extracted. Experimental results and comparison with otherknown methods are presented to indicate the good performance of the proposed scheme on real-life video record-ings. ( 2000 Elsevier Science B.V. All rights reserved.

Zusammenfassung

In dieser Arbeit wird eine unscharfe (fuzzy) Darstellung des visuellen lnhalts vorgeschlagen, die von Nutzen fuK r die neuaufkommenden Multimedia-Anwendungen wie inhaltsbasierte Bildindizierung und -au$ndung, Video-Browsing undVideo-Zusammenfassung ist. Im speziellen wird ein mehrdimensionales fuzzy Histogramm fuK r jedes Videobild kon-struiert, welches auf einer Menge von geeigneten Merkmalen beruht, die mittels Verfahren zur Video-Sequenzanalyseextrahiert werden. Dieser Ansatz wird sowohl auf die Video-Zusammenfassung (in Zusammenhang mit einem inhalts-basierten Abtastalgorithmus) als auch auf die inhaltsbasierte Indizierung und Au$ndung angewandt. Im ersten Fall wirddie Video-Zusammenfassung durch Verwerfen von Bildern mit aK hnlichem visuellen Inhalt erzielt, so da{ nur eine kleineaber aussagekraK ftige Menge von Information zuruK ckbehalten wird (SchluK sselbilder). Im zweiten Fall wird eine inhal-tsbasierte Au$ndungsmethode untersucht, so da{ die einer Anfrage am meisten entsprechenden Bilder extrahiertwerden. Die gute LeistungsfaK higkeit der vorgeschlagenen Methode bei echten Videoaufnahmen wird durch experimen-telle Ergebnisse und durch Vergleich mit anderen bekannten Methoden gezeigt. ( 2000 Elsevier Science B.V. All rightsreserved.

Re2 sume2

Dans cet article, nous proposons une repreH sentation #oue du contenu visuel qui est utile pour les nouvellesapplications multimeH dias eHmergeantes, telles l'indexage et la reH cupeH ration d'images par le contenu, ou le parcours et le

*Corresponding author.E-mail address: [email protected] (A.D. Doulamis)

0165-1684/00/$ - see front matter ( 2000 Elsevier Science B.V. All rights reserved.PII: S 0 1 6 5 - 1 6 8 4 ( 0 0 ) 0 0 0 1 9 - 0

reH sumeH videH o. En particulier, un histogramme #ou multidimensionnel est construit pour chaque trame videH o, sur based'une collection de caracteH ristiques approprieH es, extraites en utilisant des techniques d'analyse de seH quences videH o. Cetteapproche est ensuite appliqueH e a la fois au reH sumeH videH o, dans le contexte d'un algorithme d'eH chantillonnage baseH sur lecontenu, et a l'indexage et la reH cupeH ration baseH s sur le contenu. Dans le premier cas, le reH sumeH videH o est accompli esteH liminant des plans ou des trames de contenu visuel similaire, (trames-cleH s). Dans le second cas, nous investiguons unscheHma de reH cupeH ration sur base du contenu, de sorte que les images les plus semblables a une reque( te soient extraites.Des reH sultats expeH rimentaux et une comparison avec d'autres meH thodes connues sont preH senteH es pour indiquer les bonnesperformances du scheHma proposeH sur des enregistrements videH o de la vie reH elle ( 2000 Elsevier Science B.V. All rightsreserved.

Keywords: Video summarization; Content-based retrieval; Fuzzy logic; Genetic algorithms

1. Introduction

The increasing amount of digital image andvideo data has stimulated new technologies fore$cient searching, indexing, content-based retriev-ing and managing multimedia databases. Thetraditional approach of keyword annotation to ac-cessing image or video information has the draw-back that, apart from the large amount of e!ort fordeveloping annotations, it cannot e$ciently char-acterize the rich visual content using only text. Forthis reason, content-based retrieval algorithms havebeen recently proposed and they have attracteda great research interest in the image processingcommunity. [10,19]. Examples of content-based re-trieval systems, either academic or in the "rst stageof commercial exploitation include the QBIC [8],Virage [11] or VisualSeek [21] prototypes. In thisframework, the moving picture expert group(MPEG) is currently de"ning the new MPEG-7standard [17], to specify a set of descriptors for ane$cient interface of multimedia information.

The aforementioned systems are mainly re-stricted to still images and cannot easily be appliedto video databases [4]. This is due to the fact thatthe standard representation of video as a sequenceof consecutive frames results in signi"cant temporalredundancy of visual content and thus it is veryine$cient and time consuming to perform querieson every video frame. Furthermore, most videodatabases are often located on distributed plat-forms and impose both large storage and transmis-sion bandwidth requirements, even if they arecompressed. Such linear representation of video

sequences is also not adequate for the new emerg-ing multimedia applications, such as video brows-ing, content-based indexing and retrieval. For thisreason, a content-based sampling algorithm is usu-ally applied to video data for extracting a small but`meaningfula amount of the video information[3,13]. This results in a video summarizationscheme similar to that used in document searchengines, where a brief text summary corresponds toone or multiple documents.

However, e$cient implementation of content-based retrieval algorithms and video sum-marization schemes requires a more meaningfulrepresentation of visual content than the tradi-tional pixel-based one. This is due to the fact thatthere is a lack of semantic information at the pixellevel. For this reason, several works have beenpresented in the literature towards a more e$cientimage/video representation. A hidden Markovmodel has been investigated in [15] for color imageretrieval, while in [5] an approach of image re-trieval based on user sketches has been reported.A hierarchical color clustering method has beenpresented in [22]. For video summarization, con-struction of a compact image map or image mo-saics has been described in [13], while a pictorialsummary of video sequences based on story unitshas been presented in [24].

In the context of this paper, a fuzzy representationof visual content is proposed for both video sum-marization and content-based indexing and re-trieval. This representation increases the #exibilityof content-based retrieval systems since it providesan interpretation closer to the human perception

1050 A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067

[14]. It also results in a more robust description ofvisual content, since possible instabilities of thesegmentation, used for describing the visual con-tent, are reduced. In particular, the adopted fuzzyrepresentation is applied for both video summariz-ation and content-based retrieval. In the "rst case,a small set of key-frames is extracted which pro-vides an e$cient description of visual content. Thisis performed by minimizing a cross correlation cri-terion among the video frames by means of a gen-etic algorithm. The correlation is computed usingseveral features extracted using a color/ motionsegmentation on a fuzzy feature vector formulationbasis. In the second case, the user provides queriesin the form of images or sketches which are ana-lyzed in the same way as video frames in videosummarization scheme. A metric distance or sim-ilarity measure is then used to "nd a set of framesthat best match the user's query.

This paper is organized as follows: In Section 2,the video sequences are analyzed by applyinga color/motion segmentation algorithm. The ex-tracted features for each color or motion segmentare fuzzy classi"ed as is presented in Section 3.Application of the proposed fuzzy representationschemes to video summarization is discussed inSection 4, while the application to content-basedretrieval is discussed in Section 5. Furthermore,several practical implementation issues, such asselected parameters and numerical values, are alsomentioned in these sections. Experimental resultson a large image/video databases are presented inSection 6 along with comparisons with otherknown techniques to show the good performanceof the proposed scheme. Finally, Section 7 con-cludes the paper.

2. Video sequence analysis

Semantic segmentation, i.e., extraction of mean-ingful entities, is essential in a content-based re-trieval environment. However, this remains one ofthe most di$cult problems in the image analysiscommunity, especially if no constraints are im-posed on the kind of the examined video sequences[6,7,9]. For this reason, a color/motion segmenta-tion algorithm is applied in this paper for visualcontent description.

A multiresolution implementation of the recur-sive shortest spanning tree (RSST) algorithm, calledM-RSST, is adopted, for both color and motionsegmentation. The M-RSST recursively applies theRSST to images of increasing resolution. In par-ticular, a truncated image pyramid is created, eachlayer of which contains a quarter of the pixels of thelayer below. In the following steps, an iterationbegins so that images of higher resolution are takeninto consideration until the highest resolution levelis reached.

The use of the RSST algorithm is based on thefact that it is considered as one of the most powerfultools for image segmentation compared to othertechniques, such as pyramidal region growing,morphological watershed or color clustering [20].It has been shown by the COST211ter simulationsubgroup, using several experiments on a set ofgeneric video test sequences, that the RSST pres-ents the best performance compared to the othermethods [1,18]. In particular, it delineates the truecontent edges while giving as few arbitrary bound-aries as possible. As far as the computational cost isconcerned, the RSST is also the fastest algorithmamong all the examined ones [1].

However, the complexity of the RSST still re-mains very high especially for images of large size.Instead, the proposed M-RSST approach yieldsmuch faster execution time, while simultaneouslykeeping the same performance. Comparison of thecomputational load of the RSST and M-RSST atdi!erent image sizes is depicted in Table 1, usinga C implementation on a Sun Ultra 10 (333 MHz)workstation. Another bene"t of the M-RSST is thatit eliminates regions of small segments, which, ingeneral, are not preferable in the context of con-tent-based indexing and retrieval. This is due to thefact that the algorithm begins with a low imageresolution, while no segments are created or de-stroyed at higher resolution levels.

The initial image resolution level is selected to be3 (downsampling by 8]8 pixels) so that informa-tion directly available in the MPEG stream is ex-ploited [6]. This selection results in a reduction ofcomputational complexity since no decoding of theimages at the initial resolution level is required.At higher-resolution levels, only the `boundaryablocks are decoded. Since these blocks compose

A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067 1051

Table 1Execution times of color segmentation, fuzzy representation and color histogram

Execution times

RSST (s) M-RSST (ms)Fuzzy scheme (Q"3,Triangular) (ms) Color histogram (ms)

176]144 (QCIF) 5.65 132.01 1.32 8.67352]288 (CIF) 44.21 382.34 1.32 28.35720]576 (PAL) 534.22 1360.90 1.32 103.45

Fig. 1. Color segmentation results, using the M-RSST algorithm, (a) original frame; (b) segment splitting at the initial resolution level;(c) "nal segmentation result.

Fig. 2. Motion segmentation results for frame d29 of the Monitor sequence; (a) motion vectors derived from the MPEG sequence;(b) motion vectors after post processing; (c) "nal segmentation.

a small percentage of image blocks, as shown inFig. 1(b), the overall complexity for segmentation isdecreased.

Another parameter, which a!ects the segmenta-tion e$ciency, is the selection of threshold used forterminating the algorithm. In our case, an adaptiveprocedure is used for threshold estimation. Parti-cularly, at each iteration, the Euclidean distance ofthe color or motion intensities between two neigh-boring segments, weighted by the harmonic mean

of their areas, is "rst calculated and then the dis-tance histogram is created. The half of maximumhistogram value is considered as the appropriatethreshold for the given iteration. Finally, the seg-mentation is terminated if no segments are mergedfrom one step to another. Fig. 1(c) depicts thesegmentation results for the image of Fig. 1(a).Fig. 1(b) presents the boundary blocks, which aresplit into four new segments, at the initial resolu-tion of the algorithm. Fig. 2(c) shows the motion

1052 A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067

segmentation results using the image of Fig. 2(a),while Fig. 2(b) presents the motion vectors afterbeing post-processed with a median "lter so thatpossible noise e!ect is removed.

3. Fuzzy visual content representation

The size, location and average color componentsof all color segments are used as color properties. Ina similar way, motion properties include the size,location and average motion vectors of all motionsegments. Since the segment number is not con-stant for each video frame, the aforementionedproperties cannot be directly included in a featurevector, because the size of this vector is not con-stant. Thus, direct comparison between vectors ofdi!erent frames is practically impossible. Forexample, a frame consisting of 10 segments resultsin a 10]1 feature vector, while a frame of "vesegments results in a 5]1 feature vector. To over-come this problem, we classify color/motionproperties into pre-determined classes. In thisframework, each element of the feature vectorcorresponds to a speci"c class.

To avoid the possibility of classifying two similarsegments to di!erent classes, causing erroneouscomparisons, a degree of membership is allocatedto each class, resulting in a fuzzy classixcation for-mulation [14]. As a result, each sample, i.e., seg-ment in our case, can belong to several or allclasses, but with di!erent degrees of membership.Then, a fuzzy multidimensional histogram is created.In a conventional classi"cation, each sample (seg-ment property) belongs only to one class, so that itis possible for two samples (segments) of similarproperties to be assigned to di!erent classes if theyare located near the class boundary. Instead, ina fuzzy classi"cation approach, such samples wouldslightly di!er in their degree of membership toadjacent classes.

To clearly explain the proposed fuzzy classi"ca-tion approach, we "rst consider the simple case thatonly one property, say s, is used for each segment.We further assume, without loss of generality, thats takes values in [0,1] and it is classi"ed intoQ classes (partitions) using Q membership functionskn(s), n"1, 2, 2, Q. Functions k

n(s) denote the

degree of membership of s in the nth class and takevalues in the range [0,1]. Values of k

n(s) near unity

(zero) indicate high (low) degree of membership offeature s in the nth class.

The exact type and shape of the membershipfunctions k

n(s) can be greatly varied [14] and in

general depends on the speci"c problem. Severalfunctions have been proposed in the literature asmembership functions. One of the simplest are thetriangular ones, which are illustrated in Fig. 3 forQ"3 partitions. Higher-order functions have beenalso proposed and depicted in Fig. 4(a) along withthe triangular ones for comparison purposes. Asthe order increases the membership functions pro-vide `hardera classi"cation closer to the binarylogic. Functions of trapezoid shape are illustratedin Fig. 4(b) for di!erent line slopes b. As expected,the higher the slope is the `hardera the classi"ca-tion becomes. In all the above cases, `symmetricafunctions have been used since there is no reason togive more importance to a speci"c class.

Let us also assume that the examined videoframe consists of K segments or samples. First, foreach feature s

i, i"1, 2, K, of the ith segment, the

degree of membership of feature siin all Q classes is

evaluated. Then, the degree of membership of allK segments of the respective frame to the nth classis calculated through the fuzzy histogram, say H(n),which is de"ned as follows:

H(n)"1K

K+i/1

kn(si), n"1, 2, 2, Q. (1)

An arithmetic example is presented in the follow-ing for clari"cation purposes. In particular, we as-sume that K"2 segments have been extractedwhile their average luminance values, 120 and 238,respectively, are considered as the only segmentproperty. If Q"2 and triangular functions areused, then the degree of membership of eachsegment to the two available classes is:k1(120/255)"0.53, k

2(120/255)"0.47 for the "rst

segment and k1(238/255)"0.07, k

2(238/255)"

0.93 for the second segment. (The division by 255 isused for normalization in the interval [0 1].) Then,from (1), the 2]1 fuzzy histogram vector is[H(1) H(2)]T"[(0.53#0.07)/2 (0.47#0.93)/2]T"[0.3 0.7]T. Instead, if binary classi"cation was

A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067 1053

Fig. 3. Triangular membership functions in the case of Q"3partitions.

Fig. 4. Di!erent membership functions: (a) quadratic (solid line), triangular (dotted line) and functions of order 10 (dashed line);(b) trapezoid membership functions, slope b"3 (solid line) and b"7 (dashed line).

used, the histogram vector would be [1 1]T andthus the two classes would be of equal importance.

The more general case of more than one segmentproperties is considered next. In particular, let usassume that K# color and K. motion segmentshave been extracted. Then, for each color segmentS#i, i"1, 2, K#, an ¸#]1 vector s#

iis formed, while

for each motion segment S.i

i"1, 2, 2, K. an¸.]1 vector s.

iis formed as follows:

s#i"[cT(S#

i) lT(S#

i) a(S#

i)]T, (2a)

s.i"[vT(S.

i) lT(S.

i) a(S.

i)]T, (2b)

where a denotes the size of the color or motionsegment, and l is a 2]1 vector indicating the hori-zontal and vertical locations of the segment center;the 3]1 vector c includes the average values of thethree color components of the respective color seg-ment, while the 2]1 vector * includes the averagemotion vector of the motion segment. Thus, ¸#"6for color segments and ¸."5 for motion seg-ments. For the sake of notational simplicity, thesuperscripts c and m will be omitted in the sequel;each color or motion segment will be denoted asSiand will be described by the ¸]1 vector s

iwhere

¸"5 or ¸"6 depending on the segment type.Based on the above, s

i"[s

i,1si,2 2 s

i,L]T is

a vector containing all properties extracted fromthe ith segment, S

i. For example s

i, 1corresponds

to the average value of the "rst color component ofsegment S

i. Each element s

i,j, j"1, 2, 2, ¸ of vec-

tor siis then partitioned into Q regions by means of

Q membership functions knj(si,j

) nj"1, 2, 2, Q.

As in the previous case, knj(si,j

) denotes the degreeof membership of s

i,jto the n

jth class. Then, the

product of knj(si,j

) over all si,j

of si

de"nes thedegree of membership of vector s

ito the ¸-dimen-

sional class n"[n1n2 2 n

L]T the elements of

which express the class to which the elements ofsibelong.

kn(si)"

L<j/1

knj

(si, j

). (3)

1054 A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067

The product operator in the previous equation isone of the most used T- norms in fuzzy logic [14].

Gathering all segments of a frame, a multidimen-sional fuzzy histogram is created exactly as in theprevious case [Eq. (1)].

H(n)"1K

K+i/1

kn(si)"

1K

K+i/1

L<j/1

knj

(si,j

). (4)

H(n) thus can be viewed as a degree of membershipof a whole frame to class n. A frame feature vectorf is then formed by gathering all values of H(n)for all classes n, i.e., for all QL combinations ofindices, resulting in a vector of QL elements:f"[f

1f2 2 f

QL]T. In particular, the vector f isconstructed from H(n) using an index function z(n)which maps the class n to an integer between 1and QL,

z(n)"1#L+j/1

(nj!1)QL~j. (5)

Then, the elements fi, i"1, 2, QL, of f are cal-

culated as fz(n)

"H(n) for all classes n. In fact, sincethe above analysis was based on features s#

iand

s.i

of color S#iand motion S.

isegments, respectively,

two feature vectors will be calculated; a color fea-ture vector f # for color segments and a motionfeature vector f . for motion segments. Thus, thetotal feature vector of an image is formed asfollows:

f"[( f #)T ( f .)T]T. (6)

To further explain the proposed fuzzy video con-tent representation a simple example is presented inthe following. We "rst assume that K"2 segmentshave been extracted and only the three averagecolor components (RGB) are used as segment prop-erties. In particular, we consider that the values ofvectors s

1and s

2, which describe the segment prop-

erties [Eq. (2a) and (2b)] are s1"[53 145 21]T and

s1"[200 2 144]T. If Q"2 and triangular mem-

bership functions are used then the three-dimen-sional class n"[n

1n2n3]T has 23"8 di!erent

combinations since n1, n

2, n

3take values 1 or 2.

The value n1"1 indicates low R color component,

whereas n1"2 corresponds to high R color com-

ponent. Similarly, low G (B) is represented byn2"1 (n

3"1), while high G (B) by n

2"2

(n3"2). As a result, the class n"[1 1 1]T indicates

that all color components (R,G,B) take `lowavalues, while n"[1 1 2]T means that the R andG components are low while the B is high. Eq. (3) isused to estimate the degree of membership of vectors1

or s2

to a given class n. For instance, forn"[1 1 1]T the k

*1 1 1+T(s1)"0.314 while for

s2

k*1 1 1+T

(s2)"0.09. Hence, the H([1 1 1]T)"

0.202 [Eq. (4)].Let us denote as c the computational complexity

for estimating kn(si) [Eq. (3)] for a given combina-

tion of n. Then, the total complexity for all K ex-tracted segments and for all combinations (classes)of n is K )QL ) c. As can be seen the computationalload increases exponentially with respect to the par-tition number Q, and proportionally to the numberof segments K and membership function complex-ity c. Thus, the complexity load is mainly a!ectedby the partition number Q. Furthermore, large par-tition numbers Q also increase storage require-ments since a QL]1 feature vector is assigned toeach video frame.

However, the partition numbers Q as well as theshape of the membership functions, apart from thecomputational complexity, a!ects the performanceof the proposed fuzzy representation method to theproblem of content-based retrieval. Intuitively,small number of partitions is not able to describewith high e$ciency the visual content. On the con-trary, large number of classes leads to `noisya clas-si"cation. The appropriate selection of membershipfunctions and number of partitions is described inSection 5.1 of this paper. In this section, we alsopresent the e!ect of di!erent types of membershipfunctions and partition numbers on computationalcomplexity. The application of the proposed fuzzyvisual content representation to the problem ofvideo summarization is described in Section 4,while Section 5 presents the application to theproblem of content-based retrieval.

4. Video summarization

Fig. 5 depicts the block diagram of the proposedvideo summarization scheme. Since a video se-quence is a collection of di!erent shots, each ofwhich corresponds to a continuous action ofa single camera operation, a shot cut detection

A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067 1055

Fig. 5. A block diagram of the proposed scheme for videosummarization.

algorithm is "rst applied, to identify video frames ofsimilar visual content. The algorithm proposed in[22] has been adopted for this purpose, since itpresents high accuracy and low computationalcomplexity compared to other techniques [3].Then, analyzing the video sequence as presented inSection 2, color and motion properties are extrac-ted for each frame and then classi"ed using fuzzyformulation. Finally, a content-based sampling al-gorithm is applied so as to discard frames, whichpresent similar visual content.

4.1. Extraction of key-frames

Key-frames are extracted by minimizing a cross-correlation criterion, so that the selected frames arenot similar to each other.

Let us denote by fk

the feature vector of the kthframe of a shot, with k 3 <"M1, 2, 2, N

FN,

where NF

is the total number of frames of the givenshot. Let us also denote by K

Fthe number of

key-frames that should be selected. In order tode"ne a measure of correlation among K

Ffeature

vectors, an index vector is "rst de"ned x"(x

1, 2, x

KF) 3=L<KF , where ="M(x

1, 2,

xKF

) 3 <KF : x1(2(x

KFN is the subset of

<KF containing all sorted index vectors x whichcontains the frame numbers or time indices of can-didate key-frames. Then, the correlation measureamong the K

Ffeature vectors is given by following

equation:

R(x)"R(x1, 2, x

KF)

"

2K

F(K

F!1)

KF~1+i/1

KF

+j/i`1

o2xi , xj

, (7)

where oxi , xj

denotes the correlation coe$cient be-tween feature vectors f

xi, f

xj, which corresponds to

frames with numbers xiand x

j. Function R(x) takes

values in the interval [0,1]. Values close to zeromean that the K

Ffeature vectors are uncorrelated

while values close to one indicate that KF

featurevectors are strongly correlated.

Based on the above de"nition, it is clear thatsearching for a set of K

Fminimally correlated fea-

ture vectors is equivalent to searching for an indexvector x that minimizes R(x). Searching is limited inthe subset=, since the correlation measure of theK

Ffeatures is independent of the feature arrange-

ment. Consequently, the set of the KF

least-corre-lated feature vectors is found by

x("(x(1, 2, x(

KF)"R~1(x). (8)

Unfortunately, the complexity of an exhaustivesearch for obtaining the minimum value of R(x) issuch that a direct implementation of the method ispractically unfeasible. For example, about 264 mil-lion combinations of frames should be considered(each of which requires several computations forestimation of R(x)) if we wish to select 5 representa-tive frames out of a shot consisting of 128 frames.For this purpose, a logarithmic search algorithmhas been proposed in [3] for e$cient implementa-tion of the optimization procedure. Although thisapproach provides very fast convergence, it is high-ly probable for the solution to be trapped to a localminimum resulting in a sub-optimal solution. Thisdrawback is alleviated by the use of a guided ran-dom search procedure implemented by a geneticalgorithm (GA) [16].

4.1.1. The genetic approachIn the GA approach, the index vector x"

(x1, 2, x

KF) is considered as a chromosome, while

1056 A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067

Fig. 6. A graphical representation of the three examined cross-over mechanisms: (a) 1-random crossover operator; (b) 2-ran-dom crossover operator; (c) uniform crossover operator.

the elements x1, 2, x

KFof x as the genetic material

of the respective chromosome. For instance, in casethat we are interested in extracting four (K

F"4)

key-frames from a shot composing NF"100

frames, a chromosome of the form (5,10,52,90) indi-cates that the frames with numbers 5, 10, 52 and 90are possible key-frames. Initially, a population ofm chromosomes is created, say P(0), consisting ofm randomly selected index vectors x. That is,P(0)"Mx

1(0), 2, x

m(0)N, where x

i(0), i"1, 2, m

corresponds to the ith chromosome, i.e., KF]1

index vectors of population P(0).As can be seen, the codi"cation of the examined

problem presents similarities to the traveling sales-man problem (TSP). In particular, in the TSPchromosomes of a population are represented byvectors which express a salesman's tour, while thevector elements (genetic material) indicate the citiesthat the salesman visits in this tour (see page 218 of[16]). The object of TSP is to "nd that chromo-some which yields the minimum travelling cost.Similarly, in our approach, chromosomes, whichare represented by index vectors, express candidatekey-frames (like tour of the TSP), while their ele-ments express the frame indices (like cities of theTSP) of the candidate key-frames. Di!erentcombinations of frame indices result in di!erentcorrelation measures, while in the TSP dif-ferent permutations of cities gives di!erent travel-ling costs (see p. 211 of [16]).

The correlation measure of (7) is used to evaluatethe performance of all chromosomes in populationP(n) of the nth iteration. However, the lower thecorrelation measure of a chromosome the higher isits performance. For this reason, the "tness func-tion, which is responsible for measuring thechromosome quality (proportional to the chromo-some performance), is completed related to thecorrelation function [16]. Particularly, for the ithchromosome of the nth population x

i(n) the "tness

function is given by

F(xi(n))"D!R(x

i), (9)

where the constant D is selected such that negativevalues of the "tness function are avoided. In ourcase D"1 since the correlation measure is nor-malized in the interval [0 1].

Based on the "tness values, F(xi(n)), i"1, 2, m,

for all chromosomes of the current population,appropriate `parentsa are selected so that a "tterchromosome gives a higher number of o!springand thus has a higher chance of survival in the nextgeneration. In particular, in our case, a probabilityis assigned to each chromosome, equal toF(x

i(n))/+p

i/1F(x

i(n)) and then p

o(m chromo-

somes are randomly selected based on their as-signed probabilities as candidate parents (roulettewheel selection procedure [16]).

A set of new chromosomes (o!spring) is thenproduced by mating the genetic material of theparents using a crossover operator. A simple cross-over mechanism is illustrated in Fig. 6(a) and pro-duces a chromosome whose genetic material, untila random position (crossover point), comes from the

A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067 1057

"rst parent while all the others come from thesecond parent. This is called 1-random crossovermechanism in the following. Another method is touse more than one, usually two, crossover points(2-random crossover) as explained in Fig. 6(b). Ina more general case, the selected material of twoparents is randomly mated resulting in a uniform-crossover operator as in Fig. 6(c).

The next step of the algorithm is to apply muta-tion to the newly created chromosomes, introduc-ing random gene variations that are useful forrestoring lost genetic material, or for producingnew material that corresponds to new search areas.In our case, the genetic material of a chromosome ismutated to a randomly selected frame number (in-dex) of the examined shot with a probability p

m.

Otherwise, the gene remains intact. After that, thenext-generation population P(n#1) is created byinserting the new chromosomes and deleting theolder ones. Several GA cycles take place by repeat-ing the procedures of "tness evaluation, parentselection, crossover and mutation, until the popula-tion converges to an optimal solution. The GAterminates when the best chromosome "tness re-mains constant for a large number of generations,indicating that further optimization is unlikely.

4.2. Implementation issues

As can be seen from the previous subsection,several parameters are involved in the GA imple-mentation. In this section, we discuss the impact ofthese parameters on the performance of the GA tothe problem of video summarization.

In order to tune the most appropriate para-meters of the GA, an experiment was carried out byexamining about 170 shots of a video database. Thenumber of key-frames has been selected to beK

F"6. The average correlation measure over all

shots is used as objective function in this case.Furthermore, triangular membership functions andQ"3 partitions are used to classify the color andmotion features. This is due to the fact that theseparameters provide better representation of the vis-ual content as explained in Section 5.1. Fig. 7 illus-trates the convergence of the GA for the threedi!erent crossover mechanisms depicted in Fig. 6.In particular, Fig. 7(a) presents the results obtained

when no mutation is used, while Fig. 7(b and c)show the results for mutation probability p

."0.04

and 0.1 respectively. In all cases, the populationconsists of 80 chromosomes, while, at each iter-ation, 30 parents are selected. It is observed that,for low mutation rates, the uniform crossover oper-ator outperforms the other two examined crossovermethods since it converges faster to the optimalsolution. However, as p

.increases, the e!ect of the

three crossover operators to the GA performancebecomes minimal, though the uniform one stillyields slightly better results on average. The sameconclusions are drawn from Fig. 8(a), which pres-ents the average correlation measure versus p

.for

the three aforementioned crossover mechanisms. Inthis case, the number p

0of chromosomes used as

parents has been selected to be 10 to show the e!ectof crossover operators to a di!erent number of p

0,

while 100 GA cycles have been used to terminatethe iteration process. Consequently, the uniformcrossover operator seems to provide better resultsand this is selected in the following.

The e!ect of mutation probability, p., on the GA

performance is also depicted in Fig. 8(a). It seemsthat the minimum correlation is given in the rangeof [0.04}0.06] for all crossover mechanisms. Smallmutation probability may trap the solution to alocal minimum. Instead, large probability leads torandom search, which deteriorates the GA perfor-mance. Fig. 8(b) presents the mutation impact onthe GA performance for di!erent values of p

0in

case of uniform crossover operator and 100 GAcycles. Mutation probabilities around [0.04 0.06]provide better results in this case too. The e!ect ofthe mutation probability on the average computa-tional load is shown in Table 2. In this case theresults have been obtained on a Sun Ultra 10, usinguniform crossover operator and p

0"10. It should

be mentioned that the execution times refer to thetotal GA procedure but with di!erent mutationprobabilities. Particularly, Table 2 presents theaverage load per 100 GA cycles, the average num-ber of iterations required to achieved a correlationmeasure less than 0.278 over all the examined 170shots and the total load. In conclusion, mutationprobability around 0.06 provides the best results.

Then, the e!ect of the number of parents selectedp0is evaluated. Fig. 9 shows the average correlation

1058 A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067

Fig. 7. The genetic algorithm convergence for di!erent crossover mechanisms in the case of p0"30: (a) no mutation; (b) mutation

probability 0.04; (c) mutation probability 0.1.

measure versus GA cycles for four di!erent valuesof p

0(10 30 50 and 60, respectively) and p

."0.06

and uniform crossover operator as they have beenselected by the previous analysis. It can be seen thatas the value of p

0increases, the algorithm reaches

the optimal solution in fewer GA cycles. However,the number p

0signi"cantly a!ects the computa-

tional complexity of the GA. Table 3 presents theaverage computational load per 100 GA cycles fordi!erent values of p

oover all the 170 shots along

with the average iterations required to achievea correlation less than 0.26. It can be seen thatp0"30 provides a faster convergence although it

requires greater number of genetic cycles than

other p0

values. As a result, the most appropriateGA parameters are; uniform crossover operator,p."0.06 and p

0"30.

5. Content-based retrieval

The problem of content-based retrieval from im-age and video databases is discussed in this section.Particularly, for content-based video retrieval theaforementioned video summarization scheme is ap-plied so that all the redundant temporal videoinformation is discarded. At this point, the problemof content-based retrieval from a video database

A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067 1059

Fig. 8. The correlation measure versus mutation probability for 100 GA cycles: (a) di!erent crossover operators and p0"10;

(b) di!erent values of p0

and uniform crossover.

Table 2Average execution of the genetic algorithm for di!erent muta-tion probabilities in case of uniform crossover mechanism andp0"10

Mutationrate (%)

Cost/(100 GAcycles) (ms)

Averageiterations

Total cost(ms)

1 79.4 334.4 265.512 82.82 295.6 244.824 89.1 130.2 116.016 93.92 98.2 92.228 97.22 210.2 204.36

10 101.12 395.6 400.03

Fig. 9. The genetic algorithm convergence for di!erent values ofp0

in case of mutation probability 0.06 and uniform crossoveroperator.

has actually reduced to still image retrieval sincevideo queries are applied on the selected key-frames.The proposed fuzzy representation scheme is thenemployed by assigning a fuzzy feature vector toeach still image or key-frame of the database.

The user's queries can be submitted to the systemin the form of images (query by example) or sketches(query by sketch). The goal in all cases is to retrievethe best M images from the database, whose visualcontent is close to the user's query. The submittedimage or sketch is "rst analyzed similarly to theimages of the database, by applying the M-RSSTsegmentation algorithm, and then the extractedsegment properties are classi"ed using the pro-posed fuzzy representation scheme. A distance or

similarity measure is used to "nd the set of imagesthat best match the user's query. Let us denote asf2

the feature vector of the user's query and as fi, the

respective vector of the ith image in the database.Then, a weighted Euclidean distance is adopted assimilarity measure between the vector f

2and all the

vectors fiin the database,

d( f2, f

i)"

QL

+j/1

wi( f

2, j!f

i, j)2, (10)

1060 A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067

Table 3Average execution of the genetic algorithm for di!erent values ofp0

in case of uniform crossover mechanism and mutation prob-ability 0.06

p0

Cost/(100 GAcycles) (ms)

Averageiterations

Total cost(ms)

10 89.10 699.8 623.5220 181.24 325.2 589.3930 275.40 146.0 402.0840 366.24 122.1 447.1850 461.04 108.0 497.9260 579.12 102.4 593.01

where f2,j

and fi,j

are the jth element of vectorsf2

and fi, respectively. The parameter w

iregulates

the importance of each feature element to the sim-ilarity distance. The set of weights can be assignedeither by the user, according to his informationneeds or can be adjusted automatically througha relevance feedback mechanism [2]. In this paper,all the weights are selected to be equal to one,wi"1 for all i, meaning that we impose the same

importance on all features elements. The set ofM images in the database with minimum distancesd( f

2, f

i) are returned as the most appropriately

retrieved images to the user's query.

5.1. Implementation issues

The performance of the content-based retrievalmechanism depends on the selected values of fuzzyrepresentation; the shape of membership functionsand the number of partitions Q. In this section, wepresent the practical details of the proposed fuzzyvideo content representation scheme, i.e., we pres-ent the values of fuzzy parameters used and thereason why these values give appropriate repres-entation of visual content.

The following experiment is carried out to esti-mate the parameters of the fuzzy representation.Particularly, 15 image queries are submitted toa database consisting of 1250 images and key-frames about the space. Each time, the 10 mostappropriate images are returned. To evaluate theperformance of the content-based retrieval systemthe following procedure is used. First, the similarity

measure [Eq. (10)] is normalized in the interval[0 1] as follows:

d/3.

( f2, f

i)"d

w( f

q, f

i)/(d

.!9!d

.*/), (11)

where d/3.

( fq, f

i) denotes the normalized distance

and d.*/

"miniMd( f

q, f

i)N, d

.!9"max

iMd( f

q, f

i)N

the minimum and maximum distances over all im-ages of the database for a given query.

Normalization is performed to allow compari-sons between di!erent user's queries, which ingeneral give di!erent distance values. Then, for animage query, a similarity degree, say t

i, is assigned

to all images of the database, which indicates howsimilar is the content of the ith image to the query.In our case, three similarity degrees are used; zerodegree meaning that the image is quite similar tothe user's query, 13 for irrelevant images and 0.53for somehow relevant images. The absolute di!er-ence E

Aof the normalized distance and similarity

degree over all the best M retrieved images is usedto evaluate the system performance to the user'squery,

EA"

1DS

MD

+i|SM

Ed/3.

( f2, f

i)!t

iE, (12a)

where SM

is the set containing the best M retrievedimages for a given user's query and DS

MD its cardinal-

ity. The di!erence EA

expresses how close are thebest M retrieved images to the user's query. An-other approach is to examine the performance ofthe system over all relevant images to the user'squery, i.e., images of similarity degree t

i"0.

EB"

1DS

tD

+St/Mi> ti/0N

d/3.

( fq, f

i), (12b)

where DStD is the cardinality of set S

t.

Fig. 10(a) illustrates the average error EA

for allthe 15 examined image queries versus the numberof partitions Q for triangular, trapezoid with lineslope b"3 and quadratic membership functions.In the same "gure, the results obtained using binaryclassi"cation are also depicted. It is observed thata partition number Q"3 yields the best perfor-mance for membership functions. We also observethat the triangular functions give better resultscompared to the other examined functions for most

A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067 1061

Fig. 10. Average performance error over 15 queries for di!erent partition numbers and membership functions.

Table 4Execution times of the fuzzy representation for di!erent mem-bership functions and partition numbers

Membership function type

Partionnumber Q

Triangular(ms)

Trapezoid(ms)

Quadratic(ms)

Binary(ms)

2 0.11 0.10 0.13 0.103 1.32 1.23 1.98 1.024 5.45 5.21 7.23 4.985 20.67 20.02 26.65 19.86

of the partition numbers. Similar results are pre-sented in Fig. 10(b) where the average error E

Bover

all 15 queries is presented. Consequently, triangularmembership functions and partition number Q"3provides the best performance.

Table 4 shows the execution times for di!erentpartitions Q and membership functions to indicatethe complexity of the proposed fuzzy representa-tion scheme. The results have been obtained ona Sun Ultra 10 system using ¸#"6 color and¸."5 motion properties. As expected, the com-putational load increases exponentially with re-spect to partition numbers. Instead, the timesremain almost the same for di!erent membershipfunctions with functions of high order presentingthe highest load. This is due to the fact that in theimplementation additional costs are involvedwhich are the same for all function types such ascomparisons, procedure callings and so on. How-ever, the execution times remain very small, just fewms, even for large partition numbers (Q"5).A comparison of the execution times for fuzzy rep-resentation (in the case of Q"3 and triangularmembership functions) with the required times forcolor segmentation is presented in Table 1 at di!er-ent image sizes. It seems that the required time forsegmentation is much greater compared to the timeof fuzzy representation especially in cases of images

of large size. Moreover, large Q increases exponen-tially the storage requirements.

6. Experimental results

The proposed fuzzy representation of visual con-tent has been evaluated both for video summariz-ation and content-based indexing and retrieval,using a large database consisting of MPEG codedvideo sequences and several images compressed inJPEG format. The Optibase Fusion MPEG en-coder at a bit-rate of 2 Mbits/s has been used toencode the video sequences.

1062 A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067

Fig. 11. A video shot about the space consisting of 334 frames, shown with one frame out of every 10.

Fig. 12. The six key-frames selected using the genetic algorithm, the logarithmic approach and the method of [23].

Fig. 11 illustrates a shot used to demonstrate theperformance of the key-frame extraction algorithm.The shot comes from an educational series aboutthe space and consists of N

F"324 frames. One out

of every 10 frames is depicted, resulting in 32 framethumbnails. Fig. 12 presents the six extracted key-frames of this shot obtained from the minimizationof the cross correlation criterion of (7) using the GAapproach. The parameters selected in Section 4.2

are used in this case, while KF"6. The number of

key-frames extracted is estimated using the MDLcriterion described in [12]. Key-frame selection isalso compared with the logarithmic method pre-sented in [3]. The respective results are presented inFig. 12 for the same shot. It is observed that the GAapproach provides a much better description ofthe visual content compared to the logarithmicmethod. In particular, the "rst three and the last

A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067 1063

Fig. 14. An image query; (a) the submitted image; (b) the best "ve retrieved images.

Fig. 13. Histogram of correlation measure R(x) together withthe minimum value obtained for the genetic algorithm (solidline) and the logarithmic method (dotted line).

two frames present similar visual content, whilethere is no key-frame representing the last action ofthe shot. Finally, the method of [23] is used forcomparing the proposed video summarizationscheme. In this case, key-frames are extracted attime instances when the accumulated di!erences ofthe DC images exceed a pre-determined threshold[23]. This method highly depends on the selectionof a threshold value, which is, in general an ad hocprocess. In our case, the threshold has been tunedso that the average number of key-frames extractedfor the whole sequence is the same as that of theproposed method. However, using this threshold,only three key-frames are extracted from the pre-vious shot, which are not adequate for describingthe visual content. Particularly, two similar frameshave been extracted, while there is no key-frame forthe last action of the sequence.

In order to estimate the e$ciency of the algo-rithm in terms of the obtained correlation measureR(x), a test of 100,000 random index vectors isperformed, and a histogram of R(x) is constructed,as depicted in Fig. 13. The optimal value of R(x)obtained through the genetic algorithm (verticalsolid line) and the logarithmic approach (verticaldotted line) is presented in this "gure. As observed,the minimum value obtained through the geneticalgorithm is lower than those of the random testand the logarithmic approach. It should be men-tioned that the random test requires about 100

times more computational time, while the logarith-mic approach is 3.55 times slower [3].

Content-based retrieval is next examined. InFig. 14(a) an image of a space shuttle is submittedas user's query. The retrieval results are displayedin Fig. 14(b), using the fuzzy parameters selected inSection 5.1. Fig. 15 presents a comparison of theproposed method with two other methods; binaryclassi"cation and the traditional method of colorhistogram [21] (Fig. 15(b and c)). The comparisonis also performed quantitatively using both the

1064 A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067

Fig. 15. Comparison of the proposed method with other techniques for the best "ve retrieved images of the query of Fig. 14: (a) theproposed fuzzy classi"cation; (b) binary classi"cation; (c) color histogram.

errors EA

and EB

over all the 15 images queries ofthe experiment carried out in Section 5.1. The re-sults for binary classi"cation are illustrated inFig. 10, where it can be seen that the averageperformance error is higher in all cases comparedto the fuzzy approach. For the color histogrammethod we have measured an average errorEA"0.52 and E

B"0.24. As can be seen by com-

paring these values with that presented in Fig. 10,the color histogram performance is worse for anypartition number and membership function sinceonly the global image characteristics are taken intoconsideration. The computational cost for binaryclassi"cation is very small and the total cost ismainly a!ected by the segmentation load, resultingin a similar cost to the fuzzy approach (Tables1 and 4). Instead, the color histogram method de-mands smaller computational load compared tosegmentation as Table 1 shows. However, the addi-tional cost of the proposed method is justi"ed by itsbetter performance.

7. Conclusions

A new approach for e$cient visual content rep-resentation has been presented in this paper. Inparticular, in the proposed framework, the tradi-tional pixel-based representation of visual contentis transformed to a fuzzy feature-based one, whichis more suitable for the new emerging multimediaapplications, such as video browsing, content-based image indexing and retrieval and video sum-marization. First, an analysis of video sequences isperformed by applying a color/motion segmenta-tion algorithm. Then, the extracted segment prop-erties are classi"ed using fuzzy formulation. Thisrepresentation is applied for both video summariz-ation and content-based retrieval. In the "rst ap-plication, a genetic algorithm has been proposed toe$ciently extract a limited but meaningful set ofkey-frames. Experimental results indicate that thisapproach outperforms the other methods forboth accuracy and computational e$ciency. In the

A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067 1065

second case, the problem of content-based retrievalis investigated. The results indicate that the fuzzyrepresentation provides a more compact des-cription of the visual content resulting in betterretrieving performance compared to traditionaltechniques. Furthermore, the proposed methodprovides a more natural interpretation of the visualcontent giving to the users new capabilities of per-forming their queries. For example, we can seek forlarge objects or for rapid moving segments in con-trast to traditional methods where only the globalcolor or motion information is taken into consid-eration.

Another issue for further investigation is the de-velopment of the fuzzy adaptive mechanism forestimating the distance weights. This approach in-creases the system accuracy and simultaneouslyleads to solutions closer to the user's needs. Thiscan be accomplished by interpreting as fuzzy dens-ities the subjective quality of an image given bya human observer and then constructing a fuzzymeasure for weight estimation.

Acknowledgements

The authors would like to thank GeorgiosAkrivas, for providing them with an e$cient imple-mentation of the key-frame selection techniquepresented in [23].

References

[1] A. Alatan, L. Onural, M. Wollborn, R. Mech, E.Tuncel, T.Sikora, Image sequence analysis for emerging interactivemultimedia services-the European cost 211 framework,IEEE Trans. Circuits Systems Video Technol. 8 (7) (No-vember 1998) 802}813.

[2] Y. Avrithis, A. Doulamis, N.D. Doulamis, S. Kollias, Anadaptive approach to video indexing and retrieval usingfuzzy classi"cation, Proceedings of Work. Very Low BitRate Video Coding (VLBV), Urbana-Champaign, IL, Oc-tober 1998.

[3] Y. Avrithis, A. Doulamis, N. Doulamis, S. Kollias,A stochastic framework for optimal key frame extractionfrom MPEG video databases, Comput. Vision Image Un-derstanding, 75 (1) (July 1999) 3}24.

[4] S.-F. Chang, W. Chen, H.J. Meng, H. Sundaram, D.Zhong, A fully automated content-based video search

engine supporting spatiotemporal queries, IEEE Trans.Circuits Systems Video Technol. 8 (5) (September 1998)602}615.

[5] A. Del Bimbo, P. Pala, Visual image retrieval by elasticmatching of user sketches, IEEE Trans. Pattern Anal.Mach. Intell. (PAMI) 19 (2) (February 1997) 121}132.

[6] N. Doulamis, A. Doulamis, D. Kalogeras, S. Kollias, Verylow bit-rate coding of image sequences using adaptiveregions of interest, IEEE Trans. Circuits Systems VideoTechnol. 8 (8) (December 1998) 928}934.

[7] A. Doulamis, N. Doulamis, S. Kollias, On line retrainableneural networks: improving the performance of neuralnetworks in image analysis problems, IEEE Trans. NeuralNetworks 11 (1) (January 2000) 137}155.

[8] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q.Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Pet-kovic, D. Steele, P. Yanker, Query by image and videocontent: the QBIC system, IEEE Comput. Mag. (Septem-ber 1995) 23}32.

[9] L. Garrido, F. Marques, M. Pardas, P. Salembier, V.Vilaplana, A hierarchical technique for image sequenceanalysis, in Proceedings of Workshop on Image Analysisfor Multimedia Interactive Services (WIAMIS), Louvain-la-Neuve, Belgium, June 1997, pp. 13}20.

[10] V.N. Gudivada, J.V. Raghavan (Eds.), Special Issue onContent-Based Image Retrieval Systems, IEEE Comput.Mag. 28 (9) (1995).

[11] A. Hamrapur, A. Gupta, B. Horowitz, C.F. Shu, C. Fuller,J. Bach, M. Gorkani, R. Jain, Virage video engine, SPIEProceedings of Storage and Retrieval for Video and ImageDatabases V, San Jose, CA, February 1997, pp. 188}197.

[12] S. Haykin, Adaptive Filter Theory, Prentice-Hall, Engle-wood Cli!s, NJ, 1996.

[13] M. Irani, P. Anandan, Video indexing based on mosaicrepresentation, Proc. IEEE 86 (5) (May 1998) 805}921.

[14] B. Kosko, Neural networks and fuzzy systems, A Dynam-ical Systems Approach to Machine Intelligence, Prentice-Hall, Englewood Cli!s, NJ, 1992.

[15] H.-C. Lin, L.-L. Wang, S.-N. Yang, Color image retrievalbased on hidden markov models, IEEE Trans. ImageProces. 6 (2) (February 1997) 332}339.

[16] Z. Michalewicz, Genetic Algorithms#Data Structure"Evolution Programs, Springer, Berlin, 1992.

[17] MPEG-7: Context and Objectives (v.5). ISO/IEC/JTC1/SC29/WG11 N1920 MPEG-7, October 1997.

[18] P.J. Mulroy, Video content extraction: review of currentautomatic segmentation algorithms, Proceedings ofWorkshop on Image Analysis and Multimedia InteractiveSystems (WIAMIS), Louvain-la-Neuve, Belgium, June1997.

[19] K.N. Ngan, S. Panchanathan, T. Sikora, M.-T. Sun (GuestEds.), Special Issue on Segmentation, Description and Re-trieval of Video Content, IEEE Trans. Circuits SystemsVideo Technnol. 8 (5) (September 1998).

[20] P. Salembier, M. Pardas, Hierarchical morphological seg-mentation for image sequence coding, IEEE Trans. ImageProcess. 3 (5) (September 1994) 639}651.

1066 A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067

[21] J.R. Smith, S.F. Chang, VisualSEEk: a fully automatedcontent-based image query system, Proceedings of ACMMultimedia Conferance, Boston, MA, November 1996, pp.87}98.

[22] X. Wan, C.-C. J. Kuo, A new approach to image retrievalwith hierarchical color clustering, IEEE Trans. CircuitsSystems. Video Technol. 8 (5) (September 1998) 628}643.

[23] B.L. Yeo, B. Liu, Rapid scene analysis on compressedvideos, IEEE Trans. Circuits Systems Video Technol.5 (December 1995) 533}544.

[24] M.M. Yeung, B.-L. Yeo, Video visualization for compactpresentation and fast browsing of pictorial content, IEEETrans. Circuits Systems Video Technol. 7 (5) (October1997) 771}785.

A.D. Doulamis et al. / Signal Processing 80 (2000) 1049}1067 1067