Modélisation et traitement de requêtes images complexes

217
N° d’ordre 2003ISAL0032 Année 2003 Thèse Modélisation et traitement de requêtes images complexes présentée devant L’Institut National des Sciences Appliquées de Lyon pour obtenir Le Grade de Docteur École doctorale : Informatique et Information pour la société (EDIIS) Spécialité : Documents Multimédia, Images et Systèmes D’Information Communicants (DISIC) Par SOLOMON ATNAFU BESUFEKAD (DEA en Informatique) Soutenue le 15 juillet 2003 devant la Commission d’examen Jury Christine Collet Professeur (ENSIMAG, INP Grenoble ), Rapporteur Kokou Yetongnon Professeur (Université de Bourgogne), Rapporteur Jacques Kouloumdjian Professeur émérite (INSA de Lyon), Examinateur Lionel Brunie Professeur (INSA de Lyon), Directeur de thèse Harald Kosch Professeur (Université de Klagenfurt, Autriche), Examinateur Michel Simonet Chargé de recherche et habilité (lab. TIMC de Grenoble), Examinateur

Transcript of Modélisation et traitement de requêtes images complexes

N° d’ordre 2003ISAL0032 Année 2003

Thèse

Modélisation et traitement de requêtes images complexes

présentée devant L’Institut National des Sciences Appliquées de Lyon

pour obtenir

Le Grade de Docteur

École doctorale : Informatique et Information pour la société (EDIIS) Spécialité : Documents Multimédia, Images et Systèmes D’Information Communicants (DISIC)

Par

SOLOMON ATNAFU BESUFEKAD (DEA en Informatique)

Soutenue le 15 juillet 2003 devant la Commission d’examen

Jury Christine Collet Professeur (ENSIMAG, INP Grenoble ), Rapporteur Kokou Yetongnon Professeur (Université de Bourgogne), Rapporteur Jacques Kouloumdjian Professeur émérite (INSA de Lyon), Examinateur Lionel Brunie Professeur (INSA de Lyon), Directeur de thèse Harald Kosch Professeur (Université de Klagenfurt, Autriche), Examinateur Michel Simonet Chargé de recherche et habilité (lab. TIMC de Grenoble), Examinateur

AA mmeess ppaarreennttss,,

AA mmeess ffrrèèrreess eett ssœœuurrss..

TToo mmyy ppaarreennttss,,

TToo mmyy bbrrootthheerrss aanndd ssiisstteerrss..

MAI 2003

INSTITUT NATIONAL DES SCIENCES APPLIQUEES DE LYON Directeur : STORCK A. Professeurs : AUDISIO S. PHYSICOCHIMIE INDUSTRIELLE BABOT D. CONT. NON DESTR. PAR RAYONNEMENTS IONISANTS BABOUX J.C. GEMPPM*** BALLAND B. PHYSIQUE DE LA MATIERE BAPTISTE P. PRODUCTIQUE ET INFORMATIQUE DES SYSTEMES MANUFACTURIERS BARBIER D. PHYSIQUE DE LA MATIERE BASTIDE J.P. LAEPSI**** BAYADA G. MECANIQUE DES CONTACTS BENADDA B. LAEPSI**** BETEMPS M. AUTOMATIQUE INDUSTRIELLE BIENNIER F. PRODUCTIQUE ET INFORMATIQUE DES SYSTEMES MANUFACTURIERS BLANCHARD J.M. LAEPSI**** BOISSON C. VIBRATIONS-ACOUSTIQUE BOIVIN M. (Prof. émérite) MECANIQUE DES SOLIDES BOTTA H. UNITE DE RECHERCHE EN GENIE CIVIL - Développement Urbain BOTTA-ZIMMERMANN M. (Mme) UNITE DE RECHERCHE EN GENIE CIVIL - Développement Urbain BOULAYE G. (Prof. émérite) INFORMATIQUE BOYER J.C. MECANIQUE DES SOLIDES BRAU J. CENTRE DE THERMIQUE DE LYON - Thermique du bâtiment BREMOND G. PHYSIQUE DE LA MATIERE BRISSAUD M. GENIE ELECTRIQUE ET FERROELECTRICITE BRUNET M. MECANIQUE DES SOLIDES BRUNIE L. INGENIERIE DES SYSTEMES D’INFORMATION BUREAU J.C. CEGELY* CAVAILLE J.Y. GEMPPM*** CHANTE J.P. CEGELY*- Composants de puissance et applications CHOCAT B. UNITE DE RECHERCHE EN GENIE CIVIL - Hydrologie urbaine COMBESCURE A. MECANIQUE DES CONTACTS COUSIN M. UNITE DE RECHERCHE EN GENIE CIVIL - Structures DAUMAS F. (Mme) CENTRE DE THERMIQUE DE LYON - Energétique et Thermique DOUTHEAU A. CHIMIE ORGANIQUE DUFOUR R. MECANIQUE DES STRUCTURES DUPUY J.C. PHYSIQUE DE LA MATIERE EMPTOZ H. RECONNAISSANCE DE FORMES ET VISION ESNOUF C. GEMPPM*** EYRAUD L. (Prof. émérite) GENIE ELECTRIQUE ET FERROELECTRICITE FANTOZZI G. GEMPPM*** FAVREL J. PRODUCTIQUE ET INFORMATIQUE DES SYSTEMES MANUFACTURIERS FAYARD J.M. BIOLOGIE FONCTIONNELLE, INSECTES ET INTERACTIONS FAYET M. MECANIQUE DES SOLIDES FERRARIS-BESSO G. MECANIQUE DES STRUCTURES FLAMAND L. MECANIQUE DES CONTACTS FLORY A. INGENIERIE DES SYSTEMES D’INFORMATIONS FOUGERES R. GEMPPM*** FOUQUET F. GEMPPM*** FRECON L. REGROUPEMENT DES ENSEIGNANTS CHERCHEURS ISOLES GERARD J.F. INGENIERIE DES MATERIAUX POLYMERES GERMAIN P. LAEPSI**** GIMENEZ G. CREATIS** GOBIN P.F. (Prof. émérite) GEMPPM*** GONNARD P. GENIE ELECTRIQUE ET FERROELECTRICITE GONTRAND M. PHYSIQUE DE LA MATIERE GOUTTE R. (Prof. émérite) CREATIS** GOUJON L. GEMPPM*** GOURDON R. LAEPSI****. GRANGE G. GENIE ELECTRIQUE ET FERROELECTRICITE GUENIN G. GEMPPM***

GUICHARDANT M. BIOCHIMIE ET PHARMACOLOGIE GUILLOT G. PHYSIQUE DE LA MATIERE GUINET A. PRODUCTIQUE ET INFORMATIQUE DES SYSTEMES MANUFACTURIER GUYADER J.L. VIBRATIONS-ACOUSTIQUE GUYOMAR D. GENIE ELECTRIQUE ET FERROELECTRICITE HEIBIG A. MATHEMATIQUE APPLIQUEES DE LYON JACQUET-RICHARDET G. MECANIQUE DES STRUCTURES JAYET Y. GEMPPM*** JOLION J.M. RECONNAISSANCE DE FORMES ET VISION JULLIEN J.F. UNITE DE RECHERCHE EN GENIE CIVIL - Structures JUTARD A. (Prof. émérite) AUTOMATIQUE INDUSTRIELLE KASTNER R. UNITE DE RECHERCHE EN GENIE CIVIL - Géotechnique KOULOUMDJIAN J. INGENIERIE DES SYSTEMES D’INFORMATION LAGARDE M. BIOCHIMIE ET PHARMACOLOGIE LALANNE M. (Prof. émérite) MECANIQUE DES STRUCTURES LALLEMAND A. CENTRE DE THERMIQUE DE LYON - Energétique et thermique LALLEMAND M. (Mme) CENTRE DE THERMIQUE DE LYON - Energétique et thermique LAUGIER A. PHYSIQUE DE LA MATIERE LAUGIER C. BIOCHIMIE ET PHARMACOLOGIE LAURINI R. INFORMATIQUE EN IMAGE ET SYSTEMES D’INFORMATION LEJEUNE P. UNITE MICROBIOLOGIE ET GENETIQUE LUBRECHT A. MECANIQUE DES CONTACTS MASSARD N. INTERACTION COLLABORATIVE TELEFORMATION TELEACTIVITE MAZILLE H. PHYSICOCHIMIE INDUSTRIELLE MERLE P. GEMPPM*** MERLIN J. GEMPPM*** MIGNOTTE A. (Mle) INGENIERIE, INFORMATIQUE INDUSTRIELLE MILLET J.P. PHYSICOCHIMIE INDUSTRIELLE MIRAMOND M. UNITE DE RECHERCHE EN GENIE CIVIL - Hydrologie urbaine MOREL R. MECANIQUE DES FLUIDES ET D’ACOUSTIQUES MOSZKOWICZ P. LAEPSI**** NARDON P. (Prof. émérite) BIOLOGIE FONCTIONNELLE, INSECTES ET INTERACTIONS NIEL E. AUTOMATIQUE INDUSTRIELLE NORTIER P. DREP ODET C. CREATIS** OTTERBEIN M. (Prof. émérite) LAEPSI**** PARIZET E. VIBRATIONS-ACOUSTIQUE PASCAULT J.P. INGENIERIE DES MATERIAUX POLYMERES PAVIC G. VIBRATIONS-ACOUSTIQUE PELLETIER J.M. GEMPPM*** PERA J. UNITE DE RECHERCHE EN GENIE CIVIL - Matériaux PERRIAT P. GEMPPM*** PERRIN J. INTERACTION COLLABORATIVE TELEFORMATION TELEACTIVITE PINARD P. (Prof. émérite) PHYSIQUE DE LA MATIERE PINON J.M. INGENIERIE DES SYSTEMES D’INFORMATION PONCET A. PHYSIQUE DE LA MATIERE POUSIN J. MODELISATION MATHEMATIQUE ET CALCUL SCIENTIFIQUE PREVOT P. INTERACTION COLLABORATIVE TELEFORMATION TELEACTIVITE PROST R. CREATIS** RAYNAUD M. CENTRE DE THERMIQUE DE LYON - Transferts Interfaces et Matériaux REDARCE H. AUTOMATIQUE INDUSTRIELLE RETIF J-M. CEGELY* REYNOUARD J.M. UNITE DE RECHERCHE EN GENIE CIVIL - Structures RIGAL J.F. MECANIQUE DES SOLIDES RIEUTORD E. (Prof. émérite) MECANIQUE DES FLUIDES ROBERT-BAUDOUY J. (Mme) (Prof. émérite) GENETIQUE MOLECULAIRE DES MICROORGANISMES ROUBY D. GEMPPM*** ROUX J.J. CENTRE DE THERMIQUE DE LYON – Thermique de l’Habitat RUBEL P. INGENIERIE DES SYSTEMES D’INFORMATION SACADURA J.F. CENTRE DE THERMIQUE DE LYON - Transferts Interfaces et Matériaux SAUTEREAU H. INGENIERIE DES MATERIAUX POLYMERES SCAVARDA S. AUTOMATIQUE INDUSTRIELLE SOUIFI A. PHYSIQUE DE LA MATIERE SOUROUILLE J.L. INGENIERIE INFORMATIQUE INDUSTRIELLE THOMASSET D. AUTOMATIQUE INDUSTRIELLE THUDEROZ C. ESCHIL – Equipe Sciences Humaines de l’Insa de Lyon UBEDA S. CENTRE D’INNOV. EN TELECOM ET INTEGRATION DE SERVICES VELEX P. MECANIQUE DES CONTACTS VIGIER G. GEMPPM*** VINCENT A. GEMPPM***

VRAY D. CREATIS** VUILLERMOZ P.L. (Prof. émérite) PHYSIQUE DE LA MATIERE Directeurs de recherche C.N.R.S. : BAIETTO-CARNEIRO M-C. (Mme) MECANIQUE DES CONTACTS ET DES SOLIDES BERTHIER Y. MECANIQUE DES CONTACTS CONDEMINE G. UNITE MICROBIOLOGIE ET GENETIQUE COTTE-PATAT N. (Mme) UNITE MICROBIOLOGIE ET GENETIQUE ESCUDIE D. (Mme) CENTRE DE THERMIQUE DE LYON FRANCIOSI P. GEMPPM*** MANDRAND M.A. (Mme) UNITE MICROBIOLOGIE ET GENETIQUE POUSIN G. BIOLOGIE ET PHARMACOLOGIE ROCHE A. INGENIERIE DES MATERIAUX POLYMERES SEGUELA A. GEMPPM*** Directeurs de recherche I.N.R.A. : FEBVAY G. BIOLOGIE FONCTIONNELLE, INSECTES ET INTERACTIONS GRENIER S. BIOLOGIE FONCTIONNELLE, INSECTES ET INTERACTIONS RAHBE Y. BIOLOGIE FONCTIONNELLE, INSECTES ET INTERACTIONS Directeurs de recherche I.N.S.E.R.M. : PRIGENT A.F. (Mme) BIOLOGIE ET PHARMACOLOGIE MAGNIN I. (Mme) CREATIS**

*GEMPPM GROUPE D’ETUDE METALLURGIE PHYSIQUE ET PHYSIQUE DES MATERIAUX ** CREATIS CENTRE DE RECHERCHE ET D’APPLICATIONS EN TRAITEMENT DE L’IMAGE ET DU SIGNAL ***LAEPSI LABORATOIRE d’ANALYSE ENVIRONNEMENTALE DES PROCEDES ET SYSTEMES INDUSTRIELS ****CEGELY CENTRE DE GENIE ELECTRIQUE DE LYON

INSA DE LYON DEPARTEMENT DES ETUDES DOCTORALES ET RELATIONS INTERNATIONALES SCIENTIFIQUES OCTOBRE 2001

Ecoles Doctorales et Diplômes d’Etudes Approfondies

HABILITES POUR LA PERIODE 1999-2003

ECOLES DOCTORALES n° code national

RESPONSABLE

PRINCIPAL

CORRESPONDANT

INSA

DEA INSA

n° code national

RESPONSABLE

DEA INSA

Chimie Inorganique

910643

M. J.F. QUINSON Tél 83.51 Fax 85.28

Sciences et Stratégies Analytiques 910634

CHIMIE DE LYON

(Chimie, Procédés,

Environnement)

EDA206

M. D. SINOU UCBL1 04.72.44.62.63 Sec 04.72.44.62.64 Fax 04.72.44.81.60

M. P. MOSZKOWICZ 83.45 Sec 84.30 Fax 87.17

Sciences et Techniques du Déchet 910675

M. P. MOSZKOWICZ Tél 83.45 Fax 87.17

Villes et Sociétés

911218

Mme M. ZIMMERMANN Tél 84.71 Fax 87.96

ECONOMIE, ESPACE ET

MODELISATION DES

COMPORTEMENTS

(E2MC)

EDA417

M.A. BONNAFOUS LYON 2 04.72.72.64.38 Sec 04.72.72.64.03 Fax 04.72.72.64.48

Mme M. ZIMMERMANN 84.71 Fax 87.96

Dimensions Cognitives et Modélisation

992678

M. L. FRECON Tél 82.39 Fax 85.18

Automatique Industrielle

910676

M. M. BETEMPS Tél 85.59 Fax 85.35

Dispositifs de l’Electronique Intégrée

910696

M. D. BARBIER Tél 85.47 Fax 60.81

Génie Electrique de Lyon

910065

M. J.P. CHANTE Tél 87.26 Fax 85.30

ELECTRONIQUE,

ELECTROTECHNIQUE, AUTOMATIQUE

(E.E.A.)

EDA160

M. G. GIMENEZ INSA DE LYON 83.32 Fax 85.26

Images et Systèmes

992254

Mme I. MAGNIN Tél 85.63 Fax 85.26

EVOLUTION,

ECOSYSTEME, MICROBIOLOGIE ,

MODELISATION

(E2M2)

EDA403

M. J.P FLANDROIS UCBL1 04.78.86.31.50 Sec 04.78.86.31.52 Fax 04.78.86.31.49

M. S. GRENIER 79.88 Fax 85.34

Analyse et Modélisation des Systèmes Biologiques

910509

M. S. GRENIER Tél 79.88 Fax 85.34

Documents Multimédia, Images et Systèmes d’Information Communicants

992774

M. A. FLORY Tél 84.66 Fax 85.97

INFORMATIQUE

ET INFORMATION POUR LA SOCIETE

(EDIIS)

EDA 407

M. J.M. JOLION INSA DE LYON 87.59 Fax 80.97

Extraction des Connaissances à partir des Données

992099

M. J.F. BOULICAUTTél 89.05 Fax 87.13

ECOLES

DOCTORALES n° code national

RESPONSABLE

PRINCIPAL

CORRESPONDANT

INSA

DEA INSA

n° code national

RESPONSABLE

DEA INSA

Informatique et Systèmes Coopératifs pour l’Entreprise

950131

M. A. GUINET Tél 85.94 Fax 85.38

INTERDISCIPLINAI

RE SCIENCES-SANTE

(EDISS)

EDA205

M. A.J. COZZONE UCBL1 04.72.72.26.72 Sec 04.72.72.26.75 Fax 04.72.72.26.01

M. M. LAGARDE 82.40 Fax 85.24

Biochimie

930032

M. M. LAGARDE Tél 82.40 Fax 85.24

Génie des Matériaux : Microstructure, Comportement Mécanique, Durabilité

910527

M. J.M.PELLETIER Tél 83.18 Fax 85.28

Matériaux Polymères et Composites

910607

M. H. SAUTEREAU Tél 81.78 Fax 85.2

MATERIAUX DE

LYON

UNIVERSITE LYON 1

EDA 034

M. J. JOSEPH ECL 04.72.18.62.44 Sec 04.72.18.62.51 Fax 04.72.18.60.90

M. J.M. PELLETIER 83.18 Fax 84.29

Matière Condensée, Surfaces et Interfaces

910577

M. G. GUILLOT Tél 81.61 Fax 85.31

MATHEMATIQUES

ET INFORMATIQUE

FONDAMENTALE

(Math IF)

EDA 409

M. NICOLAS UCBL1 04.72.44.83.11 Fax 04.72.43.00.35

M. J. POUSIN 88.36 Fax 85.29

Analyse Numérique, Equations aux dérivées partielles et Calcul Scientifique

910281

M. G. BAYADA Tél 83.12 Fax 85.29

Acoustique

910016

M. J.L. GUYADER Tél 80.80 Fax 87.12

Génie Civil

992610

M. J.J.ROUX Tél 84.60 Fax 85.22

Génie Mécanique

992111

M. G. DALMAZ Tél 83.03 Fax 04.78.89.09.80

MECANIQUE,

ENERGETIQUE, GENIE CIVIL, ACOUSTIQUE

(MEGA)

EDA162

M. J. BATAILLE ECL 04.72.18.61.56 Sec 04.72.18.61.60 Fax 04.78.64.71.45

M. G.DALMAZ 83.03 Fax 04.72.89.09.80

Thermique et Energétique

910018

Mme. M. LALLEMAND Tél 81.54 Fax 60.10

En grisé : Les Ecoles doctorales et DEA dont l’INSA est établissement principal

Remerciements Tout d’abord ma profonde gratitude et mes très sincères remerciements vont au Prof.

Lionel Brunie pour avoir accepté d’être mon directeur de thèse, pour son excellent encadrement académique et scientifique, pour son humanité, et pour sa volonté de m’assister à tout moment. Un grand merci également à sa famille pour leur hospitalité magnifique.

Je remercie sincèrement la coopération Franco-Ethiopienne pour m'avoir offert la chance et le financement pour faire mes études de doctorant en France.

Je remercie Prof. Christine Collet (ENSIMAG, INP Grenoble) et Prof. Kokou Yetongnon (Université de Bourgogne) pour avoir bien voulu être rapporteurs de ma thèse et Prof. Jacques Kouloumdjian (INSA de Lyon), Prof. Harald Kosch (Université de Klagenfurt, Autriche), et Dr. (habilité) Michel Simonet (lab. TIMC de Grenoble) de m’avoir fait l’honneur de participer à mon jury de thèse.

Pendant toute la durée de ma thèse, j’ai eu l’occasion de travailler avec un bon nombre de personnes. Donc, je remercie Prof. Harald Kosch, Dr. Richard Chbeir, et David Coquil pour nos utiles discussions et pour leur soutien à mes études.

Ma profonde sympathie va à tous les membres du LIRIS qui m’ont soutenu dans ma vie quotidienne au labo et à tous les chercheurs de l’équipe Brunie (Jean-Marc Pierson, David, Zouheir, Fernando, Girma, Ludwig, Sonia, Amine), ainsi qu’à Rami, Nathalie Chbeir, Badr, Ahmed, Abraham, et Régine Tribollet.

Merci aux membres du département d’informatique et du département de mathématiques de l’Université de Addis Ababa qui ont facilité mes études. Merci à Mulugeta Libsie pour ses encouragements et pour ses commentaires utiles sur le rapport final de ma thèse, ainsi qu’à Tesfa Biset, pour ses conseils utiles et ses encouragements.

Ma gratitude et ma profonde sympathie vont aussi à la communauté éthiopienne de Lyon et des environs pour m’avoir apporté tout le soutien dont j’avais besoin.

Un grand merci aussi à mon frère Afework Atnafu qui malgré la distance entre nous a toujours été avec moi, avec son inlassable soutien à tous points de vue et ses encouragements. Merci à ma sœur Bereket Atnafu et à sa famille, et à mon oncle Ketema Besufekad et à sa famille pour leurs encouragements et leur soutien.

Mes derniers mais non-moins importants remerciements iront à mes parents, et aux autres membres de ma famille, qui ont toujours été à mes cotés et qui m’ont encouragé à atteindre le but que je m'étais fixé.

Acknowledgements First of all, my deep gratitude and sincere thanks goes to Prof. Lionel Brunie for he

who agreed to be my advisor, for his excellent academic and scientific advises, for his human personality, and for his good will to assist me in all respects. My thanks goes also to his family for their hospitality. Thanks to them, I had a pleasure to know such a good family.

My sincere thanks to the Franco-Ethiopian cooperation to have offered me the chance and the finance for perusing my Ph.D. studies in France.

I thank Prof. Christine Collet (ENSIMAG, INP Grenoble) and Prof. Kokou Yetongnon (University of Burgundy) to have agreed to be rapporteurs of my thesis and Prof. Jacques Kouloumdjian (INSA de Lyon), Prof. Harald Kosch (University of Klagenfurt, Austria), and Dr. (habilite) Michel Simonet (lab. TIMC of Grenoble) for having their honor to be part of the Ph.D. examination board.

Throughout my thesis, I had the occasion to work with a number of people. Thus, I thank Prof. Harald Kosch, Dr. Richard Chbeir, and David Coquil for the useful discussions we had and for their contributions to my study.

My thanks to those members of the Department of Mathematics and the Department of Computer Science of Addis Ababa University, who facilitated my PhD studies. I sincerely thank Mulugeta Libsie for his encouragements and his useful comments on the final manuscript of my thesis. I thank Tesfa Biset, for his useful advises and his encouragements.

My deep sympathy goes to all the members of LIRIS who in one way or another had been good to me, to all the researchers of the team of Brunie (Jean-Marc Pierson, David, Zouheir, Fernando, Girma, Ludwig, Sonia, Amine), and to all others (Rami, Nathalie Chbeir, Badr, Ahmed, Abraham, et Régine Tribollet) whose friendship was invaluable to me.

My gratitude and my deep sympathy also goes to the Ethiopian community in Lyon and its environs to have offered me all the support that I needed during my stay in France.

I also would like to thank my brother Afework Atnafu who, in spite of the long distance between us, supported me in many respects and who was always there to assist me. Thanks to my uncle Ketema Besufekad and his family, my sister Bereket Atnafu and her family, and the rest of my brothers and my sister who did all their best in many respects so that I can attain my goals.

My last but not least thanks goes to my parents, who gave me all they can and made me get the chances which they never had for themselves.

Résume La recherche d’images par le contenu est d’une importance croissante dans de nombreux domaines d’application tels que l’informatique médicale, la cartographie, la météorologie etc. Pour cette raison, le besoin de systèmes capables d’indexer, de stocker et de retrouver efficacement des images en fonction de critères bien précis se fait de plus en plus fortement sentir. Ainsi, de nombreux travaux ont été effectués au cours des vingt dernières années dans le but d’accroître l’efficacité des fonctions de recherche d’image et d’intégrer les images dans les environnements standards de traitement de données. Ces travaux ont souvent été menés séparément par les communautés bases de données et reconnaissance de formes et vision, conduisant à deux grandes approches.

Dans une première approche, le contenu des images est représenté par des méta-données et interrogé par des requêtes relationnelles classiques. Si ce type de représentation permet de bénéficier de l’expérience acquise en bases de données relationnelles, il impose cependant un processus d’indexation lent, subjectif et qui ne peut représenter complètement le contenu d’une image.

La communauté reconnaissance de formes, quant à elle, applique aux images des méthodes d’extraction de caractéristiques, souvent spécifiques à un domaine d’application, qui permettent de représenter les images par leurs attributs physiques tels que la couleur, la texture, la forme, etc. La recherche d’images similaires s’effectue alors à l’aide d’algorithmes de comparaison de caractéristiques permettant des mises en correspondance non exactes entre images. Cette approche a permis d’obtenir des résultats prometteurs qui ont donné une grande importance à ce domaine de recherche. Elle souffre cependant de deux défauts importants : d’une part, les informations sémantiques et contextuelles associées aux images ne sont souvent pas prises en compte ; d’autre part, la problématique de la gestion globale des images est généralement laissée en suspens : au pire, les systèmes implémentés utilisent de simples répertoires ; au mieux, ils sont associés comme composants externes (plug-in) à des systèmes de gestion de bases de données sans véritable intégration formelle ni logicielle.

La plupart des systèmes existants ont, en particulier, le défaut de n’offrir aucun cadre formel pour les requêtes dites hybrides. Or, disposer d’un cadre formel bien fondé est non seulement important sur un plan théorique mais aussi indispensable si l’on entend traiter efficacement de telles requêtes. Définir un tel cadre formel impose de concevoir et de mettre en œuvre un modèle de données adapté aux images, un modèle de stockage de données images efficace, une algèbre permettant de décrire des opérateurs portant sur le contenu des images et/ou sur le contenu des méta-données, des heuristiques d’optimisation de requêtes et, enfin, d’intégrer l’ensemble de ces composants dans un schéma global cohérent. L’étude de cette problématique et la conception d’un tel cadre formel pour le traitement de requêtes hybrides se situent au cœur de cette thèse.

Dans un premier temps, nous présenterons un modèle de représentation de données images à même de permettre de formuler des requêtes multicritères. Fondés sur ce modèle de représentation, des opérateurs algébriques portant à la fois sur les données images et les méta-données seront présentés et leurs propriétés démontrées. Des heuristiques d’optimisation de requêtes seront ensuite étudiées. Afin de valider expérimentalement la faisabilité et la pertinence des concepts et techniques introduits, nous avons implémenté un prototype de système de gestion d’images médicales appelé EMIMS.

Mots clés : base de données d’images, modèle de données d'image, requête multicritères d’image, opérations fonde sur le continu, algèbre de similarité, EMIMS.

Abstract Image retrieval is of increasing need in many application areas such as medicine, surveillance, cartography, meteorology, etc. Hence, the need for systems that can effectively index, store, and retrieve images according to certain criteria is becoming a key issue. In this regard, many research works were carried out during the last two decades with the aim of increasing the effectiveness of image retrieval systems and the integration of images in the standard data processing environments. These works were often undertaken separately: by the database and by pattern recognition and computer vision communities, in two different approaches.

In the first approach, the images are described by textual metadata-based annotations where queries on images are treated using the traditional relational approach. This type of representation enables to directly apply the experience of relational database. However, this type of image description is subjective, cannot be exhaustive, and is time consuming. The pattern recognition community on the other hand, apply the methods of automatic extraction of the visual features of images such as its color, texture, shape, etc. Image retrieval is then carried out using similarity-based comparison algorithms based on these features. Thanks to the efforts in this domain, this approach is now becoming a promising method of content-based image retrieval. However, the semantic and contextual information associated to the images are often not taken into account. In addition, the issues of effective and formal image data management are usually not considered.

The existing systems, do not offer any formal framework for multi-criteria queries that integrate relational and similarity-based queries on images. Defining such a formal framework requires an essential theoretical study. In dead, it requires to design a convenient data model adapted to images, an effective model for the storage of images and their associated data, an algebraic language that defines operators on the visual contents of the images and/or on the traditional metadata based descriptions of the images, an effective query optimization heuristic, and coherent methods that can integrate all these components. The study of these problems and the design of such a formal framework for the treatment of multi-criteria queries on image data is the objective of this thesis.

Thus, as result of the work conducted in this thesis, we present an image data repository model that permits to formulate multi-criteria queries. Based on this model, we define similarity-based algebraic operators, we study of the properties of these operators, we propose methods to integrate these similarity-based operators with the relational operators, we present useful methods for similarity-based query optimization, and we conducted experimental tests that show the soundness of the methods. Finally, in order to validate our proposals with a practical application, we developed a prototype called Extended Medical Image Management System (MIMS) with which we showed the practicability of our proposals.

Key Word: image database, image data repository model, Similarity-based operators on images, multi-criteria queries on image data, similarity-based algebra, EMIMS.

Table of Contents

Résumé en Français 23

1. Introduction 23

2. Etat de l’art 26

3. Modèle de Représentation de données Images 27 3.1 Modèle 27 3.2 Objets d'Intérêt 28

4. Opérateurs basés sur la similarité 29 4.1 L’Opérateur de sélection basée sur la similarité 30 4.2 L’Opérateur de jointure basée sur la similarité 30 4.3 L’Opérateur de jointure symétrique basée sur la similarité 33 4.4 L’Opérateur Mine 34 4.5 Autres Opérations 37

5. Le prototype EMIMS 37 5.1 Structure d’EMIMS 37 5.2 L’interface de requête 40

6. Conclusion 44

1. Introduction 45

2. Motivation and Problem Identification 53 2.1 Motivation 53

2.2 Problem Identification 56

2.3 Summary 61

3. Related Work 63 3.1 Image Data Models 64

16 Table of Contents

3.2 Image Analysis for Content-based Image Retrieval 68 3.2.1 Feature Extraction 68

3.2.1.1 Color 69 3.2.1.2 Texture 70 3.2.1.3 Shape 71 3.2.1.4 Segmentation 72

3.2.2 Supporting Salient Objects in CBIR 73 3.2.3 Multi-dimensional Indexing 74

3.3 Content-Based Image Retrieval Methods 74

3.4 Content-Based Image Retrieval Systems 76 3.4.1 QBIC 77 3.4.2 The VIR Image Search Engine 78 3.4.3 Photobook 79 3.4.4 MARS 81 3.4.5 Surfimage 83

3.5 Standards Relevant to Image Data Management 84 3.5.1 Image Compression 84 3.5.2 The MPEG Standards 85 3.5.3 Query Language Standards 87 3.5.4 The DICOM Standard 88

3.6 Database Management Systems that Support CBIR 90

3.7 DBMS-Oriented Algebra 92 3.7.1 The Relational Algebra and Query Optimization 93 3.7.2 Similarity-Based Algebra 94

3.8 Summary 95

4. Data Repository Model 99 4.1 An Image Data Repository Model 100

4.2 Supporting Salient Objects in the Model 104

4.3 Object Types to the Image Repository Models: Example Scenario 106

4.4 The Image Data repository Models in Relation to the Image Model 113

4.5 Summary 116

5. Similarity-Based Algebra 117 5.1 Introduction 117

5.2 The Similarity-Based Selection Operator 120

5.3 The Similarity-Based Join Operator 120

Table of Contents 17

5.4 The Symmetric Similarity-Based Join Operator 126

5.5 More Operators Associated to the Similarity-Based Join 129 5.5.1 The Extract Operator 129 5.5.2 The Mine Operator 129

5.6 Other Operators on Image Tables 133

5.7 Relational Operators on Image Tables and on Complex Expressions 136 5.7.1 Relational Selection on an Image Table 136 5.7.2 Relational Join on Image Tables 136 5.7.3 Relational Join Between Image and Relational Tables 138

5.8 Query Expressions Involving both Similarity-Based and Relational Operators 138

5.9 Summary 142

6. Similarity-Based Query Optimization 143 6.1 General Architecture of a Query Optimizer 144

6.2 Transformation Rules and Methods of Similarity-Based Query Optimization 146 6.2.1 Properties of Selection-Based Algebraic Expressions 147 6.2.2 Properties of Similarity-Based Join Algebraic Expressions 150 6.2.3 Methods of Local Optimization 150

6.3 Performance Evaluation Experiment 152

6.4 Summary 155

7. EMIMS (Extended Medical Image Management System) 157 7.1 The Oracle interMedia Module 158

7.1.1 The Similarity-Based Comparison 159 7.1.2 How the Comparison Works 160

7.2 General Architecture of EMIMS under a DBMS 161

7.3 The Structure of EMIMS 162

7.4 The sample database used in EMIMS 166

7.5 The Visual User Interfaces of EMIMS 168 7.5.1 The Medical Data Entry Interface 168

7.5.1.1 The Context/Domain-Oriented Data Entry Panel 168 7.5.1.2 The Image Content Oriented Data Entry Panel 169 7.5.1.3 The Medical Exam Data Entry Panel 170

7.5.2 The Query Interfaces of EMIMS 171 7.5.2.1 The Iconic Query Interface 171 7.5.2.2 The Query by Example (QBE) Interface 174 7.5.2.3 The Multi-Criteria Query Panel 177

18 Table of Contents

7.6 Summary 181

8. Discussion 185 8.1 The Data Repository Model and the P component 186

8.2 The Similarity-Based Algebra and the similarity-based operators 186

8.3 The Methods of Query Optimization on the Similarity-Based Operators 187

8.4 Practical Demonstrations 188

9. Conclusion and Prospectives 191

Annex: Cost Model 195

References 201

List of Figures 19

List of Figures

Figure 2.1: An RGB color histogram representation of a chest medical image. 54 Figure 2.2: Brain MRI, taken on 05/11/1999 that shows a metastasis at the parietal left lobe. 58 Figure 2.3: Similarity-based join operation requirement on the contents of medical images. 59 Figure 3.1: The VIMSYS Model. 64 Figure 3.2: Semantic schema of the Model. 65 Figure 3.3: An image data model in UML notation. 66 Figure 3.4: Color histogram of an image represented as a 64-dimensional feature vector. 69 Figure 3.5: Sample images where texture features could be more effective than

color features in a CBIR. 71 Figure 3.6: Sample cases for shape description requirements. 72 Figure 3.7: Example of an image (left) and its segmented form (right). 73 Figure 3.8: QBIC’s color search palette (left) and color layout search palette (right)

for user sketch-based query. 78 Figure 3.9: Results of a query from a web demo of Photobook’s facial image search engine.

The result is a search for the image at the top left. 81 Figure 3.10: An initial query result (left) and a refined query result (right) using WebMARS. 82 Figure 3.11: Retrieval for the top left face. Retrieved images are from top left to

bottom right in order of best match. 84 Figure 3.13: A sample DICOM image data file and its components. 90 Figure 4.1: Managing image related data in a table. 104 Figure 4.2: Managing Salient Objects in association with their source images. 106 Figure 5.3: The symmetric similarity-based join: M1⊕ε M2. 127 Figure 5.4: The symmetric similarity-based join: M1⊕ε M2⊕εM3 128 Figure 5.5: Illustration for the Mine operator 131 Figure 6.1: The Flow of Query Processing Steps 144 Figure 6.2: Architecture of A Query Optimizer. 145 Figure 6.3: Time for similarity-based join with varying image table size. 153 Figure 6.4: Symmetric similarity-based join, M1 ⊕ε M2, with two different strategies. 155

20 List of Figures

Figure 7.1: The Architecture of EMIMS under a general DBMS. 162 Figure 7.2: The class diagram of EMIMS in UML representation. 163 Figure 7.3: Screenshot of the Context/Domain-Oriented Data Entry Interface. 169 Figure 7.4: Screenshot of the Image Content Oriented Panel in the Data Entry Interface. 170 Figure 7.5: Screenshot of the Medical Exam Data Entry Interface. 171 Figure 7.6: Screenshots of the Iconic Query Panel of the MIMS

user interface to formulate the above query. 173 Figure 7.7: Screenshot of the QBE Visual Interface of EMIMS. 175 Figure 7.8: Screenshot of the Medical Exam Details of the patient

with Medical Exam Code-661020. 176 Figure 7.9: Screenshot of the Similarity-based Join Query Interface. 178 Figure 7.10: Screenshot of the pop-up window displayed by a click to the button

“view image 3” in the result table of the similarity-based join. 179 Figure 7.11: The content of P for the instance with “id = 6” in the result table of

the join M1⊗εM2 of Fig. 7.9. 180 Figure 7.12: Screenshot of the Symmetric Similarity-based Join with P component displayed. 180 Figure 9.1: A System with a Multi-Criteria query of images. 192

List of Tables 21

List of Tables

Table 1.1: Original Data On Film Annually Worldwide. 47 Table 4.1: Sample structure of the P component (right). 103 Table 5.1: Example Result table for the above query (Query 2 of Chapter 2). 123 Table 6.1: Time (in seconds) for similarity-based join with varying image table size. 152 Table 6.2: Symmetric similarity-based join, M1 ⊕ε M2, with two different strategies. 154

22 List of Tables

RESUME EN FRANÇAIS

1. Introduction

La recherche d’images par le contenu est d’une importance croissante dans de nombreux domaines d’application tels que l’informatique médicale, la cartographie, la météorologie etc. Pour cette raison, le besoin de systèmes capables d’indexer, de stocker et de retrouver efficacement des images en fonction de critères bien précis se fait de plus en plus sentir. Ainsi, de nombreux travaux ont été effectués au cours des deux dernières décennies dans le but d’accroître l’efficacité des fonctions de recherche d’images et d’intégrer les images dans les environnements standards de traitement de données.

Au cours de la dernière décennie, des recherches concernant l’interrogation d’images par le contenu ont été menées séparément par les communautés bases de données et reconnaissance de forme conduisent à deux grandes approches. Dans une première approche, le contenu des images est représenté par des méta données et interrogé par des requêtes relationnelles classiques. Cependant, la définition de ce type de représentation est un processus lent, subjectif et qui ne peut représenter complètement le contenu d’une image ([12, 21]). La communauté reconnaissance de forme applique aux images de méthodes d’extraction de caractéristiques qui permettent de les représenter par leurs attributs physiques tels que la couleur, la texture, la forme, etc. ([2, 4, 18]). Ces représentations de bas niveau sont obtenues en appliquant aux images des algorithmes d’extraction de caractéristiques souvent spécifiques à un domaine d’application. La recherche d’images similaires s’effectue alors à l’aide d’algorithmes de comparaison de caractéristiques. Ces algorithmes ont l’inconvénient de produire des mises en correspondance non-exactes entre les images. De plus, les informations sémantiques et contextuelles associées aux images ne sont pas prises en compte par ces méthodes. Cependant, les chercheurs en reconnaissance de forme ont obtenu des résultats prometteurs qui ont donné une grande importance à ce domaine.

24 Résumé en Français

L’expression "base de données d’images" a été utilisée dans le titre de nombreux articles dans la littérature([85]). Mais, dans la pratique, la plupart des chercheurs en bases de données n’ont proposé que des prototypes qui s’appuient uniquement sur une description purement textuelle des images. Quant aux systèmes proposés par la communauté reconnaissance de forme, ils ne s’appliquent qu’à des répertoires contenant des milliers d’images, et non à des images stockées dans des bases de données ; le problème important de la gestion des données n’est donc pas réellement traité. A notre connaissance, bien que des efforts pour concevoir de véritables systèmes de gestion de bases de données d’images existent [12, 18, 18, 85], aucun système opérationnel n’est disponible. La construction d’un tel système nécessite un véritable travail de recherche interdisciplinaire.

Ainsi, la plupart des systèmes existants ont le défaut de n’offrir aucun cadre formel qui permette de traiter de véritables requêtes hybrides1. Ce problème inclut à la fois l’absence d’un modèle de données adéquat aux images, d’un modèle de stockage des données images efficace, et d’une algèbre formelle permettant de décrire des opérations basées sur le contenu des images.

Considérons par exemple une application dans le domaine médical. Un médecin ou un chercheur pourrait vouloir formuler une requête du type suivant :

Requête-1) Rechercher toutes les IRM du cerveau, pris entre le 01/01/2000 et le 31/12/2000 dans lesquelles on trouve une « anomalie » (« objet » opaque) similaire à la tumeur présente dans l’image donnée en référence et d’une surface supérieure à 30 mm2.

Une telle requête illustre la nécessité de pouvoir formuler des requêtes qui combinent une recherche par le contenu image avec une sélection relationnelle classique.

D’autres types de requêtes nécessitent des opérations de jointure par le contenu. Prenons l’exemple d’un système de vidéosurveillance. Soient SI et EMP deux tables d’images, définies comme suit :

SI(Photo, Fv, Heure, Date) est une table qui contient des photos prises par une caméra de surveillance à intervalles de temps régulier,

EMP(Photo, Fv, Nome, Fonction, Adresse) est une table d’employés qui contient notamment leurs photographies.

1 Une requête hybride est une requête qui implique à la fois les méta données et par le contenu des

images.

Résumé en Français 25

Dans ces deux tables, Fv est un vecteur de caractéristiques physiques des images ("feature vector").

Dans cette application, un enquêteur peut souhaiter formuler une requête de la forme suivante :

Requête-2) Pour les images de personnes présentes dans SI qui ont été prises le 10 août 2002 entre 14 et 16 heures, retrouver les images d’employés les plus ressemblants ainsi que le nom, fonction et adresse correspondants.

Une telle requête nécessite à la fois une sélection relationnelle sur la table SI et une « jointure basée sur la similarité » entre les tables SI et EMP. Une telle opération de jointure basée sur la similarité n’est proposée par aucun système à l’heure actuelle. En effet, ceux-ci ne permettent que de formuler des requêtes par l’exemple, ou se contentent de proposer la jointure de listes d’images [25, 82, 81] sans tenir compte des contraintes des systèmes de gestion de bases de données.

Afin de pouvoir formuler ce type de requêtes, il est nécessaire de :

- Définir un modèle de description d’image qui intègre à la fois des méta données et des descripteurs obtenus par l’extraction des caractéristiques physiques des images et de leurs objets d’intérêt.

- Définir des méthodes de définition et d’exécution de requêtes comportant une combinaison d’opérateurs relationnels et d'opérateurs basés sur le contenu. Les opérateurs relationnels appliqués aux attributs alphanumériques, ont été formalises depuis longes temps. En revanche, les opérateurs basés sur le contenu n'ont pas encore été formalisés mathématiquement de manier satisfaisante.

L'objet de cette thèse est d'étudier un cadre formel global répondent à cette problématique et d'en valider la faisabilité et la pertinence par le développement d'un prototype expérimental. Ce résumé est organisé selon six sections.

En premier lieu, dans la Section 2, nous présenterons un état de l’art du domaine cible. Dans la Section 3, nous définissons un model de représentation de donnes d'images générique qui permet la représentation des images en utilisant à la fois des méta données et des descripteurs physiques. Ce modèle permet de formuler des requêtes multicritères (requêtes contextuelles, sémantiques et basées sur le contenu) et d’intégrer des opérateurs hétérogènes dans un SGBD. La section 4 présente une formalisation mathématique des opérations basées sur la similarité (en particulier les opérations de jointure et de sélection bases sur la similarité) ainsi qu’une étude de leurs propriétés. Dans la section 5, nous

26 Résumé en Français

décrirons le conception et les fonctionnalités du prototype EMIMS (Extended Medical Image Management System) et faisons une démonstration de l’utilisation des opérateurs basés sur la similarité. Enfin, ce résumé se termine par la présentation d’une conclusion et des perspectives à ces travaux en section 6.

Ces travaux se sont déroules dans le cadre des projets régionaux Rhône-Alps Santé et Haute Performance de Calcul et de Cognition (1998-2000), Systém d'Informations Communicants pour la Médicine (SiCOM, 2001-2003), MEDIGRID (2002-2004)

2. Etat de l’art

L’analyse, la représentation et l’interrogation par le contenu des images ont fait l’objet de nombreux travaux de recherche depuis plus d’une décennie. Ces travaux ont produit des résultats intéressants en ce qui concerne la représentation et la sélection des images par leurs caractéristiques physiques telles que les histogrammes de couleur, la texture, la forme etc. ([18, 42, 17]). Photobook, QBIC, Virage, Netra, Surfimage, VisualSeek, CAFIIR sont parmi les plus importants prototypes de systèmes de recherche d’images basés sur le contenu [12, 21, 50, 104]. Cependant, ces systèmes n’accordent que peu d’importance aux informations sémantiques et contextuelles associées aux images et représentées par des méta données. Jusqu’ici, des opérations complexes basées sur la similarité et la formulation de requêtes hybrides ne sont donc pas supportées. Dans le domaine de la reconnaissance de formes, les travaux de recherche se focalisent essentiellement sur l’optimisation des algorithmes d’extraction et de comparaison des caractéristiques physiques des images [82, 81]. Ainsi, [81] présente un algorithme de jointure par la similarité de deux ensembles d’images et étudie son efficacité, mais ne traite pas de son intégration dans un système de gestion de bases de données ni de l’utilisation complémentaire de méta données pour la recherche d’images multicritères.

Un certain nombre de méthodes pour l’intégration de la recherche d’images basées sur le contenu dans les SGBD standards ont été proposés à la fois dans le cadre de travaux universitaires et industriels (QBIC dans DB2 [50], le moteur VIRAGE d’Oracle [58], le module Excalibur Image DataBlade d’Informix [15]). Cependant, aucun de ces systèmes ne supporte les opérations nécessaires pour la formulation de requêtes multicritères. Ainsi, des opérateurs importants tels que la jointure par la similarité ne sont pas disponibles dans ces systèmes.

En outre, la plupart des systèmes d’interrogation d’images par le contenu existants se concentrent essentiellement sur leurs fonctions d’entrée-sortie sans tenir compte du problème

Résumé en Français 27

de l’optimisation des requêtes basées sur le contenu. Comme démontré par S. Adali et al. [8], l’un des plus importants problèmes est l’absence d’une algèbre pour la représentation de requêtes par le contenu. Cependant, l’algèbre proposée par ces auteurs reste à un niveau d’abstraction élevé afin d’intégrer des mesures de similarité d’images issues d’implémentations hétérogènes. Elle ne permet donc pas de traiter les problèmes de la modélisation, de l’exécution et de l’optimisation de requêtes contenant des opérations basées sur la similarité pas plus qu’une combinaison d’opérations relationnelles et d’opérations basées sur la similarité.

Dans cette thèse, nous étudions comment la formalisation algébrique d’opérateurs de similarité crée un espace algébrique dans lequel il est possible de définir des règles de base pour l’optimisation de requêtes basées sur la similarité et l'utilisation de ces opérateurs en combinaison avec les opérateurs relationnels classiques, dans le cadre de requêtes multicritères.

3. Modèle de Représentation de données Images

La modélisation d’un modèle de représentation de données d'images est nécessaire pour stocker, interroger des images et les intégrer dans les SBGD classiques. Nous présentons donc dans cette thèse un modèle qui permet de stocker des représentations du contenu des images à la fois sous la forme de méta données textuelles et de caractéristiques physiques dans le cadre global du paradigme relationnel-objet.

3.1 Modèle

Une table d’images est définie comme une table présentant la structure M(Id, O, F, A, P) suivante :

id est l'identificateur unique d’une instance de M.

O est une référence à l’objet image lui-même, stocké soit directement dans la table sous la forme d’un BLOB, soit comme une référence à un BFILE extérieur.

F est un vecteur de caractéristiques qui représente l’objet O. F contient des caractéristiques physiques de bas niveau tels que sa couleur, sa texture, sa forme, sa position. Le format de ce vecteur est compatible avec le standard actuel de description de contenu MPEG-7.

28 Résumé en Français

A est un ensemble d’attributs décrivant l’image sous forme d’annotation textuelles. A peut également contenir des clés externes qui associent l’image à d’autres tables de la base de données.

P est utilisé pour stocker des pointeurs vers des instances d’autres tables qui ont été associées à l’image par une opération binaire. Sa valeur est nulle pour les tables de base, mais non-nulle pour les tables intermédiaires issues d’opérations binaires. L'introduction de P permet, on le verra, à la fois de représentation la notion de similarité entre images et de garantir la fermeture des opérations de similarité.

3.2 Objets d'Intérêt

De nombreuses applications nécessitent de considérer les objets d’intérêt contenus dans une image pour l’interrogation par le contenu. Ceci nous conuit à intodure le concept de table d'objets d'intérêt notée S(ids, Fs, As) où:

ids est l'identificateur unique de l’objet d’intérêt.

Fs est le vecteur de caractéristiques physiques qui représente l’objet d’intérêt. Fs est utilisé dans les opérations basées sur la similarité qui concernent l’objet d’intérêt.

As est un ensemble de métadonnées qui décrivent les caractéristiques sémantiques de l’objet d’intérêt. As est utilisé pour les opérations relationnelles qui concernent l’objet d’intérêt.

La Figure 1 illustre l’utilisation du modèle dans le cas d’une image M (scanner du cerveau) comportant un objet d’intérêt S (tumeur).

Ce modèle permet de stocker les informations correspondant aux objets d’intérêt d’une image dans une structure distincte tout en conservant un lien avec l’image. Ainsi, il est possible d’effectuer des opérations relationnelles et basées sur le contenu sur l’image ainsi que sur les objets d’intérêt qu’elle contient.

Résumé en Français 29

M O,(id, A, P) F,

Semantic, Context et Spatial data

S(ids, s, As) F

Semantic data

Fv

Image Feature (V1, V2, …,Vn)

Figure 1 : Modèle de représentation associant image et objets d’intérêt

4. Opérateurs basés sur la similarité

Les exemples de requêtes présentés dans la Section 1 démontrent clairement l’intérêt d’opérateurs basés sur la similarité et d’une méthodologie permettant de les combiner avec les opérateurs relationnels classiques. Dans cette section, plusieurs opérateurs basés sur la similarité sont introduits. En premier lieu, chacun de ces nouveaux opérateurs est défini formellement. Leurs propriétés sont ensuite étudiées, en particulier dans l’optique de les utiliser en conjonction avec les opérateurs relationnels classiques.

Ces nouveaux opérateurs sont définis en utilisant la notion de voisinage basée sur une mesure de la "distance" ou similarité entre images ("range query") [81]. Chaque "objet-image" est représenté dans l'espace des caractéristiques image par son vecteur F. Ainsi, une requête de voisinage basée sur le contenu sur une table d’images S renvoie les objets images situées dans une boule ouverte de rayon ε centrée sur l’objet de la requête q dans l'espace des caractéristiques. On verra plus loin que pour les bases de données d’images, les propriétés des requêtes de voisinage sont intéressante par rapport aux techniques de recherche des k-plus proches voisins (k-Nearest Neighbor ou k-NN ([4])), en particulier dans une optique

30 Résumé en Français

d’optimisation de requêtes en outre la méthode k-NN renvoie toujours les k images les plus similaires, quelle que soit leur similarité ce qui ne correspond pas à l'attente des utilisateurs.

4.1 L’Opérateur de sélection basée sur la similarité

L’opérateur de sélection basée sur la similarité est un opérateur unaire qui s’applique à une table d’images. Il est défini comme suit :

Définition 4.1 (Sélection Basée sur la Similarité)

Soient un objet image q, une table d’image M, et un nombre réel positif ε; l’opérateur de sélection basée sur la similarité notée δε

q(M) sélectionne toutes les instances de M dont les objets images sont les similaires à q (i.e. à un distance < ε dans l'espace des caractéristiques).

Formellement :

δεq(M) = {(id, o, f, a, p) ∈M | o∈ Rε(M, q)}, où Rε(M, q) désigne l'ensemble des

éléments de M situés à un distance inférieure à ε dans l'espace des caractéristiques.

4.2 L’Opérateur de jointure basée sur la similarité

Un opérateur de jointure basée sur la similarité est un opérateur binaire applicable à des tables d’images. La définition formelle de cet opérateur est la suivante :

Définition 4.2 (Opérateur de Jointure Basée sur la Similarité)

Soient M1 et M2 deux tables d’image, et ε un nombre réel positif. La jointure basée sur la similarité de M1 et M2 associe chaque objet O de M1 à un ensemble d’objets de M2 considérés comme similaires à O en fonctions des attributs F des tables M1 et M2. La table résultant de cette opération contient les instances de M1 avec un attribut P modifié contenant un pointeur vers l’ensemble des instances similaires de M2. La définition formelle de cet opérateur est la suivante :

M1⊗εM2 = {(id1, o1, f1, a1, p'1) | (id1, o1, f1, a1, p1) ∈ M1 et p'1 = p1 ∪ (M2, {(id2, ║o1– o2║)}) et p'1 ≠ Null}, où:

- (id2, o2, f2, a2, p2) ∈ δεo1(M2) (i.e., l'instances de M2 associée par

l'opérateur de sélection basée sur la similarité, δεo1(M2)), et

Résumé en Français 31

- ║o1– o2║ est la distance entres o1 et o2 dans l'espace des caractéristiques (aussi représenté comme sim_Score(o1, o2)).

La Figure 2 montre la représentation graphique d’une jointure basée sur la similarité.

id O F A P

id11 o1

1 f11 a1

1 p11

M M M M M

id1j o1

j f1j a1

j p1j

M M M M M

id1n o1

n f1n a1

n p1n

(M2, {(idi2, sim Scorei)})

M1

Figure 2 : La jointure basée sur la similarité M1⊗ε M2

Il convient de noter que si p'1 est nul, l’instance correspondante de M1 ne sera pas incluse dans la table résultat (on ne conserve que les images ayant un "corespondant dans la table M2). Les opérateurs basés sur la similarité qui viennent d’être définis dépendent de mesures relatives. Pour cette raison, leurs propriétés algébriques déférent de celles des opérateurs relationnels. Par exemple, contrairement à l’opérateur de jointure relationnel, l’opérateur de jointure basée sur la similarité n’est pas commutatif. L’ordre des opérandes est important pour une opération de jointure basée sur la similarité. En effet, la table d’images présente à gauche d’une jointure basée sur la similarité est prise comme référence pour l’opération. Ainsi dans l’exemple de la requête 2 présentée dans la Section 1, la table SI doit être la table de référence et donc apparaître à gauche de la jointure basée sur la similarité.

Conformément à la définition 4.2 et en considérant les propriétés du calcul de similarité pour les images, il est possible d’introduire un opérateur de jointure multiple basée sur la similarité. La définition de cet opérateur nécessite la définition préalable d’un opérateur de base, l’union additive.

32 Résumé en Français

Définition 4.3 (Union Additive)

Soient M1 et M2 deux tables d’image. L’union additive de M1 et M2 notée M1+

∪ M2 est une table d’image qui contient toutes les instances présentes soit dans M1, soit dans M2, sans en exclure aucune.

Définition 4.4 (Jointure multiple basée sur la similarité)

Soient M1, M2, ... Mn n tables d’images. La jointure multiple basée sur la similarité notée M1⊗εM2⊗ε . . . ⊗ε Mn est définie par :

M1⊗ε M2⊗ε . . . ⊗εMn = MU+

≤≤ ni21 ⊗ε Mi

La Figure 3 montre comment une jointure multiple basée sur la similarité est calculée.

L’opérateur d’union additive ( ) sert simplement à concaténer les tables résultat de chaque jointure simple. En raison de la non-transitivité des opérations basées sur la similarité pour les tables d’image, nous n’utilisons pas de tables intermédiaires comme opérandes. Il est important de noter qu’un opérateur qui ne respecte pas les propriétés de commutativité et d’associativité peut difficilement être utilisé dans une optique d’optimisation de requête. Dans la section suivante, nous allons donc montrer comment étendre l’opérateur de jointure basée sur la similarité afin de lui conférer ces propriétés.

+

Résumé en Français 33

Figure 3: Jointure basée sur la similarité M1⊗εM2⊗εM3

id O F A P

id11 o1

1 f11 a1

1 p11

M M M M M

id1j o1

j f1j a1

j p1j

M M M M M

id1n o1

n f1n a1

n p1n

(M2, {(idi2, sim Scorei)})

(M3, {(idk3, sim Scorek)})

M1

4.3 L’Opérateur de jointure symétrique basée sur la similarité

La propriété de symétrie est très importante pour permettre l’optimisation de requêtes complexes. L’opérateur de jointure symétrique basée sur la similarité utilise les opérateurs d’union additive et de jointure simple basée sur la similarité. Il est défini comme suit :

Définition 4.5. (Opérateur de jointure symétrique basée sur la similarité)

Soient M1 et M2 deux tables d’images, la jointure symétrique basée sur la similarité de M1 et M2, notée M1 ⊕ε M2, est définie formellement comme :

M1 ⊕ε M2 = (M1⊗εM2) (M+

∪ 2⊗εM1)

De manière triviale,

Propriété-1 : L’opérateur de jointure symétrique basée sur la similarité est commutatif.

i.e. M1 ⊕ε M2 = M2 ⊕ε M1

Cette propriété découle directement de la commutativité de l’union additive M1+

∪ M2.

La figure 4 montre comment la jointure symétrique basée sur la similarité de deux tables est calculée.

34 Résumé en Français

id O F A

id11 o1

1 f11 a1

1

M M M M

id1j o1

j f1j a1

j

M M M M

id21 o2

1 f21 a2

1

M M M M

id2t o2

t f2t a2

t

M M M M

M1 id O F A P

id11 o1

1 f11 a1

1 p11

M M M M M

id1j o1

j f1j a1

j p1j

M M M M M

id1n o1

n f1n a1

n p1n

id21 o2

1 f21 a2

1 p21

M M M M M

id2t o2

t f2t a2

t p2t

M M M M M

id2m o2

m f2m a2

m p2m

(M2, {(idi2, sim Scorei)})

(M1, {(idk1, sim Scorek)})

M2

Figure 4 : Jointure symétrique basée sur la similarité M1⊕ε M2.

La jointure symétrique basée sur la similarité peut être généralisée à n tables d’images afin de définir la jointure multiple symétrique basée sur la similarité. La définition proposée reflète les caractéristiques des opérations basées sur la similarité tout en maintenant des propriétés utiles pour l’optimisation de requêtes (Chapitre 6).

4.4 L’Opérateur Mine

D’après les définitions précédentes, une jointure symétrique basée sur la similarité équivaut à l’union additive de deux jointures non-symétriques basées sur la similarité. En pratique l’une des deux jointures peut être facilement calcule à partir du résultat de l’autre en évitant d’effectuer des opérations basées sur la similarité. La méthode consiste à utiliser le contenu du composant P1 de M1⊗εM2 pour obtenir les instances de M2⊗εM1. Une telle opération est rendue possible par la propriété de symétrie de l’opérateur de voisinage "range query" [25], qui est l’une des raisons de privilégier cette opération plutôt que la méthode k- NN.

L’opérateur permettant d’extraire les instances de M2⊗εM1 à partir de celles M1⊗εM2 est appelé Mine. Le principal avantage de l’opérateur Mine est que son coût en temps

Résumé en Français 35

d’exécution est négligeable par rapport à celui de la jointure basée sur la similarité. Il peut donc être utilisé pour l’optimisation de requêtes comportant des opérations de jointures symétriques ou non-symétriques basées sur la similarité. Définissons dans un premier temps l’opérateur Mine dans le cas d’une jointure simple basée sur la similarité. Nous verrons ensuite comment le généraliser à toute expression complexe de jointure basée sur la similarité.

Définition 4.7. (L’ opérateur Mine)

Soit la jointure basée sur la similarité M1⊗εM2. L’opérateur Mine appliqué à M1⊗εM2

noté Mine(M1⊗εM2) utilise l’attribut P de la table résultat de M1⊗εM2 pour construire la table M2⊗εM1. Réciproquement, Mine(M2⊗εM1) utilise l’attribut P de la table résultat de M2⊗εM1 pour construire la table M1⊗εM2:

Mine(M1⊗εM2) = M2⊗εM1

Mine(M2⊗εM1) = M1⊗εM2

Soient M1(id1,O1,F1,A1,P1) et M2(id2,O2,F2,A2,P2) deux tables d’images. Supposons que nous voulions calculer M1⊕εM2. Dans un premier temps, l’opération M1⊗εM2 est effectuée. La Figure 5 montre la projection des instances de M1 et M2 sur un plan1. On voit

sur cette figure que pour l’objet de Mi

t u v w

i

t u v w

i

o1 1 et une valeur donnée de ε, la requête de voisinage

basée sur le contenu recherche tous les objets o2 de M2 dont la distance à o1 est inférieure à ε.

Les instances sélectionnées sont donc , , , et . En raison des propriétés de

symétrie de la distance, de M

o2 o2 o2 o2

o1 1 sera réciproquement considéré comme similaire pour un

seuil de ε aux objets o , , , et of M2 o2 o2 o2 2. Cette information étant contenue dans

l’attribut P de l’instance de M1⊗εM2 qui contient o , elle peut être extraite de M1 1⊗εM2. La

table M2⊗εM1 peut donc être dérivée en répétant ce processus sur les attributs P de toutes les instances de M1⊗εM2.

1 Pour des raisons simplicité, sans perte de généralité, nous considérons ici un espace de caractéristiques Fv de

deux dimensions.

36 Résumé en Français

ε

k2o.

j2o. t

2o.w2o.

u2o.

v2o.

q2o.

p2o.

j1o.

r1o.

s1o.

q1o.

k1o.

i1o.

Figure 5 : Illustration de l’opérateur Mine

Algorithme 1 donne une version simplifiée de l’algorithme Mine.

Algorithme 1 : Algorithme de l’opérateur Mine

Pour M1(id1, O1, F1, A1, P1) et M2(id2, O2, F2, A2, P2) deux table d’images. Considérons M’(id1

’, O1’, F1

’, A1’, P1

’) = M1⊗εM2.

Create table T = T(id, O, F, A, P) Foreach instance inst of M' Do Foreach element id2 of inst. P1

’ Do If id2 n’est pas dans T Ajouter get_instance(M2, id2) en T End If Mettre à jour P de T.id2 avec (M1, (M'.id1, sim_score)) End Do End Do Return(T); /* T = M2 ⊗ε M1 */

En pratique, l’optémiseur de requêtes pourra in cela s’avoir pertinente remplacer dans l’arbre algébrique représentant la requête traitée l’expression M1⊗εM2 par Mine(M2⊗ε M1). L’expérimentation (Chapitre 7) nous a permis de démontrer l’intérêt pratique considérable de cet opérateur, les gains en temps d’exécution pouvant être très importants.

Résumé en Français 37

L’opérateur Mine défini précédemment pour une jointure simple (i.e. ne concernant que deux tables) peut être généralisé pour s’appliquer à des expressions complexes de jointures basées sur la similarité. Le principe est que pour chaque jointure non-symétrique qui intervient dans l’expression complexe, on peut calculer l’opération symétrique en utilisant Mine.

4.5 Autres Opérations

En corps de cette thèse (Chapitre 5) décrit l’ensemble (beaucoup plus vaste) des opérateurs algébrique basés sur la similarité (Union, intersection, multi-jointure, etc.). En outre, nous étudions la mise en œuvre de requêtes hybrides : impliquent opérateurs de similarité et opérateurs relationnels.

5. Le prototype EMIMS

Le prototype EMIMS (Extended Medical Information Management System) a pour but de démontrer la faisabilité et la pertinence du modèle de données et des opérateurs de similarité définis précédemment. Ce prototype s’appuie sur une architecture client-serveur. Il fournit des interfaces graphiques qui permettent de stocker et de rechercher des images à l’aide de requêtes comportant à la fois des descripteurs de haut niveau (sémantiques) et de bas niveau (caractéristiques physiques).

5.1 Structure d’EMIMS

EMIMS a été développé en Java afin d’assurer sa portabilité sur différentes plates-formes. EMIMS se positionne au-dessus d’un serveur Oracle 9i. A l’aide de l’implémentation de JDBC fournie par Oracle, le programme interagit avec des composants OrdImage1 ainsi qu’avec les routines d’exécution de requêtes standards d’une base de données relationnelle. La Figure 7 présente le diagramme de classes d’EMIMS en notation UML.

1 OrdImage est un type de données fourni par Oracle permettant de stocker un fichier image en tant que

BLOB dans une base de données. De même, le type OrdImageSignature permet de stocker le vecteur de caractéristiques physiques qui correspondant à un objet OrdImage.

38 Résumé en Français

Oracle JDBC driver

Medical Data Entry Interface

Oracle 9i DataBase

1..11..1

1..1 1..1

Query Interface

Query Manager

Image table parameters : String Id String O String F String[] Metadata Private Connection connection

1..1 1..1

Connection

String url String user String password String database driver

Insert (image, table) SimJoin (left_table, right_tables[ ],

threshold, feature vectors) QBE (Image, table) Mine (left_table, right_table)

GetConnection() Close()

Figure 7 : Diagramme de classes d’EMIMS.

Au lancement du programme, une instance de la classe QueryManager est créée. Celle-ci se connecte à la base de données en créant une instance de la classe Connection. Au cours de cette session, chaque action de l’utilisateur qui nécessite un accès à la base de données sera traduite par l’objet QueryManager en requêtes SQL exécutables par le SGBD. Voici une brève description de chacune des classes de l’application:

• Connection: est une classe qui crée une connexion au SGBD Oracle et la maintient jusqu’à ce que l’utilisateur quitte le programme.

• Medical Data Entry Interface: est une applet Swing qui permet d’entrer les différents types de données médicales dans la base de données. Cette interface comporte trois panneaux graphiques :

− Context/Domain-Oriented Data Entry Panel: permet d’entrer des données concernant les patients, les médecins et les hôpitaux.

− Image-Oriented Data Entry Panel (Figure 8): permet d’insérer des images médicales dans la base de données.

Résumé en Français 39

− Medical Exam Data Entry Panel: permet d’insérer des données qui concernent un examen médical telles que l’analyse d’une image par un radiologue, le diagnostic du médecin etc.

Figure 8 : Panneau de saisie d’informations orientées image.

• Query Manager: est une classe qui permet d’effectuer diverses opérations sur une table d’images. Un objet de ce type peut agir sur toute table d’images qui respecte la structure (id, O, F, A, P) décrite dans la section 3. Concrètement, il reçoit des informations issues de l’interface graphique (actions sélectionnées par l’utilisateur) et les traduit en requêtes SQL exécutables par le moteur du SGBD sous-jacent.

Les méthodes principales de la classe QueryManager sont :

• SimJoin : effectue une opération de jointure basée sur la similarité entre une table d’images de référence et une ou plusieurs autres tables d’images. Cette opération crée une nouvelle table de structure identique qui contient toutes les instances de la table de référence dont l’attribut image a été identifié comme similaire avec au moins une image contenue dans l’une des autres tables. Le champ P contient des pointeurs vers les images similaires des autres tables.

40 Résumé en Français

• Query by Example (QBE) – requête par l’exemple : recherche les images similaires à une image de référence dans une table. Une nouvelle table est créée, qui contient toutes les instances de la table de départ dont l’attribut image a été identifié comme similaire à l’image de référence. En outre, cet interface autre fonctionnalité :

− Insert : crée, à partir d’un ficher images et de méta données descriptions, un nouveau tuple (id, O, F, A, P). Le fichier est converti dans le type Oracle OrdImage ; un algorithme d’extraction de caractéristiques lui est appliquée afin de déterminer l’attribut F du nouveau tuple.

− Mine : a partir du résultat d’une jointure M1⊗εM2, Mine extrait M2⊗εM1 en utilisant le contenu de l’attribut P des tuples de M1⊗εM2.

Toutes les opérations faisant intervenir la similarité acceptent deux paramètres :

• Threshold (seuil): Distance maximale entre deux images similaires. Ils s’agit d’une valeur normalisée entre 0 et 100 calculée par un algorithme de comparaison des caractéristiques physiques de deux images contenues dans l’attribut F des tables d’images ou définir par l’utilisateur.

• Feature coefficients (coefficients de caractéristiques): Poids attribué à chaque caractéristique physique (couleur, texture, forme, localisation) dans la comparaison basée sur la similarité. La valeur de chaque coefficient est un nombre réel contenu entre 0 et 1.

5.2 L’interface de requête

L’interface de requête est composée de trois panneaux graphiques :

• Panneau de requête iconique : permet de formuler des requêtes relationnelles à l’aide d’icônes. Ce panneau permet d’utiliser de icôns pour définir le type d’objets recherché, leur partions (à droit du lobe gauche), leur évolution temporelle et de construire graphiquement de requêtes complexes sur le méta données (Chapitre 7, Fig. 7.6).

• Panneau de requête par l’exemple : permet de formuler une opération de requête par l’exemple, i.e. une sélection basée sur la similarité (Figure 9). Cette fonction est souvent présente dans les prototypes actuelle. La fonction de requête

Résumé en Français 41

par l’exemple d’EMIMS offre cependant plusieurs caractéristiques intéressantes absentes des autres systèmes :

− Elle crée une nouvelle table qui contient toutes les instances sélectionnées par l’opération. Elle satisfait ainsi une propriété de fermeture (à l’instar de opérations relationnelles).

− Elle affiche le contenu de la table résultat. La colonne P est représentée par une combo box qui pour chaque image similaire contient le nom de sa table d’origine et son score de similarité avec l’image de référence.

− Elle affiche également toutes les images contenues dans la table résultat avec les méta données associées ainsi que deux boutons qui permettent d’obtenir plus d’informations connexes en effectuant une sélection relationnelle sur d’autres tables de la base de données (table patient et diagnostique).

− Elle affiche la requête SQL générée. Cette requête peut être modifiée puis exécutée à nouveau.

− La table résultat peut être sauvegardée dans la base de données pour une utilisation future.

42 Résumé en Français

Figure 9 : L’interface de requête par l’exemple d’EMIMS.

• Panneau de requête multicritères : permet d’effectuer des opérations telles que la jointure simple basée sur la similarité, Mine, ou la jointure multiple basée sur la similarité (Figure 10).

Pour effectuer une opération basée sur la similarité, l’utilisateur doit tout d’abord choisir la table de référence, choisir le seuil de similarité, puis sélectionner une ou plusieurs table(s) sur lesquelles la jointure va être effectuée. Une fois le calcul terminé, une table résultat est créée. Cette table contient les instances de la table de référence pour lesquelles au moins une image similaire a été trouvée dans la ou les autre(s) table(s) de la jointure. La colonne “P” de cette table est représentée par une combo box. Un clic sur cet objet graphique montre une liste de vecteurs, chaque vecteur correspondant à une image similaire. Les composants de ces vecteurs sont :

− Le nom de la table

− L’identificateur de l’image

Résumé en Français 43

− Le score de similarité entre l’image à laquelle correspond le vecteur et l’image de la table de référence auquel le vecteur est attaché.

La colonne “O” de la table résultat est représentée par un bouton. Un clic sur ce bouton ouvre une fenêtre qui affiche l’image sélectionnée ainsi que toutes les autres images qui lui sont similaires. De même que pour le panneau requête par l’exemple, chaque image est associée à deux boutons qui permettent de retrouver des informations associes dans le table patient et diagnostique.

Une fois qu’une jointure basée sur la similarité, disons M1⊗εM2, a été effectuée, l’opérateur Mine peut être utilisé afin d’obtenir la jointure inverse. La table résultant de cette opération est affichée en dessous de celle correspondant à la jointure initiale.

Toutes les tables résultats peuvent être enregistrées dans la base (sans maintenance de la validité de composants P en fonction de l’évaluation de table jointe). Enfin, pour chaque jointure basée sur la similarité, la requête SQL correspondante est affichée. L’utilisateur peut modifier cette requête et l’exécuter à nouveau.

Figure 10 : Le panneau de jointure basée sur la similarité

44 Résumé en Français

6. Conclusion

Le besoin de systèmes de gestion de données images efficaces devient de plus en plus important dans de nombreux domaines d’application. Dans cette thèse, nous présentons un modèle de représentation de données images qui permet de formuler des requêtes multicritères complexes. Le component P de ce modèle permet de garder une trace des associations entre les instances de deux tables d’images ou plus identifiées suite à des opérations binaires basées sur la similarité. Fondé sur ce modèle, un prototype de système de gestion d’images médicales, EMIMS a été développé. EMIMS offre une interface visuelle de haut niveau qui permet de formuler des requêtes combinant les opérations relationnelles classiques avec les opérations basées sur la similarité. Une série d’expérimentations a permis d’évaluer l’efficacité et l’applicabilité des différents concepts introduits.

Les perspectives à ces travaux incluent le développement d’heuristiques pour l’optimisation de requêtes, l’extension de cette approche à d’autres types de médias comme la vidéo, l’ajout d’une interface permettant d’utiliser différentes fonctions de comparaison/extraction de caractéristiques, l’intégration d’opérations basées sur la similarité applicables aux objets d’intérêt contenus dans une image donnée. L’adaptation à un contexte de grille biomédicale.

CHAPTER 1

INTRODUCTION

… "So God made people in his own image. " …

GENESIS 1:27 Old Testament

First there was an image, the image of God. Then, God created people in his own image. That is, God did create people in a way that they “look like” his own image. The concept of image and its association with another image for “likeliness”, “identicality” or “similarity” goes back to the ever existence of mankind. Since then, people talk of similarity between two concepts of “images” in their day-to-day life. But the measure of this identicalness or similarity and the process involved in it is a complex task. Until recent years, this was a process done only by the human mind. But, it is difficult to think of very high precision with human mind decisions. For instance, in our daily communication, we can say that a child resembles one of its parents, but we can never say that a child is exactly the same as one of its parents. So, we talk of “similarity”. The natural law of human/animal reproduction preserves many characteristics of parents to children. Thus, we can say that the law of life also preserves “similarity”, not exact duplication or reproduction. It is important to note that, in such a comparison, we are only referring to physical appearance, without dealing with the personality and psychological aspects.

46 Introduction

History tells us that even our early ancestors, who lived in caves, have described their appearance and their way of life on the walls of their caves by means of paintings and drawings. These paintings (images) served as a means of ritual invocation and later as artistic creation. Recently in history, Biologists have been working to classify animals and plants by their kingdom, class, species, etc., considering their similarity or identicalness with their biological features of appearances, size, shape, habitat, etc. It is not yet clear if the recent discoveries of the identification of human, animal, and plant genetic makeup and DNA molecule codes enable us to describe and compare for “bio-similarity”1 of humans, animals, and plants in addition to their physical appearance. But, this is another dimension for comparison of similarity.

It is long after the concept of using images or paintings in history that text and numerals were found to be simpler to convey information and to serve mankind as a means of recording or describing concepts and observations. But practice has shown that text is never a complete means of describing every aspect in our day-to-day activity.

Photography has been taken as one of the important discoveries since the beginning of the 19th century [61, 75, 77]. Photographs or images of objects are since then among the most effective means to convey much information in a compact manner and mostly in a way that cannot be described by mere text. The expression that “an image can speak more than a thousand words” depicts the importance of images in conveying much information at a time in a compact format. Hence, for a long time in recordings and communications, images played an important role. However, not before the twentieth century the growth in number, availability and importance of images in all walks of life was witnessed. Images nowadays play an important role in different fields such as medicine, security, education, journalism, design, etc. The invention of computer and digital imaging devices has been the real engine of the imaging revolution [75]. The computer brought with it a range of techniques for digital image capturing, processing, storage and transmission. The involvement of computers in imaging can be dated back to 1965, with the effort for computerized creation, storage, and manipulation of images, though the high cost of hardware limited their use until the mid-1980s [75]. After computerized imaging became affordable in the 1980s, it rapidly found many useful applications in the areas such as engineering, architecture, and medicine. Art galleries, photograph libraries, and museums, too, began to see the advantages of making

1 The term “bio-similarity” is used here to mean similarity not only by physical appearance and

size, but also by biological characteristics.

Introduction 47

their collections available in electronic form. The emergence of the World Wide Web in the early 1990s, enabling users to access data in a variety of media from anywhere on the planet, has provided a further massive stimulus to the exploitation of digital images. The billions of images available on the Web in every subject, size, and color (that are either in static or moving form) are huge resources that need to be addressed with proper techniques of image data management. Images are of prime importance among the various media types, not only because images are the most widely used media types besides text, but also because images are the basic components for video data [78]. Table 1.1 shows a recent study that estimated the amount of image and video production annually [159]. Only sample figures and storage space requirements are shown in this table.

Table 1.1: Original Data On Film Annually Worldwide.

Units Digital Conversion Total Petabytes Photography 82,000,000,000 5 MB per photo 410 Motion Pictures 4,000 4 GB per movie 0.016 X-Rays 2,160,000,000 8 MB per radiograph 17.2

Total: 427.216

The production of films and the conversion rate of the figures in the above table are explained by P. Lyman et al. [159] as presented below:

“There are over 2700 photographs taken every second around the world, adding up to well over 80 billion new images a year taken on over 3 billion rolls of film, according to estimates published by the United States Department of Commerce. …

Kodak reports that the typical photograph can be digitized in its format in 5 megabytes without loss of picture quality. Utilizing this conversion factor, then, the world's 82 billion photos store 410 petabytes of data every year in photographs. …

…The Motion Picture Association of America reports that for the year 1998, its members released 221 movies …

It takes approximately 2 gigabytes to store an hour of motion picture images in digital form using theMPEG-2 compression standard. If the images in 4500 full-length movies were converted into bits, the world's annual original cinematic production would, therefore, consume about 16 terabytes. …

… for medical, dental and industrial purposes, approximately 2 billion radiographs are taken around the world each year, including chest x-rays, mammograms, CT scans, and so on. …

48 Introduction

When x-ray films are converted to digital format, it is important that there is no important clinical information lost. The University of Pittsburgh Clinical Multimedia Laboratory suggests that an average conversion of a chest x-ray to digital storage with lossless compression will require 8 megabytes. To store all the world's x-rays to a computer file of this size would, therefore, require 17 petabytes each year.“ [159, p.17-18]

Considering the size of the already available collections of images since several decades, with this annual rate of production as shown in the above table, and with an expected yearly increase in the rate of production as a result of the advance in technology, it is clear that there is a critical and growing demand for efficient image and image-related data management systems. This indicates the very high need for adequate research work on image modeling, indexing, storing, processing, retrieval and presenting. Unless these issues are addressed based on requirements in each domain of applications, image-related data, which are useful, compact and essential means of description and communication, would be wasted.

Initiated by this crucial need, image retrieval and management has been a very active research area since the 1980s [1, 2, 12, 21, 26, 38, 75]. In this regard, there are three areas of active research interest: the identification of an image (or an image object), the need for its description as a convenient means of representation, and the use of this description as a basis for image similarity comparison and image retrieval. Thus, a research activity to develop a mathematical formalism of these concepts for better information management is of high value.

The two major research fields related to image retrieval are: database management and computer vision. These two fields approached image retrieval from different angles. Until the beginning of the last decade, the former used a text-based approach. The later always considers image retrieval using visual features. In database management, the traditional approach is to first annotate the content of the image using keywords (metadata) and then to use these keywords to identify and retrieve the required images. However, the increasingly large collections of images and the high rate of production of digital images for different purposes have posed acute difficulties on the manual annotation of images with metadata. The major drawbacks associated with manual annotation are that:

• it is time consuming and tedious for large collections of images,

• annotation is subjective in that people may view the same image differently,

Introduction 49

• it is difficult or even impossible to describe images fully or adequately with the help of metadata hence imposing difficulties on effective image retrieval.

To overcome these difficulties, an alternative method called Content-Based Image Retrieval (CBIR) involving the techniques of image processing1 has been proposed and practiced since the early 1990s [12, 21, 22, 57, 75] in the field of computer vision. One of the key issues with this kind of retrieval is the need to extract useful information from the raw image data before any kind of reasoning about the image’s contents is possible [79]. In CBIR, the features of color, texture, shape, layout, etc. are used to compare images for their similarity. CBIR differs from classical information retrieval in that digitized images consist of purely arrays of pixel values, with no inherent semantic meaning. Research issues in CBIR cover a wide range of topics such as image analysis, image features description, development of efficient algorithms of comparisons for image similarity, etc. Many of these topics share techniques with mainstream image processing and others with pure traditional data management and retrieval. In both approaches, there exist a lot of work that have been done so far that shows the usefulness of CBIR [12, 21]. However, there are still many issues that need to be addressed. Among the key challenges that remain to be investigated to make these technologies pervasive and useful are the:

• identification of suitable models for effective ways of describing image content,

• introduction of compact and suitable storage methods for large image collections,

• matching of the query and stored images in a way that reflects human similarity judgments,

• development of effective feature extraction for mathematical representations of images, and their salient features that are useful for content-based image retrieval,

• integration of content-based retrieval with metadata-based retrieval and pragmatic annotation methods (i.e., extending existing query methods for a multi-criteria query on image data),

• extendibility of the existing system architectures of image retrieval to support the inclusion of new features and new matching/similarity measures. Some applications, for example, require new features such as a face-matching module, in addition to their existing content-based retrieval capabilities,

• design of methods or common standards that permit interoperability of systems under different circumstances,

1 Image processing covers a wide field including image compression, transmission, analysis, object recognition, etc.

50 Introduction

• design of user interfaces that: let users easily select content-based properties, allow users to compose queries involving both content-features with text data, and let users reformulate queries in a desired manner, and

• development and integration of matching methods for effective content indexing, clustering, and filtering to improve system performance as image collections grow.

In these respects, many research works have been conducted and lots of other initiatives exist in order to address the challenges in these different subjects [21, 43, 18, 76, 104]. Consequently, ”image database” has been a central point of many research works in both the database and the computer vision communities [85]. However, in reality, most database community systems use text-based keywords to represent images and do not deal with the images themselves. On the other hand, most computer vision systems use a directory containing thousands of images and deal with content-based image analysis techniques for retrieval. But, they are not working on database systems, since most fundamental database issues such as: data model, query algebra, indexing, query optimization, etc. are not addressed at all. To our knowledge, even though there are continuing research efforts to build true image databases [15, 16, 21, 58, 85], these systems still lack a lot of useful features. A successful image database system requires an interdisciplinary research effort. Although the research focus of traditional database was in text-based document retrieval, many of the retrieval models and techniques can be useful to content-based image retrieval under a DBMS environment. Thus, researches in each of the above two areas are expanding in scope, and in many cases the two are seen merging and overlapping for the purpose of efficient image and video data management.

The subject of this thesis thus lies on modeling the techniques of CBIR in a way that image data management can be integrated into the existing popular database management systems for the purpose of efficiency; a step to bring the two disciplines closer. In particular, we will deal with the: modeling aspect of the image data repository, definition and implementation of useful database oriented similarity-based operators, similarity-based algebra for image databases, and study of algebraic properties of similarity-based operators for query optimization. These issues are not well addressed by other researches as will be discussed in the next Chapters.

Application areas that require the use of CBIR are numerous and diverse. Among the many application areas of CBIR are [48, 49]: scientific database management such as medical image management, security, criminal investigation, art galleries and museum management, remote sensing and management of earth resources, geographic information systems, weather forecasting, and trademark and copyright database management. Recently CBIR has attracted

Introduction 51

the attention of researchers across several disciplines. Medical applications are among the widely researched areas. Modern medicine is increasingly relying on computerized systems for diagnostic techniques such as computerized tomography, radiology, and histopathology. As a result, there is a high need of digital medical image management in hospitals. Whereas the primary need in medical application systems is to be able to display images related to a case of a patient, there is an increasing interest in the use of CBIR techniques to aid diagnosis by identifying similar past cases. Most medical image information system development works concentrate on:

• providing basic functionality for ensuring that medical images can be successfully digitized, stored and transmitted over local area networks without loss of quality, and

• providing user-centered interfaces and integrating image storage and retrieval with wider aspects of patient record management.

This thesis is part of the Region Rhone-Alps “Santé et MPC”1 and SiCOM2 project which aim to provide high level generic tools for health network and telemedicine applications. In dead modern medicine makes an intensive use of images (MRI, CT scans, PET, etc.). However, Picture Archiving and Communication Systems (PACS) that store medical images do not provide clinicians with high level content based retrieval facilities. In practice, current image retrieval systems use only keywords (such as name of patient, of the doctor, date of scan, etc.). Thus, within these two projects we were in charge to define a global framework and implement a demonstration of content-based image retrieval for medical images.

Furthermore, this work is also part of the MEDIGRID project. MEDIGRID is a GRID related project financed by the French ministry of research (“program Action Concertée Incitative Globalisation des Ressources Informatique”). It is believed that the GRID can provide the necessary infrastructure for these computation requirements. GRIDs have raised a large interest in the computer science community in the past decade. We fill this technology is now ready to turn towards specific applications by taking into account their needs. The MEDIGRID project is designed to explore the use of the GRID technologies for tackling the processing of huge medical image databases available in the hospitals today. Hence, it will address the problems of medical applications for which GRIDs can bring a valuable support for complex image processing algorithms requiring parallel implementations

1 "Sante et MPC " : Sante et Hautes Performances de Calcule et de Cognition (1998-2000) 2 SiCOM: System d’Information Communicants pour la Medicine (2001-2003)

52 Introduction

to become tractable in terms of computing resources and memory requirement and design tools for handling and processing complete databases of medical images (e.g. medical image content-based queries). Within MEDIGRID, a distributed version of our prototype is planned to be tested in the years to come in order to allow users to get access to medical image resources over the GRID. Note that MEDIGRID is associated to the European DATAGRID project.

Though our proposals in this thesis are generic for any content-based image management applications, the above reasons made us to choose the medical image management for our prototype implementation. Types of applications stated in Queries 3 and 4 of Chapter 2 demonstrate the important practical applications of our proposals in medical systems.

The remaining part of this thesis is organized as follows. In Chapter 2, we will identify the problem and demonstrate what is really missing with the existing systems using sample queries. In Chapter 3, we will review related work. Our image data repository model is presented in Chapter 4. In Chapter 5, we present our similarity-based algebra and the model of managing image data in an ORDBMS. We will present our proposed techniques for algebraic similarity-based query optimization in Chapter 6. In Chapter 7, we present our prototype EMIMS (Extended Medical Image Management System) that we used to demonstrate our proposals for an application in the medical domain. Chapter 8 is devoted for discussions. Finally, we give conclusion and perspectives of our work in Chapter 9.

CHAPTER 2

MOTIVATION AND PROBLEM IDENTIFICATION

"A problem understood is half solved."

Anonymous.

2.1 Motivation

The limitation of the traditional data management approach to manage image data is discussed in many reviews and research works [14, 18, 21]. To overcome these limitations, many proposals have been made [2, 7, 12, 15, 16, 18]. In this respect, the use of image-analysis technology to extract content from an image based on its different features such as color, texture, shape, layout, etc. has given a momentum in image description capabilities. Once the image’s content is extracted, it is used to represent most of the features that the user needs in order to organize, search, locate the necessary visual information, and compare images for similarity. Figure 2.1, for example, shows how the color features of an image are extracted and represented by an RGB (Red Green Blue) color histogram and then how this histogram can be transformed to a mathematical representation called feature vector. Feature vectors can be represented in as little size as 1KB or 2KB, regardless of the original image size [76]. This is a size that is manageable and can be stored in a database table.

54 Motivation and Problem Identification

Figure 2.1: An RGB color histogram representation of a chest medical image.

This concept of automatic content extraction alleviates several technological problems. The foremost benefit is that it gives a user the power to retrieve visual information by formulating a query like “Show me the pictures that look like this one”. The system satisfies the query by comparing the content of the query picture with that of all target pictures in a database or in a directory. Such a query is called Query By Image Example, and is a simple form of content-based image retrieval, a relatively new paradigm in database management systems. Thus, an important concept in content-based image retrieval is to determine how similar two pictures are to one another. To determine the similarity of two images, the n-component feature vector representations of the images are first mapped on an n-dimensional feature space and then similarity is interpreted as a distance between these feature vector representations. So, the closer the distance is the more similar the two images are [1, 2, 18].

The notion of similarity, as opposed to exact matching as in traditional database systems, is an important issue that attracted the attention of many researchers. Though similarity is a subjective measure, it is an important achievement in that similarity of images can be computationally and so deterministically be determined. This capability of computationally judging the similarity of two images has been improved through the many research works conducted during the last decade. Much of these works are done in the computer vision domain [1, 2, 4, 12], based on which a lot of prototypes and commercial

Motivation and Problem Identification 55

content-based image retrieval systems have been developed [21, 56]. In the paradigm of content-based image retrieval, pictures are not simply matched, but are ranked in order of their similarity to the query image. The content extraction is made based on generic or application oriented needs. A useful advantage of this visual feature based content representation is that it results in very high information compression. Thus, this information can be used for all subsequent similarity-based database operations. The original image does not need to be accessed except for display.

Relational Database Management Systems (RDBMSs) have been so effective in dealing with the management of alphanumeric data for the last three decades. But, recent years have shown that there is a need to deal with the management of multimedia data (consisting of text, images, video, audio, etc.). The question is then; can the management of multimedia data be handled with the traditional approaches? Or do we need to develop separate multimedia database management systems? Or, is it worth integrating the traditional approaches with the new techniques of managing multimedia data (i.e., extending the traditional approach with new techniques that are effective in handling non-text based data such as image, video, and audio)?

Many researchers agree that the relational system alone is not quite adequate to manage the newly emerging types of multimedia data [14, 23, 44, 85]. This has been justified by its inadequacy to handle more complex data structures for storing multimedia data, the ability to define non-standard application specific operations, etc. On the other hand, the idea of developing a completely new system to manage multimedia data is not much supported in the real application world, since it ignores the rich experience that is at hand and the already available efficient systems. Even if this approach can be justified, it is a process that can take a very long time to get adopted [14]. In this regard, though the Object-Oriented (OO) data model improved the relational model by offering more tools to manage complex structures of data, practice has shown that this model could not penetrate the marketplace as its predecessor and is not widely accepted like the RDBMS. What is actually going on in both the business and academic sectors is that, existing relational systems are being extended in a way that they can as well handle the new requirements of multimedia data management as justified by M. Stonebraker et al. [14].

We also share this last choice to deal with image databases. This approach enables us to use a combination of content-based and traditional queries on image data. While this approach can apply to other media types such as video and audio, in this thesis, we deal in particular with image and image related data management. Even though we believe that this

56 Motivation and Problem Identification

is the right system choice, there are a lot of problems that need to be tackled in order to make it effective in image management.

2.2 Problem Identification

To make some of the major problems more clear, we present below few scenarios where similarity-based operators are really required and a complex query formulation involving a combination of similarity-based operations on image contents and the traditional operators are required. To show this:

Let an image table consist of the attribute components: image, the feature vector representation of the image, and attributes that contain alphanumeric information about the image. Consider that EMP and SI are two image tables with the schemas:

EMP(id, Photo, Fv, Name, Occupation, Address)

SI(id, Photo, Fv, Time, Date)

That is, EMP is a table of employees of a company that contains the attribute components: the employee’s identifier, the photo of the employee, the feature vector representation of the photo, the name of the employee, his/her occupation, and his/her address. SI contains images of individuals who appeared in front of a gate of the company where a surveillance camera is mounted. Let the components of SI be: the identifier of the instance, the images taken by the surveillance camera, their corresponding feature vector representations, the date, and the time at which each of the images was scanned or taken. Thanks to the currently available intelligent surveillance cameras, the contents of SI can be generated automatically. Note that, Fv is the feature vector representation of a photo in each of the two tables.

Suppose now that there is an investigation scenario of an event that is associated to the images in SI. It is evident that SI alone cannot give complete information about a person. It is therefore, necessary to perform some operations on the two tables. Consider, for example, the possible query below:

Query 1: For pictures of individuals in SI that were scanned on December 31, 1999 from 4 to 6PM, find their most similar images in EMP, with their detail information.

Motivation and Problem Identification 57

Processing this query requires a relational selection on SI and then a "similarity-based join"1 on SI and EMP. There is also a need to process a combination of relational and similarity-based operations on image tables.

Problem: An operation such as "similarity-based join" is considered by none of the existing systems. Most systems tend to perform only a form of “Query-by-Example” that can be associated to a “similarity-based selection” operation for a given query image. Furthermore, processing a combination of relational and similarity-based operations on image tables is not practiced.

Suppose now that the surveillance system is organized in such a way that there is one camera mounted at the external gate of a company's compound and another camera at an internal gate of its particular Department. Let SIE be the image table created from the information obtained by the camera at the exterior gate and SII be the image table created from the information obtained by the camera at the interior gate. The schemas of these two tables are given below.

SIE(id, photo, Fv, Time, Date)

SII(id, photo, Fv, Time, Date)

In addition to the two tables above, the database also contains the table EMP as given earlier. Then, the following is a possible query for a particular investigation scenario:

Query 2: Get the images and the names of employees who entered the gate of the department within two hours after they entered the main gate of the compound on December 31, 1999.

This query requires a similarity-based join on the three image tables SIE, SII, and EMP in addition to the relational operation that needs to be done in this query.

Problem: Such a complex similarity-based join operation is not practiced in the existing DBMSs.

These two query examples demonstrate the need: to introduce novel operators such as the similarity-based join and to create a mechanism to be able to use the traditional operators in combination with the similarity-based operators. Furthermore, the need for a data repository model that enables us to perform these operations effectively is also very important.

1 Similarity-based join is a join on the low-level image representation attributes of image tables. A formal

definition will be given later in Chapter 5.

58 Motivation and Problem Identification

Let us see some more examples applicable in the medical domain. The issue of developing an efficient medical image retrieval system is very important to understand and treat a particular malady. In this regard, content-based medical image retrieval can be used to assist training, enhanced image interpretation, clinical decision-making, automated archiving, etc. It is important to note that, however good the content-based approach is, we cannot ignore the necessity of using metadata-based annotations as a supplementary means of describing medical cases and use this information for retrieval purposes. The trend is therefore, towards systems that use both metadata- and content-based image retrieval. There are a number of efforts for such mixed systems of query [7, 111, 92]. A combined use of the two techniques can produce a means of data retrieval that can satisfy the query needs in different application areas. We refer this mixed system of query in this thesis as “multi-criteria query”. A multi-criteria query is a query that considers different levels of abstraction of images such as high-level metadata descriptions and low-level physical descriptions (color, shape, texture, size, salient objects features, etc).

Tumor

Figure 2.2: Brain MRI, taken on 05/11/1999 that shows a metastasis at the parietal left lobe.

Consider the query below:

Query 3: Retrieve all brain MRI, from a database, taken between 01/01/2000 and 31/12/2000 where an anomaly is positioned like the tumor in Fig. 2.2 (upper-left part of the left lobe), and that is identified as hypervascularized tumor, with a dark gray dominant color, and an area greater than 30 mm square.

This query requires considering operations on salient objects of images and a similarity-based selection operation in combination with relational and spatial queries.

Motivation and Problem Identification 59

Problem: Existing DBMSs do not support such a scenario where on one hand similarity-based operation is done on salient objects and on the other hand it is conducted in combination with relational and spatial operations.

To make the problem clearer, let us see another important type of query in the medical domain that requires an image-oriented join operation. Figure 2.3 shows medical image database tables, where the “Patient” table contains personal data of patients (say for example, it has no any metadata description of the medical images) and the “Pathology” table contains medical images with their full annotation of all anomalies and referenced cases. If physicians or researchers need to get more information on patients having similar anomalies on the medical images that exist in the Pathology table, they need to make similarity-based join on the image content representation attributes.

For example, consider the following query:

Query 4: Retrieve all personal data of patients from the Patient table and the corresponding treatments in the Pathology image table, where the lung X-ray images in the two tables have similar anomalies.

Medical Image Database

Name ImageGender

Age Address

Anomaly Treatment

Image …

Patient PathologyAd

Figure 2.3: Similarity-based join operation requirement on the contents of medical images.

This query involves a combination of relational operators and “similarity-based operators” on the tables Patient and Pathology.

Actually, one of the key issues in such types of queries is the absence of an algebra as stated by Adali et al. [8] and quoted below:

" . . . to date, there has been no work on developing an algebra (similar to the relational algebra) for similarity based retrieval. Given the fact that most feature extraction and identification algorithms in media data are very expensive, the need for query optimization to such databases is critical – yet, it

60 Motivation and Problem Identification

is impossible to create principled query optimization algorithms without an initial underlying algebra within which similarity based retrieval operations are executed."

However, even the “multi-similarity algebra" that these authors propose remains at a higher abstraction level. That is, it does not address the definition of an "operational" similarity-based algebra useable for processing and optimizing queries on image contents under a DBMS environment.

Therefore, to alleviate this main problem in the process of developing efficient image data management systems, a study on “similarity-based algebra” is of very high importance.

This is then, one of the main subjects of this thesis. We will therefore propose a similarity-based algebra that is useful for content-based image retrieval under a DBMS environment. Furthermore, we will address the issues on: visual content descriptions of images, image data repository model, system of similarity-based query optimization, and visual query interface.

Content description deals with the semantic, context, and physical descriptions of visual data. The semantics and context of visual data can best be described using the traditional textual methods, whereas the physical description is best done with the methods of visual analysis. Thus, a combination of the two methods, text- and content-based management, can be more effective.

Image data repository is another important issue in image management. Thanks to the new features in Object Relational Database Management Systems (ORDBMS), there are now possibilities to handle various types of non-alphanumeric data. Hence, the management of visual data with their content and textual descriptions can be facilitated in an Object Relational (OR) paradigm [14]. Since the description is usually both in text and visual content, the visual query interfaces can be developed on top of ORDBMSs. Thus, extending the OR query system to fit the requirements of visual query can address many of the demands in various applications.

The many researches conducted on CBIR resulted on algorithms that are capable of performing comparison of, say two images, for similarity [7, 15, 16, 45, 50]. These works show that these algorithms can be integrated in the existing DBMSs as plug-ins or as cartridges. Hence, similarity-based operations are in the verge of being made part of ORDBMSs. However, there is still a way to go in order to integrate the new requirements of visual data management effectively with the existing systems.

Motivation and Problem Identification 61

Therefore, the main objective of this thesis is to contribute to the process of this integration for the sake of developing more efficient visual data management systems. Considering what have been done on both DBMS and computer vision in the previous years, it is our conviction that, an interdisciplinary research effort is required to alleviate the constraints in image data management. Though there are some efforts in this respect, most of the existing systems lack a formal framework to adequately exploit hybrid1 image management and query features. This includes lack of adequate image data model, inexistence of effective image data repository model, absence of a formal algebra for similarity-based operators on image contents, lack of adequate query interfaces, and lack of precision of the existing image processing and retrieval techniques.

2.3 Summary

Relational Database Management Systems have been so effective in dealing with the management of alphanumeric data for the last three decades. But, recent years have shown that there is a need to deal with the management of multimedia data (consisting of text, image, video, audio, etc.) as well. The limitation of the relational approach to manage image data has been discussed in many reviews and research works. Proposals have been made on one hand to extend capabilities of the traditional RDBMSs to effectively handle visual data and on the other hand to improve the techniques of automatic content extraction and the use of this for similarity-based visual data retrieval. What is actually going on in both the industrial and academic sectors is that existing systems are being extended in a way they can as well handle the new requirements of visual data management as justified by M. Stonebraker et al. [14]. Though there exist some efforts in this respect, most of the existing systems lack a formal framework to adequately exploit the advantages of hybrid query features for image data management.

In this Chapter, we have shown what is really missing in the existing systems to fulfill content-based image retrieval requirements. We supported our argument with example queries. We further raised issues that need to be solved in order to address such requirements. Therefore, in the following Chapters we will address the issues that include: the development of a convenient image data repository model that considers the constraints of dealing with large size and complex data and that enables us to deal with salient object related data, the

1 A hybrid query is a query that contains a combination of both metadata- and content-based image

retrieval expressions.

62 Motivation and Problem Identification

definition of the necessary similarity-based operators and their properties, the development of similarity-based query optimization techniques, and the development of a prototype to show that our proposals are applicable.

CHAPTER 3

RELATED WORK

"Art is I; Science is we. " Claude Bernard [88]

The current capabilities of image retrieval systems are the cumulative results of a large number of researchers’ works. In both the areas of DBMSs and Computer Vision, as we cited in Chapter 2, a large number of research works have been conducted [1, 2, 7, 12, 14, 15, 16, 18, 21, 56]. In this chapter, we will review the different works that are relevant to the subject of this thesis. First, the existing main image data models are presented. Then, we review: the fundamental issues involved in image analysis for content-based image retrieval, the CBIR methods, the major CBIR systems that are available either as research prototypes or commercial systems, the current major standards that are relevant to image data management, multi-dimensional indexing techniques that are developed to facilitate more efficient retrieval, typical CBIR applications, the efforts to integrate CBIR modules into the standard DBMSs, and the efforts made to develop DBMS oriented similarity-based algebra. Finally, a short summary of what these systems lack and why our work is useful in alleviating some of the shortcomings in the area are presented.

64 Related Work

3.1 Image Data Models

An Image Data Model is an abstraction that is used to provide a conceptual image data representation. It is a set of concepts that can be used to describe the structure and content of an image. Images are richer in information than text, and are more complex because images can be interpreted differently depending on the individual’s human perception, the application domain, and the context in which they are used. There are different proposals of an image data model for representing and capturing the semantic and content richness of image data [19, 59, 91, 92, 93]. While most of them are more or less dependent on the image application domain, some generic models are also proposed. In the following paragraphs, we will briefly present some of the main proposals of image model in the literature.

In the early 90s, Virage proposed a model for visual information, called the Visual Information Management System (VIMSYS) (see Fig. 3.1) [91]. Unlike traditional systems of this period, this model recognizes that most users prefer to search image and video information by what the image or video actually contains, rather than by keywords or descriptions associated with the visual information.

Figure 3.1: The VIMSYS Model.

The VIMSYS data model for visual information proposes four layers of information abstraction: the raw image (the Image Representation Layer), the processed image (the Image Object Layer), the user’s features of interest (called the Domain Object Layer) and the user’s events of interest for images (the Domain Event Layer). The top three layers form the content of the image. Though this model has identified well the need to use image objects in retrieval, it has made less emphasis on the semantic and context representation of images.

3.1 Image Data Models 65

The image data model presented by Grosky et al in [19] views an image in two levels: as physical and logical representation levels (see Fig. 3.2). The logical level contains the global description and content-based sublevels. The global description level consists of the meta and the semantic attributes of the image. The content-based layout contains the object features consisting of color, texture, shape, and spatial object characteristics. The physical level contains the image header and the image matrix. However, this system tends to represent low-level features just as logical features and does not consider the physical representations of salient-objects and the relationship between salient objects. Furthermore, no option is given to describe any relevant information that is external to the image itself.

Figure 3.2: Semantic schema of the Model.

The image data model proposed by R. Chbeir in [59] presents an image data model that considers not only the content of the image but also the information associated to the image contextually and semantically. It can also capture the semantic and contextual information of salient objects other than its low-level contents. Furthermore, it considers several types of relations (temporal, spatial, semantic, etc.) between salient objects. This model describes an image data in several levels of abstraction (see Fig. 3.3). It has two main spaces: the external space and the content space.

66 Related Work

Image Model

Content Space External Space

Context-oriented subspace

Domain-oriented subspace

Image-oriented subspace

Physical Feature

Spatial Feature

Semantic Feature

Relation

**

* *** * *

*

Representation Model

* Possess

Value

Descriptor

*

*

1..n

1..n 1..n 1..n

* 1..n1..n

Significance / Description

Figure 3.3: An image data model in UML notation.

The external space captures the general information associated to the image data that are external to the content of the image. The data in the external space are all alphanumeric. The external space has three subspaces:

• The context-oriented subspace: contains application-oriented data that are completely independent of the image content and have no impact on the image description. For example, in a medical application, it contains information such as the hospital’s name, the physician’s identity, and the patient’s name.

• The domain-oriented subspace: consists of data that are directly or indirectly related to the image. This subspace is very important in that it allows one to highlight several associated issues. For example, in medical image domain, it contains information like the physician’s general observations, previous associated diseases, etc. The domain-oriented subspace can also assist in identifying associated medical anomalies.

• The image-oriented subspace: corresponds to the information that is directly associated to the image creation, storage, and type. For example, in medical domain, we need to distinguish the image compression type, the format of image creation (radiography, scanner, MRI, etc.), the incidence (sagittal, coronal, axial, etc.), the scene, the study (for instance, thoracic traumatism due to a cyclist accident), the series, image acquisition date, etc. These data can significantly help the image content description.

3.1 Image Data Models 67

The content space describes the content of the image not only using content-based representation, but also using metadata description. It consists of physical, spatial and semantic features. The same representation is inherited by the salient object descriptions. The content space maintains relations between the salient objects and the image. The content space has three features:

• The Physical Features: describe the image (or the salient object) using its low-level features such as color, texture, etc. The color feature, for instance, can be described via several descriptors such as color distribution, histogram, dominant color, etc. The use of physical features allows responding for CBIR queries in medical systems such as: “Find lung x-rays that contain objects similar (by color) to a salient object, SOi”.

• The Semantic Features: integrate high-level descriptions of an image (or salient-objects) with the use of application domain oriented keywords. In the medical domain, for example, terms such as organ or case names (lung, trachea, tumor, etc.), state (inflated, exhausted, dangerous, etc.), and semantic relation (invade, attack, compress, etc.) are used to describe medical image content. The use of semantic features is important to respond to traditional queries in medical systems such as: “Find lung x-rays where hypervascularized tumor is invading the left lung”.

• The Spatial Features: are intermediate (middle-level) features that concerns geometric and topological aspects of images (or salient objects) such as shape and position. Each spatial feature can have several representation forms such as: MBR (Minimum Bounding Rectangle), bounding circle, surface, volume, etc. The spatial feature is used to identify the relations between salient objects such as metrical (near, far, etc.), directional (right, left, above, front, etc.), and topological (touch, disjoint, overlap, equal, etc.) relations. The use of spatial features allows one to respond to queries in medical systems such as: “Find lung x-rays where an object SO1 is above an object SO2 and their contours are disjoint”.

The practical importance of this model is demonstrated by a medical image data management application though the model is so generic and can be applied to different areas of image data management.

68 Related Work

3.2 Image Analysis for Content-based Image Retrieval

Image Analysis is a branch of image processing that deals with the understanding and description of digital images by their low-level features of color, texture, shape, etc. It is a phase of feature extraction to first represent the image in a form of a histogram or a feature vector. It encompasses the identification of gray areas such as object recognition by the methods of feature analysis [75]. So, it is an important phase in the process of content-based image retrieval. Other fundamental issues in the development of a CBIR system include: the development of a content-based image retrieval engine that uses the features extracted for the similarity comparison and the design of a user interface. In the following Sections, we will briefly discuss the issues involved in feature extraction of images and in identification of objects of interest in an image.

3.2.1 Feature Extraction

The features of an image are the characteristics that describe its content. Image features, in general, may include text-based features (or metadata obtained through an annotation process) that are commonly referred to as high-level features and visual features such as color, texture, and shape that are referred to as low-level features. The fully automatic generation of high-level feature descriptions of images is not possible at the current stage. Thus, we refer here only to the visual (low-level) features of images, which can be generated fully automatically. The extraction of visual features of images and their representation is the basis or the first step of content-based image retrieval. Feature extraction is the process or the analysis of the pixels that form the digital images in order to generate an equivalent mathematical representation [98]. Commonly, these features are represented by means of a histogram or a vector. For any given feature, multiple representations exist that characterize the feature from different perspectives [42, 98]. For example, for certain applications only the texture features are important. For others, for example, it could be only the color features or an average combination of the color and texture features (see the next paragraphs for more details on these features). The type of feature that needs to be extracted and used depends on the particular application requirement. Hence, certain applications use only some of the features while others are more effective when an average or a specified ratio of each of the features is used. The mathematical representations of the extracted features are used to computationally determine whether two images are similar, based on a certain measure. The extracted features are represented in a feature vector form that can be mapped on an n-

3.2 Image Analysis for Content-based Image Retrieval 69

dimensional space and similarity is determined on the basis of their distance. Figure 3.4 [18] shows a color histogram representation and its features in a 64-dimensional vector.

Figure 3.4: Color histogram of an image represented as a 64-dimensional feature vector.

The diagrams at the center of Fig. 3.4 show the histograms of the three-color components of R (Red), G (Green), and B (Blue) in the RGB color space. The feature vector is shown at the bottom of the figure, where vi is the number of pixels having a given RGB value. In this example the vector has 64 entries. However, the dimension of a feature vector can vary depending on the application domain. Of course, a higher dimension requires more computing time and may slow the process of indexing and that of retrieval.

In the following Sections, the features of color, texture, and shape are described. Furthermore, techniques of image segmentation and the use of salient objects in CBIR are discussed.

3.2.1.1 Color

In image content description, color is one of the most frequently used visual features. The color feature is thus, widely used for the purpose of content-based image retrieval. Several methods of describing images on the basis of color have been studied in the literature [66, 67]. For this representation, each image is analyzed to compute a color histogram, which shows the number of pixels of each color component (for example, R, G, and B in the RGB color space) within the image. There are different models of representation of color other than the RGB color space. Another example of a frequently used color spaces is the HSV

70 Related Work

color space that defines a color space in terms of three constituent components: Hue, Saturation, and Value. The Hue represents the color type such as red, blue, or yellow and is measured in values of 0-360 by the central tendency of its wavelength. The Saturation represents the 'intensity' of the color (or how much grayness is present). It is measured in terms of values of 0-100% by the amplitude of the wavelength. The Value represents the brightness of the color and is measured in values of 0-100% by the spread of the wavelength. More details on these are given in [75]. Other color spaces include HMMD, YCrCb, Linear Color transformation with reference to RGB, and Monochrome where most of them are supported by MPEG-71 standard [24]. Transformation algorithms from one color space representation model to the other also exist [67]. The choice of a color representation model depends on the particular application requirements.

In most CBIR systems, during image retrieval by color features, the user can either specify the desired proportion of each color (50% green, 25%, and 25% red, for example), or submit an example image from which a color histogram is calculated. Other than the color space, MPEG-7 supports also descriptors based on color such as color layout, dominant color, color quantization, etc. [24]. Color layout, for example, considers both color feature and spatial relation within an image. Many research results suggest that color layout is a better solution for color based image retrieval [69]. Color layout extends the concept of global color feature to a local one by dividing the whole image into sub-blocks and extracting the color features from each block separately. An example of such an approach is the quad-tree based color layout. In this approach, the image is split into a quad-tree structure and a color feature is extracted for each branch block of the tree [68]. A more precise but sophisticated approach is to use the techniques of image segmentation and extract the color features for each of the salient objects [70].

3.2.1.2 Texture

The texture feature deals with the visual patterns of an image that expresses its properties of homogeneity that do not result from the presence of a single color or intensity [90]. It describes the structural arrangements of surface properties of an image with their relationship to their surrounding environment. Figure 3.5 shows sample images where texture feature can be used to describe and represent images better than other features. The identification of specific textures in an image is achieved by modeling texture as a two-dimensional gray level

1 MPEG-7 is an international ISO standard for multimedia content description (see Section 3.5.2 for

details).

3.2 Image Analysis for Content-based Image Retrieval 71

variation. The brightness of pairs of pixels, for example, is computed in such a way that their relative degree of contrast, regularity, coarseness, and directionality may be estimated [89]. Here, the main issue is the process of identifying the patterns of co-pixel variation and associating them with particular classes of textures such as “silky” or “rough”. Due to the importance and usefulness of texture features, there are a lot of works done in image retrieval based on texture features [64, 65, 90]. Among the different techniques that have been used for measuring texture similarity is a method known as second-order statistics that calculates the relative brightness of selected pairs of pixels of an image in order to calculate the measures of image texture [62, 63]. Other methods of texture analysis for image retrieval include the use of Gabor filters [64] and fractals [65].

Figure 3.5: Sample images where texture features could be more effective than color features in a CBIR.

3.2.1.3 Shape

The shape feature of an image deals distinctively with the geometric appearance or characteristics of the objects in an image. It is a common experience that most natural objects are primarily recognized by their shape features. In most cases, the shape feature is invariant to translation, rotation, and scaling. Shape-based representation of 2D images can be categorized into two: boundary-based and region-based [71]. Region-Based Shape description can represent objects that may consist of either a single region or a set of regions as well as some holes in the object as illustrated in Fig. 3.6 [24]. Since the region-based shape descriptor makes use of all pixels constituting the shape within a frame, it can describe any shape, i.e., not only a simple shape with a single connected region, as in Fig. 3.6 (a) and (b), but also a complex shape that consists of holes in the object or several disjoint regions as

72 Related Work

illustrated in (c), (d), and (e) of Fig. 3.6, respectively. The advantage of region-based shape descriptors is that it can describe diverse types of shapes efficiently in a single descriptor and that it is robust to minor deformation along the boundary of the object.

Figure 3.6: Sample cases for shape description requirements.

Boundary-Based Shape description captures shape features of an object or region based on its contour. The boundary-based shape descriptor is based on the Curvature Scale Space representation of the contour. This type of description captures perceptually meaningful features in that: it reflects properties of the perception of the human visual system, it captures very well defined characteristic features of the shape, and it is robust to non-rigid motion. This offers a means for effective similarity-based retrieval.

Among the shape descriptors that MPEG-7 supports are: Contour Shape and Region Shape descriptors [24]. The Contour Shape descriptor captures characteristic shape features of an object or region based on its contour or boundary. It uses Curvature Scale-Space representation, which captures perceptually meaningful features of the shape.

3.2.1.4 Segmentation

Segmentation distinguishes or separates meaningful data entities or regions-of-interest of an image. It is an important technique for effective image retrieval. Researchers in both the computer vision and the image retrieval fields proposed different content-based image segmentation techniques. Some of the popular segmentation approaches are: edge-based methods, region-based techniques, and threshold techniques [53, 72, 73, 74, 102]. Edge-based methods center around contour detection. Their weakness in connecting together broken contour lines makes them prone to failure in the presence of blurring. Region 1 of Fig. 3.7 [75] (right) is an example of this case. In a region-based segmentation, the image is partitioned into connected regions by grouping neighboring pixels of similar intensity levels. Adjacent regions are then merged under some criterion involving, for instance, homogeneity or sharpness of region boundaries. Threshold techniques make decisions based on local pixel information. They are effective when the intensity levels of the objects fall squarely outside

3.2 Image Analysis for Content-based Image Retrieval 73

the range of levels in the background. However, since spatial information is ignored, blurred region boundaries can create confusion.

As discussed above, many algorithms can automatically segment regions, but not objects that have some useful semantic meaning. However, the identification of these objects with their interpretation is highly desirable in image retrieval. Thus, semi-automatic segmentation with human assistance is conducted in many image retrieval systems. In [101], for example, a computer-assisted boundary extraction approach, where manual input from the user and edges generated by the computer are applied in combination. A segmentation algorithm based on clustering and grouping in spatial, color, and texture space is proposed in [102]. In this case, the objects of interest are defined by the user and then, the algorithm groups these objects into meaningful regions.

Figure 3.7: Example of an image (left) and its segmented form (right).

3.2.2 Supporting Salient Objects in CBIR

An image is a complex item that may contain different objects of interest. The particular meaningful objects of interest in an image are referred to as Salient Objects in the literature [39]. In a content-based retrieval, we may be interested only on a particular object in an image. Hence, image retrieval systems should enable us to operate on these particular objects of an image. In current practices, CBIR systems have shown the effectiveness of using salient objects for image similarity-based comparisons. Hence, similarity-based search engines such as facial image retrieval engines and medical images retrieval engines use segmentation methods to automatically or semi-automatically identify salient objects and use them in the retrieval process [42, 53, 72, 102, 104]. The integration of salient objects in similarity-based image retrieval operations is thus, an important issue.

74 Related Work

3.2.3 Multi-dimensional Indexing

Data structures used in traditional database systems, such as B- trees, use one-dimensional ordering of key values that do not work in image database systems, as they cannot describe the complex property of an image. Thus, the need to explore the new characteristics of content-based image indexing and the requirements of indexing structures in image retrieval systems is very high. Multi-dimensional indexing techniques have been studied since the middle of the 1970s where indexing methods such as the quad-tree and the k-d tree were first introduced. However, their performance was far from being satisfactory as the need and domain of application for image retrieval increased. The three major research communities contributing to the area of multi-dimensional indexing are computational geometry, database management, and pattern recognition [43, 132, 133, 134]. Initiated and developed by research works in these areas, the existing popular multi-dimensional indexing techniques include bucketing algorithm, k-d tree, priority k-d tree [134], quad-tree, K-D-B tree, hB-tree, R-tree and its variants R+-tree and R*-tree [28, 131, 132, 133]. A detailed review and comparison of various indexing techniques for image retrieval can be found in [10, 30, 31, 32, 33, 130, 134, 135].

When CBIR systems handle only hundreds or at most a few thousands of images, a sequential scan of all the images may not seriously degrade the system’s performance. However, as image collections increase in size, retrieval speed will be a bottleneck. This is therefore where efficient multi-dimensional indexing is highly needed. Although some progress has been made in indexing techniques for large collections of image data, effective high-dimensional indexing techniques are still in urgent need of exploration. Multi-dimensional indexing is not within the scope of this thesis, but useful works can be found in the literatures cited above.

3.3 Content-Based Image Retrieval Methods

For a given query image, a CBIR algorithm searches an image or a set of images that are similar to the query image from a collection of images or an image database, based on a certain similarity measure. For this, the feature vector representations of the images, the measure of similarity, and the similarity search algorithm to be used must be determined first. The feature vector representation of the query image and the set of images to be searched for similarity are required to be formed with identical or compatible feature measures. During a CBIR, the similarity of two images is defined as the proximity of their feature vector representations in a feature space. The feature vector representation of the images is mostly

3.3 Content-Based Image Retrieval Methods 75

of high dimensional. Hence, the similarity-based computation becomes highly complex. To alleviate this problem of computational complexity, research works suggest a method to first reduce the dimension of the feature vector into a reasonable size using dimension reduction functions. This dimension reduction, however, should be in such a way that the recognition rate is not much degraded. Researches and experimental results in dimension reduction are discussed in [47, 95, 96]. The other suggested alternative method to alleviate the complex computation of similarity search is to use parallel processing facilities. Parallel processing methods for similarity searching are discussed in [4].

The most common methods used for CBIR are: the k-NN (k-Nearest Neighbor) search [9, 11, 34, 35, 97] and the Range Query search [13, 25, 80, 81, 82, 99]. The first method searches the k closely similar images, where k is a positive integer (and less than the number of images in the collection). The latter method searches all images that are within a given radius ε in the feature space. This method returns an unidentified number of images (from zero to some number n) that are similar to the query image. The number of returned images depends on the local density of the image representations in the feature space and on the value of ε. So, the value of ε must be selected in an optimal manner that considers both the user's needs and the data point density in the feature space. Often the value of ε is selected by the user.

In the remaining part of this thesis, we will refer to the "distance" between two images as the distance between the data points representing the images in the n-dimensional feature space. Since our similarity-based operators (defined in Chapter 5) are based on similarity-based image search methods, a brief description of these methods will help to understand some fundamental concepts. Hence, for the purpose of reference and comparison, we give the definitions of both the k-NN and the range query search methods below.

Definition 3.1: k-Nearest Neighbor Search (k-NN)

Given a set of images S, a query image q, and a positive integer k; the k-Nearest Neighbors to the query image q denoted as NNk(S, q) are the first k images that are at the closest distance from q in the feature space than any of the other images in S. More formally:

NNk(S, q) = {{o'1, o'2, ... , o'k} | o'i ∈ S, for i = 1, ..., k and ||o'i-q||≤||o-q|| ∀o ∈ S-{o'1, o'2, ... , o'k}},

where the notation ||a-b|| stands for the distance between the image objects a and b in the feature space.

76 Related Work

The value of k is assumed to be less than the cardinality of S and at the instance where there are more than one images at the kth closest point to q, the k-NN search algorithm selects one randomly. For k=1, it searches for the most similar object and is referred to as just the NN search.

In similarity-based range query, all the images that are within a given distance ε to a given query image are searched. A formal definition of similarity-based range query is given below.

Definition 3.2: Similarity-Based Range Query

Let S be a set of images, q be a query image, and ε be a positive real distance value. A similarity-based range query of q with respect to S and ε, denoted by Rε(S, q), is a similarity-based query that returns all image data points of S that are within distance ε from the query image q. Formally, it is given as:

Rε(S, q) = {o'∈S | ||o'-q|| ≤ ε}, where ||o'-q|| denotes the distance between o' and q.

We will use the range query method to define our similarity-based operators. The range query method possesses better properties as compared to the k-NN search, which can be used for similarity-based query optimization. When using k-NN, we cannot have control on the similarity of the resulting images. It just returns the k closer images, independent of their degree of similarity. However, with the range query the user can control the returned images by increasing or decreasing the value of ε. In range query, it is possible that a query image may not have a similar image in the set of images. This is coherent with the practical usage of similarity-based query. The impacts of this useful property are discussed in detail with the properties of our similarity-based operators in Chapter 5.

3.4 Content-Based Image Retrieval Systems

A large number of CBIR systems have been developed since the early 1990s [12, 21, 36, 41, 46, 87]. These systems are research prototypes of which some of them were later converted to commercial systems. Most of the systems support one or more of the options of search: by example, by sketch, or by keyword. It can be said that each of these systems has made its contributions to the development of CBIR, but none can satisfy all of the requirements in the different application areas. This is mainly because content-based image retrieval applications are mostly domain dependent. Since CBIR systems are the founding background for the prospects of a system of similarity-based operations under a DBMS environment, we present

3.4 Content-Based Image Retrieval Systems 77

below some of the frequently mentioned CBIR systems with their distinct characteristics. Our intension in this thesis is not to develop a new type of CBIR. It is rather to use the available or the future CBIR techniques under a DBMS environment for the purpose of efficient image retrieval. Thus, our presentation in this section is not to compare the CBIR systems with any thing that is ours, but to view what systems exist at hand and which features they support. This will permit us to propose our models and system of operations based on the requirements in the field of image management.

3.4.1 QBIC

QBIC (Query By Image Content) is a content-based image retrieval system that is developed by IBM Almaden Research Center, San Jose, CA [54, 55, 76]. It is one of the best-known content-based image retrieval systems released in the early 90s. QBIC allows querying collections of images by their content. Queries by content complement traditional queries that use image file names or keyword descriptions.

QBIC offers retrieval by any combination of color, texture or shape, as well as by text keyword. The system extracts color, shape and texture features from each image and stores them into a database. In case of color based feature extraction, the average color feature vector of an object, or the whole image (in certain color space model supported by QBIC) are extracted automatically from the image. The texture features used in QBIC are modified versions of the coarseness, contrast, and directionality features proposed in [62]. The shape features consist of shape area, circularity, eccentricity, major axis orientation and a set of algebraic moment invariants [55]. These features are extracted automatically and/or semi-automatically for all the object contours and stored in the database population. QBIC also implements a method of retrieving images based on rough user sketch (see Fig. 3.8).

In QBIC, image queries can be formulated by: selecting color samples from a palette, specifying an example query image, or sketching a desired shape on the screen. At search time, the system matches appropriate features from the query and the stored images, calculates a similarity score between the query and each stored image, and displays the most similar images on the screen as thumbnails. QBIC also allows queries based on selected color and texture patterns. In this case, the user chooses colors or textures from a sampler. Figure 3.8 [160] shows an interface of color and color layout palettes for user sketch-based query. For example, one can search for images that have predominantly a red color or a blue color at top left corner and a red color of certain shape at the bottom right corner of the image. The percentage of a desired color in an image is adjusted by moving the sliders. As a result of a

78 Related Work

query, the best matches are presented in decreasing order of similarity with an optional matching score aside.

QBIC uses R*-tree based indexing methods to improve search efficiency [107]. The latest version of the system incorporates more efficient indexing techniques, an improved user interface, the ability to search gray-level images, and a video storyboarding facility [106]. QBIC is available either as part of other IBM products such as the DB2 Image Extender and Digital Library or in standalone form.

Figure 3.8: QBIC’s color search palette (left) and color layout search palette (right) for user sketch-based query.

3.4.2 The VIR Image Search Engine

The VIR image search engine is a CBIR system that is developed by Virage Inc. It is a well-known image retrieval engine that is available commercially as standalone or as a plug-in tool into other systems [36, 45, 55, 161]. The VIR Image Engine has been designed with the goal that it can be applied in different application domains and system scenarios. Thus, it makes no presupposition about the application or the system components in which it can be integrated.

The VIR Image Engine performs two main functions: image analysis, and image comparison. The result of the image analysis is a feature vector that is a compact and a semantically rich mathematical representation of the visual content of the image. The image comparison engine takes two feature vectors and a set of weights for the visual primitives as an input, and compares them for similarity. As a result of the comparison, it produces a quantified similarity "score" that measures the visual "distance" between the two images represented by the feature vectors. Scores are normalized in the range [0 .. 100], and are

3.4 Content-Based Image Retrieval Systems 79

independent of the population of images being compared. A value of 0 indicates a perfect similarity match.

The VIR Image Engine extracts and uses a set of general primitives such as global color, local color, texture, and shape. Furthermore, apart from these, various domain specific primitives can be created when developing an application. When defining such a primitive, the developer supplies a function for computing the primitive's feature data from the raw image.

When the VIR Image Engine is implemented as an add-on in a system, the presentations of the result and the storage of the feature vector data are handled by the system in which it is integrated. It has no indexing mechanism given in the original prototype [45]. The VIR Image Engine was integrated into databases such as Sybase, Object Design, and Objectivity, and added as a component to Oracle 8 and its later versions. Another example of application of the Virage technology is the Alta Vista Photo Finder1 that allows web surfers to search for images by their content. Virage technology has also been extended to many video management systems [45]. Since the VIR image engine is made part of the Oracle 9i DBMS, it is this engine that we used in our prototype (see Chapter 7).

3.4.3 Photobook

Photobook is a content-based image retrieval system that is developed by the Vision and Modeling group of the Massachusetts Institute of Technology (MIT) [104]. Photobook was one of the most influential CBIR systems of the early 1990s. It includes an interactive learning agent called “FourEyes”. The system FourEyes selects and combines models based on examples from the user. This is a feature that makes Photobook different from systems like QBIC and Virage, which all support search on various features but offer little assistance in actually choosing one for a given task. For example, in Photobook the user can formulate the query: "show me the images which look like this one" in terms of overall content; or "show me the images whose Fourier transform looks like this one". The user can choose among a large repertoire of algorithms for such queries. These include algorithms that incorporate feedback into the other algorithms and allows them to "learn" the user's concept of image similarity. FourEyes permits users to directly express their retrieval need. This has been successfully applied in a number of applications involving the retrieval of images by their texture, shape, and human face, each using features based on a different model of the image. More recent versions of the system allow users to select the most appropriate feature 1 See http://image.altavista.com/cgi-bin/avncgi.

80 Related Work

type for the retrieval problem at hand from a wide range of alternatives [104] and include an interactive scene segmentation and annotation facility.

To perform a query, the user selects some images from the grid of still images displayed and/or enters an annotation filter. As shown in Fig. 3.9 [143], from the images displayed, the user can select another query image and reiterate the search. As a result of the query, the returned images are sorted by descending order of similarity. For any image in the database, its distance to the average of the prototypes is computed and stored for future database search. An example is the Viisage Technology1 that uses the face recognition techniques of Photobook in the “FaceID” package. This is a system that is currently in use in several US police departments for criminal investigation purpose [21].

1 See http://www.viisage.com/.

3.4 Content-Based Image Retrieval Systems 81

Figure 3.9: Results of a query from a web demo of Photobook’s facial image search engine. The result is a search for the image at the top left.

3.4.4 MARS

MARS (Multimedia Analysis and Retrieval System) is a CBIR system that was originally developed at the Department of Computer Science of the University of Illinois at Urbana-Champaign [105]. It was further upgraded at the Department of Information and Computer Science of the University of California at Irvine [43]. The system characterizes each object within an image by a variety of features and uses a range of different similarity measures to compare a query image with a set of stored images. In MARS, the user feedback is used to adjust feature weights, and if necessary to invoke different similarity measures [12, 43].

The MARS system supports queries on combinations of low-level features and textual descriptions. Color is represented using a 2D histogram over the HS coordinates of the HSV color space. Texture is represented by coarseness and directionality histograms and a scalar

82 Related Work

defining the contrast. In order to extract the color/texture layout, the image is divided into a grid of 5 by 5 sub images. For each sub-image, a color histogram is computed. For the texture of a sub-image, a vector based on wavelet coefficients is extracted. The objects in an image are segmented in two phases. First, a k-means clustering method in the color/texture space is applied. Then, the regions detected are grouped by an attraction-based method. For example, to compute the texture similarity between two corresponding sub images, it uses the Euclidean distance between their vector representations. A weighted sum of the 25 (obtained from the 5 by 5 grid of sub images) color/texture similarities is used to compute the color/texture layout distance between two images. As a result of a query, images are listed in order of decreasing similarity. MARS uses relevance feedback to support individual human similarity judgments.

A relatively newer version of the MARS system, “WebMARS”, is also developed [162]. WebMARS extends MARS in that it exploits the vast amounts of web resources to obtain documents containing both textual and visual objects. Figure 3.101 shows initial and refined query results using WebMARS.

Figure 3.10: An initial query result (left) and a refined query result (right) using WebMARS. 1 http://www-db.ics.uci.edu/pages/demos/index.shtml

3.4 Content-Based Image Retrieval Systems 83

3.4.5 Surfimage

Surfimage is developed by a research group at INRIA Rocquencourt, France [103]. Surfimage and MARS have many similarities. They both support multiple types of image features that can be combined in different ways. They also offer relevance feedback in order to facilitate more precise retrieval. Surfimage uses advanced features, which makes it uniquely flexible than the other image retrieval systems. These advanced features include a combination of signature and the use of relevance feedback based on density estimation.

Among the various low-level features that are supported by Surfimage are RGB color histogram, edge-orientation histogram computed after applying a Canny edge detector, texture signature derived from the gray-level co-occurrence matrix, Fourier transforms, and Wavelet transform. In the case of choosing more than one feature for a query, the corresponding similarity measures are combined in a final weighted measure, in which case, some of the distance measures are normalized so that they have values in the range [0, 1].

The visual interface of Surfimage permits the user to select image features and appropriate similarity measures from a displayed list. A user can also specify the weights that define the importance of each feature in the retrieval. Figure 3.111 shows the visual interface of Surfimage with an example query on a face database.

In conclusion, with these different prototypes, we see that many efforts are being done to develop different techniques for effective CBIR. Each of these prototypes has contributed some useful method and approach. However, none of them have directly answered the requirements of managing image data under a DBMS. None of them have responded to the question of a formal algebra for similarity-based operations for queries involving images.

1 http://www-rocq.inria.fr/imedia/Articles/MM98/node9.html (see this site for more descriptions and

figures)

84 Related Work

Figure 3.11: Retrieval for the top left face. Retrieved images are from top left to bottom right in order of best match.

3.5 Standards Relevant to Image Data Management

There are different standards that can directly or indirectly affect the manner in which content-based image retrieval and image data management systems are used [21, 94, 100, 108, 113, 114]. Therefore, a review of some of the basic aspects of these standards will help us to see how the proposals in this thesis are coherent with most of them. Hence, among the many existing and emerging standards related to image management, we present only the most relevant ones in the sequel.

3.5.1 Image Compression

Since high-resolution images can occupy large amount of storage space and impose many constraints on image transmission, there is an increasing importance of image compression. Thus, different methods for image compression are proposed and most are in use today. Some of these methods have become standards or de-facto standards in many domains of applications. Among the many image compression standards are: the Joint Photographic Experts Group (JPEG) format, the Graphics Interchange Format (GIF, defacto), the Joint Bilevel Image Group (JBIG) format, and the Portable Network Graphics (PNG, defacto) format.

3.5 Standards Relevant to Image Data Management 85

It is important to note that image compression should not degrade too much the quality of the feature extraction or image representation. The higher the resolution of the image, the better representable the image is. One approach used to eliminate the possibility of degradation of the feature extraction is to extract features before compression. There are also systems that set all images to a standard resolution so that feature extraction could be more representative.

It is thus evident that image representation and content-based image retrieval systems need to take the different compression methods into consideration. In this respect, most CBIR systems such as QBIC, Surfimage, and Photobook support the different compression methods. The VIR Image Engine that we used in our prototype also supports many of these image compression formats.

3.5.2 The MPEG Standards

MPEG, which stands for Moving Picture Experts Group, is the name for a working group of ISO/IEC that is in charge of the development of standards for coded representation of digital audio-visual data [24, 109, 110]. The MPEG group has produced different standards such as: MPEG-1, MPEG-2, MPEG-4, and MPEG-7. MPEG-1, MPEG-2 and MPEG-4 are standards for coding audio-visual content. More specifically, MPEG-1 deals with the standard on which such products as Video CD and MP3 are based. MPEG-2 deals with the standard on which products such as Digital Television set top boxes and DVD are based. MPEG-4 covers a wide range of applications and allows object-based coding as compared to frame-based coding as used in MPEG-1 and MPEG-2. MPEG-7 is a standard for multimedia content description. The work on the new standard, MPEG-21, "Multimedia Framework", is on going since June 2000 [108]. MPEG-21 will create an open framework for multimedia delivery and consumption, with both the content creator and content consumer as focal points. Currently, there are many elements (standards) that exist for the delivery and consumption of multimedia content, but there is no a standardized approach to describe how these elements relate to each other. Hence, MPEG-21 will allow existing components to be used together and will increase interoperability.

Since MPEG-7 is the one that is relevant to content description and image representation, we will present here an overview of this standard. MPEG-7 is one of the standards that are set to have an important impact in the CBIR field [24]. MPEG-7 addresses the issues of multimedia content description. It aims to set a standard framework for describing all aspects of a multimedia item’s content, including low-level and high-level

86 Related Work

descriptions of multimedia objects. It does not cover either the methods by which those features will be extracted, or the way in which search engines make use of the features for retrieval. Each individual object in a scene of a video, on a still image or within an audio can be described using MPEG-7 description tools. MPEG-7 visual descriptors include low-level features of images such as color, texture, shape, and localization. Each of these consists of elementary and sophisticated descriptors. Thus, an MPEG-7 based description of an image adequately describes the image in all levels. This makes the process of searching for a desired image easier, generic and effective.

Figure 3.12 [109] shows an abstract block diagram that depicts the scope and processing chain of MPEG-7 [24, 100]. MPEG-7 standardizes only how content must be described. Description production and its consumption are out of the scope of MPEG-7. The description production can be obtained, for example, by an automatic method of feature extraction (analysis). The description consumption part can be designed based on a particular application requirement. Hence, to fully exploit the possibilities of MPEG-7 descriptions, automatic extraction of features will be extremely useful. MPEG-7 uses XML schemas for the representation of content description and for allowing extensibility of description tools. The use of XML will facilitate interoperability in the future, since XML is becoming a popular and standard tool. The main elements of MPEG-7’s content description are: Descriptors (D) that define the syntax and the semantics of each feature representation, Description Schemes (DS), that specify the structure and semantics of the relationships between their components, and a Description Definition Language (DDL) that allows the creation of new Description Schemes, and Descriptors, and the extension and modification of existing Description Schemes.

Feature SearchExtraction Engine

MPEG-7Description

standardization

Search Engine: • Searching & filtering

• Classification • Manipulation • Summarization

• Indexing

MPEG -7 Scope: •

Description Schemes (DSs)

• Descriptors (Ds) • Language (DDL)

Featu re E xtra ct ion • Content analysis (D, DS) • Feature extraction ( D,

DS )

• Annotation tools (DS)

• Authoring (DS)

Figure 3.12: Scope and processing chain of the MPEG-7 Standard1.

1 http://www.mpeg-industry.com

3.5 Standards Relevant to Image Data Management 87

Since it can enable all search engines to potentially use compatible features, the MPEG-7 standard will enormously enhance system interoperability. This is most likely to have a major impact on image retrieval and image searching on the Web, which will become a far more efficient process once a significant number of images enhanced with MPEG-7 metadata become available. Since it supports a rich set of feature types and representation methods, MPEG-7 will definitely increase interoperability between CBIR systems in the huge marketplace of image management. Thus, MPEG-7 will widely be adopted by the development of the next generation of image retrieval systems. Currently, it has the support of academic and industrial institutions, including systems designers such as IBM and Virage, and large-scale video users such as CNN, the BBC, and INA (the French national film and TV archive). This will virtually guarantee its widespread adoption over the next few years.

In summary, it is reasonable to conclude that emerging image retrieval systems need to consider their feature representation’s compatibility to the MPEG-7 standard of image description. Our proposals in this thesis take the format of MPEG-7 descriptors into consideration. Since MPEG-7 descriptions are in XML, an important task to consider is the transformation of data (image description data) from an XML format to a database table’s format and vice versa. This will enable to exploit the existing efficient data management facilities of DBMSs, in addition to the XML tools, for image retrieval. In this regard, there are a number of initiatives and works in many domains of application to develop conversion tools [115, 116, 117]. This shows that the interoperability between XML and DBMSs is an important domain that will gain more interest in the future.

3.5.3 Query Language Standards

SQL is a standard that existed ever since the 1970s. SQL has passed through a number of revisions and updates including SQL 92 and SQL 99 [119]. All of the core functionalities of SQL, such as inserting, reading, and modifying data are basically the same throughout all its various versions.

There are many initiatives for further enhancements of the SQL standard. These include its improvement to access data on the Internet, particularly to meet the needs of the emerging XML standard [164]. There is also an effort for the integration of SQL with Java, either through Sun's Java Database Connectivity (JDBC) mechanism or through internal implementations [40]. The group that established the last version of the SQL standard is also considering how to integrate object-based programming models. The latest standard query

88 Related Work

language, SQL 99 (also known as SQL 3), was developed to address such advanced issues in SQL [118, 119]. The new features of SQL99 include:

• the ability to store either raw images or large character documents,

• the ability to have groups of “sub-columns” within a column. Each may be an array or a RefType,

• the ability to completely define a non-traditional data type,

• a complete library or set of data structures and routines that support still image data types and operations on those data types,

• a complete set of data structures, special full-text operations and SQL routines that support the loading, accessing, and maintenance of other multimedia data such as text, video, and audio, and

• routines and facilities in support of the management of types and classes of data that exist outside the domain of SQL.

Thus, it is due to these features of SQL that queries on new data types such as images and other multimedia data become possible. Since the acceptance of SQL99, DBMS vendors have been working hard to support its various aspects and features in their systems [16]. It is the open architecture and the object-oriented property of this last version of SQL that has enabled us, in our prototype, to query images by their content. We thus have effectively utilized SQL to manage queries involving content-based retrieval in a DBMS environment.

The Object Query Language (OQL) is a query language standard that supports the Object Database Management Group’s (ODMG) data model [24, 120]. OQL is very close to SQL 92. Its extensions encompass object-oriented notions like complex objects, object identity, path expressions, polymorphism, and operation invocation. With these extensions, OQL provides high-level primitives to deal with sets of objects without being restricted to its collection of constructs. It also provides primitives to deal with structures, lists, arrays, and treats such constructs with the same efficiency. However, OQL was initially designed for OODBMS and its implementation is association to systems that use this data model. A formal and complete definition of this language is given in [94].

3.5.4 The DICOM Standard

Because we have chosen to implement our prototype on a medical image data management application, it is useful to discuss briefly an important standard in the medical domain, called

3.5 Standards Relevant to Image Data Management 89

DICOM (Digital Imaging and Communications in Medicine)1. The emergence of digital medical image sources in the 1970s and the use of computers in processing these images after their acquisition led to a formation of a joint committee in order to create the DICOM standard. The primary objective of this standard is to create an aid for the distribution, transmission and viewing of medical images such as CT scans, MRIs, and ultrasound with their associated information [112, 113]. Thus, the standard describes a file format for the distribution of images. Based on this standard, a single DICOM file contains both a header (which stores information about the patient's name, the type of scan, image dimensions, etc.), and the image data itself (which can contain information in two or three dimensions). This is different from other popular formats, which store the image data in one file (*.img) and the header data in another file (*.hdr). Another advantage of a DICOM image data file is that it can be compressed to reduce the image size using lossy or lossless variants of the JPEG format, as well as a lossless Run-Length Encoding format (which is identical to the packed-bits compression found in some TIFF format images).

Figure 3.13 [113] shows the DICOM file header and the medical image associated with it. In this sample DICOM image file, the first few bytes are used for a DICOM format header, which describes the image dimensions and retains other text information about the scan. The size of this header varies depending on how much header information is stored. The image data follows the header information. A more detailed list of the DICOM header is shown on the right as displayed by a DICOM image file viewer. There are different tools to visualize the components (the header and the image) of DICOM data files and to perform different useful processes on the file [121, 122].

1 DICOM is developed by the American College of Radiology (ACR) and the National Electrical

Manufacturers Association (NEMA).

90 Related Work

Figure 3.13: A sample DICOM image data file and its components.

3.6 Database Management Systems that Support CBIR

Commercial database management systems have started to integrate content-based retrieval modules in their systems to support image data retrieval. Among the widely known DBMSs that demonstrated the importance of integrating CBIR modules into their systems are DB2, Oracle, and Informix. These are systems that take the biggest share of corporate DBMS implementations and what they do have an important impact in the market place. Version 5.2 and the later releases of DB2 Universal Database integrate the modules of QBIC to enable image retrieval by their low-level features. This component of DB2 that enables content-based image retrieval is referred to as Image Extender. The DB2 image extender offers retrieval by any combination of color, texture or shape - as well as by text keyword [6]. However, with the DB2 Image Extender, content-based operations on image contents are not processed in the traditional manner and do not support operations such as the "similarity-based join". It has no algebraic rules that permit to formulate content-based or a combination of content-based and textual queries for image retrieval. Image query is still limited to query by example and by sketch. Its retrieval capability is limited to the capabilities of QBIC. Other modules such as Text Extender for large text retrieval and Video Extender for video data retrieval are too incorporated in the DB2 Universal Database.

3.6 Database Management Systems that Support CBIR 91

The other widely influential DBMS that is increasingly supporting content-based image retrieval is Oracle. In its version 8 and later releases, Oracle integrated the VIR Image Engine of Virage Co. The VIR Image Engine provides the fundamental capabilities for analyzing and comparing images for similarity. But it has no facilities of persistent storage, user interface, and query processing or optimization schemes [21]. Oracle in its newer versions has defined a module called, “interMedia”. The Oracle interMedia module uses automated image feature extraction, object recognition, and similarity-based image comparison techniques to manage images by their content in addition to by their textual descriptions. With Oracle’s interMedia module we can combine both approaches in designing a table to support images. Traditional text columns describe the semantic significance of the image, and the ORDImage type of the module permits content-based queries based on feature attributes of the image (for example, how closely its color and shape match to a given image) [16, 58, 156, 157]. The recent version of Oracle, Oracle9i, is an object relational database management system. In addition to its traditional efficient relational data management capability, it provides support for the definition of object types, and includes the necessary operations (methods) that can be used to manipulate the objects. It includes support for BLOBs to provide the basis for inserting complex objects, such as digitized audio, image, and video to Oracle9i databases. Among the common operations of the Oracle9i interMedia module are: add ORDImage types to a new or existing table, insert an image in a table, generate signature (or feature vector representation) of images, store signature in a table under ORDSignature object type, import an image from an external file into the database, retrieve images similar to a query image, copy an image, convert an image to different formats, etc. Any content-based queries on images deal solely with the signature of the image that is an abstraction of the image itself. Thus, every image inserted into a table is first analyzed, and its signature is stored in the table. More about the Oracle’s way of managing image content is discussed in Section 7.1 later.

Though Oracle, with its newer versions, has recognized the importance of managing images with their visual features, and introduced many useful tools that are accredited to the VIR Image Engine, there are a number of requirements that make it incomplete with respect to managing images by their content in a DBMS. Among these are: the absence of convenient image repository model that facilitates complex operations on tables that contain images, that no formal algebra is used or developed for content-based operations on images or image tables, and no concept of query optimization is used for the content-based operations on images.

92 Related Work

Informix has also well acknowledged the importance of content-based image management under a DBMS environment. Informix incorporated the DataBlade module of Excalibur Technologies to manage images by their contents in a DBMS environment [15]. It provides the basis on which new or specialized image types and image processing technologies can be managed. Using the Excalibur's Adaptive Pattern Recognition Processing (APRP) techniques, Informix can perform content-based searching of images by their color, shape or texture. It has integrated a number of tools that permit to store, retrieve, and transform images. However, it fails to treat similarity-based image operations as database operations, lacks to follow a formal algebra, and has no query optimization features.

The above three DBMSs use content-based image management modules developed by different companies. Hence, their capabilities for content-based image management depend on the add-on modules that are integrated in their systems. A common feature of the add-on modules integrated in these DBMSs is that, given a query image, they search its most similar images from a database. That is, the attempts so far didn't exceed from one-to-many content-based image retrieval operations. Thus, images are not yet managed as the traditional data. A positive result of most of these works is that, one can use SQL-based statements to store and retrieve images. However, as discussed earlier, these systems are strongly limited in terms of supporting complex similarity-based operations and mixed queries involving both relational and similarity-based operations.

3.7 DBMS-Oriented Algebra

A relational database management system is composed of tables that are organized based on the relational model. The rows in a table consist of values for all attributes labeling the columns of the table. The rows are called tuples, and the set of all tuples in the table is called a relation. A RDBMS is intended for storing and accessing data effectively. It typically provides many services such as:

• Consistency: the system ensures that the updates to data do not violate a specified set of constraints,

• Concurrency Control: several users can access and update data simultaneously,

• Access Control: users can have various access rights and the system prevents unauthorized access to the data,

• Reliability and recovery: very little data is lost in case of, say, a program error that abnormally halts the execution of the program.

3.7 DBMS-Oriented Algebra 93

Moreover, in RDBMS, the operations on the tables are governed by a relational algebra. The relational algebra is the basic mathematical tool that made data retrieval in RDBMSs efficient and consistent. More description regarding this algebra is given in the section below.

3.7.1 The Relational Algebra and Query Optimization

The relational algebra is a formalism for expressing relational queries. It contains a set of basic operations for manipulating data given in tabular form. In addition, several derived operations (or operations that can be expressed as a sequence of basic operations) are also used. Among the many operations in relational algebra are the set operations, projection, selection, and join operations. Since relations are defined as sets, operations in relational algebra are set-oriented. Conventional set operations are thus useful for manipulating relations. The union, intersection, and difference operations are the basic set-oriented operations in relational algebra. It must be noted that the result of these operations is a relation again. That is, the closure property is satisfied.

The relational operators: projection, selection, join and the set operators are well established concepts and defined in all books on RDBMSs [14, 51, 52, 123, 124, 136]. Hence, we will not attempt to provide their definition here. However, these concepts will be used in the Chapters that follow.

The relational algebra and the query languages that are based on it had been the subject of many research activities for the last three decades as a result of which the efficiency of DBMSs is in its current status. Among the many by-products of these research activities is the use of the algebraic properties in query optimization. Query optimization is a process of choosing a suitable execution strategy for processing a query. The optimization of a relational query can be divided into two parts: transformation of the query on the level of the query language to a simpler and more efficient implementable form, and choice of the implementation method for each basic operation in a query. An optimization that uses the rules and properties of the relational algebra is referred to as algebraic optimization.

Relational query optimization is still an active research issue after more than twenty years of research and experience in the field [3, 9, 35, 83]. When it comes to image database systems, query optimization is a fresh research area. Indeed, most of the existing similarity-based retrieval systems focus mainly on their content-based retrieval engine and input/output capabilities and give less emphasis to similarity-based query optimization. In this thesis, we

94 Related Work

tried to extend the relational algebra under an OR paradigm and introduced a state-of-the-art similarity-based algebra.

3.7.2 Similarity-Based Algebra

There are some works done to explore the algebraic space required for a similarity-based algebra. Actually, one of the key issues, as stated by Adali et al. [8] is that, there has been no work on developing a formal algebra for similarity-based database query operations. Even the "multi-similarity algebra" that these authors propose remains at a higher abstraction level. That is, it does not address the definition of an "operational" similarity-based algebra usable for modeling, optimizing and processing similarity-based queries on visual features of images at the implementation level.

Ciaccia et al. [125] extend the work of Adali et al. with their development of the SAMEW algebra. This algebra introduces further user preferences such as weights, and also captures imprecision in feature representations. However, the implementation issues of the complex operations introduced are not addressed.

Other relevant works come from the domain of spatial databases. Typical "spatial join queries" involve two or more sets of spatial data and combine pairs (or tuples) of spatial objects that satisfy a given predicate [126]. For instance, one might be interested in finding all the pairs of objects that intersect with each other (intersection join). Brinkho et al. present in [127] a very detailed analysis of different implementation strategies for an intersection join. They show that a spatial sort-merge-based join based on a plane sweep technique outperforms a nested-loop one. However, the plane-sweep technique proposed for an intersection join cannot apply for the purpose of our similarity-based join, since they have different definitions and deal with different data sets (dimension of the feature vector of an image data is in general higher than that of spatial data).

Recent papers in spatial databases attempt to combine spatial join queries with an NN-search. For example, the works in [128] and [129] deal with spatial datasets. Hjaltason et al. [128] define a "distance join" between two input sets which computes the K-closest pairs of two input spatial object sets, ordered by the distance of objects in each pair. In the same paper, the authors propose a "distance semi-join" which groups the results of the distance join by the objects of the outer table retaining the pairs of objects with closest distance. Corral et al [129] reconsider the problem called K-Closest Pair Query (K-CPQ) and improved the implementation proposed in [128]. The ACOI Algebra [84] presents an algebraic framework to express queries on images, pixels, regions, segments and objects on CBIR systems such as

Summary 95

query by images and query by sketch for queries over spatial relations amongst objects within the images. However, it is not an algebra designed for use on a DBMS environment, where data is organized in form of tables.

The AQUA data model and algebra discussed in [139] describe an object-oriented model and query algebra built as a part of a project at Brown University for supporting bulk types such as trees, sets, bags, etc. AQUA, however, doesn’t address problems of DBMS oriented image management problems and algebra.

It is important to stress that the definitions of distance-join and distance semi-join are different from the definition of our similarity-based join. The distance join produces ordered pairs of the input tables, whereas we employ the range query search for each object of the outer table. Moreover, for the reasons of effectiveness and the better properties, we defined our similarity-based operators based on the range query, not the k-NN.

3.8 Summary

Images are complex entities that are richer in information than text. They are complex, because they can be interpreted differently depending on the individual’s human perception, the application domain, and the context in which they are used. Thus, there is a high need for an image model that enables to capture adequate amount of information from images and to represent them in a convenient manner so that image data retrieval can be effective. A good image data model should, therefore:

− have an expressive power to represent the structure and content of images, − enable the capability to capture the relationships among component objects, − be extensible, and − enable efficient and effective retrieval and navigation of images in an image

database management system.

In relation to this, there had been different proposals of an image data model designed to represent the semantic and content richness of image data. While most of them are more or less dependent on the application domain, some generic models are also proposed. In this chapter, we presented three different proposals of an image model available in the literature. Out of these models, we adopted the one by R. Chbeir [59] for our work in this thesis. We found that this model is comprehensive and satisfies the above requirements. Hence, we made our proposals coherent to this image data model.

96 Related Work

In order to facilitate content-based retrieval, images need to be analysed and represented in a compact mathematical form. Image Analysis is a field that deals with the understanding and representation of digital images by their low-level features of color, texture, shape, etc., in a form of a histogram or a feature vector. It also encompasses the identification of salient objects using the methods of feature analysis. So, it is an important phase in the process of content-based image retrieval. In this chapter, we presented some details of the techniques of image analysis in order to understand the process in CBIR. We also stated the importance of using the features of salient objects for more effective content-based image retrieval. In a system that supports image retrieval by salient objects, similarity-based operations are performed with reference to each of the objects.

Content-based image retrieval uses the notion of non-exact matching. What is then commonly practiced is that, one searches for a certain number of similar objects and then browses interactively to identify the most matching ones. The most common methods of CBIR are: the k-NN (k-Nearest Neighbor) search and the Range Query search. The first method searches until k similar images are found, where k is a positive integer less than the number of images in the collection. The later method searches all images that are within a given radius ε in the feature space. This method returns an unidentified number of images (from zero to some number n) that are similar to the query image. For the purpose of reference and comparison, we presented the definitions of both methods of retrieval, though what we intend to use is the range query. Our choice for the range query is due to its useful properties.

We also discussed some representative examples of the many CBIR systems, namely QBIC, VIR, Photobook, MARS, and Surfimage. A comparative evaluation on the success of these CBIR systems in terms of their effectiveness, efficiency, and flexibility is a difficult task. This is because each of these systems uses its own approach in terms of search methods and support for image features for content-based image retrieval. Some focus on low-level features while others also consider high-level features. Some consider objects while others compare images in their global notion. Some are interactive while others are not. Some are useful only in certain application domains while others tend to be generic as much as possible. The target applications of these prototypes matter a lot on the techniques to be used. A CBIR developed for facial images and that developed for landscapes do not consider the same techniques of retrieval. These varied approaches show the complexity and the diverse requirements of CBIR systems in general.

Most research works in computer vision demonstrate the varied capabilities of the CBIR systems mainly on the use of low-level features. However, systems that use only low-

Summary 97

level features cannot be adequately effective by themselves. The incorporation of high-level semantics of images in retrieval systems is of high importance as demonstrated by the different prototypes. It is our conviction that the next generation of image retrieval systems will combine the two approaches of retrieval. Our work in this thesis has the goals of integrating the use of low-level and high-level image features for efficient image retrieval and the enhanced use of the currently available ORDBMS techniques for increased image management efficiency.

The manner in which content-based image retrieval and image data management systems operate is directly or indirectly affected by existing and emerging standards related to multimedia data. We thus made a short review of some of the most relevant standards in image compression, the MPEG standards, the query language standards, and the DICOM standard. The overview of these standards helps to see how our proposals in this thesis are coherent with most of them. For example, since our proposals are mainly on managing both image content and its associated textual data in an integrated manner, our proposals can well support the DICOM style of medical image data representation. Apart from distribution, transmission and viewing, which are the primary objectives of DICOM, our approach can make DICOM compliant images queryable by both their low-level and high-level features. The fact that the two components of DICOM files (the header and the image) can be viewed or extracted separately will facilitate DBMS-based medical image management. Hence, our work has high promising prospective for medical image data management.

An effective content-based image database management system must be able to integrate the capabilities of traditional database management systems with the requirements of content-based image retrieval. The trend is then to use similarity-based operations in addition to keyword matching. We, therefore, studied the possibility of using similarity-based operations in conjunction with the relational operators under an Object-Relational (OR) paradigm. We recognized that content-based image management requires more complex data structures, new data types for storing image data, and the ability to define non-standard and application specific operations. Hence, the use of an OR paradigm enables us to exploit the techniques and capabilities of Object-Oriented concepts in managing image data. In this regard, we investigated the features of three popular commercial DBMSs: DB2 Universal Database, Informix, and Oracle 9i, for their facilities of supporting content-based image retrieval. These DBMSs use content-based image management modules developed by different companies. Hence, their capabilities for content-based image management depend on the add-on modules that are integrated in their systems. Furthermore, their attempts so far didn't exceed from one-to-many content-based image retrieval operations. Thus, images are

98 Related Work

not yet managed based on formal database-oriented system algebra. These systems are strongly limited in terms of supporting complex similarity-based operations and mixed queries involving both relational and similarity-based operations and optimization.

Based on the requirements to effectively manage image data under a DBMS environment, developing a similarity-based algebra is one of the main subjects of this thesis. Thus, we first investigated the role of a formal algebra towards the success of traditional DBMSs. Furthermore, we reviewed the useful contribution of the relational query optimization methods, that are mostly based on such an algebra, for relational query efficiency. We also investigated the few research works available in the literature on similarity-based algebra for CBIRs. However, as far as our knowledge is concerned, no work has been done so far for a DBMS oriented similarity-based algebra on image content (i.e., on the visual features of images). Hence, there are no algebraic query optimization proposals and none of the existing DBMSs support such features. In this thesis, we will show that the system of similarity-based operators that are to be introduced will create a similarity-based algebraic space with which we can formulate basic rules and introduce a system of similarity-based query optimization.

CHAPTER 4

DATA REPOSITORY MODEL

“A good model can advance fashion by ten years.” Yves Saint Laurent, French couturier.

A data repository model is a conceptual representation of a repository that deals with the way data is stored in a DBMS. The Relational Model is the most efficiently and widely used model to manage alphanumeric data since the 1970s [14, 124, 147]. This model is widely used because it is relatively easy to learn, easy to use, and has a high expression power. In this model data is organized in a form of relations, which are in a simpler form defined as sets of attributes. The relational model is based on Set Theory and Predicate Logic. The use of set theory allows data to be structured in a series of tables, which have columns representing the attributes and rows that contain specific instances of data.

In the 1980s and 1990s a new model called object-oriented data model was proposed [60, 94, 137, 138, 141, 146, 148]. The Object-Oriented (OO) data model improved the relational model by offering complex structures, object identity, inheritance between classes, and extensibility. This introduced a new generation of DBMS, the OODBMS. The OODBMS usually includes basic database facilities such as a simple query language, access techniques such as hashing and clustering, transaction management, concurrency control, and recovery. Attributes can be simple values, complex values, references to other objects, or methods. Their advantage is a seamless integration with their corresponding OO programming language.

100 Data Repository Model

During the last decade or so, it became obvious to the database industry and their customers, that the relational model was not designed for, and is not able to cope up effectively with the new types of data such as image, video, and audio as well as user-defined types. The newer model, OODBMS, could not also penetrate the marketplace as its predecessor and is not widely accepted like the RDBMS. This is where the much newer Object-Relational (OR) model becomes obvious [14]. The OR model allows organizations to continue using their existing relational systems, without having to make major changes, and at the same time allows them to start using object oriented systems in parallel. M. Stonebraker et al. [14] have elaborated this by saying: “the next wave of database management systems is under the OR paradigm”. In particular, the OR paradigm is a model that conveniently supports image data in a DBMS. Hence, in this thesis we limit our discussion within this framework.

Though many data models do not deal with the details of how data is stored and the storage and retrieval requirements of image data, the complexity of image data prompts us to deal with the design of a convenient repository model under a DBMS environment. We thus propose in this thesis, an image data repository model. Although our proposed model and the prototype we developed based on it is for image data management, this same repository model can as well be extended to be used for other media types such as video, audio, and large text. In our model, information is stored as a persistent object, and not as a row in a table. This makes it more efficient in terms of storage requirements and ensures that users can use the results of previous queries as part of new queries. It also reduces the disk space needed for queries and query results.

In the following sections, we present our image data and salient object repository models. We also describe its contents and the way it is used under an ORDBMS environment. Furthermore, we will show the relationship of these models with the image model in [59].

4.1 An Image Data Repository Model

Each attribute in a relational table holds alphanumeric information and is an instance of an entity. Most operations on such tables rely on attribute value based matching or comparison on numeric values or text strings. A table that contains an image as an attribute is quite different from the traditional relational tables. It requires supplementary attributes that describe the image characteristics either using keywords or low-level features. So, attributes are used not only to describe the instance but also to describe the image characteristics since

4.1 An Image Data Repository Model 101

images possess much information that need to be described in more details. Furthermore, when a table consists of content-based representations of images (such as the color, texture, and shape features), the database system requires techniques to store, describe, and manipulate them in ways that are different from techniques used in relational systems. Hence, an image data repository should be able to capture the low-level feature descriptions of image data that are required by the new techniques of content-based image retrieval. In addition, it must also capture the metadata representation of high-level semantic features of images. Hence, an image data repository model must consider at least the following additional requirements as compared to relational tables:

a) A table with image data has much larger storage space requirement, which has a direct impact on data management. Thus, an image data repository model should consider the need for large storage space requirement not only at the initial definition of the table, but also during intermediate query operations and query result presentations.

b) Since images are complex objects in terms of their information content richness, image data tables must be able to hold further low- and high-level feature descriptions of images.

c) These different levels of descriptions of images require the introduction of new and complex data structures for the components (or the attributes) of the tables to be defined.

d) The introduction of these new data structures requires the introduction of new types of operators (for example, similarity-based operators). Furthermore, the high storage space requirement of images, the low- and high-level feature descriptions of the images, and the image data query techniques require new methods of data management.

e) An image data repository model should be able to facilitate the requirement for the visualization of images at the time of query formulation and query result presentation.

Therefore, in this thesis, we propose a novel image data repository model that facilitates the management of image data with all its new characteristics, under an OR paradigm.

As a composite of relational and object-oriented model, an OR model permits us to freely define objects or repositories based on a particular requirement. In this section we

102 Data Repository Model

present our image repository model as a meta-model. By a meta-model we mean a general description of the model, independent from a specific implementation of the various components (O, F, A, P) of the model without any formalism. However in Section 3.3, we present an example scenario where we formally define the model. It is important to note that all details of database design can be applied based on the relational or object-oriented data model specifications. This model is presented below and will be referred to as an “image data repository model” or just as “image table” throughout our discussion in this thesis.

An image data repository model is a schema of five components M(id, O, F, A, P), under an object-relational model, where:

id is a unique identifier of an instance of M,

O is a reference to the image object itself that can be stored as a BLOB internally in the table or which can be referenced as an external BFILE (binary file),

F is a feature vector representation of the object O,

A is an attribute component that may be used to describe the object O using textual data or keyword like annotations, and

P is a data structure that is used to capture pointer links to instances of other image tables as a result of a binary operation.

Note that in an image table M, the item of primary interest is the image itself and the remaining attributes are associated to the image because of the new requirements discussed above. The three components O, F, and A can be used to capture sufficient information related to the image data. O is the image object itself. F stores the feature vectors representing all or part of the color, texture, shape, and layout contents that are extracted from the image O using image analysis tools. Based on the particular application requirements and the capability of the low-level feature extraction tools, F is represented as a mathematical representation that can later be used to perform all similarity-based operations on the image. The A component of M captures the semantic representation of O. The component A may be declared as an object, a set of objects, a table, or a set of attributes linked to other relational tables. This permits the model to be flexible and be associated with other relational tables or to previously available data in a relational or ORDBMS. P can be considered as a column whose content is a data structure that can store links to instances of other tables during binary

4.1 An Image Data Repository Model 103

operations such as similarity-based join1. P has a value Null in the base tables. It has a non-null value in intermediate tables during binary operations. More formally, when it contains a value, P can be expressed as a set of the form (table, {(referred_idstable, corresponding_similarity-scores)}), where the component table denotes a table associated by a binary operation and (referred_idstable, corresponding_similarity-scores) is a tuple such that referred_of_idstable is the set of the referred id elements of table and corresponding_similarity-scores is a set of its corresponding scores of similarity during a binary similarity-based operation.

To further elaborate the data structure P, consider that M' is an image table which is the result of a similarity-based binary operation such as a similarity-based join of two image tables M1 and M2. Let the P component of the resulting image table M' hold a link to table M2. Then, for each instance of M', P will contain elements of the form (M2, {(idi

2, sim_Scorei)}), for i = 1 to h, where sim_Scorei is the similarity score associated to the object with identifier idi

2, and where h is the number of image objects of M2 that are found to be similar for each image object in M1 during the similarity-based binary operation. The P component can be expressed in terms of a table as shown in Table. 4.1. In this example, h has a value 5.

The introduction of the P component in our model plays a useful role. It is the use of the P component that allows the similarity-based operators defined in Chapter 5 and the similarity-based query optimization methods introduced in Chapter 6 to acquire the properties they have. More descriptions on the contents of P will be given later in Chapters 5, 6 and 7.

Table 4.1: Sample structure of the P component (right).

Referred id Sim_Score

id 32 5.68

id 52 13.05

id 92 29.56

id 132 46.67

id 252 49.90

M 2id O F A P

id 11 o1

1 f11 a1

1 p 11

id 12 o1

2 f12 a1

2 p 12

id 13 o1

3 f13 a1

3 p 13

M M M M M

id 1n o1

n f1n a1

n p 1n

1 Similarity-based join is a join on the low-level image representation attributes of image tables. A

formal definition will be given later in Chapter 5.

104 Data Repository Model

With the help of this image table model, we can fulfill the necessary requirements to effectively integrate a content-based image data management into an ORDBMS. Figure 4.1 shows how image related data could be stored in each component of M.

Image Context and Semantic related data

- Link to other tables

M(id, O, F, A, P)

Figure 4.1: Managing image related data in a table.

4.2 Supporting Salient Objects in the Model

Salient objects are not separate images; they are rather parts of an image that are of particular interest. In many applications, supporting salient objects for similarity-based image retrieval is very important for the purpose of efficiency and precision [18, 19]. Hence, the need to have a convenient repository model that can capture salient object related data contained in an image is very important. Such a model can be deduced from the general structure of an image repository model. For a salient object repository, we do not need to consider the components 'O' and ‘P’ of M; we rather create a link to the main image (found in an image table) from

4.2 Supporting Salient Objects in the Model 105

which the salient objects are extracted. We present below a salient object repository model. In this thesis we also refer to this model just as “salient image table”.

A salient object repository model is described as a schema of three components S(ids, Fs, As), where:

ids is an identifier of a salient object,

Fs is the feature vector extracted to represent the low-level features of the salient object, and

As is an attribute component that is used to capture all semantic descriptions of the salient object using textual data or keyword like annotations.

It is the value in Fs that is used for similarity-based operations on the salient objects. Moreover, the Fs component can be used to capture the representation (e.g. contour, region, etc.) of the salient object that enables us to trace back the object identified. Relational operations can be performed on the As component for any required query. The liaison between the images in M and the salient objects in S can also be tracked using their id components in a different relational table. The link can also be tracked, for example, by a foreign key that keeps a liaison between an image table and an associated salient image table. This foreign key can be part of the A component of M. Note that an image may have more than one salient objects where each salient object is identified with a unique identifier.

Figure 4.2 shows the content of S and its liaison with M for a brain image. Only the feature vector representations of the salient objects are stored in Fs. In this case, a tumor is identified at the lower right corner of the image as an object of interest. This part of the image, which can be identified by a domain specialist or by an automatic means, is considered as an object of interest and its feature vector is extracted and stored in Fs. The technique of extracting salient objects from an image is purely in the province of image analysis and is not considered in this work.

106 Data Repository Model

M O(id, , A, P) F,

Semantic, Context et Spatial data

S(ids, s, As) F

Semantic data

Fv

Image Feature (V1, V2, … ,Vn)

Figure 4.2: Managing Salient Objects in association with their source images.

4.3 Object Types to the Image Repository Models: Example Scenario

Without being specific to a particular application scenario or dependant to a specific system, we give below a formal definition of the types for the objects of our image repository models: M(id, O, F, A, P) and S(id, Fs, As). For the sack of precision we choose the PL/SQL language, which is compliant with the latest SQL3 standard, to define the types. Note that, based on application requirements, the types or domains specified in this example scenario can be redefined or modified or extended differently. The image table M and the salient-object table S can be created using the following SQL statements:

CREATE TABLE M ( id INTEGER,

O Otype, F Ftype, A Atype, P Ptype,

);

4.3 Object Types to the Image Repository Models: E

CREATE TABLE S (

id INTEGER,

Fs FsType, As AsType,

);

Object Types

We present here the object types: Osource, Otype, Ftype, Atype, Ptype, FsType, AsType that permit to illustrate the method of accessing a variety of sources of image data.

Otype Object Type

Otype object type that supports the storage and management of image data is defined as follows.

CREATE OR REPLACE TYPE Otype AS OBJECT ( ------ TYPE ATTRIBUTES ------------------------------

source OSource, height INTEGER, width INTEGER, contentLength INTEGER, fileFormat VARCHAR2(4000), contentFormat VARCHAR2(4000), compressionFormat VARCHAR2(4000),

-- METHOD DECLARATIONs ------------------------ ----- Only some of the useful methods associated to the object are listed below MEMBER FUNCTION getHeight RETURN INTEGER, MEMBER FUNCTION getWidth RETURN INTEGER, MEMBER FUNCTION getContentLength RETURN INTEGER, MEMBER FUNCTION getFileFormat RETURN VARCHAR2, MEMBER FUNCTION getCompressionFormat RETURN VARCHAR2, MEMBER FUNCTION getUpdateTime RETURN DATE,

108 Data Repository Model

MEMBER FUNCTION getMimeType RETURN VARCHAR2, MEMBER FUNCTION getSource RETURN VARCHAR2, MEMBER FUNCTION getSourceType RETURN VARCHAR2, MEMBER FUNCTION getSourceLocation RETURN VARCHAR2, MEMBER FUNCTION getSourceName RETURN VARCHAR2,

… );

where: • source: the source of the stored image data. • height: the height of the image in pixels. • width: the width of the image in pixels. • contentLength: the size of the on-disk image file in bytes. • fileFormat: the file type or format in which the image data is stored (TIFF, JIFF,

etc.). • contentFormat: the type of image (monochrome and so forth). • compressionFormat: the compression algorithm used on the image data.

Osource Object Type

Osource object type supports access to data sources stored locally in a table in a form of a BLOB, externally on a local file system, from a URL on an HTTP server. This object type is defined as follows:

CREATE OR REPLACE TYPE Osource AS OBJECT (

------ ATTRIBUTES -------------------- localData BLOB, srcType VARCHAR2(4000), srcLocation VARCHAR2(4000), srcName VARCHAR2(4000), updateTime DATE,

------------------- METHODS --------------------------- MEMBER FUNCTION getUpdateTime RETURN DATE,

4.3 Object Types to the Image Repository Models: E

MEMBER PROCEDURE setUpdateTime(current_time DATE), MEMBER FUNCTION getSourceInformation RETURN VARCHAR2, MEMBER FUNCTION getSourceType RETURN VARCHAR2, MEMBER FUNCTION getSourceLocation RETURN VARCHAR2, MEMBER FUNCTION getSourceName RETURN VARCHAR2,

… );

where: • localData: contains the locally stored image data of type BLOB. • srcType: identifies the data source type. For example, values for srcType could

be: "file" - a BFILE on a local file system or "HTTP" - an HTTP server.

• srcLocation: identifies the place where data can be found based on the srcType value. For example, srcLocation could be: "file name", an URL, etc.)

• srcName: identifies the data object name. For example, srcName values could be: "file name", name of an object at an URL, etc.

• updateTime: the time at which the data was last updated.

Ftype Object Type

Ftype object type is the type that supports content-based retrieval. The feature vector representation or the signature of the image that describes the color, texture, and shape features of the image. This data can be stored in a BLOB. Thus, without loosing generality this object type can be defined as follows.

CREATE OR REPLACE TYPE Ftype AS OBJECT (

F BLOB, ---------------------------------------------------------------------- -- METHOD DECLARATION -- Only some of the useful methods associated to the object are listed below: -----------------------------------------------------------------------

STATIC FUNCTION init RETURN Ftype, STATIC FUNCTION evaluateScore(F1 Ftype, F2 Ftype, score VARCHAR2)

110 Data Repository Model

RETURN FLOAT, STATIC FUNCTION isSimilar(F1 Ftype, F2 Ftype, score FLOAT,

threshold FLOAT) RETURN INTEGER, MEMBER PROCEDURE generateF (image Otype) );

where: • F: holds the feature vector representation of the stored image data.

Atype Object Type,

Atype can be defined as an object type that can capture the semantic and contextual description of the image using textual or key-word like specifications. Its attribute components can be standard data types. It is the part that is highly influenced by the particular application where it is used. Since semantic and contextual descriptions of images are mainly textual or alphanumeric data, they can be defined in a different relational table and can be associated to the image table M using a foreign key component in A. As a sample scenario we can define Atype as follows.

CREATE OR REPLACE TYPE Atype AS OBJECT ( id INTEGER, category VARCHAR2(100), case VARCHAR2(100), f-key INTEGER, ------------------------------------------------------- METHODS – as needed by the application );

where • id: an identifier to the object, • Category: the category of the image. For example, x-ray, MRI, etc. • Case: the case in which the image is to be used. For example, a suspected type

of symptom in a medical application.

4.3 Object Types to the Image Repository Models: E

• f-key: a foreign key element that can be used to associate the image table with an external relational table.

Ptype Object Type Ptype is the type that is used to capture a link to records of another image table that is associated by a similarity-based operation. It can be declared as follows:

CREATE OR REPLACE TYPE Ptype AS TABLE OF Ptr;

CREATE OR REPLACE TYPE Ptr AS OBJECT ( table VARCHAR2(50), id-table INTEGER, score FLOAT,

----------- -- METHOD DECLARATION

-- Only some of the useful methods associated to the object are listed below MEMBER FUNCTION get_id-table RETURN INTEGER, MEMBER FUNCTION get_score RETURN INTEGER, MEMBER FUNCTION get-table RETURN VARCHAR2, … );

Where • table: the table name associated by a similarity-based operator, • id-table: the identifier of the record in the table that is associated by the

similarity-based operator. • score: the similarity score associated with the referring image in the operation.

112 Data Repository Model

FsType Object Type FsType is an object type that describes the feature vector of the salient object. In addition, this object needs to describe the shape and spatial position of the salient object within the image. FsType can be defined as follows.

CREATE OR REPLACE TYPE FsType AS OBJECT (

id INTEGER, F-Salient Ftype, S-shape BLOB, S-location BLOB, -----------------------------------

-- METHOD DECLARATION -- Only some of the useful methods associated to the object are listed below MEMBER FUNCTION get_oject-id RETURN INTEGER, … );

Where • id: an identifier to the object, • F-Salient: the feature vector representation of the salient object, • S-shape: the shape descriptor of the particular salient object. With the methods

of pattern recognition, shape can be described with a mathematical representation that can be stored in the type BLOB.

• S-location: the spatial position description of the salient object. With the methods of pattern recognition, the spatial position of a salient object can be described with a mathematical representation that can be stored in the type BLOB. Spatial position can also be described using descriptive spatial operators [59].

AsType Object Type,

AsType can be defined as an object type that can capture the semantic and contextual description of the salient object using textual or key-word like specifications. It can be

4.4 The Image Data repository Models in Relation to the Image Model 113

defined as Atype or a different object type may be used based on the need on the particular application.

4.4 The Image Data repository Models in Relation to the Image Model

Our image data repository models can capture image related data effectively based on the image model in [59]. To demonstrate this, we follow a general schematic structure for an image database.

Consider the model M(id, O, F, A, P), where the contents of the components F and A of M are described as follows:

F(Descriptor, Model, Value), where

Descriptor: is the type of descriptor (such as color feature, texture feature, shape feature, etc.),

Model: is the description format (such as RGB, HSV for color features, coarseness, contrast for texture features, etc.), and

Value: is the content descriptor. This is the mathematical feature vector representation of the image in terms of the descriptors and the model chosen. It can also be a generalized representation of all possible features where a choice of a model or a descriptor can be made based on a particular need during an operation.

In some implementations, Descriptors are used to set the type of feature to be used for the similarity-based comparison. It can be expressed in terms of a ratio. For example, in our implementation as will be presented in Chapter 7, this is expressed as: “color = 1, texture = 1, shape = 1, location = 1” to indicate that all the four features are to be considered during a similarity-based comparison. If, for example, we want to consider only the color and the location features in the comparison, we set the descriptor as: “color = 1, texture = 0, shape = 0, location =1”.

The Model component to be used for the comparison can be set to a certain choice based on the application’s requirements.

The way that Descriptors, Model, and Value are expressed depends on a particular implementation. In our implementation for example, Descriptor and Model are expressed

114 Data Repository Model

separately and it is the feature vector that is kept in the F component as Value. The Descriptor is set as a string in the query expression and the Model is fixed internally.

A(ES, Sem_F, R), where

ES: is the External Space descriptions (consisting of Context-Oriented, Domain-Oriented, and Image-Oriented sub-spaces) as indicated in the image data Model in [59] and described further in Section 3.1. This component represents a huge amount of data and hence it can be designed or organized based on a particular implementation. For example, in our implementation, this includes all data on the Hospital, the Medical Doctor, the Patient, etc.,

Sem_F: is the semantic feature of the Content Space of the image model that indicates the significance and interpretation (keywords, legend, etc.) of the image, and

R: is a component that captures the relations between either two salient objects or a salient object and the image.

The contents of these components can be further described as follows:

Sem_F(Type, Description), where

Type: defines the type of the semantic feature (keyword, scene, etc.). For example, in a medical application, Type could be tumor type or cancer type of a known case, and

Description: is a textual representation of Type. This could be, for example, a description of a diagnosis, of a treatment, a medical history, etc.

R(ids, id, Relation), where

ids: identifies a salient object,

id: is the identifier of either an image or a salient object, and

Relation: represents the spatial (directional, metrical, topological) or semantic relations between them.

For the salient object repository model S(ids, Fs, As), the contents of the components Fs and As are described below.

Fs(Descriptor, Model, Value), where

4.4 The Image Data repository Models in Relation to the Image Model 115

Descriptor: is the type of descriptor (such as color feature, texture feature, shape feature, etc.),

Model: is the description format (such as RGB for color feature, coarseness for texture feature, etc.), and

Value: is the content descriptor. This component represents the mathematical feature vector of the salient object.

As(Type, Description), where

Type: defines the type of the semantic feature (name, state, etc.), and

Description: is a textual description of Type as discussed above for Sem_F.

With this schema, we can support both relational and similarity-based operations. All relational query operations can operate on the relational attributes of the component A of M and/or As of S. A similarity-based operation is, however, performed on the Value component of F of M and/or on the Value component of Fs of S. The object O of M is mostly required as a resource from which the salient objects and some annotations are extracted and as an image object that can be displayed as a result of a query.

Another type of operations that are commonly used in image management are spatial operations. Spatial operations are often performed using spatial relations that exist either between two salient objects or a salient object and an image. In the literature [29, 59, 69, 111, 126], the following three types of spatial relations are considered:

Metric relations: determined based on proximity that express the closeness of objects such as near, far, etc.

Directional Relations: generally determined on the basis of the positional direction between objects such as right, left, north, east, etc.

Topological relations: determined based on the position of the salient objects relative to each other such as disjoint, touch, overlap, etc.

In our model, all spatial relations are captured in the R component of A. Thus, these operations can be treated in the traditional manner.

These image data repository models can adequately capture all the information associated to a whole image and to salient objects in the ways of high-level and low-level descriptions. Hence it is the basis for an effective image management system.

116 Data Repository Model

4.5 Summary

The image data repository models we proposed in this thesis enable us to capture the image itself, its low-level representation (in terms of its color, texture, shape, etc.), and all alphanumeric data associated to the image. The OR model that we have chosen permits us to manage these image related data under a DBMS. This allows us to fully exploit the data management facilities of an ORDBMS. The novel component P introduced in our repository model M, permits us to conveniently manage intermediary image tables and query result tables in an optimized manner in terms of storage, the requirements of the similarity-based operations, and image data management in general. This P component is also a key element for the good properties of the similarity-based algebra defined in the next Chapter. The novel salient object repository model permits us to capture all relevant information associated to salient objects, to keep track of spatial relationships between salient objects and/or a salient object and the image that contains the object, and to facilitate both traditional and similarity-based operations on salient objects. We verified how effectively this repository models can be used in an ORDBMS with our prototype EMIMS and is presented in Chapter 7.

CHAPTER 5

SIMILARITY-BASED ALGEBRA

5.1 Introduction

The term “Similarity” is widely used in different disciplines and contexts. Hence, it is difficult to give a generalized and uniform definition of the term. The Cambridge Dictionary of English defines the term “similar” (adj.) as: looking or being almost, but not exactly the same. It describes it with an example as: My father and I have similar views on politics or Paul is very similar in appearance to his brother. The Webster's 1828 Dictionary defines the term “similarity” (noun) as: likeness, resemblance, and congruency, as a similarity of features, as the state of being not different or other; as the sameness of an unchangeable being, as near resemblance, correspondence, as a sameness of manner, etc. Its usage and precise definition depend on the area it is applied or on the context it is used. For example, in Mathematics there is a clear definition of similarity of geometric figures based on which we can say either two triangles are similar or not. In other fields, it is difficult to give such a precise definition except using it in its general conception. Similarity between texts, for example, is widely used by web search engines. Similarity in this case is expressed in percentage. In our daily practice, the judgment of being similar is sometimes determined by consensus. A child may look similar to his father by the judgment of some people but not by others. Such comparisons mostly consider the physical appearance or the facial images of two or more individuals. But, such types of image comparison for similarity have been nowadays extended to any other types of images, particularly of digital images in many disciplines.

118 Similarity-Based Algebra

The process of determining image similarity has been the subject of many research works in computer vision. Using the techniques of image analysis developed in this field, it is possible to search all similar images from a set of images for a given reference or query image and a given set of features as discussed in Chapter 3. The main subject of this thesis is to integrate the notion of image similarity as an operator in image database management systems. This, however, requires the introduction of similarity-based operators that are governed by formal database oriented algebra. Furthermore, the complex nature of images requires us to choose a convenient data model.

The concept of algebra is a central issue in mathematical systems. An algebra is formally defined as a pair (S, Θ), where S is a (possibly infinite) set of objects and Θ is a (possibly infinite) set of operators, each of which is closed with respect to S [149]. This definition also applies to database-oriented algebra where S is the set of tables or relations and Θ is the set of operators that are closed with respect to S. These operators will have certain mathematical properties that make the rearrangement of algebraic expressions possible to obtain an equivalent but different expression. This is what makes algebras so desirable, particularly for query optimization in DBMSs.

The fundamental feature of a DBMS is its data model, the formal abstraction that it uses to map real-world entities onto (logical) database entities. Many different DBMSs may implement the same abstract data model (e.g., the relational model). As presented in Chapter 4, data models are becoming increasingly complex, and with this complexity comes the need to ensure correct and efficient execution of queries on data with increasingly complex structure. The algebra follows or is based on the data model chosen. Thus, it is important to consider the appropriate data model.

The relational data model and the relational algebra are so widely used and implemented in the last three decades and have been the subject of many books and publications. It is thus needless to repeat details of this model here. An object-oriented data model supports scalar values and the tuple and set type constructors, as well as inheritance among tuple types. One important extension, though, is that sets need not be homogeneous. Everything in an object-oriented database is an object with its own unique identifier, thus everything in a field of a tuple and every element in a set (homogeneous or not) has an object identifier. Tests for both identity and equality of two objects are allowed. Equality here means "deep equality"; i.e., the entire hierarchical structure is traversed, and all parts of it must be equal to the corresponding parts of the other. An object-oriented data model and algebra are described in [137, 138, 141, 153].

5.1 Introduction 119

Based on the many researches and literatures in this area [14, 17, 27, 37, 140], we have identified that the object-relational model is the one that can conveniently handle the requirements of image data management in different areas of applications, using the content-, semantic-, and context-based descriptions. This model is also the one that is capable to manage the complexity of the formulation and execution of queries.

In an image database, operations are not always based on exact matching. Hence, the algebra that bases itself on equality does not apply for image databases. The example queries in Chapter 2 demonstrate the need to have a new type of algebra that bases itself on the notions of similarity of images. We thus, in this thesis, propose such a novel type of algebra.

In the following sections, we first define the different novel operators that are required in a content-based image database management system. Then, we study their properties and the way they can be used in conjunction with the traditional database operators. To define these novel similarity-based operators on image tables, we use the method of content-based range query (see Definition 3.2). A content-based range query on a set of images S returns those image objects that are within a distance1 of ε from the query image q. In the next Chapter, we will see how the properties of the range query are advantageous over the k-Nearest Neighbor method (see Definition 3.1), particularly for similarity-based query optimization in image database management systems.

1 Here, distance is defined as the distance between the feature vectors representing the images in a

feature space.

120 Similarity-Based Algebra

5.2 The Similarity-Based Selection Operator

The similarity-based selection operator is a unary operator on an image table M(id, O, F, A, P) performed on the component F as defined below.

Definition 5.1 (Similarity-Based Selection)

Given a query image o with its feature vector representation, an image table M(id, O, F, A, P), and a positive real number ε; the similarity-based selection operator, denoted by δε

o(M), selects all the instances of M whose image objects are similar to the query image o based on the range query method.

Formally it is given as:

δεo(M) = {(id, o', f, a, p) ∈M | o'∈ Rε(M, o)}, where Rε(M, o) denotes the range

query with respect to ε for the query image o and the set of images in the image table M.

This operator is similar to the relational selection operator except that the operation is similarity-based (non-exact matching) and it operates only on one single component, F, of the image table. We see here that the result of a similarity-based selection operation is an image table. Hence, this operator satisfies the closure property that is required of an algebra. Furthermore, the similarity-based selection operator can be combined with the relational operators on an image table (see Section 5.8).

The similarity-based selection operator first uses the range query search method to select the image objects that are most similar to o from the objects in M (which is also expressed by the notation Rε(M, o)). Then, it identifies the underlined instances of M whose image objects are similar to o. This form of operation is quite different from the many content-based image retrieval systems (see Section 3.4) that return just images. Other than satisfying the closure property, our similarity-based selection operator maintains the data associated to the images selected in the returned table.

5.3 The Similarity-Based Join Operator

A similarity-based join is a binary operator on image tables performed on the feature vector components. To perform a similarity-based binary operation on two image tables, we assume that their feature vector components Fi are extracted identically in such a way that it permits a meaningful computation of range query.

5.3 The Similarity-Based Join Operator 121

Definition 5.2 (Similarity-Based Join)

Let M1(id1, O1, F1, A1, P1) and M2(id2, O2, F2, A2, P2) be two image tables and let ε be a positive real number. The similarity-based join operator on M1 and M2, denoted by M1⊗εM2, associates each object O1 of M1 to a set of similar objects in M2 with respect to the F components of M1 and M2. The resulting table consists of the referring instances of M1 (the table at the left) where P is modified by inserting a pointer pointing to the id’s of the associated instances of M2 (the table at the right side of the operation) with its corresponding similarity score.

More formally it is given as:

M1⊗εM2 = {(id1, o1, f1, a1, p'1) | (id1, o1, f1, a1, p1) ∈ M1 and p'1 = p1 ∪ (M2, {(id2, ║o1– o2║)}) and p'1 ≠ Null}, where:

- (id2, o2, f2, a2, p2) ∈ δεo1(M2) (i.e., the instances of M2 associated

by the similarity-based selection δεo1(M2)), and

- ║o1– o2║ is distance between o1 and o2 in the feature space, also called the similarity score of the of o2 and o1 (we denote this in the remaining part of this chapter as sim_Score(o1, o2)).

Note that, if the content of p1 of M1 is Null (not pointing to M2), then the instance containing o1 will not be included in the resulting table of the similarity-based join. This is a property inherited from the range query and is an important property in our system of similarity-based operations. That is, based on the value of ε, the range query permits us to have the possibility that an image object may not have a similar image in a given set of images. In contrast, the k-NN search always looks for k similar images no matter what the degree of similarity is. It is our conviction that the range query reflects many practical application needs better. If a query image has no similar image in a given set, based on a given threshold, then the system must reflect the same instead of returning an image that doesn’t resemble to be similar. Figure 5.1 shows a pictorial illustration for the resulting table of a similarity-based join M1⊗εM2.

From this definition, we also see that the result of a similarity-based join operation is an image table. Hence, this operator also satisfies the closure property. Each tuple of an image table and that of a resulting table of a similarity-based join is identified by its unique identifier, id.

122 Similarity-Based Algebra

id O F A P

id11 o1

1 f11 a1

1 p11

M M M M M

id1j o1

j f1j a1

j p1j

M M M M M

id1n o1

n f1n a1

n p1n

(M2, {(idi2, sim Scorei)})

M1

Figure 5.1: The Similarity-based join: M1⊗ε M2.

In the component p1j, “idi

2” (for each i) denotes the referred ids of M2 by the similarity-based join and “sim_Scorei” (for each i) is the similarity score for the referred object with id = “idi

2”.

Query 2 of Chapter 2, for example, requires an operator of this type. We demonstrate below the use of this novel operator in combination with the traditional operators. First, let us recall the query:

Show me the details of employees who entered the gate of the department within two hours after they entered the main gate of the compound on a particular date.

Let us first reformulate the tables in Chapter 2 based on our image repository model:

SIE(id, photo, Fv, A1(Time, Date), P1),

SII(id, photo, Fv, A2(Time, Date), P2), and

EMP(id, Photo, Fv, A3(Name, Occupation, Address), P3).

5.3 The Similarity-Based Join Operator 123

Then, the following is an SQL-like expression for the above query:

SELECT *

FROM EMP e, SIE s, SII t

WHERE s.A1. Date = t.A2.Date

AND ((t.A2.Time.Hour - s.A1.Time.Hour) ≤ 2)

AND e.Fv ≈ε s.Fv

AND (s.Fv ≈ε t.Fv))

Note that the symbol ≈ε stands for the similarity of the images with respect to ε. We used the term “SQL-like expression”, because similarity-based operation is not supported in the SQL standard.

The above query expression may, for example, have a result as presented in Table 5.1 (see Chapter 7 for more examples).

Table 5.1: Example Result table for the above query (Query 2 of Chapter 2).

A3id Photo

Name Occupation Address P3

103 Image_103 Stéphane LAPORTE Accountant 2 Av. A. Einstein Ptr_103

109 Image_109 Olivier ROSSARD IT-Manager 30 R. Antonins Ptr_109

206 Image_206 Bertrand DAILLE Director 5 R. C. Heneru Ptr_206

520 Image_520 François PETIT Accountant 9 P. J. Mace Ptr_520

The definition for the concepts of referred and referring instance records in a similarity-based join is given in Definition 5.3. Note that, tuples of M1 that do not refer to M2 by the similarity-based join operation will be suppressed from the resulting table. Similarly, the id’s for the tuples of M2 that are not referred by the similarity-join operation will not appear in any of the p'1 components of the resulting table.

Definition 5.3 (Referring and Referred Instances)

Let M1(id1, O1, F1, A1, P1) and M2(id2, O2, F2, A2, P2) be two image tables. Consider the similarity-based join operation M1⊗εM2.

− The referring instances of M1 (in M1⊗εM2) are the instance records of M1 to whose image objects O1, a similar image object is found in M2. If the object

124 Similarity-Based Algebra

of an instance of M1 has no similar images in M2 by the similarity-based join, we say that it is a non-referring instance.

− The instances of M2 that are identified by the similarity-based join for any of the objects O1 of M1 (in M1⊗εM2) are called referred instances. If an instance of M2 is not associated or referred by the similarity-based join operation, we say that this instance is a non-referred instance of M2.

Similarity-based operations depend on non-exact and "relative" measures. Due to this, the similarity-based operators, defined in this thesis, possess different algebraic properties than those of their relational counterparts. For example, unlike the relational join operator, it is evident to see that the similarity-based join operator, ⊗ε, is not commutative. This property shows that in similarity-based join operation, the order of the image tables is important. If an image table appears on the left side of a similarity-based join, then its objects are taken as reference for the similarity operation. In the example query above, the image table SII should be our reference table for the join and hence should appear on the left (i.e., SII ⊗ε SIE) if we are interested to know about the individuals who entered the interior gate of the department.

Similar to Definition 5.2 and considering the properties of similarity-based comparison between images, we define below the way we compute a multi similarity-based join. To facilitate the definition of this join operation, let us first define a basic operator, called the Additive Union.

Definition 5.4 (Additive Union)

Let M1(id1, O1, F1, A1, P1) and M2(id2, O2, F2, A2, P2) be two image tables. The

Additive Union of M1 and M2, denoted by M1+

∪ M2, is an image table that contains all the records that are either in M1 or in M2.

Formally:

M1+

∪ M2 = {(id, o, f, a, p) | (id, o, f, a, p) ∈ M1 ∨ (id, o, f, a, p) ∈ M2}

The additive union contains all instances that are either in M1 or in M2, without excluding any of the instances of M1 or M2. Here also, we observe that the resulting table is an image table and hence satisfies the closure property. With M1 and M2 having the same structure on their id, O, F, and P components, keeping the result of the additive union under a table of the same structure is possible. However, M1 and M2 may differ in their A components, in which case we use the union of the two A’s as a data structure for the A

5.3 The Similarity-Based Join Operator 125

component of the resulting image table of the additive union. Here it is important to note that additive union is commutative. This follows from the property of the disjunction used in the definition of the additive union.

Definition 5.5 (Multi Similarity-Based Join)

Let M1(id1, O1, F1, A1, P1), M2(id2, O2, F2, A2, P2), ..., Mn(idn, On, Fn, An, Pn) be n image tables. The multi similarity-based join, denoted by M1⊗εM2⊗ε . . . ⊗ε Mn, is defined as:

M1⊗ε M2⊗ε . . . ⊗εMn = MU+

≤≤ ni21 ⊗ε Mi

Note that the order of the operand tables is important and has a semantic interpretation. Only the most left table is considered as a reference table. The additive union

operator (+

∪ ) is used just for the purpose of merging the resulting tables of each of the pair-wise similarity-based joins. Figure 5.2 gives an illustration of a Multi Similarity-Based Join of three image tables, M1, M2, and M3. “idi

2” in p1 denotes the referred id’s of M2 by the similarity-based join. “sim_Scorei” (for each i) is the similarity score for the corresponding referred. Similarly, “idk

3” denotes the referred ids of M3 by the similarity-based join M1⊗εM3, while “sim_Scorek” is the similarity score for each corresponding referred object.

Figure 5.2: The Similarity-based join: M1⊗εM2⊗εM3

id O F A P

id11 o1

1 f11 a1

1 p11

M M M M M

id1j o1

j f1j a1

j p1j

M M M M M

id1n o1

n f1n a1

n p1n

(M2, {(idi2, sim Scorei)})

(M3, {(idk3, sim Scorek)})

M1

126 Similarity-Based Algebra

It is important to note that an operator without the properties of commutativity and associativity can hardly be exploited for an algebraic query optimization. We thus need to see possibilities of extending this operator so that it possesses useful properties for query optimization. In the next Section, we will investigate the similarity-based join operator for some useful properties.

5.4 The Symmetric Similarity-Based Join Operator

In view of the current practices of content-based query systems (where for a given query image object, we search for its closely similar objects from a database of images), the similarity-based join operator defined above is what may be needed for many applications. However, it does not possess useful properties such as symmetricity. This makes it lose one of the major properties for query optimization. To make the similarity-based join operator defined above suitable for similarity-based query optimization, we extend it to a Symmetric Similarity-Based Join operator. We define below a symmetric similarity-based join that makes use of the similarity-based join operator and the additive union operator.

Definition 5.6 (Symmetric Similarity-Based Join)

Let M1(id1, O1, F1, A1, P1) and M2(id2, O2, F2, A2, P2) be two image tables. The symmetric similarity-based join of M1 and M2, denoted by M1 ⊕ε M2, is defined as:

M1 ⊕ε M2 = (M1⊗εM2) (M+

∪ 2⊗εM1)

From the definition above we see that the symmetric similarity-based join operator possesses the following property.

Property: The symmetric similarity-based join operator ⊕ε is commutative.

i.e., M1⊕ε M2 = M2 ⊕ε M1.

This follows directly from the commutative property of the additive union. Figure 5.3 illustrates a symmetric similarity-based join of two image tables. As in the previous figures, “idi

2” in P denotes the referred id’s of M2 by the similarity-based join M1⊗εM2 where “sim_Scorei” is the similarity score between the image object identified by id1

j of M1 and the referred image object identified by “idi

2”. Similarly, “idk1” denotes the referred ids of M1 by

the similarity-based join M2⊗εM1, where “sim_Scorek” is the similarity score between the image object identified by id2

t of M2 and of the referred image object identified by “idk1”.

5.4 The Symmetric Similarity-Based Join Operator 127

id O F A P

id11 o1

1 f11 a1

1 p11

M M M M M

id1j o1

j f1j a1

j p1j

M M M M M

id1n o1

n f1n a1

n p1n

id21 o2

1 f21 a2

1 p21

M M M M M

id2t o2

t f2t a2

t p2t

M M M M M

id2m o2

m f2m a2

m p2m

(M2, {(idi2, sim Scorei)})

(M1, {(idk1, sim Scorek)})

M2

M1

Figure 5.3: The symmetric similarity-based join: M1⊕ε M2.

We can now generalize the symmetric similarity-based join on more than two image tables and define a Symmetric Multi Similarity-Based Join. This definition reflects the characteristics of similarity-based operators and maintains useful properties for similarity-based query optimization.

Definition 5.7 (Symmetric Multi Similarity-Based Join)

Let M1(id1, O1, F1, A1, P1), M2(id2, O2, F2, A2, P2), ..., Mn(idn, On, Fn, An, Pn) be n image tables. The Symmetric Multi Similarity-Based Join, denoted by M1 ⊕ε M2 ⊕ε ... ⊕ε Mn, is defined as:

M1 ⊕ε M2 ⊕ε ... ⊕ε Mn = MU+

<≤≤≤≤ jinjni ,1,1i ⊕ε Mj

The way the symmetric multi similarity-based join operates is not identical to that of the multi-way join known in relational database management systems. Here it does not

128 Similarity-Based Algebra

consider intermediate resulting tables as operands. Furthermore, the order is not relevant. This property could rather be useful when considering the properties of content-based image retrieval. Though order is not relevant for the result, the order of each pair of operands can effectively be exploited for query optimization purposes, as is the case in relational join operation. This enables us to formulate new rules for query optimization (see Chapter 6).

Figure 5.4 illustrates how the symmetric multi similarity-based join operates on three image tables. The contents of P are as illustrated in the previous similarity-based join figures.

(M1, {(idx1, sim Scorex)})

id O F A P

id11 o1

1 f11 a1

1 p11

M M M M M

id1j o1

j f1j a1

j p1j

M M M M M

id1n o1

n f1n a1

n p1n

M2id2

1 o21 f2

1 a21 p2

1

M M M M M

id2t o2

t f2t a2

t p2t

M M M M M

id2m o2

m f2m a2

m p2m

(M2, {(idi2, sim Scorei)})

(M1, {(idk1, sim Scorek)})

id21 o2

1 f21 a2

1 p21

M M M M M

id2s o2

s f2s a2

s p2s

M M M M M

id2r o2

r f2r a2

r p2r

(M3, {(idu3, sim Scoreu)})

(M2, {(idw2, sim Scorew)})

(M3, {(idv3, sim Scorev)})

M3

M1

Figure 5.4: The symmetric similarity-based join: M1⊕ε M2⊕εM3

5.5 More Operators Associated to the Similarity-Based Join 129

In the following sections, we investigate the properties possessed by the similarity-based operators discussed above.

5.5 More Operators Associated to the Similarity-Based Join

The similarity-based join operator and its extensions possess various useful properties. A proper investigation of these properties helps to formulate algebraic rules that can be useful to build equivalent algebraic expressions, which in turn can be exploited for the purpose of query optimization. We present some useful operators and their properties below.

5.5.1 The Extract Operator

The non-symmetric similarity-based join given in Definition 5.2 can be extracted from the symmetric similarity-based join by the use of an operator called Extract. Hence, we have the following definition.

Definition 5.8 (The Extract Operators)

ExtractM1(M1 ⊕ε M2) extracts the instances of M1 from M1⊕ε M2 keeping the modified P component of M1. Similarly, ExtractM2(M1 ⊕ε M2) extracts the instances of M2 from M1⊕εM2 keeping the modified P component of M2.

Formally:

ExtractM1(M1 ⊕ε M2) = M1 ⊗ε M2

ExtractM2(M1 ⊕ε M2) = M2 ⊗ε M1

The Extract operator can also be applied on Symmetric multi similarity-based joins to get any of the non-symmetric multi similarity-based joins. For example, ExtractM1(M1⊕εM2⊕εM3) = M1⊗εM2⊗εM3.

5.5.2 The Mine Operator

We stated above that the symmetric similarity-based join could be interpreted as the additive union of the two non-symmetric similarity-based joins.

After having the resulting table for one of the non-symmetric joins, we present here the method to get the other without applying similarity-based operations. If, for example, we have the resulting table of M1⊗εM2 with its new P1 component, we can make use of the

130 Similarity-Based Algebra

contents of P1 to get the instances of M2 ⊗εM1 by direct statistics collection. This follows from the idea that, given the same value of ε, if an object o2 of M2 is within a distance ε to the object o1 of M1, then the converse is also true, by the symmetric property of distance. This is a property that we can get from the use of range query (i.e., we could not have this advantage if we use, for example, the k-NN search method). Hence, we can compute the symmetric similarity-based join, M1⊕εM2, by performing only one of the non-symmetric joins and then using an operator called the Mine operator for the other. The major advantage of this approach is that the Mine operator could be much less expensive in terms of computational time than the other similarity-based join. Thus, this property can be exploited for the purpose of query optimization in both cases of the symmetric and non-symmetric similarity-based join operations (see Sections 6.2.2 and 6.3 for details). To demonstrate this concept, let us first define the Mine operator on a simple similarity-based join, M1⊗εM2.

Definition 5.9 (The Mine operator)

Consider the similarity-based join M1⊗εM2.

Mine(M1⊗εM2) = M2⊗εM1

Mine(M2⊗εM1) = M1⊗εM2

The Mine operator on M1⊗εM2, denoted by Mine(M1⊗εM2), uses the component p1 of the resulting table of M1⊗εM2 and builds the table M2⊗εM1. Conversely, Mine(M2⊗εM1) uses the component p2 of the resulting table of M2⊗εM1 and builds the table M1⊗εM2.

Let M1(id1, O1, F1, A1, P1) and M2(id2, O2, F2, A2, P2) be two image tables. Suppose we want to compute M1⊕εM2. Let us first consider that we are computing M1⊗εM2. Figure 5.5 shows the projection of the image data points of M1 and M2 on a 2D plane for illustration purpose. Let all the o1 objects on the figure belong to M1 and all the o2’s belong to M2. As

shown on the figure, for the image object of Mio1 1 and a given value of ε, the range query

looks for all objects o2 of M2 that are within a distance of ε. Thus, it selects the image data

points: , , , and of Mto2uo2

vo2wo2 2. Then, the similarity-based join M1⊗εM2 identifies the id's

of the instances of M2 containing these objects and stores them in the corresponding p component of the resulting table. From this example we clearly see that when we later on

want to compute M2⊗εM1, of Mio1 1 will be returned as an image object within a range of ε

for all objects of , , , and of Mto2uo2

vo2wo2 2, by the symmetric property of distance. Since this

information is already contained in the p1 component of the instance of M1⊗εM2 that contains

5.5 More Operators Associated to the Similarity-Based Join 131

the object , we can get this information from Mio1 1⊗εM2. This can then be repeated on all

elements of p1 of M1⊗εM2 to get all the instances of M2⊗εM1.

ε

k2o.

j2o. t

2o.w2o.

u2o.

v2o.

q2o.

p2o.

j1o.

r1o.

s1o.

q1o.

k1o.

i1o.

Figure 5.5: Illustration for the Mine operator

To demonstrate the use of Mine further, we consider Definition 5.3 on the use the terms referring, non-referring, referred, and non-referred instances of the operand image tables of a similarity-based join, say M1⊗εM2.

For the similarity-based join M1⊗εM2, suppose M1 has a set of referring instances M'1

and non-referring instances M"1. Note that M'1 and M"1 are disjoint and that M1=M'1+

∪ M"1. Suppose also that M2 has a set of referred instances M'2 and a set of non-referred instances

M"2. Note also that M'2 and M"2 are disjoint and that M2 = M'2 +

∪ M"2. Then, the symmetric

similarity-based join operation M1⊕εM2 = (M’1+

∪ M”1) ⊕ε (M'2 M"+

∪ 2), will finally be

reduced to M1⊕ε M2 = (M’1⊗ε M’2) (M’+

∪ 2⊗ε M’1).

Demonstrations:

M1⊗εM2 = (M’1+

∪ M”1) ⊗ε (M'2 M"+

∪ 2)

= M’1 ⊗ε (M'2 M"+

∪ 2), since no object in M”1 is referred.

= M’1 ⊗ε M'2, since no object in M”2 is referred.

132 Similarity-Based Algebra

Thus,

M1⊕εM2 = (M1⊗εM2) (M+

∪ 2 ⊗εM1)

= (M’1 ⊗ε M’2) (M'+

∪ 2 ⊗εM’1),

= M’1 ⊕ε M'2, by definition.

Thus, if M'1⊗ε M'2 is computed using the similarity-based join operation of Definition 5.2, then the other (i.e., M'2⊗ε M'1) can be generated using the Mine operator or vice-versa. Algorithm 5.1 demonstrates how the Mine operator computes M'2 ⊗ε M'1, from the resulting table of M'1 ⊗ε M'2. Note that if an instance of M1 is not referring in the join M1⊗ε M2, then it cannot be re-referred by M2⊗ε M1. Furthermore, dropping the non-referring and non-referred instances from the resulting image table will not affect the commutative property of the symmetric similarity-based join.

To present the algorithm for the Mine operator, we require a function that allows us to retrieve the instances of an image table M by their unique identifier components. So, we define get_instance(M, query_id) = (query_id, o, f, a, p), such that (query_id, o, f, a, p) ∈ M. Actually, the get_instance function is a relational selection operation on the id component of M.

Algorithm 5.1: Algorithm used by Mine to compute M'2 ⊗ε M'1.

Let M1(id1, O1, F1, A1, P1) and M2(id2, O2, F2, A2, P2) be two image tables. Let us denote that M’(id1

’, O1’, F1

’, A1’, P1

’) = M1⊗εM2.

Create table T = T(id, O, F, A, P) Foreach instance inst of M' Do Foreach element id2 of inst.P1

’ Do If id2 is not in T Append get_instance(M2, id2) in T End If Update P of T.id2 with (M1, (M'.id1, sim_score)) End Do End Do Return(T); /* T = M2 ⊗ε M1 */

5.6 Other Operators on Image Tables 133

The Mine operator given above for a simple similarity-based join (i.e., a similarity-based join between two image tables), can be generalized to more complex similarity-based join expressions. The general principle is that, for each one-sided similarity-based join, we can compute the other side joins with the use of the Mine operator.

For a short description, let us consider the symmetric similarity-based join of three image tables M1, M2, and M3. To compute M1⊕εM2⊕εM3, the system needs to process six similarity-based joins, M1⊗εM2, M2⊗εM1, M1⊗εM3, M3⊗εM1, M2⊗εM3, and M3⊗εM2, based on Definition 5.2. However, we can compute only three of the similarity-based joins. The rest three can be generated using the Mine operator. This can be generalized for a symmetric multi similarity-based join with any number of tables, n. To compute a symmetric multi similarity-based join with n operand tables, we need to compute n(n-1) non-symmetric similarity-based joins, out of which n(n-1)/2 of them can be generated using the Mine operator. Then, the results can be merged to form the symmetric multi-similarity-based join based on Definition 5.7.

In the following section we present other useful operators on image tables that can be used for wider area of applications.

5.6 Other Operators on Image Tables

In relational algebra, the Intersection, Union, and Difference operators are among the most useful operators. These operators however, are based on exact matching and can not be used in the same manner on image tables of the schema M(id, O, F, A, P), where the operations are similarity-based. Thus, there is a need to redefine these operators under the circumstances of similarity-based comparisons on image representations. Below, we redefine these operators in a way that they can be used on image tables.

Definition 5.10 (Asymmetric Similarity-Based Intersection between two Image Tables)

The asymmetric similarity-based Intersection of two image tables M1(id1, O1, F1, A1, P1) and M2(id2, O2, F2, A2, P2), denoted by M1∩ε M2, is defined as:

M1∩ε M2={(id1, o1, f1, a1, p1) | ((id1, o1, f1, a1, p1) ∈ M1 ∧ ∃ (id2, o2, f2, a2, p2)∈

M2 ∋: ||o1-o2|| ≤ ε and a1 = a2}

134 Similarity-Based Algebra

With the asymmetric similarity-based intersection operator, we are searching for all objects (id1, o1, f1, a1, p1) of M1 that are similar to (at least) one object (id2, o2, f2, a2, p2) of M2 (i.e. ||o1-o2|| ≤ ε) and such that a1 = a2.

A generalized assumption is that, the F components of M1 and M2 consist of identical feature measures that facilitate the similarity-based comparison. Then, if two F values are within a range of ε, we say that they are similar and hence are considered in the intersection, if the other conditions are satisfied. The O components are only compared by their representation of F. Comparison on the A components is based on the relational rules.

This operator is well applicable and useful. It can serve cases where the relational intersection operator could get the desired result. Consider for example, two different tables of employee. Suppose that there are two different persons with the same records. A relational intersection on these two tables will have this record on the result. However, if the two employee tables were image tables and if the asymmetric similarity-based intersection is applied, the said record will not be in the result of the intersection.

Note that, contrary to the relational intersection, the asymmetric similarity-based intersection is not commutative. This operator can however extend this operator in order to get a commutative (or symmetric) operator by considering the additive union of the two

asymmetric similarity-based intersections: M1∩ε M2 +

∪ M2 ∩ε M1.

Definition 5.11 (Asymmetric similarity-based Union between two Image Tables)

The Asymmetric Similarity-Based Union of two image tables M1(id1, O1, F1, A1, P1) and M2(id2, O2, F2, A2, P2), denoted by M1∪ε M2, is defined as:

M1∪ε M2= M1 {(id+

∪ 2, o2, f2, a2, p2) ∈M2 | ∀(id1, o1, f1, a1, p1) ∈ M1, ||o1-o2|| > ε and a1 ≠ a2}

That is, we search fore all images of M2 that are “different” (such that their A component is different or they are not similar) to all images in M1. Then we add the records with these images to M1.

5.6 Other Operators on Image Tables 135

Definition 5.12 (Similarity-Based Difference between two Image Tables)

The Similarity-based Difference of two image tables M1(id1, O1, F1, A1, P1) and M2(id2, O2, F2, A2, P2), denoted by M1-ε M2, is defined as:

M1-ε M2={(id1, o1, f1, a1, p1) ∈M1 | (id2, o2, f2, a2, p2) ∈M2, ||o1-o2|| > ε and a1 ≠ a2}

Comparison on the relational components is based on the relational rules. From the above definitions, we can deduce the following theorems:

Theorem 1: M1-ε M2 = M1 \ (M1∩ε M2)

Theorem 2: M1∪ε M2= M1 +

∪ (M2-ε M1)

As a sample case where this operator can be useful consider two tables: a table of students that are members of the university football club and a table of students for the members of the universitiy’s basketball club. Suppose that we want to search for all students that are members of the football club and the basketball club.

− A relational intersection operation on the two tables will give us all in both clubs, including those boys with the same name and record. However, an asymmetric similarity-based intersection (considering that we are using image tables) will not include those different boys with the same name and record.

− A relational union operator on the two tables will suppress one of those students with the same name and record. However, an asymmetric similarity-based union will have both of the two students with the same name and record (supposing that we used image tables and that the two students differ in their photos).

− The relational difference operation will have only one of the different students with the same record. Where as the similarity-based difference operator will have both students with the same name and record.

Furthermore, we can as well define the Cartesian product of two image tables as follows.

136 Similarity-Based Algebra

Definition 5.13 (Cartesian Product of two image tables)

The Cartesian product of two image tables M1(id1, O1, F1, A1, P1) and M2(id2, O2, F2, A2, P2), denoted by M1 × M2, is defined as:

M1× M2 = {(id1, o1, f1, a1, p'1) | (id1, o1, f1, a1, p1) ∈ M1 and p'1 = p1 ∪ (M2, {(id2, sim_Score(o1, o2)})},

where sim_Score(o1,o2) is the similarity score between o1 and each of the image objects of M2.

Note that, for each image object in M1, the Cartesian product refers to all instances in M2.

The operators presented in this Chapter permit us to work on an extended system of algebra composed of relational and content-based operations on the OR model. This system permits us to formulate multi-criteria queries that are based on similarity-based comparisons and relational operations on image databases.

5.7 Relational Operators on Image Tables and on Complex Expressions

A relational operation on the relational attributes of image tables is performed based on the rules of relational algebra.

5.7.1 Relational Selection on an Image Table

Since similarity-based operations are expensive, it is usually preferable if a relational selection precedes a similarity-based operation. A relational selection on an image table can be used to simplify query execution, particularly when there is a complex expression composed of relational and similarity-based operations. A relational selection operation on an image table follows the rules of relational algebra and can only be performed on the relational attributes.

5.7.2 Relational Join on Image Tables

There is also a practical requirement for a relational join on two image tables (for example, on the elements of the A components). One of our goals other than the semantic and syntactic correctness of the similarity-based operations on image databases is to avoid having two

5.7 Relational Operators on Image Tables and on Complex Expressions 137

image components in a resulting table. To realize this same goal in a relational join of image tables, we can use the P component of one table to store pointer structures to the joined instance(s) of the other table(s) in the same manner to the similarity-based join. However, in the case of a relational join on image tables, the operation is based on the non-image components. We can, therefore, keep the joined components in one of the image tables and maintain a pointer to the instances of the other within the P component. This leads us to the choice of performing either a left Relational Join (lR-Join) or a right Relational Join (rR-Join) operation on two image tables. The choice could, for example, be based on the requirement for a subsequent operation. An lR-Join on two image tables performs a relational join based on a join condition jc and retains the record instances of the table at the left that satisfy the condition jc while modifying its P component by inserting a pointer to the joined record instances of the table at the right. Similarly, a rR-Join retains the record instances of the table at the right that satisfy the condition jc while modifying its P component to include a pointer to the joined record instances of the table at the left. The following two definitions formally present these operations.

Definition 5.14 (left Relational Join)

Let M1(id1, O1, F1, A1, P1) and M2(id2, O2, F2, A2, P2) be two image tables. The left

relational join of M1 to M2 under the join condition jc, denoted by M1⋈ljcM2, is

defined as:

M1⋈ljcM2 = {(id1, o1, f1, a1, p'1) | t=(id1, o1, f1, a1, p1) ∈ M1, p'1=p1 ∪ (M2, {(id2,

Null)})}, where R(t)= {t'.id , t' ∈ M2 | jc(t,t')} and id2 ∈ R(t) and R(t) ≠ ∅.

For some t ∈ M1 and t' ∈ M2, jc(t,t') is a predicate which holds true when the join condition jc is satisfied. Since this is a relational operation, the similarity-score component of the P in the result table is not defined and hence it is kept Null.

Definition 5.15 (right Relational Join)

Let M1(id1, O1, F1, A1, P1) and M2(id2, O2, F2, A2, P2) be two image tables. The right

relational join of M1 to M2 under the join condition jc, denoted by M1 ⋈rjc M2, is

defined as:

138 Similarity-Based Algebra

M1 ⋈rjc M2 = {(id2, o2, f2, a2, p'2) | t=(id2, o2, f2, a2, p2) ∈ M2, p'2=p2 ∪ (M1, {(id1,

Null)})}, where L(t) = {t'.id , t' ∈ M1 | jc(t',t)} and id1 ∈ L(t) and L(t) ≠ ∅.

For some t' ∈ M1, t ∈ M2, jc(t',t) is a predicate which holds true when the join condition jc is satisfied. Since this is a relational operation, the similarity-score component of the P in the result table is kept Null.

5.7.3 Relational Join Between Image and Relational Tables

A join between an image table and a relational table may also be required in many practical applications. Thus, we define a relational join operator between an image table and a relational table as follows.

Definition 5.16 (A Join between Image and Relational Tables)

Let R be a relational table and M(id, O, F, A, P) be an image table. A relational join between M and R will result in a modified image table M' which integrates R in the A component of M based on the join condition jc.

Formally:

M' = R ⋈jc M = M ⋈jc R = {(id, o, f, a', p) | t=(id, o, f, a, p)∈M, a'= (a,t'), t' ∈ R,

jc(t,t')}.

For some t ∈ M and t' ∈ R, jc(t,t') is a predicate which holds true when the join condition jc is satisfied. Since this is a relational operation, the other components of the resulting table M’ are not affected. Hence, in the case of a join between an image table and a relational table, there will be no modification on the P component of M. The Cartesian product between a relational table R and an image table M can also be defined similarly.

5.8 Query Expressions Involving both Similarity-Based and Relational Operators

In this section, we present query expressions in order to demonstrate the way some of the queries given in Chapter 2 can be managed. Since our similarity-based operators are novel introductions as database operators, none of the existing query languages support them as

5.8 Query Expressions Involving both Similarity-Based and Relational Operators 139

basic operators. In our implementation, these operators are introduced as plug-ins associated to the ORDImage module of Oracle 9i DBMS.

As presented earlier in this Chapter, the similarity-based operators use the parameter ε for each range query operation. We suppose that the user gives this value when the query is first stated. We only present here an abstraction of SQL-like query expressions without considering the query result. More practical expressions with their result are given in Chapter 7 that presents our prototype.

Consider the example queries in Chapter 2. For queries 1 and 2, we first formulate the image tables in a way that is compatible to our image data repository model as follows:

SI(id, Surv_Photo, F, A', P),

where A' is an object component given by A' = {Date, Time}1 and Surv_Photo is the photo taken by the surveillance camera.

EMP(id, Photo, F, A*, P),

where the A* component is composed of a schema A*(ename, eaddress, Dept, Occupation) and Photo is the photo of the employee.

Then, the query statement for Query 1 can be stated using an SQL-like query language as:

SELECT *

FROM SI s, EMP e

WHERE s.F ≈ε e.F2

AND s.A'.Date = 31-12-1999

AND s.A'.Time BETWEEN (4:00:00 PM AND 6:00:00 PM).

The image tables can be re-formulated for Query 2 of Chapter 2 as follows.

SIE(id, O, F', Ae, P),

1 Some database systems store both Date and Time under the same data structure. We, however,

state it here separately for the purpose of clarity. 2 The symbol “≈ ε” is associated to the similarity-based join operator ⊗ε discussed earlier in this

Chapter. In our prototype, we used the operator name “SimJoin” for the similarity-based join operator.

140 Similarity-Based Algebra

SII(id, O, F*, Ai, P),

where Ae and Ai contain the date and time and that the surveillance images are taken at both the exterior and interior gates respectively. The EMP table given above can be considered as it is defined for Query 1.

Thus, we can state the SQL-like statement as:

SELECT *

FROM SIE s, SII t, EMP e

WHERE t.F* ≈ε s.F’ AND t.F* ≈ε e.F

AND t.A'.Date = ‘31-12-1999”

AND s.A'.Date = ‘31-12-1999”

AND (t.Ai.Time.Hour – s.Ae.Time.Hour) ≤ 2.

Let us now consider a case of an application in Bio-Medicine. Suppose a specialist is interested to analyze medical images collected from different sources in comparison with an existing medical image database of Pathology. Due to the limitations of describing an image using key words, supporting medical applications with content-based image description and retrieval techniques is of high importance. Consider, for example, that the collected images are lung X-ray images of patients and that they are organized in an image table X-ray. The table X-ray of patients doesn't contain adequate information on treatments and no interpretations are given for the anomalies. Let the Pathology table contain a large collection of medical images with identified anomalies, the corresponding diagnosis, and the treatments given. We see that the only means to get information on the identification of the anomalies on the images in the X-ray table for possible diagnosis and the use of the experience on treating such cases is by making a similarity-based join on the representations of the images or salient objects in the two tables, X-ray and Pathology. Since anomalies are parts of images and hence are identified as salient objects, we assume here that the image tables are structured based on our image data repository model. Hence, the Sx and Sp tables given below are tables that contain all the information related to the identified salient objects in X-ray and Pathology tables. The image tables involved in this problem can be given as:

5.8 Query Expressions Involving both Similarity-Based and Relational Operators 141

X-ray(id, O, F, Ap, P),

Sx(ids, Fs, As),

where Sx is a table that contains the salient objects of the images in X-ray table.

Pathology(id, O, F, At, P),

Sp(ids,Fs,As),

where Sp is a table that contains the salient objects of the images in Pathology table.

The A components of the above tables can be stated as below. These tables can as well be kept in a separate relational table, where a common key can be used to keep the liaison. Remember that these are non-image tables and can be handled with the traditional approaches.

Ap(Name, Gender, Age, Address, Image_Type, Sx-id),

At(Image_Type, Date, Physician, diagnosis, Treatment, Sp-id),

where Sx-id and Sp-id contain all corresponding id's of Sx and Sp whose salient objects belong to the images of X-ray.id and Pathology.id.

The A component of Sp is given as:

As(anomaly_type, Anomaly_description, Sp-id)

The major difference of X-ray and Pathology is that in Pathology and in the Sp tables, anomalies are well identified and the complete information on the diagnosis and treatment of the case is well documented. The specialist is, therefore, interested to exploit the expertise used in Pathology in identifying and classifying the similar cases of the images in the X-ray table. Thus, for this purpose, the specialist may for example have a query stated as:

"Retrieve all relevant information including the diagnosis and treatments of the Pathology table to lung related medical cases for the X-ray images of the patients in the X-ray table ".

Processing this query will require at least a ⊗ε between the Pathology and X-ray table or even a relational join between X-ray and Sx plus a relational join between Pathology and Sp plus a similarity-based join between Sx and Sp (plus relational selection and projection).

142 Similarity-Based Algebra

5.9 Summary

The relational algebra and the query languages that are based on it had been the subject of many research activities for the last three decades. As a result, the relational model has so far served a large domain of applications with reasonable efficiency. However, more and more new applications, involving digital images with content-based retrieval requirement, are emerging in various domains. The relational algebra is thus becoming deficient to handle requirements regarding similarity-based queries on visual features of images. It is, therefore, becoming very important to think of an algebra that can support similarity-based operations on images, and hence manage multi-criteria queries on complex features of images. In this respect, there are only few efforts done on the development of a formal algebra. Moreover, the few works done do not address the definition of an "operational" similarity-based algebra usable for modeling, optimizing and processing similarity-based queries and combining similarity-based and relational operators.

In this Chapter, we addressed this important problem and proposed several concepts for a novel system of algebra - the similarity-based algebra under an OR model. We particularly:

• defined several operators based on the similarity-based comparisons, • described the usage of the operators using sample queries, • described the usage of the novel component of an image table, called P, • investigated and stated properties of the operators introduced, • investigated the possible techniques of query optimization based on the algebraic

properties of the similarity-based operators, • introduced an operator called Mine that can effectively be used for query

optimization, • defined the necessary basic set-based operators on image tables, • studied the use of the similarity-based operators in combination with the relational

operators, and • gave demonstrations on complex (multi-criteria) queries involving both relational and

similarity-based operators.

CHAPTER 6

SIMILARITY-BASED QUERY OPTIMIZATION

In a relational query system, there are many access plans that the DBMS can follow to process a query. These plans are equivalent in terms of their final output but vary in their cost, for example, the amount of time that they need to run. This cost difference can be of several orders of magnitude and this can affect system performance [3, 154]. Thus DBMSs have a module, called query optimizer, which examines the possible alternatives and chooses the plan with the least cost. Query optimization in the relational model is a large area that has been surveyed extensively [3, 126, 154, 155]. Since descriptive query languages for commercial OODBMSs are relatively new, most of the query optimization proposals in this model can only be found as research prototypes [20, 142, 145, 153].

Among the many query optimization strategies are the Algebraic Rewriting and Physical Optimization [152, 153] strategies. In Algebraic Rewriting strategy, the system first transforms a query into an algebraic expression. Then, equivalent algebraic expressions governed by algebraic transformation rules are generated. The optimizer then chooses the one with a lesser cost depending on rule-based, cost-based or other strategy. Algebraic transformation rules for a relational model are given in [123]. Physical Optimizations deal with the implementation of indexes on collections of objects, the use of inverse relationships, replications, clustering, etc. These methods can be used to create a better execution plan using the knowledge of the storage organization.

144 Similarity-Based Query Optimization

In the following section, an abstraction of a query optimization process in a DBMS and a brief description of its major components are presented. This section is in particular intended to position our contribution in relation to a query optimization process.

6.1 General Architecture of a Query Optimizer

For a given query expression, there exist several execution plans with the same final result, but with possibly different costs of execution. In a query optimization process, most of the alternatives need to be considered so that the one with the best cost effective plan is chosen. Calculating the expected costs of the different possible plans shows the various levels of efficiency that each plan may have. Without a query optimizer, a system may choose a costly plan to execute a query. Query optimizers examine the possible alternatives so that the best alternative of executing a query can be chosen [126, 154, 158].

The path that a query traverses through a DBMS until its result is generated is shown in Figure 6.1. In this process, a Query Parser checks the validity of the query and then translates it into an internal form, usually an algebraic expression or something equivalent. The Query Optimizer examines the most effective algebraic expressions that are equivalent to the given query and chooses the one that is estimated to be the cheapest. The Code Generator transforms the access plan generated by the optimizer into calls to the query processor. Finally, the Query Processor does the actual query execution. Though these steps are widely used in many traditional DBMSs, the same process can be applied to ORDBMSs and to query processing that involves similarity-based operations on image databases.

Q u e ry P a rse r

Q u e ry O p tim izer

C o d e G en e ra to r

Q u e ry P ro cesso r

Q u e ry

Figure 6.1: The Flow of Query Processing Steps

6.1 General Architecture of a Query Optimizer 145

An abstraction of an entire query optimization process for generating and testing the different alternatives is shown in Figure 6.2 as a modular architecture of a query optimizer [158]. In this modular architecture, the process can be seen as having two stages: the rewriting and the planning.

Algebraic

Space Module

Planer

Method-Structure Module

Cost-Model

Size-Distribution Estimator

Rewriter

Figure 6.2: Architecture of A Query Optimizer.

The Rewriter module applies transformation rules to a given query and produces equivalent queries that are hopefully more efficient. The transformations performed by the Rewriter depend only on the characteristics of queries and do not take into account the actual query costs for the specific DBMS concerned. If the rewriting is known or assumed to always be beneficial, the original query is discarded; otherwise, it is sent to the next stage as well.

The Planner is the main module of the optimization process. It examines different possible execution plans for each query produced in the previous stage and selects the overall cheapest one to be used to generate the query result. It employs the Algebraic Space Module and the Method-Structure Module of the optimizer to examine the execution plans in order to determine the search strategy and the cost. The execution plans examined by the Planner are compared based on estimates of their cost so that the cheapest may be chosen. These costs are derived using the Cost Model and the Size-Distribution Estimator modules of the optimizer.

The Algebraic Space module determines the execution orders that are to be considered by the Planner for each query sent to it. The execution orders are usually represented by the properties of the operators and the transformation rules of the relevant

146 Similarity-Based Query Optimization

algebra. For queries that involve similarity-based operators, this is where our transformation rules and the properties of the similarity-based operators can be applied.

The Method-Structure module determines the implementation choices that exist for the execution of each series of actions specified by the Algebraic Space module. This choice is related to the available methods for each operator. For example, for a relational join, nested loops, sort merge, and hash join methods can be considered. This choice is also related to the available indices for accessing each relation, which is determined by the physical schema of each database stored in its catalogs. Given an algebraic formula or tree from the Algebraic Space, this module determines the execution plans, which specify the implementation of each algebraic operator and the use of any indices.

The Cost Model module specifies the arithmetic formulas that are used to estimate the cost of execution plans. For every different method of executing an operator, for every different index type access, and in general for every distinct kind of step that can be found in an execution plan, this module contains a formula that gives its cost. Given the complexity of many of these steps, most of these formulas are simple approximations of what the system actually does and are based on certain assumptions regarding issues like buffer management, disk-CPU overlap, sequential vs. random I/O, etc. In a relational system, the important input parameters to a cost estimation formula are the size of the buffer pool used, the sizes of relations or indices accessed, and possible various distributions of values in these relations [151, 158]. The size of the buffer for each query is determined by the DBMS, whereas the size of the relations or indices accessed are estimated by the Size-Distribution Estimator.

The Size-Distribution Estimator module specifies how the sizes (and possibly frequency distributions of attribute values) of database tables and indices as well as (sub)query results are estimated. The specific estimation approach adopted in this module also determines the form of statistics that need to be maintained in the catalogs of the database.

6.2 Transformation Rules and Methods of Similarity-Based Query Optimization

Though a lot has been done on relational and object-oriented query optimization strategies, there is practically little work on similarity-based query optimization. One of the main reasons for this is that there are no works on similarity-based algebra [8] that is the basis for the process of query optimization. The similarity-based algebra that we proposed in this thesis will enable us to exploit the properties of the operators for the purpose of similarity-

6.2 Transformation Rules and Methods of Similarity-Based Query Optimization 147

based query optimization. Due to the properties of the similarity-based operators, a query in content-based image database management systems is quite different from a query in a Relational DBMS. Considering the fact that most feature extraction and content-based retrieval algorithms are very expensive as compared to the matching operations in relational systems, the need for query optimization strategies in image databases is even more critical. Thus, based on the definitions of the similarity-based operators presented in the previous Chapter, we present here rules that are useful for query optimization in image database management systems. Furthermore, we investigate the possibilities of query optimization for complex query expressions containing both relational and similarity-based operations. The rules on similarity-based operations combined with the traditional optimization strategies enable us to deal with more complex query expressions in image databases. We use here the notations of our definitions of an object-relational image table in Chapter 4 and similarity-based operators in Chapter 5. Although many query optimization engines are mainly designed for relational systems, it is our conviction that the similarity-based algebra and the transformation rules that we proposed in this work will contribute to the development of future “similarity-based query optimization”1 engines.

6.2.1 Properties of Selection-Based Algebraic Expressions

We consider two types of selection operators: a relational selection (on a relational attribute within the A component of an image table) and a similarity-based selection (on the F component of an image table). Let us denote the relational selection by δa(M), where “a” is an attribute value of M, and the similarity-based selection by δε

q(M), where q is a query image object.

Then, the following properties hold true:

1. δεq(M1

+

∪ M2) = δεq(M1)

+

∪ δεq(M2); where “q” stands for the query image

object. (Pushing a similarity-based selection in to an additive union),

2. δεq(M1⊗εM2)=δε

q(M1)⊗εM2 (pushing a similarity-based selection into a similarity-based join), and

1 By “similarity-based query optimization” engine we mean a query optimizer that can also engage

in queries involving similarity-based operators in their expressions in addition to the traditional operators.

148 Similarity-Based Query Optimization

3. δεq(M1⊕εM2) = δε

q(M1) ⊗εM2 δ+

∪ εq(M2) ⊗εM1 (pushing a similarity-based

selection into a symmetric similarity-based join which follows from the above two).

Proofs:

To show that these algebraic rules hold true, we proceed as follows:

1. Let m = (id, o, f, a, p) ∈ δεq(M1

+

∪ M2).

By definition of δ, m ∈ Rε(M1+

∪ M2, o),

i.e. m ∈ (M1+

∪ M2) and ||o-q|| ≤ ε

Suppose without loss of generality that m∈ M1.

Then, we have m ∈ M1 and ||o-q|| ≤ ε ; i.e. m ∈ δεq(M1).

Thus, δεq (M1

+

∪ M2) ⊆ (δεq(M1) δ

+

∪ εq(M2))

Reciprocally, let m ∈ (δεq(M1) δ

+

∪ εq(M2))

Suppose that, without loss of generality, that m∈ δεq(M1),

Then by definition of δ, m ∈ M1 and ||o-q|| ≤ ε,

Thus, m ∈ (M1 M2) and ||o-q|| ≤ ε, +

i.e. m ∈ (M1 M+

∪ 2)

Thus, (δεq(M1) δεq(M2)) ⊆ δεq (M1 M2) +

∪+

Therefore, δεq (M1 M2) = (δεq(M1) δεq(M2)) +

∪+

2. Let m = (id, o, f, a, p) ∈ δεq(M1⊗εM2),

We have by definition of δ m ∈ (M1⊗εM2) and ||o-q|| ≤ ε,

i.e. m is a referring instance of M2 (Def. 5.3) that is similar to q.

As referring instance of M1, m ∈ M1,

6.2 Transformation Rules and Methods of Similarity-Based Query Optimization 149

since ||o-q|| ≤ ε and m ∈ δεq(M1),

Thus, m ∈ δεq (M1), and m is a referring instance of M2,

So, m is a referring instance of δεq(M1) ⊗εM2.

i.e. m ∈ δεq(M1) ⊗εM2,

Thus, δεq(M1⊗εM2) ⊆ δε

q(M1)⊗εM2

Conversely, Let m = (id, o, f, a, p) ∈ δεq(M1) ⊗εM2,

Then, ||o-q|| ≤ ε, m ∈ M1, and m is a referring instance to M2.

i.e. m∈(M1 ⊗εM2),

So, m is a referring instance of M1 that is similar to q.

i.e. m ∈ δεq(M1⊗εM2),

Hence, δεq(M1)⊗εM2 ⊆ δε

q(M1⊗εM2),

Therefore, δεq(M1)⊗εM2 = δε

q(M1⊗εM2).

3. δεq(M1⊕εM2) = δε

q((M1⊗εM2) (M+

∪ 2⊗εM1))

= δεq(M1⊗εM2) δ

+

∪ εq(M2⊗εM1), property 1

= δεq(M1)⊗εM2 δ

+

∪ εq(M2)⊗εM1 , property 2

These properties also hold true for the relational selection operator, δa(M), too:

• δa(M1+

∪ M2) = δa(M1) δ+

∪ a(M2) (pushing a relational selection in to an additive union),

• δa(M1⊗εM2) = δa(M1)⊗εM2 (pushing a relational selection into a similarity-based join), and

• δa(M1⊕εM2) = δa(M1)⊕εδa(M2) (pushing a relational selection into a symmetric similarity-based join).

150 Similarity-Based Query Optimization

6.2.2 Properties of Similarity-Based Join Algebraic Expressions

Based on the properties of the similarity-based operators defined and discussed in the previous Chapter, the following rules are valid. These rules can effectively be used for query optimization purpose.

• (M1 ⊕ε M2) = (M2 ⊕ε M1), the commutative property of ⊕ε;

Whenever the symmetric similarity-based join ⊕ε is applied, its commutative property can be used. Since the symmetric similarity-based join is defined in terms of the similarity-based join, this is applied only while reordering the component joins can give benefit in terms of query optimization.

• M1⊗εM2 → Mine(M2⊗εM1), the exchange order property using Mine;

Even in the non-symmetric similarity-based join operator, we can use the Mine operator to optimize query processing. The factors for the exchange of the order of the operand tables could be: the number of record instances in the tables, the density of the objects in the feature space, the value of ε, the amount of I/O and Memory requirement, etc. A query optimizer must, therefore, carefully compare the benefits of transforming an expression against the computational cost.

Furthermore, an optimum similarity-based join algorithm (or a better available method of similarity-based join algorithm) that results with the same result table can be chosen.

• M1 ⊗εmethod1 M2 → M1 ⊗ε

method2 M2, the join method choice;

Here method1 and method2 are presented as two possible choices. If there are different methods or tools of computing the similarity-based comparisons, the most optimum one can be chosen.

6.2.3 Methods of Local Optimization

We refer to a local optimization as the inner table choice in a non-symmetric similarity-based join. For instance, it could be less costly to compute M2⊗εM1 than M1⊗εM2 or vice-versa. Based on the exchange order property using Mine stated above, the additional price for the exchange of order is the computational cost of the Mine operator. That is, instead of computing M1⊗εM2, we first compute M2⊗εM1 and then apply the Mine operator on the result. It is the task of the query optimizer to decide which one-sided similarity-based join to

6.2 Transformation Rules and Methods of Similarity-Based Query Optimization 151

perform first and then use the Mine operator for the other. The decision depends on the cost model for the similarity-based join.

Obviously, the cost of a similarity-based join depends linearly on the number of data points in M1. However, the dependency in the number of data points in M2 is far less obvious. This means that compared to relational query optimization, no simple decision rule can be established which decides on the relative inner table choice within a similarity-based join. Therefore, we need to compute the cost of each alternative separately, and then decide which strategy to follow.

Thus, the transformation rule: M1⊗εM2 → Mine(M2⊗εM1) is worthwhile only if cost(M1⊗εM2) > cost(M2⊗εM1) + cost(Mine(M2⊗εM1)). Compared to the respective cost of computing the non-symmetric join operation, the cost of Mine is negligible. Therefore, a test of the following form can be applied.

if cost(M1⊗ε M2) > cost(M2⊗εM1) then perform Mine(M2⊗εM1);

A cost model for a simple nested-loop similarity-based join on spatial data based on a k-NN method is presented in [34]. This cost model is built based on the previously developed cost models [34, 35] for content-based image retrieval (see Annex for details). Based on this model:

).)21(.

'.1()(cos ''

121

21jdjd

jeff

Sj

dN

CNMMt −

=∑ ⎟⎟⎠

⎞⎜⎜⎝

⎛+=⊗ ε

ε

where N1 and N2 are the number of data points in M1 and M2 respectively, Ceff is the effective capacity of a leaf partition and d' is computed as d' = log2(N2/Ceff). A uniform object distribution is considered in this model.

This model, however, is not universal, since it is a cost estimation for a particular algorithm. In order to benefit from query optimization, future image DBMSs that involve similarity-based operations need to design a model that estimates the cost of their algorithms, in association with the different indexing methods that they can work on.

In the environment where we developed our prototype, no query optimization strategies are used for the similarity-based operators. Since we used a non-open image-processing engine developed by a commercial company, we had no option to develop a cost model to it. We have, however, evaluated the cost of processing similarity-based join operators in association to the Mine operator. These experimental results are presented in Section 6.4.

152 Similarity-Based Query Optimization

6.3 Performance Evaluation Experiment

In this section, we describe the experiments we performed to evaluate the performance of our similarity-based operators defined in Chapter 5 and implemented in the EMIMS prototype (Chapter 7). In particular, we conducted experiments that evaluated the performance of a similarity-based join with and without the use of the operator Mine.

There are different factors that can affect the performance of a query process among which are the size of the buffer pool used, the disk-CPU overlap, the sequential vs. random I/O, the sizes of the tables or the indices accessed, etc. Moreover, the value of ε and the efficiency of the similarity-based search engine are also important factors that affect query execution efficiency. We performed this experiment on a Pentium III 500Mhz machine with 256 M memory.

Since considering all these factors for the performance evaluation of the Mine operator in a similarity-based join will be difficult, we kept some of these factors constant in the experiment we conducted.

For our experiments, we created image tables containing up to 1000 image records. For the sake of convenience and practicability, we fixed the similarity threshold value, ε, at 5.0, so that only the few most similar images are selected.

Table 6.1: Time (in seconds) for similarity-based join with varying image table size.

Size of M1 Size of M2 M1 ⊗ε M2 M2 ⊗ε M1 Mine(M1 ⊗ε M2) Mine(M2 ⊗ε M1)

10 100 25.2 11.3 3.1 3.3

10 200 42.2 19.3 4.0 3.6

10 300 64.3 28.7 7.2 3.2

10 400 75.5 35.8 6.2 3.5

10 500 82.6 46.0 9.1 5.2

10 600 108.6 55.6 15.2 5.6

10 700 152.5 65.0 18.4 6.1

10 779 170.0 72.5 22.9 6.8

10 900 177.2 82.1 27.3 7.3

10 1000 215.9 90.3 26.9 6.9

We performed similarity-based joins on two tables: M1 (with fixed size of 10 images) and M2 (with varying size from 100 to 1000 images). As shown in Table 6.1 and graphically

6.3 Performance Evaluation Experiment 153

in Figure 6.3, we compare the computation time with respect to the size of M2 for operations M1 ⊗ε M2, M2 ⊗ε M1, Mine(M1 ⊗ε M2) and Mine(M2 ⊗ε M1).

Furthermore, we applied the Mine operator and evaluated its performance in optimizing the similarity-based join. In our experiments, we executed 400 similarity-based join and Mine operations. To avoid uncertainty on the measures due to external constraints such as varying operating system load, we performed every operation 10 times and took their average execution time.

In this test, we see that the execution time for the similarity-based join M2⊗εM1 is linear with respect to the size of the image table M2. This is logical, since the join algorithm performs a loop on each instance of the image tables. Since the size of one of the tables is

fixed, the execution time is proportional to the size of the other table.

0

50

100

150

200

250

0 200 400 600 800 1000

Num ber of im ages in M2

Tim

e (s

econ

ds)

Mine(M2⊗εM1)

Mine(M1⊗εM

M1⊗εM

M2⊗εM1

Figure 6.3: Time for similarity-based join with varying image table size.

We also observe that M1 ⊗ε M2 is always more expensive than M2 ⊗ε M1, and the difference increases almost linearly with the number of records in M2. Since M1 ⊗ε M2 and M2 ⊗ε M1 perform the same number of signature comparisons, the difference in the cost cannot be explained by this operation. However, while computing M1⊗εM2, the operation takes a signature S1 from M1, loads it in memory, and then loads signatures of M2 in memory to perform the comparison. Since M2 contains a large number of records, not all signatures of M2 can be loaded in memory at once. The server, therefore, needs to first load a bucket of signatures of M2 at a time. S1 is then compared to this bucket of signatures. The server then unloads this bucket of signatures, and then loads the next bucket, until S1 has been compared

154 Similarity-Based Query Optimization

to all signatures of M2. This swapping effect increases the cost of M1⊗εM2. On the contrary, for M2⊗εM1, all signatures of M1 can be loaded in memory at once since M1 has a small number of records.

Another observation is that the execution time of the Mine operator is small in comparison to that of the similarity-based join operator, particularly for large tables. This is because the Mine operator does not perform a similarity-based operation. We also note that the cost of Mine(M1⊗εM2) is greater than that of Mine(M2⊗εM1), with the difference increasing along with the size of M2. Mine(M1⊗εM2) actually creates M2⊗εM1, which is a table with a large number of records compared to M1⊗εM2. The more records in a table, the more insert operations need to be performed. This makes Mine(M1⊗εM2) more expensive than Mine(M2⊗εM1). From these observations, we can deduce a good strategy for choosing a better query plan that can effectively be used in a query optimization heuristics.

Table 6.2: Symmetric similarity-based join, M1 ⊕ε M2, with two different strategies.

Size of M1 Size of M2 M1 ⊕ε M2 M1⊗εM2 +

∪ Mine(M1⊗εM2)

10 100 36.5 14.6 10 200 61.5 22.9 10 300 93.0 31.9 10 400 111.3 39.3 10 500 128.6 51.2 10 600 164.2 61.2 10 700 217.5 71.1 10 779 242.5 79.3 10 900 259.3 89.4 10 1000 306.2 97.2

When a symmetric similarity-based join operation is computed, the advantage of applying the Mine operator can clearly be seen in Table 6.2 and graphically in Figure 6.4.

6.4 Summary 155

0

50

100

150

200

250

300

350

0 200 400 600 800 1000

Number of images in M2

Tim

e (s

econ

ds)

M1⊗εM2 M+

∪ 2⊗εM1

M1⊗εM2

+

∪Mine(M1⊗εM2)

Figure 6.4: Symmetric similarity-based join, M1 ⊕ε M2, with two different strategies.

6.4 Summary

Query optimization in the relational model is a wide area that has been studied extensively. As a result, most of the relational DBMSs have a module, called query optimizer, which examines the possible alternatives (plans) of query execution and chooses the one with the least amount of cost in order to enhance query execution efficiency. These plans are equivalent in terms of their final output but vary in their cost, particularly in the amount of time that they need to run. Most of the query optimization strategies in a relational system are based on the properties of the algebraic operators.

The similarity-based operators that we defined earlier in Chapter 5 possess useful properties that can be exploited for query optimization. In this Chapter, we first made a review of the general structure of a query optimization process. Then, we investigated the way that a similarity-based query optimization can be designed. We particularly considered the algebraic query optimization, based on which, we proposed useful transformation rules for the similarity-based operators. These rules can be used to formulate alternative plans of executing the same algebraic expression. Out of these alternative plans, the one with the least cost will be chosen.

Furthermore, we conducted experiments and evaluated the performance of the different formulations of the similarity-based joins. We also evaluated the use of the Mine

156 Similarity-Based Query Optimization

operator. The Mine operator can thus be used to reduce the costs of the similarity-based join and the symmetric similarity-based join.

This work defines the basis for developing an optimizer dedicated to compose hybrid queries. However, formalizing and implementing a fully operational optimizer is a very complex task constitutes a research subject by itself and will be beyond the scope of this PhD work. Our goal in this Chapter was to make a first investigation of the optimization possibilities associated to our algebraic operators.

CHAPTER 7

EMIMS (EXTENDED MEDICAL IMAGE

MANAGEMENT SYSTEM)

A number of commercial DBMSs have been upgraded to Object Relational DBMSs in order to possess fundamental characteristics such as base type extension, support for complex objects, inheritance, etc. [14]. ORDBMSs permit systems to be able to integrate tools that can be used to manage multimedia data such as image, video, audio, or heterogeneous types of data without loosing its capability to manage traditional alphanumeric data. Hence, this data model can handle our proposals in this thesis. To show the validity and practicability of all our proposals, we developed a prototype called EMIMS (Extended Medical Image Management System). To realize this prototype, we considered and compared three ORDBMSs: Oracle, Informix, and DB2 Universal Database. We finally made a choice for the Oracle DBMS. Our choice of Oracle is because of its high quality tools that it integrated in its recent version to support the management of multimedia data, its popularity, and the availability of sufficient support materials on the web1. We used the latest version, Oracle 9i, for it has a better support to manage multimedia data compared to its predecessor, Oracle 8i. Since its version 8, Oracle started to provide tools that can be used to manage multimedia data in a database environment [156].

1 Oracle Technology Network (OTN) website at: http://otn.oracle.com/

158 EMIMS (Extended Medical Image Management System)

In a previous research work at LISI, INSA de Lyon, a prototype called Medical Image Management System (MIMS) was developed, by R. Chbeir [59]. MIMS was designed to manage medical images under a relational DBMS as a means to validate the image data model, image indexing methods, an iconic language, and spatial relationships between image objects proposed by the author. With an initiative to enhance this system so that it can support similarity-based operations on medical images, we developed novel modules that realize our proposals and incorporated the same with MIMS, hence creating EMIMS (Extended MIMS).

EMIMS is developed in Java under Oracle 9i using the JDBC interface. It uses the image data repository model proposed in this thesis to store image data and it realizes the implementation of the similarity-based operators defined in this thesis. EMIMS provides an interface that is able to store and retrieve medical images using multi-criteria (i.e., using both similarity-based and metadata-based features of image data) query formulations. In this Chapter, we present the components and functionalities of EMIMS. First, the content-based image management facilities offered by the interMedia component of Oracle 9i is described. Then, we present a general architecture that shows the position of EMIMS in an ORDBMS, in particular in the structure of Oracle 9i. Finally, we describe the image management components of the prototype and the different interfaces that we developed in EMIMS.

7.1 The Oracle interMedia Module

The interMedia module is an image management tool developed by Virage Co.1 and later adopted by Oracle in order to enhance Oracle’s traditional safe and efficient data management features with multimedia data management capabilities. Oracle interMedia of Oracle 9i enables to manage multimedia data such as images, audio, video, or other heterogeneous media data in an integrated manner with other enterprise information. Oracle 9i provides support for the definition of object types including the data associated with these objects and the operations (methods) that can be performed on them. Database applications and plug-in modules written in Java, C++, or in traditional 3GLs can interact with interMedia through modern class library interfaces, or PL/SQL and Oracle Call Interface (OCI). The Oracle 9i interMedia module provides the means to insert multimedia data such as image, audio, video, or other heterogeneous media columns or objects to new or existing tables and also facilitates the update and retrieval of such multimedia data. These new features enable to

1 For more information visit http://www.virage.com

7.1 The Oracle interMedia Module 159

extend existing database applications with multimedia data or to build new end-user multimedia database applications. It is also designed in such a way that it can be extended to create a new object type or a new composite object type based on the provided multimedia object types. It supports specialized plug-ins to manage external sources of image, audio, video, or other heterogeneous media data that are not supported in its current version.

Within Oracle interMedia, image data characteristics have an object relational type known as ORDImage. It also includes other object relational types such as ORDAudio, ORDVideo, and ORDDoc to manage audio, video, and heterogeneous documents respectively [157]. ORDImage supports two-dimensional, static, digitized raster images stored as binary representations of real-world objects or scenes. The image data can have varying depths (bits per pixel) depending on how the image was captured and coded, and can be organized in various ways. This organization of the image data is known as the data format. ORDImage can store and retrieve image data of many data formats.

Though it possesses many image management tools, the system lacks to support useful operators that can enhance the efficient management of images and other multimedia data. For example, for content-based image management, it has no support for database table oriented similarity-based operators such as the similarity-based join operation on two or more image tables. Our major contribution in this prototype is, therefore, incorporating our proposals of database table oriented similarity-based and other relevant operations. With this prototype, we will show that our proposals in Chapters 4, 5, and 6 work correctly under a generic DBMS environment. The next subsections describe how these operations internally work.

7.1.1 The Similarity-Based Comparison

Before a similarity-based comparison, every image inserted into a database table is analyzed and a compact representation of its content is stored in a feature vector, or signature column. The signature contains the global color, texture, and shape information along with their object-based location information to represent the visual attributes for the entire image. Thus, any query operation deals solely with this abstraction rather than with the image itself. Images are thus compared based on their color, texture, and shape attributes. The positions of these visual attributes in the image are represented by location. Location by itself is not a meaningful search parameter; but in conjunction with one of the three visual attributes, it represents a search scheme where the visual attribute and its location within the image are

160 EMIMS (Extended Medical Image Management System)

both important. More precisely, the signature generated by the ORDImage image analysis engine contains information about the following visual attributes:

• Color: represents the distribution of colors within the entire image. This distribution includes the amounts of each color, but not their locations.

• Texture: represents the low-level patterns and textures within the image, such as graininess or smoothness. Unlike shape, texture is very sensitive to features that appear with great frequency in the image.

• Shape: represents the shapes that appear in the image, as determined by color-based segmentation techniques. A region of uniform color characterizes a shape.

• Location: represents the positions of the shape, color, and texture components.

The feature vector data for all these visual attributes is stored in a signature, whose size typically ranges from 3 to 4 kilobytes. Thus, this feature data is small in size as compared to the image data and is easily manageable.

Images in a database table can be retrieved by similarity-based matching with a comparison image. The comparison image can be any image inside or outside the current database, a sketch, or an algorithmically generated image. Images are seldom identical, and therefore matching is based on a similarity-measuring function on the visual attributes and a set of weights for each attribute. The score is the relative distance between two images being compared. The score for each attribute is used to determine the degree of similarity when images are compared, with a smaller distance reflecting a closer match.

7.1.2 How the Comparison Works

When images are compared for similarity, an importance measure or weight is assigned by the user to each of the visual attributes and interMedia calculates a similarity measure for each visual attribute. Each weight value reflects how sensitive the matching process for a given attribute should be to the degree of similarity or dissimilarity between two images. For example, if we want color to be completely ignored in the matching, we assign a weight of 0.0 to color. In this case, any similarity or difference between the colors of the two images is totally irrelevant for the matching. On the other hand, if color is extremely important, we assign it a weight greater than any of the other attributes. This will cause any similarity or dissimilarity between the two images with respect to color to contribute greatly to whether or not the two images match.

7.2 General Architecture of EMIMS under a DBMS 161

Weight values can be between 0.0 and 1.0. During processing, the values are normalized such that they total 1.0, still maintaining the ratios supplied. The weight of at least one of the color, texture, or shape attributes must be set to a value greater than zero.

The similarity measure for each visual attribute is calculated as the score or distance between the two images with respect to that attribute. The score can range from 0.0 (no difference) to 100.0 (maximum possible difference). Thus, the more similar two images are with respect to a visual attribute, the smaller the score will be for that attribute. In reality, when images are compared for similarity, the degree of similarity depends on a weighted sum reflecting the weight and distance of all three of the visual attributes.

When images are compared, a threshold value is assigned. If the weighted sum of the distances for the visual attributes is less than or equal to the threshold, the images match; otherwise, the images do not match. The threshold value is given as a ratio and its value ranges from 0 to 100. We reflected the concept of a threshold value with the parameter ε in our similarity-based algebra in Chapter 5.

7.2 General Architecture of EMIMS under a DBMS

In order to obtain a platform and database-independent application, EMIMS was implemented in Java. EMIMS is designed to run either as an applet accessible on the Web or as a standalone application. As discussed above, EMIMS is developed on top of the Oracle 9i DBMS. Figure 7.1 presents a simplified general architecture of EMIMS. This presentation also shows how a content-based retrieval module can be integrated with the existing image data management systems.

EMIMS interacts with an Oracle 9i database through JDBC. It has two main components: the visual interface and the query manager. The visual interface includes the data-entry and the query interfaces. The details of these interfaces are given in the next Section.

Oracle 9i stores rich content in tables along with traditional data through interMedia. A server-side media parser and image processor is supported through the Oracle 9i Java Virtual Machine (JVM). The media parser has object-oriented and relational interfaces, supports format and application metadata parsing, and includes a registry for new formats and extensions. The image processor includes Java Advanced Imaging (JAI) classes and provides image-processing tools for converting, matching, and indexing images.

162 EMIMS (Extended Medical Image Management System)

The interMedia Java classes make it possible for JDBC result sets to include both traditional relational data and interMedia media objects. This support enables applications to easily select and operate on a result set that contains interMedia columns plus other relational data. These classes also enable access to interMedia object attributes and invocation of interMedia object methods.

Oracle 9i Query Processors

Traditional and Media data Repository

Image Processor JAI

Media Parser

PL/SQLCartridge

interMedia

Java Classes

Media data Repository

JVM

Oracle 9i Data Repositories

E M

I M S

O r a c l e

Visual User Interface

Query Manager

Figure 7.1: The Architecture of EMIMS under a general DBMS.

7.3 The Structure of EMIMS

In order to obtain a platform and database-independent application, we chose to implement EMIMS in Java. Figure 7.2 shows the different components and classes of EMIMS. Each component of the structure is briefly described below.

7.3 The Structure of EMIMS 163

• The EMIMS Data Entry Visual Interface: a Swing JApplet designed to enter the different types of medical data (see Section 7.5.1 for more details).

• The Query Interface: a Swing JApplet interface designed to formulate multi-criteria queries (see Section 7.5.2 for more details).

• Connection: a class that creates a connection to the Oracle 9i database server and maintains it until the user quits the program. It uses the URL of the database server, the username, the password, and the database driver path as attributes.

Oracle JDBC driver

Medical Data Entry Interface

Oracle 9i DataBase

1..11..1

1..1 1..1

Query Interface

Query Manager

Image table parameters : String Id String O String F String[] Metadata Private Connection connection

1..1 1..1

Connection

String url String user String password String database driver

Insert (image, table) SimJoin (left_table, right_tables[ ],

threshold, feature vectors) QBE (Image, table) Mine (left_table, right_table)

GetConnection() Close()

Figure 7.2: The class diagram of EMIMS in UML representation.

• The Query Manager: performs all of the query operations including the similarity-based operations through Oracle’s JDBC driver. The Query Manager uses the schematic structure M(id, O, F, A, P) for all image tables, with respect to our repository model. We described the details of this image repository model in

164 EMIMS (Extended Medical Image Management System)

Chapter 3. However, to present it in association with the implementation environment, we give a brief review of the components of this schema below:

• id: a unique identifier of the image record in the image table that serves as the primary key of the image table. Based on the requirement, the object

identifier (OID) of Oracle can as well be used. More discussion on OID’s is available in [16, 157].

• O: used to store the raw image data as an ORDImage data type of Oracle 9i. ORDImage is an extension of a BLOB data type. It has many relevant components and useful associated methods [157].

• F: the feature vector representation of the image. It uses the ORDImageSignature data type of Oracle 9i. It includes the descriptions of the image in terms of color, shape, and texture features and their location characteristics.

• A: contains an alphanumeric metadata structure that contains a semantic data related to the image or a key that associates the image with other tables.

• P: a table of vectors that keeps track of the association between image tables based on a binary similarity-based operator. It is composed of the following three components:

• id: the identifier of the record whose image is found to be similar during a similarity-based join operation.

• table: the name of the associated image table by the similarity-based operation.

• score: the similarity distance rate between two compared images. It is a real number between 0 and 100. A zero score means that the two images have the highest similarity.

In the class diagram, these components are presented as string types. This is because the Query Manager considers the attribute names of the image tables for the purpose of associating the tables for binary operations. Other forms of organization are also possible in this particular case.

The Query Manager class implements a part of the Image Query Processor. The Connection class implements the Driver Manager component of the architecture. When EMIMS is launched, an instance of the Query Manager class is created. Then, the new object

7.3 The Structure of EMIMS 165

establishes the connection with the database server by creating a Connection object. During this session, each user action that requires access to the database is translated to the DBMS using the methods of the Query Manager object. Through Oracle’s JDBC Driver Manager, EMIMS interacts with an Oracle 9i database. The main methods of the Query Manager class are:

• SimJoin: a method that performs a similarity-based join operation between a reference image table and one or more other image tables. This operation creates a new table with identical schematic structure of the reference image table. This table contains all the records of the reference image table whose image components are found to be similar with image components of the associated tables (see Section 7.5.2.3 for more details).

• Query by Example (QBE): a method that, for a given example image, searches all similar images in a specified image table. The result of this query is an image table with all pertinent records of the specified table. Furthermore, it displays all the images that are found similar. Each similar image is displayed together with a button which, when clicked, shows information associated to the displayed image (see Section 7.5.2.2 for more details).

• Insert: a method that inserts an image into an image table chosen by the user. It converts the image to an ORDImage data type, generates its signature, and updates the O and F components of the table where the image is inserted (see Section 7.5.1.2 for details).

• Mine: a method that extracts the table M2⊗εM1 from the resulting table of a similarity-based join, M1⊗εM2, using the contents of its P component. It is important to recall here that the similarity-based join, M1⊗εM2, is not symmetric and hence the above two tables are different. The Mine operator can best be exploited for the purpose of query optimization. That is, based on the costs of computing M1⊗εM2 and that of M2⊗εM1, it is possible to first compute the one with the least cost and then apply the Mine operator, if it pays to do so.

For all similarity-based operations, there are two parameters that must be specified: the threshold and feature coefficients. The feature coefficients: specify the weights to each of the visual feature attributes (color, texture, shape, and location) to be used for the similarity-based comparison. The value of each coefficient is a real number between 0 and 1.

166 EMIMS (Extended Medical Image Management System)

An image similarity engine that is integrated into the database management system as a plug-in performs the comparison between two images for similarity. This engine returns the similarity score (a real value in the interval [0, 100]) as a result of the comparison of the respective feature coefficients representing the images. If this score is lower than the specified threshold, the two images are considered similar.

7.4 The sample database used in EMIMS

Our proposals for an image data repository model, a similarity-based algebra, and the transformation rules for similarity-based query optimization are so generic that they can be integrated into a standard DBMS. Thus, they can be implemented for image management in any application domain. As a sample database for our prototype implementation we will use a medical application1.

In medicine, a large number of images of various imaging modalities (e.g., computer tomography, magnetic resonance, etc.) are produced daily and used to support decision-making. These capabilities of efficient image management can be extended to provide valuable teaching, training, and enhanced image interpretation support, by developing techniques that support the automated archiving and retrieval of images by content. For example, in medicine, before making a diagnosis, a clinician could retrieve similar cases from a medical archive and use the result for better decision-making. Such content-based retrievals would not only yield cases of patients with similar image examinations, but also cases of patients with similar image examinations and different diagnoses. In many cases, image content descriptions are used to search an image database and to determine which images satisfy a certain criterion of a query.

It is important to note here that the effectiveness of an image database management system ultimately depends on the type of image queries allowed, on the types and correctness of image content representations used, and the effectiveness of the search techniques implemented. Query formulation in an image database must thus be more flexible and convenient as compared to the query formulations allowed in the traditional command-line oriented expressions. Because we also deal with visual data in image databases, the design of visual query interfaces is indispensable. In this respect, our prototype has incorporated the

1 Since this research work is part of the MEDIGRID project, we chose to make our prototype on a

medical application.

7.4 The sample database used in EMIMS 167

necessary tools that are required for content-based image management in addition to the traditional facilities.

Our sample medical database addresses a standard medical image data management requirement. It keeps medical records of patients, personal information of medical doctors, information related to the hospital, and medical images related to a medical examination. Below, we give the schematic structure of the tables of this database. The tables that contain the medical images are organized based on our image data repository model.

− Doctor(DSN, FirstName, LastName, Specialization, P_history); information related to a medical doctor that is uniquely identified by the Doctor’s Security Number (DSN).

− Hospital(H_Code, Name, Address, Sections); information related to the hospital that is uniquely identified by the Hospital Code(H_Code).

− Medical_Exam(SSN, DSN, H_Code, ME_Code, DateOfExam, Clinical_P, Case, M_History, Findings, Diagnosis, M_Image); a detailed medical examination record identified by the primary key, Medical Exam Code (ME_Code).

− Image_Description(ME_Code, DSN, I_Specified, DeviceUsed, I_Analysis); a textual description of the image and the device used.

− Patient(SSN, FirstName, LastName, DateOfBirth, R_Address, R_History, M_History); identified by the Social Security Number (SSN) field.

All image tables in the database have the following structure:

− Mi (ID, O, F, ME_Code, Image_Path, P), where i is an index of the form 1, 2, 3, …

The ME_Code field allows to retrieve information related to an image in Mi from the other relational tables. Since for similarity-based comparison images need to be of the same category, the Mi tables are populated with only a collection of lung X-ray images. We obtained these images from a collection of lung X-ray images that is prepared for teaching purposes available on the Web1.

With this database we demonstrated how a multi-criteria query can effectively be used using our proposals. Hence, queries that combine similarity-based operators on image

1 http://www.radiology.co.uk/xrayfile/xray/cases/4/skel.htm

168 EMIMS (Extended Medical Image Management System)

contents in addition to the traditional methods can be formulated as presented in Section 7.5.2.

7.5 The Visual User Interfaces of EMIMS

The visual user interfaces of EMIMS provide mechanisms by which the user can interact with the system. The current implementation is written as Java applet and can run as standalone or can be made accessible through the Internet. The user interface allows users to graphically construct content-based as well as traditional text-based queries. We present here the two groups of visual interfaces of EMIMS: the medical data entry and the query interfaces.

7.5.1 The Medical Data Entry Interface

The medical data entry interface of EMIMS is composed of three panels: Context/Domain-Oriented data entry panel, Image Content Oriented data entry panel, and Medical Exam Data Entry panel. This form of organizing the Medical Data Entry interface is designed based on the image data model we used (See Section 3.8.1, Figure 3.14). It permits to capture all relevant information related to a medical image. Each of these data entry interfaces is described below.

7.5.1.1 The Context/Domain-Oriented Data Entry Panel

The Context/Domain-Oriented Data Entry Panel is a data entry interface that allows to enter the alphanumeric data related to the patient, the medical doctor, and the hospital (See Figure 7.3). With the help of this interface, we can insert or update the personal data related to the patient, profession-related data of the medical doctor who did the follow-up, and relevant information related to the hospital where the treatment of the patient takes place.

7.5 The Visual User Interfaces of EMIMS 169

Figure 7.3: Screenshot of the Context/Domain-Oriented Data Entry Interface.

7.5.1.2 The Image Content Oriented Data Entry Panel

The Image Content-Oriented Data Entry Panel is a visual image data entry interface used to select an image and insert it into an image table (See Figure 7.4). The “Browse Image” button on this panel allows to select an image stored externally and visualize it in a panel before inserting it to an image table. The combo-box for ‘Select Image Table’ at the left is used to select the image table into which we want to insert the selected image. This combo-box gives a choice of all available image tables. Then, the patient’s Social Security Number (SSN), the medical Doctor’s Security Number (DSN), and the Medical Exam Code (MExam_Code) associated to the image chosen are entered. The “Insert Image” button performs two tasks. A click on this button inserts the selected image into the specified image table. Furthermore, it calls the image processing method that generates the signature of the image and updates the F column of the image table. The insertion of the signature of the image in the F column permits to perform any similarity-based operation on this particular image record. There is a message box that displays textual message concerning the success or otherwise of the process involved in the data entry. For our sample database, this interface can adequately fulfill all image data entry requirements.

170 EMIMS (Extended Medical Image Management System)

Figure 7.4: Screenshot of the Image Content Oriented Panel in the Data Entry Interface.

7.5.1.3 The Medical Exam Data Entry Panel

The Medical Exam Data Entry Panel is intended to be used by medical doctors to enter the data related to the medical exam of a patient including all the cases, findings, and diagnoses associated to a medical examination (See Figure 7.5). First, the security codes of the patient and the medical doctor, the hospital code, the date of medical examination, and the medical exam code are filled. Then, the doctor can insert his investigation on the Clinical Presentation, the Case, his Findings, and the Diagnosis. If the doctor recommends that a medical image (say, for an X-ray image) of the patient be taken, he specifies the same in the lower part of the panel. This panel also contains tools that the doctor can use to browse the data entered.

7.5 The Visual User Interfaces of EMIMS 171

Figure 7.5: Screenshot of the Medical Exam Data Entry Interface.

7.5.2 The Query Interfaces of EMIMS

EMIMS has three Query Interfaces: the Iconic Query Interface, the Query by Example panel, and the Multi-Criteria Query Panel. These panels permit to formulate different types of queries. The interfaces are organized in a form that the user can choose the type of interface that suits his query requirement best. Each of these query interfaces is described below.

7.5.2.1 The Iconic Query Interface

The Iconic Query Interface is a panel used to formulate queries using an iconic interface as shown in Figure 7.6. It is migrated from the prototype MIMS [59]. It uses iconic and hypermedia-based interfaces to store and retrieve medical images. Using this interface the user can formulate queries on the external, physical, spatial, and semantic features of medical image data. Typically, it provides a support for spatial operations on salient objects based on the models introduced in [59]. The three types of spatial relations considered are:

• Metric relations: determined in terms of the proximity between salient objects that express the closeness of the objects such as: near, far, etc.

172 EMIMS (Extended Medical Image Management System)

• Directional Relations: generally determined on the basis of the direction between objects such as: right, left, north, east, etc.

• Topological relations: describe the intersection and the incidence between objects such as: disjoint, touch, overlap, etc.

As compared to our model, the spatial relations are captured in the component A of the image data repository model or in a relational table that is associated to this table by an external key.

Hence, operations are treated in the traditional manner. For example, consider the query below:

Retrieve all brain X-rays taken between 01/01/2000 and 12/31/2000 where an anomaly is positioned as the image on the screen (upper-left part of the left lobe), identified as hypervascularized tumor, with a dark gray dominant color, and an area greater than 30 mm square.

To formulate a query, the user starts by specifying the external space data (image type, acquisition date, patient name, etc.). Then, the user chooses a medical organ with an appropriate incidence so that it corresponds to the desired X-ray images. Then, the medical organ pattern appears with or without its regions depending on the user’s choice. Clicking over a Medical Region launches a process (thread) that determines compatible medical anomalies that appear as icons in a new panel. When the mouse passes over an icon, its semantic is displayed. Then, the user can easily locate the desired medical anomaly. The user can also type the anomaly’s name (or code) in a text field, where the corresponding icon is automatically displayed.

The medical anomaly icon positioning launches another process that finds and creates the appropriate spatial relations with its Medical Region. Moreover, it searches for other icons (in the region) and calculates spatial relations between them. By choosing an icon, the user can add at any moment any related content-based data (physical, spatial and semantic) or relation, and change any calculated relation.

To formulate the above query, we use our image data repository model M(id, O, F, A, P) considering that the schema of the components is given as: F(Descriptor, Model, Value), A(ES, Sem_F, R), where Sem_F and R are given as Sem_F(Type, Description) and R(ids, id, Relation). For the salient objects, we use our model S(ids, Fs, As), where Fs(Descriptor, Model, Value), As(Type, Description), and ids are as described in Section 4.3.

7.5 The Visual User Interfaces of EMIMS 173

Figure 7.6: Screenshots of the Iconic Query Panel of the MIMS user interface to formulate the above query.

The step-by-step formulation of the query illustrated visually in Figure 7.6 can be stated using the traditional SQL language as follows, where the notation IOS stands for Image-Oriented Subspace.

S10: SELECT *

FROM S

WHERE (S.As.Type = “name”)

AND (S.As.Description = “tumor”)

S11: SELECT *

FROM S10

WHERE (S10.As.Type = “State”)

AND (S10.As.Description = “Hypervascularized”)

174 EMIMS (Extended Medical Image Management System)

S12: SELECT *

FROM S11

WHERE (S11.Fs LIKE (“Dominant Color”, “RGB”, [5, 20, 135]) )

S’: SELECT *

FROM S12

WHERE (S12.Fs. >= (“Volume”, *, 30))

SELECT M.O

FROM M, S’

WHERE (M.A.ES.IOS.Type = “X-ray”)

AND (M.A.ES.IOS.Date BETWEEN [01/01/00, 12/31/00])

AND (M.A.Sem_F.Type = “Organ”)

AND (M.A.Sem_F.Description = “Brain”)

AND (M.A.R.ids = S’.ids)

AND (M.A.R.id = M.id)

AND (M.A.R.Relation = “up-left”)

7.5.2.2 The Query by Example (QBE) Interface

The Query by Example (QBE) Interface is a visual panel that is mainly used to formulate a similarity-based selection query by giving an image as a query sample as shown in Figure 7.7. The “Browse” button on this panel allows to select an example image stored externally and visualize the image in the scroll panel before executing the QBE operation. The “Threshold” text box is used to enter the value of the threshold. This value is the ε parameter that we used when we defined our similarity-based operators in Chapter 5. The combo-box at the right of it is used to select the image table from which we want to perform the similarity-based selection query. This combo-box gives a choice of all available image tables. The “Search Similar Images” button performs multiple tasks. A click on this button searches for all similar images from the selected table and displays them in the scroll panel, creates and shows the result table in another scroll panel, and displays the SQL-like1 query expression that generated this result in an editable text box. This last SQL query expression can be modified and used to modify the query in any other possible expression. Thus, our QBE has

1 We say SQL-like, because the command includes similarity-based operators that are not included

in the SQL standard.

7.5 The Visual User Interfaces of EMIMS 175

many additional interesting features that show a real integration of such types of systems with a standard DBMS contrary to many other content-based Query by Example systems in the literature [55, 86, 103, 104, 105].

Figure 7.7: Screenshot of the QBE Visual Interface of EMIMS.

As a result of a query, the image table that the QBE of EMIMS produces contains all the instances of the table whose image objects are found to be similar by a similarity-based operation. Other QBE systems display only the similar images on a visual interface. The importance of the image table created is that all information associated to the selected image is retrieved and presented in a form of an image table. This new table can be saved in the database so that it can be used for a subsequent operation or for some relevant future query. To save the result image table, the interface contains a text box to enter a table name and a button to execute the operation. Note that the creation of this table makes this selection operation satisfy a closure property like other relational operations.

When a result table is displayed, the P column is viewed as a combo box component that contains the name of the searched table, the ids of the instances that contain the images found, and the similarity score of the image. The QBE panel also displays some important

176 EMIMS (Extended Medical Image Management System)

information associated to the image found as similar from the database. To facilitate this, there are two buttons that allow to retrieve further information on the image by performing relevant relational query operations on the “Patient” and “Medical_Exam” tables of the database. A click on the “Medical Details” button will pop-up a window with all medical information associated to the image displayed. Figure 7.8 is an example result of clicking this button and displays the Medical exam details of the patient (Medical Exam Code, Date of Exam, Clinical Presentation of the patient, the Case, the Diagnosis, and the Findings) with Medical exam code 661020. With the “Patient Details” button we can display the personal information associated to the patient. Since the personal information of the patient should be secured, it will only be displayed to authorized personnel.

Figure 7.8: Screenshot of the Medical Exam Details of the patient with Medical Exam Code-661020.

The SQL-like command that created the result table is also displayed in the QBE panel. This command can be modified for re-execution. The SQL-like query that resulted with this operation is:

7.5 The Visual User Interfaces of EMIMS 177

SELECT

ORDSYS.ORDIMAGESIGNATURE.evaluateScore((SELECT QBE_TEMP.F FROM QBE_TEMP WHERE ID = 1), M1.F, 'color=1 texture=1 shape=1 location=1') AS SCORE, ID, O, F, ME_CODE, IMAGE_PATH

FROM M1

WHERE ordsys.ordimagesignature.isSimilar((

SELECT QBE_TEMP.F

FROM QBE_TEMP WHERE ID = 1), M1.F, 'color=1 texture=1 shape=1 location=1', 40.0)=1

ORDER BY SCORE

The button “Clear” in Figure 7.7 can be used to clear the SQL command displayed so that another command can be composed. The “Execute” button is used to execute or re-execute the command in the SQL command text box.

7.5.2.3 The Multi-Criteria Query Panel

The Multi-Criteria Query Panel is a panel that is used to perform operations such as Similarity-based Join, Multi Similarity-based Join, Mine, etc. (see Fig. 7.9). For a similarity-based join operation, we first specify the threshold value to be used in the “Threshold” text box. Then we choose the reference image table. Then we select the type of operator and the right image table with which the join will be performed. To execute the join operation we click on the “Go” button. As a result of this join, an image table is created. This image table contains the image records of the reference image table for which similar image(s) were found in the right table.

178 EMIMS (Extended Medical Image Management System)

Figure 7.9: Screenshot of the Similarity-based Join Query Interface.

The “ID” column of the result table displays the id’s of the instances of the reference table for whose image objects a similar image is found from the right table based on the threshold value specified.

The “O” column of the result table displays an array of buttons whose indexes are the “id” of the selected instances. A click on one of these buttons opens a pop-up window that displays the selected image in the reference table and all its similar images in the right join table(s) (see Fig. 7.10). As in the QBE panel, each displayed image is associated with two buttons (“Patient Details” and “Medical Details”) that allow retrieving further information associated to the displayed images. Figure 7.8 shows the pop-up window that is displayed by a click to the “Medical Details” button.

7.5 The Visual User Interfaces of EMIMS 179

Figure 7.10: Screenshot of the pop-up window displayed by a click to the button “view image 3” in the result table of the similarity-based join.

The “P” column of this table is displayed as a combo box. Clicking on the combo box shows a list of vectors: identifier, table name, and similarity score (see Fig. 7.11). The components of this vector are:

• identifier - the identifier of the image record in the right join table that contains the image object selected by the similarity-based operation,

• table name - the name of the right join table, and

• similarity-score, the similarity score between two compared images. This is determined by the search engine that is integrated as a plug-in the database system.

Each record on the list corresponds to an instance of a similar image in the table indicated as a result of the similarity-based join.

180 EMIMS (Extended Medical Image Management System)

Figure 7.11: The content of P for the instance with “id = 6” in the result table of the join M1⊗εM2 of Fig. 7.9.

Once a similarity-based join, for example M1⊗εM2, is performed, the Mine operator can be applied to extract the converse of the join, M2⊗εM1. The result table of the Mine operation is extracted from the P component of M1⊗εM2. The reason of applying the Mine operator is discussed in Chapters 5 and 6. This result table is displayed in a panel below the result table of its corresponding join (see Fig. 7.12). The upper part of the table in Figure 7.12 is generated by the similarity-based join, whereas the lower part is generated by the Mine operator.

Figure 7.12: Screenshot of the Symmetric Similarity-based Join with P component displayed.

7.6 Summary 181

The displayed result tables can be saved and be made part of the database system for further operations. The panel contains a text box to enter a table name and a button to perform the creation of the tables (Fig. 7.15).

Furthermore, as in the QBE panel, for each similarity-based join query, the corresponding SQL statement is displayed in an SQL command text box (Fig. 7.15). This SQL statement can be edited and re-executed for another query. The SQL query for the result table displayed in Fig. 7.12, M1⊗20M2, is:

SELECT

ORDSYS.ORDIMAGESIGNATURE.evaluateScore (M1.F, M2.F, 'color=1 texture=1 shape=1 location=1') AS SCORE, M1.ID, M1.O, M1.F, M1.ME_CODE, M1.IMAGE_PATH, M2.ID

FROM M1, M2

WHERE ORDSYS.ORDIMAGESIGNATURE.SimJoin(M1.F,M2.F,v'color=1 texture =1 shape=1 location=1',20.0) = 1 ORDER BY M1.ID, SCORE

The interface also contains a message box. The process of the query and possible error messages are displayed inside this box.

7.6 Summary

In this Chapter, we presented our prototype EMIMS that is developed on top of an Oracle 9i DBMS. We first explained the detailed techniques involved in similarity-based image retrieval, particularly the way it is dealt under the interMedia component of Oracle 9i. We then presented a simplified general architecture of EMIMS in association with the structure of Oracle 9i. Through JDBC, EMIMS interacts with Oracle 9i database to perform its image management tasks. EMIMS has two main components: the Query Manager and the Visual Interface.

The Query Manager is responsible for making a link between the visual interface and the database components. The Query Manager handles similarity-based and other relevant operators on any image table of structure M(id, O, F, A, P). These operators include the SimJoin (similarity-based join operator), the Query by Example (QbE), the insert operator (to insert an image into an image table), and the Mine operator. For all similarity-based operators, it uses the two parameters: the threshold (that defines the maximum similarity

182 EMIMS (Extended Medical Image Management System)

distance between two similar image signatures) and the feature coefficients (that specify the weight given to each visual feature attribute - color, texture, shape and location).

The Visual Interface includes the Data-Entry and the Query interface. The Data Entry Interface is composed of three panels: the Context/Domain-Oriented panel (used to enter data related to an image), the Image-Oriented panel (used to insert an image from a file into a table), and the Medical Exam panel (used to enter the medical exam data related to the patient). The EMIMS Query Interface contains three panels: the Iconic Query Interface (used to formulate a query using the iconic interface of MIMS), the Query by Example (QbE) panel (used to perform similarity-based selection operations), and the Multi-Criteria Query panel (used to formulate complex queries involving visual content, semantic, and context of images).

Our QbE operates as a database oriented selection operation and has many new features compared to other CBIR and the traditional image database systems. A QbE query:

• produces an image table that contains all the image records selected by the similarity-based selection operation. The creation of this table makes this selection operation satisfy a closure property like other relational operations,

• displays the result table, where the P column is viewed as a combo box that contains the name of the searched table and the similarity score of the image found,

• displays all the images of the result table with associated or relevant alphanumeric data and two buttons that allow to retrieve further information on the image by performing a relational selection, and

• displays the SQL-like command associated to the query formulated. The command can then be edited and re-executed.

Furthermore, the result table of a QBE query can be saved (i.e., a new table with the content of the result can be created) and be made part of the database for further operations.

The Multi-Criteria Query panel of EMIMS facilitates the formulation of a similarity–based join operation. It also includes many new features as a result of the characterstics of our algebra. With the help of this panel:

• we can formulate a query expression that is composed of similarity-based and relational operators. This is a new feature of our system and a result of the proper formulation of our similarity-based algebra,

• the Mine operator can be applied on a result table of a join to compute the symmetric similarity-based join with lesser cost, and

7.6 Summary 183

• we can formulate a query with a multi similarity-based join operation, which is not considered by other systems as far as our knowledge is concerned.

Moreover, as in the QbE panel, in the Multi-Criteria Query Panel: • the result of a query can be presented in an image table with a P component, where

the P column of this table is displayed as a combo box. Clicking on the combo box shows a list of vectors (table name, identifier, and similarity score), where each vector corresponds to one similar image. This is a very compact form of presenting a result in addition to its convenience for further operations.

• the result table of the query can be saved (i.e., a new table with the content of the result can be created) and be made part of the database for further operations (with no dynamic maintenance of the P component with the modifications of the query table), and

• the SQL-like command that created the result is displayed and hence, the query can be re-formulated and re-executed.

In conclusion, with the help of the EMIMS prototype, we tried to demonstrate the feasibility and usefulness of the image data repository model, the similarity-based algebra we developed, and the similarity-based query optimization methods we proposed in this thesis with practical application in the domain of Medical image data management.

Furthermore, this prototype is part of the generic toolbox developed in the SiCOM project for managing medical data repositories. A research and development action is presently underway to implement a distributed image retrieval system for medical grids.

184 EMIMS (Extended Medical Image Management System)

CHAPTER 8

DISCUSSION

There had been many efforts to manage images with the help of the traditional methods in different application domains for the last two decades. However, the complex nature of images had always created limitations to adequately manage them with traditional tools. The relational data model, algebra, and query language were not designed to handle the complex features of images adequately. In this regard, we demonstrated the limitations of the relational system to manage operations on the visual features of images with example queries in Chapter 2.

The other domain that is actively involved in the treatment and management of images are computer vision. However, the management of images using the visual features alone cannot tell enough about an image, since some semantic descriptions and contextual interpretations are always attached to an image. Hence, these systems also cannot fully respond to the requirements of image data management.

Therefore, in this thesis, we proposed a system with a hybrid means of managing image content and image related data. Among the many important issues we addressed in order to realize an image management system that can support multi-criteria queries are: the design of convenient image data repository model, the development of a novel similarity-based algebra, the study of the properties of the operators in our algebra, and the proposals for algebraic similarity-based query optimization. Moreover, our prototype EMIMS clearly demonstrated the practicability of the proposals we made and the effectiveness of our query optimization proposals.

186 Discussion

8.1 The Data Repository Model and the P component

To manage images and image related data under a DBMS environment, a suitable repository model is required. Such a model should not serve just as a repository, but also needs to be convenient to perform the necessary operations based on the features of the data it contains. Earlier methods of handling images under a DBMS treat images in the traditional manner using relational tables and keeping images as BLOBs or as external files and hence do not support non-relational operations. Hence, operations on the visual features of images were not supported. Even the recent moves made by the popular DBMSs to keep images as components can only keep images as binary objects, and do not address the issue of performing similarity-based operations on the visual features of images. Thus, complex operations on image content and image related data were not possible or were very limited. Moreover, these systems do not consider the requirements of managing salient objects.

The image data repository models we proposed in this thesis enable us to capture the image, its low-level features, and its high-level descriptions. The object-relational model that we have chosen allows us to adequately exploit the data management facilities of ORDBMSs. The novel component P introduced in our image repository model permits us to conveniently manage intermediate image tables and query result tables in an optimized manner in terms of storage requirements, the requirements of the similarity-based operations. Our salient object repository model permits us to capture relevant information associated to salient objects adequately. Moreover, we can keep track of spatial relationships between salient objects and/or a salient object and the image that contains the object. It can then facilitate both traditional and similarity-based operations on salient objects. Using our prototype, EMIMS, we verified how effectively this repository models could be used.

8.2 The Similarity-Based Algebra and the similarity-based operators

Research on image retrieval algebras has so far been limited. The most common image retrieval systems are only of type query by example or by sketch. These systems do not provide a functional or algebraic abstraction to enable the user to formulate a specific request. The few works done in this area are far from a concise algebraic framework as compared to the relational algebra, which was the basis for its great success of RDBMS. As more and more new applications, involving digital images with a requirement of content-based retrieval, are emerging in various domains, the need for an algebraic framework for the requirements regarding similarity-based queries on the visual features of images is becoming

Discussion 187

very critical. It is, therefore, very important to think of an algebra based on which the similarity-based operators will work. A good design of such an algebra permits to formulate multi-criteria queries on the complex features of images. Furthermore, it facilitates the adequate usage of the model, an effective use of query optimization, and a convenient formulation of multi-criteria queries on images.

We therefore, addressed this important problem and proposed a novel system of algebra, the similarity-based algebra. Our algebra considers the features of image search engines currently available either as stand-alone or integrated as add-on modules in most DBMSs. It extends the one-to-many image query systems developed in many prototypes or commercial systems to full database operators. The operators are defined in such a way that they can be used in association with the relational operators in an object-relational data model. This feature permits us the formulation of multi-criteria queries on images and their salient objects based on their low- and high-level features. We demonstrated, using the EMIMS prototype, the possible formulations of complex (multi-criteria) queries involving both relational and similarity-based operators. However, there is still work that needs to be done further. For example, due to the limitations on the image processing tool we used, image segmentation and extraction of salient objects was not possible. As a result, operations on salient objects were not separately tested. Properties of the operators on such objects are also to be explored further.

8.3 The Methods of Query Optimization on the Similarity-Based Operators

Most of the popular RDBMSs incorporate a module that is designed to handle query optimization. This module is usually designed based on a certain query optimization strategy. Most of the query optimization strategies in a relational system are based on the properties of the algebraic operators. As a result of this module, queries are treated in an optimal manner and query results are obtained with the minimum possible execution time. This module is very important in many cases and we have to consider optimization strategies.

Queries involving similarity-based operators are more computing intensive than the query operations in relational systems. This is because operations in relational systems involve only exact matching. Queries involving similarity-based operators, on the other hand, deal with non-exact matching and results depend on the threshold value ε. Therefore, there is a need for query optimization in a system of similarity-based algebra.

188 Discussion

We defined the similarity-based operators in such a way that they possess useful properties that can be exploited for query optimization. We also introduced operators, such as the Mine operator, to particularly serve the query optimization process. We specifically considered algebraic query optimization methods based on which we proposed useful transformation rules for the similarity-based operators. The experiments we conducted show good performance on queries involving similarity-based join operators. The advantage of the Mine operator in enhancing the performance of a query involving the similarity-based join and the symmetric similarity-based join operators is clearly demonstrated with our experiments. It is important to note that, though our work is so useful to similarity-based optimization, formalizing and implementing a fully operational optimizer is a very complex task constitutes a big research subject by itself. Our goal in this Chapter was to make a first investigation of the optimization possibilities associated to our algebraic operators. Further work in this regard includes development of a cost model using range query, development of an optimizer, choice or design of a good indexing scheme, etc.

8.4 Practical Demonstrations

There are many prototypes designed for content-based image retrieval. As discussed in Chapter 3, though each of these systems has its own characteristics, they also have some features in common. Their common features are that they perform a Query by Example form of retrieval and as a result of a query, they display the images found to be similar. However, none of the prototypes present a result of a query as a table containing all the relevant information, and none of them try to join two or more tables to get a table as a result.

With the help of our prototype, we showed that operations on image tables are possible and the result of such an operation can be an image table. With this we are able to conserve the closure property of our similarity-based operators, satisfying the basic rule of algebra. However, it is important to mention that this is a result of the good features of our image data repository model and the good design of the similarity-based algebra. We showed the effective usage of the component P of our model. The introduction of P has multiple advantages. Most importantly, the use of P permits us to optimize memory and storage requirements of the result tables, facilitates the use of the Mine operator, and enables to keep data in a table that can be used for further operations.

Moreover, using the EMIMS prototype, we showed how effectively such a system with similarity-based operators can fulfill practical requirements of applications in the

Discussion 189

medical domain. Though we choose this domain of application, our model and the similarity-based algebra are so generic that they can be applied to other domains of applications.

It is important to mention that, though our prototype shows very well the practicability of our different proposals, there are still many works that can be done in this regard. This includes the design of techniques to include user preferences regarding image processing tools, the design of user interfaces that can be adaptable based on user preferences, the technique of security and integrity regarding content-based image management, etc. However, formalizing and implementing a fully operational optimizer is a very complex task constitutes a research subject by itself and will be beyond the scope of this PhD work. Our goal in this Chapter was to make a first investigation of the optimization possibilities associated to our algebraic operators.

190 Discussion

CHAPTER 9

CONCLUSION AND PROSPECTIVES

The need for an efficient image data management system is an important issue in many application areas. We assessed the increasing importance of images in our daily life in Chapter 1. In this regard, the experience gained in managing alphanumeric data in the last three decades is so important for the critical requirement of managing image data. We believe that developing a new and separate system to deal with content-based image data management will not be so effective, since image data management also has a lot to do with the techniques of the existing textual and alphanumeric data management. Nevertheless, to fully satisfy the needs of image data management, traditional systems lack to address the issues of complex data and content-based retrieval requirements.

In this regard, we studied and used the existing methods in both domains (computer vision and DBMS) and proposed many novel concepts for a hybrid system of managing data under an OR paradigm.

192 Conclusion and Perspectives

RDBMmanag

a resul

Figure 9.1: A System with a Multi-Criteria query of images.

RDBMS:

o incomplete representation of images, o subjective descriptions, o time consuming.

• metadata to describe/retrieve images: (practiced for about two decades)

operations on key-word matching (relational) but,

A

Computer Vision:

o similarity-based (non-exact matching). o current approach is one to many

• promising CBIR methods: (practiced for about decades)

automatic extraction of content, description: by color, texture, shape, etc., features. but,

B

A System with Multi-Criteria Query on Images:• uses OR model, • convenient image data repository model, • based on a novel similarity-based algebra, • query optimization possible.

C

As shown in Fig. 8.1, we brought together the good features of the two domains - Ss (A of Fig. 8.1) and Computer Vision (B of Fig. 8.1) - for the purpose of effectively

ing images and image related data.

In this thesis, we conducted adequate research work to address these deficiencies. As t, the major contributions of our work are the following:

We identified the important components of image data that need to be considered to fully represent and describe an image and its related data. We proposed a novel schematic data repository model that is convenient to be used in the management of images under the context of an OR paradigm. This gives the way for an efficient support of image data in the existing DBMSs that facilitates the image data management requirements with the techniques that are already available for alphanumeric data management. Thanks to the P component of our model, it is possible to track associations between two or more image tables during binary and more complex similarity-based operations. With this model, a multi-criteria query that

Conclusion and Perspectives 193

considers the different features (content, semantic, and context-oriented) of images can be formulated for a wide area of applications.

• We defined novel similarity-based and other subordinate operators that enable us to perform similarity-based operations on the visual features of images, which are not supported by the traditional systems that use only exact matching. In an efferent to make these operators effectively usable, we developed an algebra, the similarity-based algebra. Furthermore, we studied the algebraic properties of these operators.

• We investigated the way the new similarity-based operators can be used in association with the existing relational algebra and we defined convenient rules for compound operations under an OR model.

• We identified algebraic rules and implementation heuristics that can be useful for query optimization.

• We introduced a novel operator called “Mine”, that can facilitate query optimization. We further evaluated the effectiveness of our similarity-based operators and the efficiency of the mine operator for query optimization.

• We developed a prototype called EMIMS with convenient visual user interfaces in order to demonstrate the validity of our proposals. With EMIMS, we developed a visual multi-criteria query interface that enables to formulate a combination of the traditional query operators and the similarity-based operators. EMIMS has practically demonstrated the functionality of our similarity-based operators, and

• We demonstrated the effectiveness of our proposals with a practical application in the area of medical image data management.

Future work in this domain of research includes: • the design and study of methodologies for an open architecture to integrate

application oriented image processing tools in a way that can support the proposals we made for effective complex queries on image data,

• the design and development of heuristics for similarity-based query optimization and a proposal on the way that it can be incorporated into the existing popular DBMSs,

• the integration of similarity-based operations on salient objects and the extension of this method to a generic multi-criteria query on image database applications,

• the extension of our approach to other media data, such as audio and video, and

194 Conclusion and Perspectives

• the development of appropriate indexing techniques for visual data, since the traditional approaches of indexing are quite different from the requirements of content-based visual indexing in a way that can be integrated into the existing DBMSs.

ANNEX: COST MODEL The common content-based image retrieval operation to date is of type: given an image, find its similar images from a set of or a database of images. The complexity of content-based image representations has initiated the use of multidimensional indexing schemes [30]. A multidimensional index structure makes it possible to access only fewer index pages in order to increase the efficiency of content-based image retrieval. Existing index structures for high dimensional feature spaces can be classified into Data Partitioning (DP)-based and Space Partitioning (SP)-based structures [28].

Based on a simple and straightforward nested-loop implementation of similarity-based join using the range-search, we develop a cost model that works with DP-based index structures and that uses a hierarchical index structure. The straightforward method for performing a similarity-based join is to directly apply the algorithm of a range query for each object of the left input table M1 as a query object looking for its similar objects from the right input table M2. The metric used is the number of required disk accesses for performing the similarity-based join. This is a common metric in use in related works [34, 35].

We assume that the objects of the two operand image tables M1 and M2 follow the same object distribution function over a normalized object space of [0,1)d (d is the dimension of the feature vector) and that a DP-based index structure is used. The impact of a concrete distribution function is discussed after the general cost considerations. The notations used here are shown in the table below.

196 Annex

Symbols Notations M1, M2 operand image table

d dimension of the feature vectors in M1 and M2

N1, N2 Number of data points in M1 and M2 respectively

m1, m2 Number of leaf nodes in the index structure of M1 and M2 respectively ko1 kth object in the index structure of M1ix2 length vector of the ith leaf node Li of M2, i.e. = (ix2

ix21, … ix2

d) (1 ≤ k ≤ d)

Ceff average number of data points in one leaf page of M2jVε hyper-volume of the hypersphere , with radius ε for a dimension 0 ≤ j ≤ d jSε

DA( ) ko1 expected disk accesses for the range-search of an object in Mko1 2

DAtotal expected total number of disk accesses

The expected number of disk accesses for a similarity-based join is the sum of

expected disk accesses for the range search of all objects of M1 (1 ≤ k ≤ N1) in M2. The

sum of disk accesses of one object of M1 is the number of leaf partition nodes of M2 that

intersects the hypersphere containing the neighbors for within the distance of ε.

ko1

ko1

dSεko1

In order to compute the number of leaf partition nodes of M2 that intersects , the

Minkowski Sum of and the leaf nodes is determined. The Minkowski Sum corresponds

graphically to the volume of an area that results from moving the center of over the surface of the bounding box of one leaf node. Summing up the Minkowski Sum for each leaf node results in the expected number of disk accesses.

dSε

dSε

dSε

197

Figure C-1: Minkowski Sum in a 2-dimensional space

For instance, consider figure 6 for the Minkowski Sum in a 2-dimensional space. One

leaf node Li of M2 is a rectangle with length vector =( 1, 2). The query circle with radius ε is the region, which includes the neighbors of the query object. The number of disk accesses for the query object is the sum of the probabilities for intersection with all leaf

partitions. The probability that intersects one leaf partition Li corresponds graphically

(see Figure C-1 to an area which results from moving the circle around the leaf partition Li.

ix2ix2

ix22εS

2εS

2εS

The volume of this area, i.e. the probability that intersects Li, is: 2εS

volume(Li)+perimeter(Li) . ε + . 2εV

Summing it up to the index structure with m2 index nodes, we obtain the expected

number of disk accesses DA( ) for the range-search of an object which expresses in the 2-dimensional case as:

ko1ko1

.....)().()( 2222

2

1 222

1

2121

επεεε ε +++=++ ∑∑ ==iiim

iim

ixxxxSvolumeLiperimeterLivolume

198 Annex

Expressing the last line in terms of hyperspheres of dimensions lower than d, we

obtain a slightly modified formulae that corresponds exactly to the Minkowski Sum of :

jSε

dSε

...... 21

21

20

22

1 2

2121

εεεε π SSxSxSxx iiim

ii +++∑ =

In the general case, DA( ) computes then as : ko1

(1) { }{ }∑∑∑

⎭⎬⎫

⎩⎨⎧∈

−==

= diij xxPowerSetyyjdjd

j

m

iSyyvolume

21

21 ,,,,

11

2

1k1 .,(.)DA(o

LLL ε ,

where,

(a) ,)12/(

jn

j

nS επ

ε ⋅+Γ

=

(b) volume({y1, … , yj}) = , kjk y1=∏

In order to obtain a cost model for high-dimensional spaces, we have to consider the so-called boundary effects (i.e. the perimeter of the bounding box of one leaf node of M2 is partially outside the data space [0,1)d and does consequently not contribute to the probability that this leaf node intersects the query object [34]. The main idea to deal with this problem is to attribute significance to the dimensions. i.e. a dimension is significant if its part in the Minkowski sum contributes to the overall probability. It has been shown by Berchtold et al. in [34] that only the first d' dimension are significant with

⎥⎥⎥

⎢⎢⎢

⎡= )(log' 2

2effC

Nd.

Consequently, equation (1) may be rewritten as follows in order to suit to boundary effects.

(2) ∑ ∑= =

= 2

1

'

11 )( m

i

d

jkoDA { }( ){ }

jdxxPowerSetyy

j Syyvolumediij−

⎭⎬⎫

⎩⎨⎧∈

⋅⋅∑ ',,,,

1'

21

21 ,, εLL

L

199

We can now express the expected total disk accesses, DAtotal. For each object of M1 we compute the neighbors within distance ε in M2. We can obviously assume that for two objects located in the same leaf node of M1, the leaf node is still in the cache. Therefore DAtotal computes as:

io1

(3) ∑ =

+= 1

1 11 )(N

kk

total oDAmDA

It is a common use in relational databases to estimate uniform object distribution, for instance, for attribute values in order to compute the join selectivity. We will adopt this principle here. However, we are aware that this gives only a rough estimation, as many applications do not follow an uniform distribution [34, 35]. Uniform object distribution means that the object points follow a random distribution in the normalized object space of [0, 1)d.

The number of leaf nodes in the index structure of M2, m2, is computed by fixing an index page size of B Bytes (typically 4096 Bytes). This implies an effective capacity of

VdBCeff .

= data objects per data page, supposing that the value in each dimension is stored in

V Bytes. Consequently, BNVdm 2

2..

= and

⎥⎥

⎤⎢⎢

⎡⎟⎠⎞

⎜⎝⎛=

BVdN

d..

log' 22

.

Finally, the length vector of all leaf node in M2 is (1/2, …, 1/2), i.e. ix2

21, 2)1( =∀ ≤≤

kidkk x . Respectively, m1 is computed as B

NVdm 11

..=

, supposing the same dimension feature vectors, as well as its representation in the same index structure. Equation (3) can then be reduced to a simplified version (using N1 and N2) for a uniform object distribution as:

(4) ∑ =−⋅⋅⎟⎟

⎞⎜⎜⎝

⎛⋅+=

'

1'

21 ))

21(

'1( d

jjdj

efftotal S

jd

NCNDA ε

200 Annex

REFERENCES

[1] V. S. Subrahmanian. Principles of Multimedia Database Systems. San Fransisco, California: Morgan Kaufmann Publishers Inc., 1998, 442 p., ISBN 1-55860-466-9.

[2] J.K.Wu, A.D. Narasimhalu, B.M. Mehtre, C.P. Lam and Y.J. Gao. CORE: A Content-Based Retrieval Engine for Multimedia Information Systems, Multimedia Systems, 1995, vol.3, p. 25-41.

[3] H. Kosch, L. Brunie and W. Wohner. From the Modeling of Parallel Relational Query Processing to Query Optimization and Simulation, Parallel Processing Letters, 1998, vol. 8, no. 1, p. 2-14.

[4] S. Berchtold, C. Boehm, B. Braunmueller, D. A. Keim and H. P. Kriegel. Fast Parallel Similarity Search in Multimedia Databases, SIGMOD, AZ,USA, 1997, p. 1-12.

[5] J.M. Hellerstein and M. Stonebraker. Predicate Migration:- Optimizing Queries with Expensive Predicates, SIGMOD, Washington DC, USA, 1993, p. 267-276.

[6] Yoshitaka and T. Ichikawa. A Survey on Content-Based Retrieval for Multimedia Databases, IEEE Transactions on Knowledge and Data Engineering, 1999, vol. 11, no. 1, p. 81-93.

[7] V.Oria, M.T. Özsu, L. Liu, X. Li, J.Z. Li, Y. Niu and P.J. Iglinski. Modeling Images for Content-Based Queries: The DISMA Approach, VIS'97, San Diego, 1997, p. 339-346.

[8] S. Adali, P. Bonatti, M. L. Sapino and A. S. Subrahmanian. A Multi-Similarity Algebra, SIGMOD, Seattle, Washington, USA, 1998, p. 402-413.

[9] S. Berchtold, C. Boehm, B. Braunmueller, D. A. Keim and H. P. Kriegel. A Cost Model for Nearest Neighbor Search in High-Dimensional Data Space, ACM PODS, Arizona, 1997, p. 78-86.

[10] Jian-Kang Wu. Content-Based Indexing of Multimedia Databases, IEEE TKDE, 1997, vol. 9, no. 6, p. 978-989.

202 References

[11] T. Seidl, H.P. Kriegel. Optimal Multi-Step k-Nearest Neighbor Search, SIGMOD, WA, USA, 1998, p. 154-165.

[12] Y. Rui, T.S. Huang and S.F. Chang. Image Retrieval: Past, Present, and Future, Journal of Visual Communication and Image Representation, 1999, vol. 10, p. 1-23.

[13] G.R. Hjaltason and H. Samet. Ranking in Spatial Databases, Springer-Verlag, Berlin, 1995, Proceedings of the 4th International Symposium on Large Spatial Databases - SSD'95, p. 83-95.

[14] M. Stonebraker and P. Brown. Object-Relational DBMSs : tracking the next great wave. San. Francisco: Mogan Kaufmann Pub. Inc, 1999, 287 p, ISBN 1-55860-452-9.

[15] Excalibur Image Datablade Module User's Guide, Informix Press, March 1999, Ver. 1.2, P. No. 000-5356.

[16] Oracle8i Visual Information Retrieval Users Guide and Reference, Oracle Press, 1999, Release 8.1.5, A67293-01.

[17] S. Atnafu, L. Brunie and H. Kosch. Similarity-Based Algebra for Multimedia Database Systems, 12th Australian Database Conference (ADC-2001), IEEE Computer Society Press, Gold Coast, Australia, Jan. 29 – Feb. 1, 2001, p. 115-122.

[18] William I. Grosky. Managing Multimedia Information in Database Systems, Communications of the ACM, 1997, vol. 40, no. 12, p. 72-80.

[19] William I. Grosky and Peter L. Stanchev. An Image Data Model, Advances in Visual Information Systems, Visual 2000, LNCS 1929, Springer, 4th International Conference, Lyon, France, 2000, p. 14-25.

[20] Chaudhuri S. and Gravano L. Optimizing Queries over Multimedia Repositories, SIGMOD'96, Canada, ACM Press, p. 91-102.

[21] John P. Eakins and Margaret E. Graham. Content-Based Image Retrieval: A report to the JISC Technology Applications Programme, Inst. for Image Data Research, Univ. of Northumbria at Newcastle, January 1999.

[22] T. Seidl and H.P. Kriegel. Efficient User-Adaptable Similarity Search in Large Multimedia Databases, VLDB'97, Athens, Greece, p. 506-515.

[23] A.W.M. Smeulders, T. Gevers and M.L. Kersten. Crossing the Divide Between Computer Vision and Databases in Search of Image Databases, Visual Database Systems Conf., Italy, 1998, p. 223-239.

203

[24] José M. Martinez. MPEG-7 Overview, Approved Version 8.0, ISO/IEC JTC1/SC29/WG11, Klangenfurt, July 2002.

[25] Nick Koudas and Kenneth C. Sevcik. High Dimensional Similarity Joins: Algorithms and Performance Evaluation, IEEE Transactions on Knowledge and Data Engineering, 2000, vol. 12, no. 1, p. 3-18.

[26] Amit Sheth and Wolfgang Klas. Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media. San Francisco: McGraw-Hill, 1998, 384 p.

[27] S. Atnafu, L. Brunie and H. Kosch. Similarity-Based Operators in Image Database Systems, 2nd International Conference on Advances in Web-Age Information Management, WAIM'2001, Xi'an, China, p.14-25.

[28] K. Chakrabarti and S. Mehrotra. The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces, ICDE 1999, Sydney, Australia, p. 440-447.

[29] Antonin Guttman. R-trees: A dynamic index structure for spatial searching, ACM SIGMOD, Boston, MA, USA, June 1984, p. 47-57.

[30] S. Berchtold, Daniel A. Keim and Hans-Peter Kriegel. The X-tree: An Indexing Structure for High-Dimensional Data, VLDB, Bombay, India, Sept. 1996, p. 28-39.

[31] R. Kurniawati, J.S. Jin and J.A. Shepherd. The SS+-Tree: An Improved Index Structure for Similarity Searches in a High-Dimensional Feature Space, Proceedings of the SPIE storage and Retrieval of Image and Video Databases, San Jose, CA, Feb. 1997, p. 110-120.

[32] P. Ciaccia, M. Patella and P. Zezula. M-Tree; An Efficient Access Method for Similarity Search in Metric Spaces, VLDB, Athens, Greece, August 1997, p. 426-435.

[33] G. Evangelidis, D. Lomet and B. Salzberg. The hBp tree: A multi-attribute Index supporting Concurrency, Recovery and Node Consolidation, VLDB Journal, 1997, vol. 6, no. 1, p. 1-25.

[34] S. Berchtold, C. Bohm, D. A. Keim and H.P. Kriegel. A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space, Proceedings of the ACM PODS Conference, Tucson, Arizona, May 1997, p. 78-86.

[35] J.H. Lee, G.H.Cha and C.-W. Chung. A Model for k-Nearest Neighbor Query Processing Cost in Multidimensional Data Space, Journal of Information Processing Letters, 1999, vol. 69, no. 2, p. 69-75.

204 References

[36] Remco C.Veltkamp, Mirela Tanase. Content-Based Image Retrieval Systems: A Survay, Technical Report UU-cs-2000-34, Department of Computer Science, Utrecht University, October 2000.

[37] S. Atnafu, L. Brunie and H. Kosch. Similarity-Based Operators and Query Optimization for Multimedia Database Systems, Proceedings of the International Database Engineering & Applications Symposium (IDEAS), IEEE Computer Society, Grenoble, France, July 2001, p. 346-355.

[38] Alberto Del Bimbo. Visual Information Retrieval, San Fransisco, California: Morgan Kaufmann Publishers Inc., 1999, 270 p, ISBN 1-55860-624-6.

[39] V. Oria, M.T. Özsu, P. Iglinski, B. Xu and L.I. Cheng. DISMA: An Object Oriented Approach to Developing an Image Database System, ICDE, , San Diego, California, 16th Int. Conf. on Data Engineering, Feb. 2000, p 672-673

[40] Sun Microsystems, Inc., JDBC Data Access API, 2002, Available on line: http://java.sun.com/products/jdbc/ (Consulted on April 8, 2003).

[41] Venkat N. Gudivada, Vijay V. Raghavan. Modeling and retrieving images by content. Information Processing and Management, 1997, vol.33, no.4, p.427-452.

[42] J. Duncan and N. Ayache. Medical Image Analysis: Progress over two decades and the challenges ahead. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, vol.22, no.1, p.85-106.

[43] Young Rui, Efficient Indexing, Browsing and Retrieval of Image/Video Content, A PhD. Thesis, University of Illions at Urbana-Champaign, 1999.

[44] Harald Kosch and Solomon Atnafu. A Multimedia Join by the Method of Nearest Neighbor Search; Information Processing Letters (IPL), Elsevier Press, June 15 2002, vol.82, Issue 5, p. 269-276.

[45] J.R. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R. Jain, and C.F. Shu. The virage image search engine: An open framework for image management. In Proceedings of SPIE, Storage and Retrieval for Still Image and Video Databases IV, San Jose, CA, USA, February 1996, p.76-87.

[46] W.Y.Ma and B.S.Manjunath. NeTra: a toolbox for navigating large image databases, Multimedia Systems, Springer-Verlag, Berlin, Germany, May 1999. vol.7, no.3, p.184-98.

205

[47] P. Wu, B. S. Manjunath, and H. D. Shin. Dimensionality Reduction for Image Search and Retrieval, Conference on Image Processing International Conference on Image Processing (ICIP 2000), Vancouver, Canada, September 2000, p 726-729.

[48] Euripides G.M. Petrakis. Content-Based Retrieval of Medical Images, International Journal of Computer Research, Number 2, 2002, vol. 11, p. 171-182.

[49] Euripides G.M. Petrakis and Christos Faloutsos. Similarity Searching in Medical Image Databases, IEEE Transactions on Knowledge and Data Engineering, May/June 1997, vol. 9, no. 3, p. 435-447.

[50] Myron Flickner, Harpreet Sawhney, Wayne Niblack, et al. Query by Image and Video Content: The QBIC System, IEEE Computer Society , September 1995, vol. 28, no. 9, p. 23-32.

[51] Pariti Mishra and Margaret H. Eich. Join Processing in Relational Databases; ACM Computing Surveys, March 1992, vol. 24, no. 1, p. 63-113.

[52] Yannis E. Ionnidis and Younkyung Cha Kang. Randomized Algorithms for Optimizing Large Join Queries; Proceedings of the ACM SIGMOD Int. Conf. On Management of Data; Atlantic City, NJ, May 23 - 25, 1990, p. 312-321.

[53] S. Belongie, C. Carson, H. Greenspan, and J. Malik. Color- and texture-based image segmentation using the expectation-maximization algorithm and its application to content-based image retrieval. In Proc. International Conference on Computer Vision, 1998, p. 675–682.

[54] J. Ashley,R. Barber,M. Flickner,J. et al. Automatic and semi-automatic methods for image annotation and retrieval inQBIC. Proceedings of SPIE-Storage and Retrieval for Image and Video Databases III, 1995, vol. 2420, p. 24-35.

[55] W. Niblack, R. Barber, W. Equitz, et al. The QBIC project: Querying images by content using color, texture, and shape. In Storage and Retrieval for Image and Video Databases, volume SPIE, February 1993, vol. 1908, p. 173-187.

[56] Remco C. Veltkamp, Mirela Tanase. Content-Based Image Retrieval Systems: A Survey; Technical Report UU-CS-2000-34, October 2000.

[57] Eakins, J P. Automatic image content retrieval - are we getting anywhere? in Proceedings of Third International Conference on Electronic Library and Visual Information Research (ELVIRA3), De Montfort University, Milton Keynes, May 1996, p. 123-135.

206 References

[58] Database Plug-in for Oracle Administrator’s Guide. San Mateo, CA: Virage, Inc., 1999, 94402-3121.

[59] Richard Chbeir. Modèle de Description d’Images : Application au Domaine Médical, Thèse de Doctorant, INSA de Lyon, France, Décembre 2001.

[60] Traina Jr. C. Traina, A.J.M. Ribeiro, R.A.; Senzako, E.Y. Content-based Medical Images Retrieval in Object Oriented Database, Proceedings of 10th IEEE Symposium on Computer-Based Medical System - Part II, Maribor- Slovenia June 1997, p. 67-72.

[61] Robert Leggat. A History of Photography: From its beginnings till the 1920s, 1992. Available on : http://www.rleggat.com/photohistory/index.html (Consulted on Sept. 12, 2002).

[62] H. Tamura, S. Mori, and T. Yamawaki. Textural features corresponding to visual perception, IEEE Transactions on Systems, Man and Cybernetics, 1978, vol. 8, no.6, p. 460-472.

[63] Liu, F and Picard, R W. Periodicity, directionality and randomness: Wold features for image modelling and retrieval IEEE Transactions on Pattern Analysis and Machine Intelligence, 1996, vol.18, no.7, p.722-733.

[64] Manjunath, B S and Ma, W Y. Texture features for browsing and retrieval of large image data, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1996, vol.18, p.837-842.

[65] Kaplan, L M et Murenzi, R. Fast texture database retrieval using extended fractal features, in Storage and Retrieval for Image and Video Databases VI (Proc SPIE), San Jose, CA, United States, 1998, p.162-173.

[66] J. Wang, W. J. Yang, and R. Acharya. Color clustering techniques for color-content-based image retrieval from image databases, Proceedings of the 1997 IEEE International Conference on Multimedia Computing and Systems, ICMCS, Ottawa, Ont, Can, 1997, p. 442-449.

[67] M. Miyahara. Mathematical transform of (RGB) color data to Munsell (HSV) color data, in Proc. SPIE Visual Communications and Image Processing, 1988, vol. 1001, p. 650-657.

[68] H. Lu, B. Ooi, and K. Tan. Efficient image retrieval by color contents, Proceedings of the 1st International Conference on Applications of Databases (ADB-94), Vadstena, Swed 1994, p 95

207

[69] T. S. Chua, K. L. Tan, and B. C. Ooi. Fast signature-based color-spatial image retrieval, Proceedings of the 1997 IEEE International Conference on Multimedia Computing and Systems, ICMCS, Ottawa, Ont, Can, 1997, p 362-369

[70] J. R. Smith and S. F. Chang. Single color extraction and image query Proceedings of the 1995 IEEE International Conference on Image Processing. Part 3 (of 3), Washington, DC, USA, 1995, p. 528-531

[71] Biederman, I (1987). Recognition-by-components: a theory of human image understanding, Psychological Review, vol. 94, no.2, p. 115-147.

[72] T. Gevers and V. K. Kajcovski. Image segmentation by directed region subdivision Proceedings of the 12th IAPR International Conference on Pattern Recognition. Part 1 (of 3), Jerusalem, Isr, 1994, p 342-346

[73] M. Hansen and W. Higgins. Watershed-driven relaxation labeling for image segmentation, Proceedings of the 1994 1st IEEE International Conference on Image Processing. Part 3 (of 3), Austin, TX, USA, 1994, p. 460-464

[74] X. Q. Li, Z. W. Zhao, H. D. Cheng, C. M. Huang, and R. W. Harris. Fuzzy logic approach to image segmentation, Proceedings of the 12th IAPR International Conference on Pattern Recognition. Part 1 (of 3), Jerusalem, Isr, 1994, p. 337-341

[75] Rafael C. Gonzalez, Richard E. Woods. Digital Image Processing, 2nd Edition: Upper Saddle River (NJ), Prentice Hall Inc. 2002, 793 p, ISBN: 0-201-18075-8.

[76] Query By Image Content – QBIC, Version 3.0, Programmer’s Guide, IBM Co. 1998.

[77] The American Museum of Photography. Available on the site: http://www.photographymuseum.com/. (Consulted on Sept. 12, 2002).

[78] H. Wactiar. Extracting and Visualizing Knowledge from Video and Film Archives, Volume 8, Issue 6 (http://www.jucs.org), July 2002, Springer Co. Pub. Also appeared on the Proceedings of I-KNOW’02 Conference, Graz, Austria, July 11-12, 2002. Available on http://www.know-center.at/en/conference/i-know02/program.htm (Consulted on Nov. 24, 2002).

[79] Santini, S and Jain, R C. The graphical specification of similarity queries, Journal of Visual Languages and Computing, 1997, vol. 7, p. 403-421.

208 References

[80] Christian Böhm and Hans-Peter Kriegel. A Cost Model and Index Architecture for the Similarity Join, Proc. 17th IEEE Int. Conf. on Data Engineering (ICDE), Heidelberg, Germany, 2001, p. 411-420.

[81] Christian Böhm, Bernhard Braunmüller, Florian Krebs, and Hans-Peter Kriegel. Epsilon Grid Order: An Algorithm for the Similarity Join on Massive High-Dimensional Data, Proc. ACM SIGMOD Int. Conf. on Management of Data, Santa Barbara, CA, 2001, p. 379-388.

[82] Christian Böhm. A Cost Model for Query Processing in High-Dimensional Data Spaces, ACM Transactions on Database Systems, 2000.

[83] Danilo Montesi, Alberto Trombetta, Similarity Search through Fuzzy Relational Algebra, 10th International Workshop on Database &amp; Expert Systems Applications, IEEE Press, Florence, Italy, Sept. 1999, p.235.

[84] N.J. Nes and M.L. Kersten. The ACOI algebra: A query algebra for image retrieval systems, Lecture notes in computer science, 1998, No. 1405, p. 77

[85] Simone Santini and Ramesh Jain. Image Databases are not Databases with Images, Proc. of the 9th International Conference on Image Analysis and Processing, Florence, Italy, vol.2, p.38-45.

[86] Jean-Michel Jolion. Feature Similarity, In M. Lew eds Principles of Visual Information Retrieval. London : Springer, 2001

[87] Etienne Loupias, Stéphane Bres and Jean-Michel Jolion. From Methods to Images, Computing, 1999, vol.62, p.265-275. (a presentation on KIWI).

[88] G. Octo Barnett. History of the Development of Medical Information Systems at the Laboratory of Computer Science at Massachusetts General Hospital, ACM 0-8979&-248-9/87/0011/0043, 1987.

[89] Tamura, H et al. Textural features corresponding to visual perception, IEEE Transactions on Systems, Man and Cybernetics 1978, vol. 8, no.6, p.460-472.

[90] J. R. Smith and S. F. Chang. Automated binary texture feature sets for image retrieval, in Proc. IEEE Int. Conf. Acoust, Speech, and Signal Processing, Atlanta, GA, 1996, p. 443-449.

[91] Gupta, A., Weymouth, T., and Jain, R. Semantic Queries with Pictures: The VIMSYS Model, Proceedings of the 17th Conference on Very Large Databases (VLDB), Palo Alto, California, 1991, p. 69-79.

209

[92] Mechkour, M. EMIR2. An Extended Model for Image Representation and Retrieval, in Revell, N. and Tjoa, A. (Eds.), 6th International conference Database and Expert Systems Applications (DEXA), 1995 Sep : London, p. 395-414.

[93] Gudivada, V., Raghavan, V., and Vanapipat, K. A Unified Approach to Data Modeling and Retrieval for a Class of Image Database Applications, IEEE Transactions on Data and Knowledge Engineering, 1994.

[94] R. G. G. Cattell, Douglas K. Barry, Mark Berler, et al. The Object Data Standard : ODMG 3.0. San Francisco : Morgan Kaufmann, 2000, 280 p, ISBN 1-55860-647-5.

[95] Vitorino Ramos, Fernando Muge. Less is More: Genetic Optimization of Nearest Neighbour Classifiers; RecPad’98, 10th Portuguese Conference on Pattern Recognition, Lisbon, March 1998, p. 293-302.

[96] K. V. Ravi Kanth, Divyakant Agrawal and Ambuj Singh. Dimensionality Reduction for Similarity Searching in Dynamic Databases, SIGMOD’98 Seattle, WA, USA, 1998, p.166-176.

[97] Thomas Seidl, Hans-Peter Kriegel. Optimal Multi-Step k-Nearest Neighbor Search, SIGMOD’98 Seattle, WA, USA, 1998, p.154-165.

[98] Piero Zamperoni. Feature Extraction, In eds. H.Maître and J. Zinn-Justin , Les progrès du traitement des images : Progress in picture processing. Amsterdam ; New York : Elsevier, 1996, p. 119-184.

[99] Paolo Ciaccia, Aarco Patella, and Pavel Zezula. A Cost Model for Similarity Queries in Metric Spaces; In the proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’98), Seattle, Washington, USA, June 1998, p. 59-68.

[100] John R. Smith. MPEG-7 Standard for Multimedia Databases, ACM SIGMOD, Santa Barbara, California, USA, May 21-24, 2001, 627 p.

[101] R. Samadani and C. Han. Computer-assisted extraction of boundaries from images, in Proc. of SPIE Storage and Retrieval for Image and Video Databases, San Jose, CA, USA, 1993, p. 219-225.

[102] Y. Rui, A. C. She and T. S. Huang. Automated shape segmentation using attraction-based grouping in spatial color texture space, Proc. IEEE Int. Conf. on Image Processing, Lausanne, Switzerland, 1996, p. 53-56.

210 References

[103] Chahab Nastar, Matthias Mitschke, Christophe Meilhac, Nozha Boujemaa. Surfimage: a flexible content-based image retrieval system, ACM Multimedia ‘98, Bristol, UK, 1998. Available on http://www-rocq.inria.fr/imedia/Articles/MM98/mm98.html, (Consulted on June 20, 2002).

[104] Wegner, Editors, North-Holland, 119-133. A. Pentland, R. W. Picard, S. Sclaroff; Photobook: Content-Based Manipulation of Image Databases, International Journal of Computer Vision, 1996, vol.18, no.3, p. 233-254.

[105] Sharad Mehrotra, Yong Rui, Michael Ortega, and Thomas S. Huang. Supporting Content-based Queries over Images in MARS, Proceedings of the 4th IEEE Int. Conf. Multimedia Computing and Systems, Chateau Laurier, Ottawa, Ontario, Canada, June 3-6, 1997, p. 632-633

[106] Niblack, W, XIAOMING Z. Updates to the QBIC system, Proc SPIE Storage and Retrieval for Image and Video Databases VI, San Jose, CA, United States, 1998, p. 150-161.

[107] C. Faloutsos, et al. Efficient and effective querying by image content Journal of Intelligent Information Systems, vol. 3, 1994, p. 231-262.

[108] MPEG-21 Overview v.5, (Editors: Jan Bormans, Keith Hill), Approved, ISO/IEC JTC1/SC29/WG11/N5231, Shanghai, October 2002.

[109] The official web-site of MPEG: Available on http://www.mpeg.org/MPEG/index.html. (Consulted on May 8, 2001).

[110] John Watkinson. The MPEG Handbook, Oxford : Focal, 2001, 320 p., ISBN: 0240516567

[111] Chu, W. W., Hsu, C.C., Cárdenas, A.F., and Taira, R. K. Knowledge-Based Image Retrieval with Spatial and Temporal Constraints, IEEE Transactions on Knowledge and Data Engineering, November/December 1998, vol. 10, no. 6, p. 872-888.

[112] The Digital Imaging and Communications in Medicine (DICOM) Standard, Part 1-16: published by National Electrical Manufacturers Association, 1300 N. 17th Street, Rosslyn, Virginia 22209 USA, 2001 Available on http://medical.nema.org/dicom/2001.html (Consulted on Oct. 11, 2002)

[113] The DICOM Standard: Available on Available on http://www.psychology.nottingham.ac.uk/staff/cr1/dicom.html, University of Nottingham, Nottingham NG7 2RD, UK, (Consulted on April 8, 2003).

211

[114] J. M. Shapiro. Embedded Image Coding Using Zerotrees of Wavelet Coefficients, IEEE Transactions on Signal Processing, December 1993, vol. 41, no. 12, p.3445-3462.

[115] Takeyuki Shimura, Masatoshi Yoshikawa, Shunsuke Uemura. Storage and Retrieval of XML Documents using Object-Relational Databases, Database and Expert Systems Applications (DEXA), 1999, p. 206-217.

[116] Dongwon Lee, Murali Mani, Frank Chiu, Wesley W. Chu. NeT & CoT: Translating Relational Schemas to XML Schemas Using Semantic Constraints. CIKM 2002, p. 282-291.

[117] Dongwon Lee, Wesley W. Chu. Constraints-Preserving Transformation from XML Document Type Definition to Relational Schema. ER 2000, p. 323-338.

[118] Andrew Eisenberg, Jim Melton. SQL: 1999, formerly known as SQL 3. SIGMOD Record, 1999, vol.28, no.1, p. 131-138.

[119] Jim Melton, ; Alan R Simon, Understanding the New SQL: A Complete Guide. Second Edition.. San Mateo, Calif : Morgan Kaufmann, 2000, 536 p.

[120] Agathoniki Trigoni, G. M. Bierman. Inferring the Principal Type and the Schema Requirements of an OQL Query. BNCOD 2001, p. 185-201.

[121] F. Unglauben, W. Hillen, Th. Kondring. Java DICOM Viewer für die Teleradiologie, Bildverarbeitung für die Medizin, 2001, p. 404-408.

[122] The DICOM Standard (Links to many useful sites): Available on http://www.psychology.nottingham.ac.uk/staff/cr1/dicom.html#links. (Consulted on January 25, 2002)

[123] R. Elmasri and S.B. Navathe. Fundamentals of Database Systems. 2nd ed. California: Addison-Wesley co, 1994, 87 p., ISBN 0-8053-1753-8

[124] Heikki Mannila, K-J. Raiha. The Design of Relational Databases. Addison-Wesley Co. ISBN 0-201-56523-4, 1992.

[125] P. Ciaccia, D. Montesi, W. Penzo, and A. Trombetta. Imprecision and user preferences in multimedia queries: A generic algebraic approach. In Foundations of Information and Knowledge Systems, First International Symposium, Springer, LNCS 1762, Burg Spreewald, Germany, February 2000. p. 50-71.

212 References

[126] D. Papadias, N. Mamoulis, and Y. Theodoridis. Processing and optimization of multi-way spatial joins using R-Trees. In Proceedings of the ACM PODS International Conference, Philadelphia, Pennsylvania, USA, May-June 1999, p. 189-200.

[127] T. Brinkho_, H.-P.Kriegel, and B. Seeger. Efficient processing of spatial joins using R-trees. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Washington, USA, June 1993, p.10-15.

[128] G.R. Hjaltason and H. Samet. Incremental distance join algorithms for spatial databases. In Proceedings of the ACM SIGMOD Conference on Management of Data, Seattle, USA, June 1998, p.237-248.

[129] Corral, Y. Manolopoulos, Y. Theodoridis, and M. Vassilakopoulos. Closest pair queries in spatial databases. In Proceedings of the ACM SIGMOD Conference on Management of Data, Dallas, USA, May 2000, p.189-200.

[130] H. Tagare. Increasing Retrieval Efficiency by index tree Adaptation, Proc. of IEEE Workshop on content-based Access of Image and Video Libraries, in conjunction with IEEE CVPR’97, 1997.

[131] A. Guttman. R-Tree: a dynamic index structure for spatial searching SIGMOD '84, Proceedings of Annual Meeting (ACM Special Interest Group on Management of Data), Boston, MA, USA, 1984, p 47-57

[132] T. Sellis, N. Roussopoulos, and C. Faloutsos. The R+-tree: A dynamic index for multi-dimensional objects. College Park, Md. : University of Maryland, 1987, 24 p.

[133] N. Beckmann, H-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: an efficient and robust access method for points and rectangles. Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, USA, 1990, p. 322

[134] D. White and R.Jain. Similarity indexing: Algorithms and Performance, Proc. Of SPIE Storage and Retrieval for Image and Video Databases, San Jose, CA, USA, 1996, p. 62-73

[135] Raymond Ng and Andishe Sedighian. Evaluating multidimensional indexing structures for images transformed by principal component analysis. In Proc. of SPIE Storage and Retrieval for Image and Video Databases, San Jose, CA, USA, 1996, p. 50-61

213

[136] S. Abiteboul, R. Hull, V. Vianu. Foundations of Databases, New York: Addison-Wesley Pub. Co., 1995, ISBN 0-201-53771-0

[137] G. M. Shaw and S. B. Zdonik. A query algebra for object-oriented databases. In Providence, R.I. : Brown University, Dept. of Computer Science, 1989, 27 p.

[138] S.L. Vandenberg. Algebras for Object-Oriented Query Languages. PhD thesis, University of Wisconson - Madison, U.S.A., 1993.

[139] T. W. Leung, G. Mitchell, B. Subramanian, B. Vance, S. L. Vandenberg, and S. B. Zdonik. The AQUA data model and algebra. In Proceedings of the 4th Workshop on Database Programming Languages, 1993, p. 157-175.

[140] M.H. Scholl. Extensions to the relational data model. In P. Loucopoulos and R. Zicari, editors, Conceptual Modelling, Databases and CASE: An Integrated View of Information Systems Development. New York: John Wiley & Sons, 1992.

[141] G. Mitchell. Extensible Query Processing in an Object-Oriented Database. PhD thesis, Brown University, U.S.A., 1993.

[142] G. Mitchell, S. Zdonik, and U. Dayal. Object-oriented query optimization: What's the problem? Technical Report CS-91-41, Brown University, 1991.

[143] MIT, Photobook Web Demo, Available on http://www-white.media.mit.edu:80/cgi-bin/tpminka/query?faces,ev,7412,15, (consulted on July 12, 2002).

[144] R. Fagin. Fuzzy queries in multimedia database systems Proceedings of the 1998 17th ACM SIGART-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS, Seattle, WA, USA, 1998, p.1-10.

[145] D. D. Straube and M. T. Ozsu. Queries and query processing in object-oriented database systems. ACM Transactions on Information Systems, October 1990, vol.8, no.4, p.387-430.

[146] M. Atkinson, F. Bancilhon, D. DeWitt, K. Dittrich, D. Maier, and S. Zdonik. The Object Oriented Database System Manifesto, in: W. Kim, J.-M. Nicolas, S. Nishio (eds.), Proceedings of The First International Conference on Deductive and Object-Oriented Databases, Kyoto, December 1989, p. 40-57.

[147] E. Codd. A Relational Model of Data for Large Shared Data Banks, CACM, June 1970, vol.13, no.6.

214 References

[148] Adiba, M. STORM: Structural and Temporal Object Oriented Multimedia database system; Proceedings of the Int. Workshop on Multi-Media Database Management Systems, August 1995, p. 12-19.

[149] L. L. Dornhoff and F. E. Hohn. Applied Modern Algebra, New York: Macmillan, 1978, 500 p.

[150] Makinouchi. A Consideration on Normal Form of Not-Necessarily-Normalized Relations in the Relational Data Model, Proc. VLDB Conf., Tokyo, Japan, 1977.

[151] S. Abiteboul and N. Bidoit. Non-First Normal Form Relations: An Algebra Allowing Data Restructuring, Rocquencourt, France : Institut national de recherche en informatique et en automatique, 1984, 47 p.

[152] V. Deshpande and P.-A. Larson. An Algebra for Nested Relations, Research Report CS-87-65, University of Waterloo, Dec. 1987.

[153] S. Osborn. Identity, Equality, and Query Optimization, in ed. K. Dittrich, Advances in Object-Oriented Database Systems : 2nd International Workshop on Object-oriented Database Systems, Bad Münster am Stein-Ebernburg, FRG, September 27-30, 1988, proceedings. Springer-Verlag : Berlin, Germany, 1988

[154] Jarke, M. and Koch, J. Query optimization in database systems. ACM Computing Surveys, June 1984, vol.16, no.2, p.111–152.

[155] Mannino, M. V., Chu, P., and Sager. T. Statistical profile estimation in database systems. ACM Computing Surveys, Sept 1988, vol.20, no.3, p.192–221.

[156] Cathy Baird, Gayle Geurin. Oracle9i Database New Features, Release 1 (9.0.1), Oracle Corporation, Part No. A90120-02, 2001.

[157] Rod Ward. Oracle interMedia User’s Guide and Reference, Release 9.0.1, Oracle Corporation, Part No. A88786-01, 2001.

[158] Y. Ioannidis. Query Optimization, ACM Computing Surveys, symposium issue on the 50th Anniversary of ACM, March 1996, vol. 28, no. 1, p. 121-123.

[159] P. Lyman, H. R. Varian. How Much Information?, A research report at the School of Information Management and Systems at the University of California at Berkeley, ãRegents of the University of California, October 2000. Available on http://www.sims.berkeley.edu/how-much-info/, (Consulted on April 5, 2003).

215

[160] T. Gevers and A.W.M. Smeulders. An Approach to Image Retrieval for Image, Databases, 1993. Available on http://www.hermitagemuseum.org/fcgi-bin/db2www/browse.mac/category?selLang=English, (Consulted on June 2001).

[161] Chuck Fuller, Virage Image Component Technologies, Virage, Inc., January, 1999.

[162] Michael Ortega, Sharad Mehrotra, Kaushik Chakrabarti, and Kriengkrai Porkaew. WebMARS: A Multimedia Search Engine, Proc. SPIE, 1999, vol. 3964, p.314-321.

[163] C.H.C.Leung and H.H.S.Ip. Benchmarking for Content-Based Visual Information Search, Visual 2000, 4th International Conference, Lyon, France, 2000, p. 442-456.

[164] J. Robie, J. Lapp, D. Schach. XML Query Language (XQL). Available online: http://www.w3.org/TandS/QL/QL98/pp/xql.html, (consulted on April 7, 2003).

216 References

Liste de Publications

Revues Internationales :

1. Solomon Atnafu, Richard Chbeir , Lionel Brunie; Efficient Content-Based and Metadata Retrieval in Image Database, Journal of Universal Computer Science, Volume 8, Issue 6 (http://www.jucs.org), July 2002, Springer Co. Pub. Also appeared on the Proceedings of I-KNOW’02 Conference, Graz, Austria, July 11-12, 02, pp. 118-127.

2. Harald Kosch and Solomon Atnafu; A Multimedia Join by the Method of Nearest Neighbor Search; Information Processing Letters (IPL), Volume 82, Issue 5, 15 June 2002, Elsevier Press, pp. 269-276.

Conférences Internationales :

1. David COQUIL, Solomon ATNAFU, Richard CHBEIR, Lionel BRUNIE, EMIMS: A Medical Image Management System with a Visual Multi-Criteria Query Interface, International Conference on Information Resources Management (IRMA 2003), IRMA Press, May 18-21, 2003, Philadelphia, USA.

2. Solomon Atnafu, Richard Chbeir, Lionel Brunie; Content-Based and Metadata Retrieval in Medical Image Database; 15th IEEE Symposium on Computer Based Medical Systems (CBMS 2002) June 4th - 7th, 2002, Maribor, Slovenia.

3. Richard Chbeir, Solomon Atnafu, André Flory, Lionel Brunie; Efficient Method for Image Indexing: A Case for Medical Application; 2nd IEEE International Symposium on Signal Processing and Information Technology (ISSPIT'02), December 18-21, 2002, Marrakesh, Morocco.

4. Richard Chbeir, Solomon Atnafu, Lionel Brunie, Image Data Model for Efficient Multi-Criteria Query in Medical Database; 14th International Conference on Scientific and Statistical Database Management (SSDBM 2002), 24th-26th July, pp.165- 175, Edinburgh, Scotland.

5. Solomon Atnafu, Lionel Brunie, and Harald Kosch, Similarity-Based Operators and Query Optimization for Multimedia Database Systems; International Database Engineering & Applications Symposium (IDEAS'01), July 16-18, 2001, Grenoble, France; IEEE Computer Society Press, pp. 346-355.

6. Solomon Atnafu, Lionel Brunie, and Harald Kosch, Similarity-Based Algebra for Multimedia Databases, In the Proc. the 12th Australian Database Conference (ADC'01), Gold Cost, Australia, Jan. 29 - Feb. 2, 2001, IEEE Computer Society Press, pp. 115-122.

7. Solomon Atnafu, Lionel Brunie, and Harald Kosch, Modeling Data Repository and Formalizing Content-Based Queries for Image Databases; ACS/IEEE International Conference on Computer Systems and Applications (AICCSA'01), Beirut, Lebanon, June 25-29, 2001, IEEE Computer Society Press.

8. Solomon Atnafu, Lionel Brunie, and Harald Kosch, Similarity-Based Operators in Image Database Systems; The 2nd International Conference on Web-Age Information Management (WAIM'2001), Xi'an, China, July 9-11, 2001, LNCS, Springer Verlag, pp. 14-25.