Visualization and interactive exploration of multidimensional confocal images

Visualization and Interactive Exploration of Factorial Correspondence Analysis Results on Images Nguyen-Khang Pham*, ** — Annie Morin* — Patrick Gros* * IRISA Campus de Beaulieu 35042 Rennes cedex {pnguyenk;amorin:pgros}@irisa.fr

** Cantho University Campus III, 1 Ly Tu Trong Street, Cantho city, Vietnam [email protected]

ABSTRACT. The aim of our investigation is to interactively explore the results of Factorial Correspondence Analysis (FCA) applied on images in order to be able to extract knowledge and to better interpret the obtained results.

We propose an interactive graphical tool, CAViz, which allows us to view and to extract knowledge from the FCA results on images. Originally, FCA deals with contingency tables and is very often used in Textual Data Analysis (TDA). In Textual Data Analysis, the contingency table crosses words and documents. For adapting FCA to images, the first step is to define the “visual” words in images (similar to words in the texts). These words are constructed from local descriptors (SIFT, Scale Invariant Feature Transform) in images.

CAViz projects clouds of points in factorial planes and allows to view and to extract valuable information such as characterizing words and relevant indicators (representation quality and contribution to the inertia). An application to the Caltech4 base demonstrates the interest of CAViz for the analysis of FCA results.

KEY WORDS: Factorial correspondence analysis, Visualization, SIFT

Title of the journal. Volume X – no X/2002, pages 1 to n

2 Title of the journal. Volume X – no X/2002

1. Introduction

Data mining (Fayyad et al., 1996) intends to extract useful hidden knowledge from the large datasets in a given application. This usefulness relates to user goal. In other words, only the user can decide whether the resulting knowledge answers his/her goal. Therefore, data mining tools should be highly interactive and user-friendly. The idea here is to increase the human involvement through interactive visualization techniques in data-mining environment.

In the past years, many visual methods have been developed in different domains and used for data exploration and knowledge extraction process (Fayyad et al., 2001, Keim, 2002). Visual methods are used for data selection (pre-processing step) and results displays (post-processing step). Some recent visual data mining methods (Ankerst et al., 2000, Do et al., 2004, Poulet, 2004) try to involve more intensively human factors in the data mining step through visualization. The effective cooperation can bring out some advantages such as the use of domain knowledge during the model construction, the improvement of the confidence and the understanding of the obtained models, when using human pattern recognition capabilities in model exploration and construction.

Our investigation aims to explore the results of Factorial Correspondence Analysis (FCA) (Benzécri, 1973). In textual data analysis, the Bi-Qnomis tool (Kerbaol et al., 2006) developed by M. Kerbaol allows to visualize results and to find relevant topics in a text corpus analyzed by a FCA. The point clouds (terms/words and documents) are projected on a factorial plane. Then for each axis, a list of words whose contributions to inertia are large (called metakeys after M. Kerbaol), generally three times the average contribution by word or/and documents, is created. FCA is based on classical results in matrix theory and the central result is a eigenvector-eigenvalue decomposition of a square matrix. Total inertia on an axis is equal to the corresponding eigenvalue; so the threshold is easy to compute. The metakeys are displayed on the left and the titles of the documents with high contributions are listed on the top of screen. We can click on the title of the document to get immediately the plain text. When examining metakeys and/or documents, an expert can summarize the content of these documents. Bi-Qnomis supports also a visualization of metakeys using a hyperbolic tree.

For adapting FCA to images, we meet some difficulties because there are no real words in images. We use “visual words” instead. Recently, some methods originally developed for textual data analysis such as PLSA (Probabilistic Latent Semantic Analysis) (Hofmann, 1999), LDA (Latent Dirichlet allocation) (Blei et al., 2003) have been applied in image analysis for image classification (Willamowski et al., 2004), topic discovery in the image (Sivic et al., 2005), scene classification (Bosch et al., 2006), and image retrieval (Lienhart et al., 2007). These methods try to find a model of the corpus and to reduce the dimensions. A disadvantage of these methods is the use of an ad hoc model and of the EM algorithm to find a local optimum. In

https://www.researchgate.net/publication/224711670_PLSA_on_Large_Scale_Image_Databases?el=1_x_8&enrichId=rgreq-25b085b3-e8f1-438a-a355-751439283488&enrichSource=Y292ZXJQYWdlOzIzOTkzMDM1MDtBUzoxOTM1Mjk1NTg5NTM5OTBAMTQyMzE1MjQ1MDU0Mw==

https://www.researchgate.net/publication/220319974_Latent_Dirichlet_Allocation?el=1_x_8&enrichId=rgreq-25b085b3-e8f1-438a-a355-751439283488&enrichSource=Y292ZXJQYWdlOzIzOTkzMDM1MDtBUzoxOTM1Mjk1NTg5NTM5OTBAMTQyMzE1MjQ1MDU0Mw==

https://www.researchgate.net/publication/2630812_Towards_an_Effective_Cooperation_of_the_Computer_and_the_User_for_Classification?el=1_x_8&enrichId=rgreq-25b085b3-e8f1-438a-a355-751439283488&enrichSource=Y292ZXJQYWdlOzIzOTkzMDM1MDtBUzoxOTM1Mjk1NTg5NTM5OTBAMTQyMzE1MjQ1MDU0Mw==

https://www.researchgate.net/publication/236025174_Probabilistic_Latent_Semantic_Analysis?el=1_x_8&enrichId=rgreq-25b085b3-e8f1-438a-a355-751439283488&enrichSource=Y292ZXJQYWdlOzIzOTkzMDM1MDtBUzoxOTM1Mjk1NTg5NTM5OTBAMTQyMzE1MjQ1MDU0Mw==

https://www.researchgate.net/publication/3246186_An_analysis_of_IEEE_publications?el=1_x_8&enrichId=rgreq-25b085b3-e8f1-438a-a355-751439283488&enrichSource=Y292ZXJQYWdlOzIzOTkzMDM1MDtBUzoxOTM1Mjk1NTg5NTM5OTBAMTQyMzE1MjQ1MDU0Mw==

https://www.researchgate.net/publication/220605492_From_Data_Mining_to_Knowledge_Discovery_in_Databases?el=1_x_8&enrichId=rgreq-25b085b3-e8f1-438a-a355-751439283488&enrichSource=Y292ZXJQYWdlOzIzOTkzMDM1MDtBUzoxOTM1Mjk1NTg5NTM5OTBAMTQyMzE1MjQ1MDU0Mw==

https://www.researchgate.net/publication/4133625_SVM_and_graphical_algorithms_a_cooperative_approach?el=1_x_8&enrichId=rgreq-25b085b3-e8f1-438a-a355-751439283488&enrichSource=Y292ZXJQYWdlOzIzOTkzMDM1MDtBUzoxOTM1Mjk1NTg5NTM5OTBAMTQyMzE1MjQ1MDU0Mw==

https://www.researchgate.net/publication/232641291_Discovering_objects_and_their_localization_in_images_In_ICCV?el=1_x_8&enrichId=rgreq-25b085b3-e8f1-438a-a355-751439283488&enrichSource=Y292ZXJQYWdlOzIzOTkzMDM1MDtBUzoxOTM1Mjk1NTg5NTM5OTBAMTQyMzE1MjQ1MDU0Mw==

CAViz 3

addition, it is difficult to interpret the results of these methods. Most of the works use such methods as black boxes.

This paper will focus on adapting FCA on images and on interpreting the results using relevant indicators through a visualization tool, CAViz, in which the user can interactively explore the results to better understand them. The article is organized as follows: we briefly describe the FCA method in Section 2. Section 3 presents interactive knowledge extraction from the FCA results. In conclusion, we present some perspectives for this work.

2. Adaptation of FCA on images

2.1. FCA

FCA is a classical exploratory method for the analysis of contingency tables. It was proposed by J. P. Benzécri (1973) in the linguistic context, i.e. textual data analysis. The first study was performed on the tragedies of Racine. FCA on a table crossing words and documents allows answering the following questions: is there any proximity between certain words? Is there any proximity between certain documents? Is there any link between certain words and certain documents? FCA like most factorial method uses a singular value decomposition of a particular matrix and allows viewing of words and documents in a reduced space. This reduced space has a particular propriety where points are projected (words and / or documents) with a maximum inertia. In addition, FCA provides relevant indicators for the interpretation of the axes as the contribution of a word or a document to the inertia of the axis or the representation quality of a word and/or document on an axis (Morin, 2004).

2.2. Construction of visual words and images representation

In order to adapt FCA on images, we must represent the image corpus in the form of contingency table. Here images are treated as documents and the “visual words” (to be defined) as terms/words.

The words in the images, called visual words, must be calculated to form a vocabulary of N words. Each image will be finally represented by a word histogram. The construction of visual words is processed in two steps: (i) computation of local descriptors for a set of images, (ii) classification (clustering) of obtained descriptors. For each cluster, we have a visual word. There would thus be as many words as clusters obtained at the end of step (ii). The local descriptor computation in an image is also done in two stages: we first detect the interest points in images. These points are either maxima of Laplace of Gaussian (Lindeberg, 1998), or 3D local extrema of

https://www.researchgate.net/publication/2492416_Feature_Detection_with_Automatic_Scale_Selection?el=1_x_8&enrichId=rgreq-25b085b3-e8f1-438a-a355-751439283488&enrichSource=Y292ZXJQYWdlOzIzOTkzMDM1MDtBUzoxOTM1Mjk1NTg5NTM5OTBAMTQyMzE1MjQ1MDU0Mw==

https://www.researchgate.net/publication/4112700_Intensive_use_of_correspondence_analysis_for_information_retrieval?el=1_x_8&enrichId=rgreq-25b085b3-e8f1-438a-a355-751439283488&enrichSource=Y292ZXJQYWdlOzIzOTkzMDM1MDtBUzoxOTM1Mjk1NTg5NTM5OTBAMTQyMzE1MjQ1MDU0Mw==


the difference of Gaussian (Lowe, 1999), or points detected by a Hessian-Affine detector (Mikolajczyk et al., 2004). Figure 1 shows some interest points detected by a Hessian – Affine detector.

Figure 1. Interest points detected by Hessian-Affine detector

Figure 2. A SIFT descriptor computed from the region around the interpret point (the circle): the gradient of the image (left) and the descriptor of the interest point (right)

Then, the descriptor of the interest points is computed on the gray level gradient of the region around the point. The Scale-Invariant Feature Transform descriptor, SIFT (Lowe, 2004), is often preferred. Each SIFT descriptor is a 128-dimensional vector. Figure 2 describes a SIFT descriptor. The second step is to form visual words from the local descriptors computed in the previous step. Most of works perform a k-means on descriptors and take the average of each cluster as visual word (Willamowski et al., 2004, Sivic et al., 2005, Bosch et al., 2006). Some visual words computed from Caltech4 dataset are shown in figure 3. After building the visual vocabulary, each descriptor is assigned to its nearest cluster. For this, we compute,

https://www.researchgate.net/publication/228980010_Categorizing_nine_visual_classes_using_local_appearance_descriptors?el=1_x_8&enrichId=rgreq-25b085b3-e8f1-438a-a355-751439283488&enrichSource=Y292ZXJQYWdlOzIzOTkzMDM1MDtBUzoxOTM1Mjk1NTg5NTM5OTBAMTQyMzE1MjQ1MDU0Mw==

https://www.researchgate.net/publication/215721498_Scale_Affine_Invariant_Interest_Point_Detectors?el=1_x_8&enrichId=rgreq-25b085b3-e8f1-438a-a355-751439283488&enrichSource=Y292ZXJQYWdlOzIzOTkzMDM1MDtBUzoxOTM1Mjk1NTg5NTM5OTBAMTQyMzE1MjQ1MDU0Mw==

https://www.researchgate.net/publication/221304071_Scene_classification_via_pLSA?el=1_x_8&enrichId=rgreq-25b085b3-e8f1-438a-a355-751439283488&enrichSource=Y292ZXJQYWdlOzIzOTkzMDM1MDtBUzoxOTM1Mjk1NTg5NTM5OTBAMTQyMzE1MjQ1MDU0Mw==

https://www.researchgate.net/publication/3816624_Object_Recognition_from_Local_Scale-Invariant_Features?el=1_x_8&enrichId=rgreq-25b085b3-e8f1-438a-a355-751439283488&enrichSource=Y292ZXJQYWdlOzIzOTkzMDM1MDtBUzoxOTM1Mjk1NTg5NTM5OTBAMTQyMzE1MjQ1MDU0Mw==

https://www.researchgate.net/publication/200038910_Lowe_DG_Distinctive_Image_Features_from_Scale-Invariant_Key-points_Int_J_Comput_Vision_602_91-110?el=1_x_8&enrichId=rgreq-25b085b3-e8f1-438a-a355-751439283488&enrichSource=Y292ZXJQYWdlOzIzOTkzMDM1MDtBUzoxOTM1Mjk1NTg5NTM5OTBAMTQyMzE1MjQ1MDU0Mw==


CAViz 5

in R128, distances from each descriptor to representatives of previously defined clusters. An image is then characterized by the frequency of its descriptors. The image corpus will be represented in the form of a contingency table crossing images and clusters.

Figure 3. Some visual words constructed from Caltech4 dataset

Since these visual words are defined on descriptors which are computed from several categories, it is possible to have images in different categories sharing one word. We call it the polysemy of visual words.

3. Interactive exploration of FCA results

3.1. Projection on factorial plane

The screen is divided into two parts: point clouds (images and/or visual words) are drawn on the left, and the right part is reserved to display the selected images. A point “image” is displayed as a red square; and a point “word” as a blue square. The user can select an image or an image group (image group is a group of points standing nearly around each other) by pointing to the image interesting. All the images found in a neighbourhood of radius r of the interesting image will be displayed on the right of the screen. The points corresponding to the selected images will change their colour (from red to green). The visual words (in the form of an ellipse) will be also drawn on images. Selected images are shown on the right hand side. This gives us immediately a general summary of the content of these images.


Figure 4. Projection of Caltech4 base on axes 1 and 2

To focus only on images and/or interesting words, we display only the images and/or words whose contribution to the inertia is high, usually 2 or 3 times the average contribution. The total inertia on one axis is equal to the eigenvalue associated to this axis. So the threshold is easy to determine. Figure 4 shows the projection of the Caltech4 base (Sivic et al., 2005) on the axes 1 and 2 with the threshold equal to 2 times the average. Mr. Kerbaol calls metakey a group of words whose contribution is very high on an axis. We therefore have 2 metakeys by an axis, a positive one and a negative one. The words belonging to each metakey will be displayed on the associated image. In Figure 5, we have some metakeys and their visual words. Images located at each end of axes display corresponding metakeys. The image on the upper left is superimposed with visual words which are closed to it. It is easy to see that most of these words are found in both the left metakey and the bottom metakey.

Interesting information on an image is interactively extracted by selecting the image. There are two relevant indicators for interpreting FCA results: the quality of display on the one hand and the contribution to the inertia on the other hand. This information helps us in the following tasks. Figure 6 shows an example of the interactive information extraction. The image in this example is well represented by


CAViz 7

the axes 2 (negative), 3 (negative), 18 (negative), 12 (positive) and highly contributes to the inertia of these axes.

One of advantages of FCA is a double representation of documents and words on a same factorial plane. Words closed to an image well characterize this image. It is easy to visualize words well characterizing an image or an image group by selecting the image and displaying the words closed to this image. The figure 7 shows the words well representing the topic “visage”.

Figure 5. Visualization of metakeys


Figure 6. Image information extraction: histogram of representation quality on axes and histogram of contribution to inertia of axes of an image (red: positive, blue: negative)

3.2. Image topic discovery

After having displayed point clouds on the factorial plane, we have a look on group of points which defines a topic in this factorial plane. However, in order to discover topics, we should look for axes which well represent topics. CAViz allows to display the information about representation quality of axes and to support a dual view which facilitates viewing information about representation quality of an image group.

CAViz 9

Figure 7. The visual words characterizing the “face” topic

3.2.1. Representation quality of images

The representation quality of a point i (image i) on the axis j is the square cosine of the angle between the axis j and the vector which joins the centre of the point cloud to the point i. The more the square cosine is close to 1, the more the position of the observed point on the projection is close to the real position of the point in the original space. We use this criterion to look for relevant axes.


The point cloud is projected on the first plane (i.e. axes 1 and 2). We extract then the representation quality of image on axes by looking at the first factors of the histogram. This information helps us to find the best plane for an image group.

3.2.2. Dual views

CAViz displays the points on the left part and selected images on the right part. When clicking on an image on the right, the corresponding point on the left is selected and whole information is showed. These dual views allow us to select easily interesting images. The human visual perception is a good tool for pattern recognition. We select the similar images on the right hand side and look at their information on the left. If we find that there are the same axes which well represent the selected images, we will take these axes and project the points on them.

3.2.3. Image topic discovery

We give here a case study of topic discovery in Caltech4 database (Sivic et al., 2005) drawn from the Caltech101 database (Fergus et al., 2003). This database contains 4090 images divided into 5 categories. Table 1 describes this database. Figure 8 shows images drawn from the Caltech4 database.

Category Number of images Faces 435 Motorbikes 800 Airplanes 800 Backgrounds 900 cars (rear) 1155

Table 1. Description of the Caltech4 database

First a FCA is applied on images. We then project point clouds onto the first factorial plane and select the group on the left (cars category). The selected images are displayed on the right part of the screen. We start to select images by clicking on the right and by looking at their representation quality on axes. We find that there are images that are well represented by the axes 6 (negative) and 7 (positive) and there are many images well represented by the axes 11 (negative) and 12 (positive) (Figures 9, 10). Projecting point clouds on these axes, we find some high-quality topics which content similar images.

CAViz 11

Figure 8. Images drawn from the Caltech4 database

Figure 9. Images well represented by the two axes 6 and 7


Figure 10. Images well represented by the two axes 11 and 12

Projecting on the axes 6 and 7, we find that there is a group of points on the top left (axis 7 - positive and axis 6 - negative). We select this group and look at the images displayed on the right part: these images are very similar (they are white cars, figure 11). In addition, there is also another group on the bottom left. We do the same thing as previously and find another topic (red cars)! Similarly, on the axes 11 and 12, there is another topic for cars (Figure 12).

4. Conclusion and future works

We have presented in this article a graphical tool, called CAViz, which visualize the results of the FCA on images. This tool helps users to extract knowledge and to interpret FCA results. We have also shown an application on the image topic discovery in the Caltech4 database. The obtained topics in this study have proved the simultaneous value of CAViz and of FCA.

CAViz 13

Figure 11. Topic discovery by projecting points on the best plane: we found two different topics “cars” on the plane 6 – 7


Figure 12. Topic discovery by projecting points on the best plane: we found two different topics “cars” on the plane 6 – 7

Some improvements should be useful such as a quick search of important axes for an image group and flexibility in the selection of images. For this, we can extract information from the group’s gravity centre and/or use a rectangle, ellipse or an arbitrary shape instead of a circle. One of the possible developments of this method is the building of a system that would facilitate the exploration, navigation, and image labelling by integrating visual information and other information associated with the images.

Acknowledgements

The authors would like to thank A. Zisserman for Caltech4 database and F. Poulet for useful discussions.

CAViz 15

5. Bibliography

Ankerst M., Ester M., Kriegel H-P., “Towards an effective cooperation of the computer and the user for classification”, In Proceeding of KDD'00, 6th ACM SIGKDD, 2000, p. 179 – 188.

Benzécri J.-P., L'analyse des correspondances, Paris: Dunod, 1973.

Blei, D. M., A. Y. Ng, and M. I. Jordan., “Latent dirichlet allocation”, Journal of Machine Learning Research, vol. 3, p. 993 – 1022, 2003.

Bosch A., Zisserman A., Munoz X., “Scene Classification via PLSA”, In Proceedings of the European Conference on Computer Vision, 2006, p. 517 – 530.

Do T-N., Poulet F., “Enhancing svm with visualization”, In Discovery Science 2004, LNAI-3245, E. Suzuki et S. Arikawa Eds., 2004, p. 183 – 194.

Fayyad U., Piatetsky-Shapiro G., Smyth P., “From data mining to knowledge discovery in databases”, AI Magazine, vol. 17, n° 3, 1996, p. 37 – 54.

Fayyad U., Grinstein G., Wierse A., Information Visualization in Data Mining and Knowledge Discovery, Morgan Kaufmann Publishers, 2001.

Fergus R., Perona P., Zisserman A., “Object Class Recognition by Unsupervised Scale-Invariant Learning”, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2003, p. 264 – 271.

Hofmann T., “Probabilistic Latent Semantic Analysis”, In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI’99), 1999, p. 289 – 296.

Keim D., “Information Visualization and Visual Data Mining”, IEEE Transactions on Visualization and Computer Graphics, vol. 8, n° 1, 2002, p. 1 – 8.

Kerbaol M., Bansard J. Y., Coatrieux J. L., “An Analysis of IEEE Publications”, IEEE Engineering in Medicine and Biology Magazine, vol. 25, n° 2, 2006, p. 6 – 9.

Lienhart R., M. Slaney, “PLSA on Large Scale Image Databases”, In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, 2007, p. 1217 – 1220.

Lindeberg T., “Feature detection with automatic scale selection”, International Journal of Computer Vision, vol. 30, n° 2, p. 79-116, 1998.

Lowe D. G., “Object Recognition from Local Scale-Invariant Features”, In Proceedings of International Conference on Computer Vision, 1999, p. 1150 – 1157.

Lowe D. G., “Distinctive image features from scale-invariant keypoints”, International Journal of Computer Vision, vol. 60, n° 2, 2004, p. 91 – 110.

Mikolajczyk K., Schmid C., “Scale and affine invariant interest point detectors”, Proceedings of IJC V, vol. 60, n° 1, p. 63-86, 2004.

Morin A., “Intensive Use of Correspondence Analysis for Information Retrieval”, In Proceedings of the 26th International Conference on Information Technology Interfaces, 2004, p. 255 – 258.


Poulet F., “SVM and Graphical Algorithms: A Cooperative Approach”, In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM’04), 2004, p. 499 – 502.

Sivic J., Russell B. C., Efros A. A., Zisserman A., Freeman W. T., “Discovering Objects and their Localization in Images”, In Proceedings of the International Conference on Computer Vision, 2005, p. 370 – 377.

Willamowski J., Arregui D., Csurka G., Dance C., Fan L., “Categorizing Nine Visual Classes Using Local Appearance Descriptors”, In Proceedings of ICPR 2004 Workshop Learning for Adaptable Visual Systems, Cambridge, United Kingdom, 2004.

Visualization and interactive exploration of multidimensional confocal images

Documents

Transcript of Visualization and interactive exploration of multidimensional confocal images