A hybrid computational model for an automated image descriptor for visually impaired users

Computers in Human Behavior 27 (2011) 677–693

Contents lists available at ScienceDirect

Computers in Human Behavior

journal homepage: www.elsevier .com/locate /comphumbeh

A hybrid computational model for an automated image descriptor for visuallyimpaired users

Tarek Helmy a,*,1, Mohammad M. Hassan b, Muhammad Sarfraz c

a College of Computer Science and Engineering, King Fahd University of Petroleum and Minerals, Dhahran 31261, Mail Box # 413, Saudi Arabiab Dammam Community College, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabiac Department of Information Science, Kuwait University, Safat 13060, Kuwait

a r t i c l e i n f o a b s t r a c t

Article history:Available online 23 May 2010

Keywords:ClassificationImage analysis and descriptor

0747-5632/$ - see front matter � 2010 Elsevier Ltd. Adoi:10.1016/j.chb.2010.04.018

* Corresponding author. Tel.: +966 38601967.E-mail addresses: [email protected] (T. Helm

(M.M. Hassan), [email protected] (M. Sarfraz).1 On leave from College of Engineering, Department

Automatic Control, Tanta University, Egypt.

Nowadays, with the development of high-quality software, most presentations contain images. Thismakes a problem for visually impaired people, as there is a support for text-to-voice conversion butnot for image-to-voice. For documents which combine images and text, we propose a hybrid model tomake a meaningful and easily recognizable descriptor for images in three main categories (statistical,geometrical and non-geometrical). First, a neural classifier is trained, by mining the associated texts usingadvanced concepts, so that it can assign each document to a specific category. Then, a similarity matchingwith that category’s annotated templates is performed for images in every other category. We have madea classifier by using novel features based on color projection and able to differentiate geometrical imagesfrom ordinary images. Thus we have significantly improved the similarity matching, to achieve moreaccurate descriptions of images for visually impaired users. An important feature of the proposed modelis that its specific matching techniques, suitable for a particular category, can be easily integrated anddeveloped for other categories.

� 2010 Elsevier Ltd. All rights reserved.

1. Introduction

Personal computers (PC) are becoming more user-friendly andpopular due to intensive research on human–computer interaction(HCI). One of the main aims of HCI is to increase the accessibility ofinformation so that it becomes a right for all human beings. As apart of mankind, disabled people also possess this right. Accordingto estimates by the United Nations, about 10% of the world’s pop-ulation is disabled (http://web.worldbank.org/). They are diversi-fied groups and have different kinds of challenges. To achieve‘usability for disability’, several broad categories should be consid-ered (Helmy & Hassan, 2005; Mccrindle, 1999)

� Sensory impairment (e.g. sight and hearing loss).� Mental impairment (e.g. learning difficulties, aphasia, rehabili-

tation requirements).� Motor impairment (e.g. limited movement, co-ordination

difficulties).

To work for these disabled people is not only a humanitarianinitiative, but also a legal obligation of big organizations. It be-

ll rights reserved.

y), [email protected]

of Computers Engineering &

comes an important concern to include disabled users in the newtechnological revolution which is evidenced through the increasingawareness of the requirements and rights of the disabled users. Inour research we mainly focus on sight-disabled people.

As computers improve, documents are not just a combination oftext with hazy pictures; but they are becoming a mixture of severalmedia (see Fig. 1). It shows different contents of a typical moderndocument. We model a system that will increase accessibility tomultipart documents for sight-disabled users.

Many studies are going on in the field of document manipulation,such as making presentations more attractive with animation,seeking information from scattered documents, and searching formultimedia data. Several powerful search engines (e.g. Google) basi-cally work on the text parts of the documents, and extract and storeinformation from them. But there is no significant solution for theautomated content description of images for blind users. Imagesare an important element of visual information, and so describingimages is becoming an important area of research.

If we observe carefully, we will find that most presentationscontain two types of graphical elements:

(a) Pictorial (e.g. photos, digital art, etc. from various domains).(b) Regular geometrical shapes (e.g. bar-charts, pie-charts, etc.).

Our main goal is to present a whole document which has manyapplications in a voice-synthesized form, especially for the sight-

http://web.worldbank.org/

http://dx.doi.org/10.1016/j.chb.2010.04.018

mailto:[email protected]



http://www.sciencedirect.com/science/journal/07475632

http://www.elsevier.com/locate/comphumbeh

Document

Text Multimedia

Picture Animation/Video

Picture of StandardGeometrical Shape

Non geometrical Picture

Non related GeometricalShape Picture

Related Geometrical ShapePicture (Bar charts, PIE charts)

Fig. 1. Parts of a modern document.

678 T. Helmy et al. / Computers in Human Behavior 27 (2011) 677–693

disabled users. In our model, we mainly emphasize the ‘‘picture ofstandard geometrical shapes” as they are easily implementable bya standard shape recognition system, and most documents con-taining statistical information belong to this category. In our ap-proach, we try to make a system that can recognize mostcommon geometrical shapes, produce phrases to describe the rec-ognized shape, and finally communicate with the voice synthesizerto produce a speech. We find that it is possible to make finedescriptions of those images where shapes are related (e.g. pie-charts or bar-charts). As we plan to make a precise image descrip-tor, simple content description is not enough; it needs some kind ofmeaningful semantic description. For this purpose, we propose amodel which produces a textual description based on similaritysearching, with standard templates stored in a repository. We alsouse similarity matching to describe non-geometrical images. Therest of the paper is organized as follows: Section 2 covers variousrelated works. A detailed conceptual model of our experimentalsystem is described in Section 3. Section 4 presents the proposedprototype. Section 5 shows detailed results, with observationsand discussion. Section 6 concludes the paper, and highlights thefuture vision of the proposed prototype.

2. Related works

Assisting visually impaired people is a major challenge. There aresystems for guiding blind people around, such as ‘‘Dristi” (Plesserset al., 2005; Feng, Jeusfeld, & Hoppenbrouwers, 2001). Some systemsaccept voice commands for operation by blind people, while othersuse a Braille keyboard or a touching mechanism (Kieninger, 1996;Ran, Hela, & Moore, 2004) to collect the user’s input and to give anoutput as speech by using a text-to-speech synthesizer (Manaris,McCauley, & MacGyvers, 2001; Plessers et al., 2005). Recently, afterthe web revolution, accessibility of information has become a key is-sue for visually impaired people. A simple approach is to organize theweb documents in a way that will be more meaningful, when syn-thesized by a voice reader (Rotard, Knödler, & Ertl, 2005). This is avery good approach for text-oriented documents but not good en-ough for documents which contain images. An interesting solutionis to develop a special device which can touch and feel the imagein a document (Bartolome & Gustavo, 2004). Mathematical chartsor curves, and binary or low-color images can be represented in thisway, but complications arise for critical images which consist ofshades and colors. Due to the cost as well as the performance, suchdevises probably do not provide a viable solution. In another ap-proach, annotation (text or direct voice) can be a way to describe pic-tures (Wenyin, Dumais, Sun, Czerwinski, & Field, 2001). Some

automatic annotation of pictures is possible (Lehmann et al.,2005), but the annotation is still dependent on the manual effort.To improve this situation, we introduce a tool that will help blindpeople to get information from pictures in documents by analyzingthe content and matching them with previously annotated pictures.To make a suitable system, we encompass several fields of research,such as: image analysis, shape recognition, document analysis, andsimilarity matching. In the following sections, we will discuss theissues related to those areas.

2.1. Image analysis

Image analysis is a vast area. We work mainly on featuresextraction to find some features that can be used for image catego-rization (normal or geometrical) and similarity matching. Featuresin this process can be varied differently; and they are classified aslocal or global features (Ding, Lam, & Ching, 1997; Kherfi, Ziou, &Bernardi, 2004). Features extraction in special-purpose images ismore suitable and accurate than in general images. Since (Jagadish,1991) has used concavity and horizontal color projection for scriptcategorization. Finding suitable features is a very challenging task.We have devised new features especially for image categorization,such as: color histogram (Jain, Duin, & Mao, 2000) or contour of ashape (Grigorescu, Petkov, & Westenberg, 2003; Ankerst, Kriegel, &Seidl, 1998) for further processing. Our main hurdles were to findrepresentative features for different classes and to prove theircapability to distinguish the images. In general, automatic catego-rization as a mapping of images into pre-defined classes involvesthree basic principles (Kherfi et al., 2004):

(i) The extraction of appropriate features to describe the imagecontent;

(ii) The selection of the best features to be used for distinguish-ing the images;

(iii) The training and evaluation of the classifier.

In the proposed prototype, we use six features for distinguish-ing the images and we defined four of them. Section 3 gives moredetails about these features. We also use some domain-specificfeatures in analyzing the geometrical images (e.g. the length andthe width of a bar in the bar-charts).

2.2. Shape recognition

After categorizing the images into either a ‘‘Normal Image” (NI)or a ‘‘Geometrical Image” (GI), the system can describe them. A GI

T. Helmy et al. / Computers in Human Behavior 27 (2011) 677–693 679

can be recognized from the extracted parametric information (e.g.radius and center position of a circle, and length, width and centerposition of a rectangle). There are some advanced techniques forrecognizing shapes from an image (Zukas & Price, 2008; Aggarwal& Philip, 2001). Advanced techniques such as Neural Networks canbe used for this purpose. To extract different objects from the fig-ure, color difference from background can be used as an indicationof shape existence. Relative positioning of objects in an image isalso an important feature (Yang, 1999). We use these techniquesto handle a GI, and we use row and column scanning to get the rel-ative positions of objects. There are systems already developedwhich combine the above concepts, and an extensive survey onthese systems was done by Euripides, Petrakis, and Faloutsos(1997). Most of these systems are based on either annotation orsemantic text retrieval with image features. In our approach, weput greater emphasize on simple categorization by using a statisti-cal neural classifier. We determined the types of the documents(e.g. scientific, business, officials, etc.) by word tokens, and we usedthem to choose more similar pictures.

2.3. Document analysis

Any document contains an image usually has some valuable textabout the context meaning of that image. Hence, some investigationinto the content of the text can be fruitful in identifying the imagelibrary to be used in the similarity matching. There are some learn-ing and mining techniques to do the investigations (Lehmann et al.,2005; Heaiey & Jain, 1996). There are other methods for documentcategorization. The popular approach is LSI (Latent Semantic Index-ing) (Chen & Wang, 2004). LSI is capable of automatically extractingthe conceptual content of a document’s items. One of the facilities ofLSI is that it can process arbitrary character strings and it is not re-stricted to work with words only. We did not use this method as itmainly targets extensively on information collection. Another ap-proach was proposed by Aggarwal (2001) for text indexing and sim-ilarity matching based on creating a representation in terms ofconceptual word-chains. An extensive survey about document anal-ysis was reported in Sarfraz & Ghazi (2003). Most of the works in thisarea target the similarity matching between documents accordingto their conceptual semantic meaning. A practical example is theGoogle search engine.

2.4. Similarity matching

Information matching can be divided into two categories: exactmatching and similarity matching. Image matching is based on glo-bal features or local features extracted from images. Some generalfeatures are common to all kind of images, such as color histogram(global feature) and shape of silent objects (local feature), but do-main-specific knowledge gives more meaning to these features.Some systems for similarity matching, such as QBIC, and KMeDdo not use domain-specific knowledge, but they seem more mean-ingful when they incorporated domain-specific knowledge (Dinget al., 1997; Brunelli & Mich, 1999). Other systems, developed inthe medical sector, use content-based image retrieval (CBIR),where some standard features are selected to represent the con-tent of an image (Russell & Norvig, 2003; Heaiey & Jain, 1996).When an image is stored in a database, its features are extractedby a previously defined process and then stored with an indexingschema. Indexing itself is a big research issue, which is very impor-tant for faster image retrieval from a mammoth image database. Inthe matching steps, first the image is analyzed, then its features areextracted, and these features are next compared with the storedimage’s features, and finally those images which have similarfeatures are retrieved. There is some variation in the searching-by-drawing approach, where a query is given in the form of an

interactive manual drawing which is a mostly outline of thetargeted shapes. We follow the query-by-example approach forsimilarity matching in our implementation. In our model weimplement similarity matching as the key for describing an imagein text. We develop two sections: in the first, we have no domain-specific knowledge for NI; and on the second, we use very domain-specific knowledge to formulate the similarity matching for GI.

3. Conceptual model

In the proposed model, three modules that work collectively togive the final description of the images (see Fig. 2):

(1) The text analysis module;(2) The image repository module;(3) The image analysis module.

Fig. 2 shows the whole process in the proposed model. We ana-lyze both picture and text before going onto describe the picture bythe similarity matching. In Table 1, we show the sequence of thewhole process.

3.1. Text analysis module

In this module, we categorize the documents based on their textcontents. We use a Neural Network (NN) to implement this mod-ule. First, we do supervise training with some pre-categorized doc-uments. In this process, we first extract the tokens of each domain,and then we give them some weights according to their frequencyof presence in the documents, (see Fig. 3). After the training, we re-duce the number of tokens by removing the low-frequency do-mains, and we make the number of tokens the same for allcategories. Fig. 3 shows the model, where ‘T’ denotes the token,‘S’ the subject, and ‘TW’ the token weight in each subject.

3.2. Image repository module

This module handles the template images database. It takes theextracted features information from different module agents suchas document analysis, and image analysis, and image categoriza-tion agents, (see Fig. 4). It shows the diagrammatic view of thismodule. It has two different repositories of images: one for thegeneral images, and one for the geometrical images. According tothe image type obtained from the image analysis, the module se-lects one of these repositories. In the searching process, it usesthe extracted features from the image as well as the document toreduce the search domain and to find the most similar template.The image repository module is the main source to get semanticdescriptions for the NIs, and it enriches the GIs’ content descrip-tions with semantic meaning.

3.3. Image analysis module

The image analysis is the core module of the proposed systemand other modules are supporting components for this module. Itmainly consists of different component such as:

(1) Image categorization agent;(2) Geometrical image agent;(3) General image agent.

These agents implement several related concepts, including im-age categorization, simple image preprocessing, shape recognition,feature extraction, etc. We choose the agent base paradigm as itcombines knowledge and intelligence with modularity and

Text Analysis Module

Text Part Image Part

Geometrical ImageAgent

Image Repository Module

General ImageAgent

Type of The Documents

Key Words

Type of The Image

Extracted Featureinformation from Image

Similar Images with text description

Extracted Featureinformation from Image

Document withtext and image

Image CategorizationAgent

Image Analysis Module

Fig. 2. Diagrammatic presentation of the image descriptor.

Table 1Processing steps of the image descriptor.

Step 1 Splits the documents into two parts (text and image) and then sends them respectively to two different modules(the text analysis and the image analysis)

Step 2 Analyzes the document to find the category to which it belongsStep 3 The image categorization agent analyzes the images. Then according to the 6 features’ values

(described later), it classifies them into either geometrical or non-geometrical imagesStep 4 Form a geometrical image, the geometrical agent extracts the features that will be used for the similarity matchingStep 5 For other images, the general image agent extracts the features of the image that will be used for the similarity matchingStep 6 Finally, search for similar images inside the domain (tagged by the text analysis module and the image categorization

agent) with the features extracted by the geometrical image or general image agentsStep 7 Describe the image by using similar image’s annotation and the extracted features from the content


extensibility (Ndumu & Nwana, 1997; Helmy, 2006, 2007). In thenext sections, we describe these agents in details.

3.3.1. Image categorization agentThis agent differentiates between two different kinds of images:

the NI and the GI. Finding the suitable features for this purpose is atedious task. So, we observe these two kinds of images empirically,and we try to find some innovative features. We also check the pre-vious literature of other research domains for suitable traditionalfeatures to use. Finally, to discriminate between these two differ-ent classes, we find six features (a � f) as follows:

(a) Number of colors: It is calculated by counting the number ofdistinct colors present in an image.

(b) Percentage of background color: We consider the color as abackground color if it contains the maximum number ofpixels.

In optical character recognition, a common technique is colorhistogram projection on either axis. This is used for line and char-acter segmentation. We observed this method carefully, and wedevised four new features that can be used for imagecategorization:

(c) Rate of horizontal color discontinuity: We calculate the num-ber of discontinuities for a single color after projecting iton the horizontal axis, and finally we sum up all the discon-tinuities for every color.

MakeToken

T1

Tn

S1

S2

S3

S4

T1Ws1

TnWs4

Maximum SAs detected

categoryGive Document

Fig. 3. Diagrammatic view of neural document classifier.

Type

of T

he D

ocum

ent

Type

of T

he Im

age

Extracted Feature information from Image

Templates of normal images with textual annotation and their domain

type

Templates of geometrical images

with textual annotation and their

domain type

Searching Agent based on

document type, and extracted Image features

Extracted Information from Similar templates

Fig. 4. Basic components of the image repository module.

Table 2Preliminary observation for discrimination features.

Feature Name NI GI

Number of colors High LowPercentage of background color Low HighRate of horizontal color discontinuity High LowRate of vertical color discontinuity High LowRate of horizontal color overlap High LowRate of horizontal color overlap High Low


(d) Rate of vertical color discontinuity: This is same as the aboveproperty, but we take the projection on the vertical axisonly.

(e) Rate of horizontal color overlap: After projection, we check onthe horizontal overlap as the number of colors in a horizon-tal point. If it is greater than one, it means an overlap occursand we count it.

(f) Rate of vertical color overlap: This is same as the above prop-erty, but we count it on the vertical axis only.

Table 2 shows our preliminary observation. We combine allthese features and find some threshold values that can be usedfor classification. In the experimental result section, we show theempirical proof of these observations.

3.3.2. Geometrical image agentRecently, agent technologies have attracted a lot of interest in

both academe and industry. As an extension to our experience withmulti-agent-based systems (Helmy & Amamiya, 2006; Helmy,

General Image Agent

New Image

ImageCategorization

Agent

For geometrical image callagents sequentially to extract

information

Search based on extractedfeature information or Save the

information in repository

text

Agent Bar Chart: to extractinformation from Bar chart

Shape agent : to extractinformation from arbitrary

GI

Agent…….

Neural Document Classifiercategorizes the documentto find the domain of the

image

Fig. 5. Information extraction process from a geometrical image.


2006, 2007) an agent library of several agents will be used, whereeach agent has its own definition and interpretation of an image,(see Fig. 5). An image is submitted to an agent (such as the bar-chart agent) and then this agent checks the image. If it can be rec-ognized and interpreted, then the agent returns a content descrip-tion with the extracted feature values. Later, these features can beused for finding the similar image. This similar image already has asemantic description by a manual annotation which is a meaning-ful description for the submitted image. On the other hand, if theagent can not describe the image, it sends the image to otheragents in the library. This library can be extended by adding morespecialized agents to enrich it. The main two parts of the modelare: ‘‘shape monitor” and an extensible library for ‘‘shape agent”.The shape monitor handles the whole shape reorganization pro-cess. It differentiates the shapes from the background, marks theregion for each shape and asks the shape agents to give the properdefinition for each shape. The second part is a library which has acollection of distinctive ‘‘shape agents”. The shape agent identifiesthe shape properly and gives the right description according to itssize and position. Below, we will describe the shape agent for a‘‘rectangle”:

Shape structure: A rectangle has four lines as well as four points.The lines are parallel to each other, and two adjacent lines are per-pendicular to each other.

Shape definition: The formal textual description for this shapewill be used to describe the whole picture i.e. ‘‘This is a rectangle,with its position at (x, y), with a height (H) and a width (W)”.

Shape finding method: It will check the region that satisfies thedefinition of the shape and make an absolute definition accordingto the shape definition (replacing the unspecified parameter of theformal shape definition). In the same way, we can add other shapesinto the library.

This agent is good for non-related GI only. For related GI, webuild a more complicated agent such as the bar-chart agent whichcan describe bar-charts more accurately. It can find the number ofbars, their alignment, the color sequence, etc. This information isvery useful for the similarity matching by which we can give amore meaningful description of a bar-chart image. In the sameway, we can extract features from some other charts, such aspie-charts and line-charts. Table 3 shows our observation. Wefound that two kinds of matching are possible: matching in sub-domain (such as, matching of a bar-chart with a bar-chart and apie-chart with a pie-chart) and cross-matching (such as, a bar-chart with a pie-chart). To support both of them, we have donesome analysis to find the features of different domains that canbe compared with each other. In the following sections, we de-scribe both of them as intra and inter domain matching. This sim-ilarity matching expands the searching area beyond a specificcategory of GI in order to bring more meaningful descriptions fromdifferent categories.

3.3.3. Normal image agentWe mention that we categorize the images into two classes: NI

or GI. The normal image agent will be used for the NI, (see Fig. 6). It

Table 3Extractable information from different charts.

Charts Types Extractable features information

0

10

20

30

40

50

60

70

80

1 2 3 4

Series1Series2Series3

Bar-charts (1) Number of bars(2) Width of each bar(3) Height of each bar(4) Horizontal or vertical alignment of base(5) Color sequence(6) Textual annotation

1

2

3

4

Pie-charts (1) Number of slices(2) Angle of each slice(3) Area of each slice(4) Color of each slice(5) Textual annotation

0

10

20

30

40

50

60

70

80

1 2 3 4

Series1Series2Series3

Line Charts (1) Number of lines(2) Number of silent points in a line(3) Position of each silent point(4) Color of lines(5) Textual annotation


shows the execution process for the NI. There is a great difficulty inmaking a general policy to handle all of the normal images. The bestway is to categorize them into finer classes, and then to implement asingle agent for each class, where we can get a better description bythe similarity matching within a limited domain. We use a seg-mented color histogram as a simple feature for this agent, to producebetter dissimilarity than a normal color histogram. In this process,we divide the colors into different bands. For implementation, wework mainly on 256-color pictures. We divide the 256 colors into10 bands, and we divide the picture into several segments. Then,we calculate the number of color pixels for each band; choose thehighest band as the representative of that segment, and save itsinformation. We use this information later to find the similarity be-tween the new submitted images. We can use domain-specific fea-ture extraction techniques to implement individual agents for eachdomain, and this method yields better results.

4. Prototype implementation

We have developed a prototype model that provides the follow-ing facilities:

� Categorization of images into two classes, NI or GI.� Automatic image analysis and feature extraction that can be

stored into a database for future similarity matching.� Similarity searching based on image features.

� Supporting of the document classification by Neural DocumentClassifier (NDC) to reduce the search space.� Meaningful image description.

To support these facilities, we have developed the followinginterfaces:

� Image Categorization Interface (ICI),� Normal Image Descriptor Interface (NIDI),� Neural Document Classifier Interface (NDCI),� Image Descriptor Interface (IDI).

Initially, the user has to load the test image, for which the sim-ilarity matching is to be done, by typing the file name or browsingthe system repository. S/he also has to provide the associated textin the text box. By clicking on the ‘Search’ button, the user can startthe similarity searching. The system produces the result accordingto the selected options, and it shows the most similar picture. Asimilar description will be shown, and the test image’s informationcan be saved or added to the database. The operation for GI is sim-ilar to non-geometric images, except that the image file should beGI (e.g. a bar-chart, a pie-chart). The user does not need to choosethe image type, as it can be automatically selected by the imagecategorization agent. Moreover, the system categorizes the docu-ment and trains the NDC. Then the rest of steps will be done auto-matically to give the meaningful description.

Geometrical Image Agent

New Image

ImageCategorization

Agent

For normal Image call NDC firstthen call NI Agent to extract

feature information

Neural Document Classifiercategorizes the documentto find the domain of the

image

General Image Agentextract features information

as a form of segmentedcolor histogram from any

ordinary Image

Search based on extractedfeature information or Save the

information in repository

Fig. 6. Information extraction process from a normal image.


5. Experimental results and discussion

We have conducted several experiments to make a consistentevaluation of the system’s performance. In this section, we will de-scribe the results obtained while testing the proposed model invarious aspects. We have implemented all the modules in such away that they can be tested individually; such as testing the NDCand the categorization agent independently. We also tested theirperformance in combined form. In the testing phase, we targetedthe following features to be verified:

� Check the performance of the six features individually.� Check the performance with different combinations.� Find a suitable threshold value as well as a precise formula to be

used later for classification.� Test the performance of the NDC.� Verify the similarity matching improvement with NDC.

In the experiments, we have taken 50 images from http://ima-ges.search.google.com, 100 images from http://images.search.ya-hoo.com, and 200 images from http://search.msn.com/images forGI. We have taken 350 images from national geographies http://www.ngsassignments.com/ for NI. This means a total of 700images have been used. First, we have taken visually clear pictures(image dataset one) and then we tested how these six featureswork. Second, we have taken some geometrical images that arenot clear and confusing (image dataset two) and tested how thesesix features work to categorize the impurity and noise. Finally, wedid rigorous tests by collecting various kinds of GI (i.e. 3D chartswith shading and impurity) and clear NI (dataset three). For brev-ity, we will show the results with discussion for different combina-tions of the six features.

We have calculated the number of colors by counting the dis-tinct colors present in an image. In this process, we found some dif-ficulties as some pictures contain 24-bit colors which sometimesproduce as many colors as the number of pixels in the picture.To reduce this complexity, we convert all the images into 256gray-level images before extracting feature values. We used for-

mula 1 to convert the RGB image to the gray image, where R, Gand B represent the Red, Green and Blue components values of apixel color.

Gray value ¼ 0:299 � R þ 0:587 � Gþ 0:114 � B ð1Þ

Selecting a color as a background is crucial because the back-ground of an image is considered as a presence of non-silent ob-jects (like the green field in a football match image) which isreally difficult to extract. However, a simple assumption is possiblebecause the background color is the color which contains the max-imum number of pixels. This assumption fits closely with our pur-pose. On the other hand, we can count this feature as a percentageof the most containing color in an image.

The rate of horizontal/vertical color discontinuity can be calcu-lated by the number of breakings for a single color after it is pro-jected on a horizontal/vertical axis, and then by, sum up all thebreakings for each color. After projection of the images on horizon-tal/vertical axis, we calculate the horizontal overlap as the numberof colors in a horizontal point. If it is greater than one, it means anoverlap occurs and we count it.

We used different formulae for combining the features with thefollowing mathematical notation:

s Number of colors (NC) – g,s Percentage of the background color (BK) – b,s Rate of the horizontal color discontinuity (HD) – a1,s Rate of the vertical color discontinuity (VD) – a2,s Rate of the horizontal color overlap (HO) – h1,s Rate of the vertical color overlap (VO) – h2,s Image height (H) – k,s Image width (W) – x.

5.1. Combination of number of color and background color

We use formula 2 to calculate the Combined Value (CV). Here,we take the inverse of the background color percentage, as wewant to get the CV high for the NI and low for the GI. It also sup-ports our preliminary observation.

http://images.search.google.com

http://images.search.google.com

http://images.search.yahoo.com

http://images.search.yahoo.com

http://search.msn.com/images

http://www.ngsassignments.com/

http://www.ngsassignments.com/

Comparision of NI and GI for Number of color + Percentage of Back color values

0.000.010.010.020.020.030.030.040.04

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99Image index

CV

NIGI

Fig. 7a. Result for combined NC and BK on image dataset one.

Comparision of NI and GI for Number of color + Percentage of Back color values

0.00

0.01

0.01

0.02

0.02

0.03

0.03

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46Image index

CV

NIGI

Fig. 7b. Result for combined NC and BK on image dataset two.


CV ¼gþ 1

b

� �k�x

ð2Þ

(See Fig. 7), it shows a very interesting observation. For the im-pure pictures, the results do not improve. For the clear pictureshowever, it yields 100% accuracy. So, these combined featurescan be used as a categorizer when the presented images are sureto be clear.

5.2. Combination of NC, BK, HD and VD

We combined the number of colors (NC), the percentage ofbackground color (BK), and the horizontal and vertical discontinu-ity (HD/VD) according to the formula 3.

CV ¼ðgþ 1

b þ a1 þ a2Þk�x

ð3Þ

Comparision of NI (NC+1/BK+HD+V

0.00

0.20

0.40

0.60

0.80

1.00

1 8 15 22 29 36 43

Imag

CV

NIGI

Fig. 8a. Result for combined NC, BK, HD and VD (nor

As shown in Fig. 8, this increases the categorization accuracy forthe impure pictures up to 88%. For clear images its accuracy ishigher than 98%, but it is less than the previous combination(according to formula 2).

5.3. Combination of NC, BK, HO and VO

We combined the number of colors (NC), the percentage ofbackground color (BK), and the horizontal and vertical overlap(HO/VO) according to the formula 4.

CV ¼ðgþ 1

bþ h1 þ h2Þk�x

ð4Þ

As shown in Fig. 9, this increases the categorization accuracy forthe impure pictures up to 88%. For the clear images, its accuracy is100%.

and GI for Combine D)/(H*W) values

50 57 64 71 78 85 92 99

e index

malization by image area) on image dataset one.

Comparision of NI and GI for Combine (NC+1/BK+HD+VD)/(H*W) values

0.000.100.200.300.400.500.600.700.800.90

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46Image Index

CV

NI

GI

Fig. 8b. Result for combined NC, BK, HO and VO (normalization by image area) on image dataset two.

Comparision of NI and GI for Combine (NC+1/BK+HO+VO)/(H*W) values

0.00

0.200.40

0.60

0.80

1.001.20

1.40

1.60

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99Image index

CV

NI

GI

Fig. 9a. Result for combined NC, BK, HO and VO (normalization by image area) on image dataset one.

Comparision of NI and GI for Combine (NC+1/BK+HO+VO)/(H*W) values

0.000.200.400.600.801.001.201.401.60

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46Image Index

CV

NI

GI

Fig. 9b. Result for combined NC, BK, HO and VO (normalization by image area) on image dataset two.


5.4. Combination of NC, HD, VD, HO and VO

We combined the number of colors (NC), the horizontal andvertical discontinuity (HD/VD) and the horizontal and verticaloverlap (HO/VO) according to the formula 5.

CV ¼ ðgþ a1 þ a2 þ h1 þ h2Þk�x

ð5Þ

As shown in Fig. 10, this increases the categorization accuracyfor the impure pictures up to 88%. For the clear images, its accuracyis 100%.

5.5. Combination of HD, VD, HO and VO

We combined the horizontal and vertical discontinuity (HD/VD)and the horizontal and vertical overlap (HO/VO) according to theformula 6.

CV ¼ ða1 þ a2 þ h1 þ h2Þk�x

ð6Þ

As shown in Fig. 11, this increases the categorization accuracyfor the impure pictures up to 88%. For the clear images, its accuracyis 100%. Accuracy is drastically reduced for dataset three, where ityields only 60%.

Comparision of NI and GI for Combine (NC+HD+VD+HO+VO)/(H*W) values

0.00

0.50

1.00

1.50

2.00

2.50

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99Image index

CV

NIGI

Fig. 10a. Result for combined NC, HD, VD, HO and VO on image dataset one.


0.00

0.50

1.00

1.50

2.00

2.50

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46Image Index

CV

NIGI

Fig. 10b. Result for combined NC, HD, VD, HO and VO on image dataset two.

Comparision of NI and GI for Combine features (HD+VD+HO+VO)/(H*W) values

0

0.5

1

1.5

2

2.5

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96Image index

CV

NIGI

Fig. 11a. Result for combined HD, VD, HO and VO on image dataset one.


0.00

0.50

1.00

1.50

2.00

2.50

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46Image Index

CV

NIGI

Fig. 11b. Result for combined NC, HD, VD, HO and VO on image dataset two.


Comparision of NI and GI for Combine features (HD+VD+HO+VO)/(H*W) values

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

1.80

1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196Image index

CV

NIGI

Fig. 11c. Result for combined NC, HD, VD, HO and VO on image dataset three.


5.6. Combination of all features

Finally, in this section, we will show the normalized CombinedValues (CV) of all features according to formulas 7 and 8.

CV ¼ g256þ 1� b

100

� �þ ða1 þ a2 þ h1 þ h2Þ

k�xð7Þ

The results shown in Fig. 12 do not look promising, as we gotbetter results for the datasets one and two by using otherapproaches. However, dataset three, which contains more chal-lenging images including 3D charts, produces significant improve-ment over the other approaches.

CV ¼ðgþ 1

b þ a1 þ a2 þ h1 þ h2Þk�x

ð8Þ

The results for the formula 8 are shown in Fig. 13. The normal-ized combined feature values clearly show more discrimination aswell as more categorization accuracy. Formula 8 yields up to 89%categorization accuracy for the impure images (dataset one) and100% categorization accuracy for the clear images (dataset two),but its performance for dataset three, which contains criticalimages like 3D charts with shading and impurity, decreases signif-

Comparision of NI andNC/256+(1-BK/100)+(HD

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 5 9 13 17 21 25 29 33 37 41 45Imag

CV

NIGI

Fig. 12a. Result for all combined features on i

icantly. According to our observation, in most of the cases where apicture is large, this simple normalizing approach has very insignif-icant values for g and b. To balance it, we formulize Eq. (7) where gis divided by 256, as this is the maximum number of colors in thegray scale, and b is divided by 100, as this is the maximum percent-age value. All of these efforts aim to find a suitable combination ofthe features with perfect threshold values to make a fine classifierthat can discriminate geometrical images from other images. Thisis an important step, because a geometrical image has more objec-tive information than an ordinary image, which is preciselyextractable.

In Table 4, we summarize the accuracy results in percentage.According to the testing and observation described in the abovesections, we found that the combined approaches produce moreaccurate results. Notice that there are some results for datasettwo with 100% accuracy as the dataset two consists of only clearhigh-quality images, while for both dataset one and three the max-imum accuracy is 89%. In Table 5, we show the different thresholdvalues that yield the maximum categorization rate for bothdatasets.

Now the question is, which approach we should choose, andwhat is the threshold value for that approach. We get the best re-sults by selecting the first approach with a threshold 2.8, which

GI for Combine features +VD+HO+VO)/(H*W) values

49 53 57 61 65 69 73 77 81 85 89 93 97e Index

mage dataset one according to formula 7.

Comparision of NI and GI for Combine features NC/256+(1-BK/100)+(HD+VD+HO+VO)/(H*W) values

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49Image Index

CV

NIGI

Fig. 12b. Result for all combined features on image dataset two according to formula 7.

Comparision of NI and GI for Combine features NC/256+(1-BK/100)+(HD+VD+HO+VO)/(H*W) values

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

1 16 31 46 61 76 91 106 121 136 151 166 181 196Image index

CV

NI

GI

Fig. 12c. Result for all combined features on image dataset three according to formula 7.


produces 80%, 78% and 70% accuracy for the datasets one, two andthree, respectively. It is obvious that, if we take clearer pictures,then CV value will be reduced and the accuracy will be increased.In this case, the CV value .4 is perfect as a threshold for the 2nd ap-proach. We have checked different combinations of these features,and it is proved now that a combination of several features yieldsmore difference than a single feature value. Their optimization ispossible. As we see in some cases, only four features are enoughto get a reasonable result. Optimization will reduce the processingburden to some extent.

5.7. Verify effects of NDC in similarity matching

To test the image similarity matching module, first we collectedfrom the Internet around 100 images of nature scenery includinglandscapes, flowers and sunset views, and we loaded them intothe system. Then, the extracted global features by ‘‘segmented col-or histogram” were stored in the database. In the next step, we

arbitrarily loaded around 63 images to find images in the databasesimilar to them. We recorded similarity up to five levels. We found100% similarity for the first similarity index as we checked onlythose images already stored in the database. For the subsequentmatching, the percentages are 68, 48, 47 and 38, respectively(see Fig. 14). The result obtained seems to be very promising aswe chose images of natural scenery only. However, the resultmay be less good when images of very different domains aresearched for the similarity matching. This problem is minimizedby using NDC to reduce the search space into a specific domain.

5.8. Verify neural document classifier

For testing the document categorization module, the system isfirst trained up with some standard data of different domains. Fivedomains are selected, namely science, sports, business, entertain-ment and politics. Texts covering each area of these domains arecopied from some standard websites for the respective domain,

Comparision of NI and GI for All Combine features (NC+1/BK+HD+VD+HO+VO)/(H*W) values

0.00

0.50

1.00

1.50

2.00

2.50

1 11 21 31 41 51 61 71 81 91Image index

CV

Normal Image

Geometrical Image

Fig. 13a. Result for all combined features on image dataset one according to formula 8.

Comparision of NI and GI for All Combine Features values(NC+1/BK+HD+VD+HO+VO)/(H*W)

0

0.5

1

1.5

2

2.5

1 5 9 13 17 21 25 29 33 37 41 45Image Index

CV

Normal Image

Geometrical Image

Fig. 13b. Result for all combined features on image dataset two according to formula 8.

Comparision of NI and GI for All Combine Features values(NC+1/BK+HD+VD+HO+VO)/(H*W)

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

1.80

1 17 33 49 65 81 97 113 129 145 161 177 193Image index

CV

Normal Image

Geometrical Image

Fig. 13c. Result for all combined features on image dataset three according to formula 8.


Table 4Comparison of various combined approaches based on six features.

Accuracy withdataset one(%)

Accuracy withdataset two(%)

Number of colors 70 80Percentage of the background color 60 80Rate horizontal discontinuity 75 80Rate vertical discontinuity 80 84Rate horizontal overlap 70 82Rate vertical overlap 70 84NC + 1/BK 75 100NC + 1/BK + HD + VD 78 97NC + 1/BK + HO + VO 88 100NC + HD + VD + HD + VD 88 100HD + VD + HD + VD 88 100All features

CV ¼ g256þ ð1�

b100Þ þ

ða1þa2þh1þh2Þk�x

87 96

All features CV ¼ ðgþ1bþa1þa2þh1þh2Þ

k�x89 100

Table 5Different threshold values for selected approaches.

Datasetone

Datasettwo

Datasetthree

All features – 1st approach

CV ¼ g256þ ð1�

b100Þ þ

ða1þa2þh1þh2Þk�x

3 2.4 2.8

All features – 2nd approach

CV ¼ ðgþ1bþa1þa2þh1þh2Þ

k�x

1.2 .4 .9

Optimized – 2nd approach CV ¼ ða1þa2þh1þh2Þk�x

1.2 .4 .9

Similarity Matching o

0

20

40

60

80

100

1 2

% o

f Im

ages

Mat

ched

Simila

Fig. 14. Percentage of similarity mat

Recogi

Business

Entertainment

Sc

0

20

40

60

80

100

1 2Ca

Rec

ogni

tion

Rat

e

Fig. 15. Recognition rates of five do


such as www.bbc.com for political news, http://sports.espn.go.comand http://content-usa.cricinfo.com for sports etc. Then, the sys-tem is tested against texts copied from various sources includingsome local newspapers. The recognition rates for the chosen fivedomains are shown in Fig. 15, when the database is enriched witharound 2000 tokens per domain.

The effect of database size on the recognition rate is measuredby deleting low-frequency data from each domain. We recordedrecognition rates with 2000, 1000 and 500 tokens for each domain,(see Fig. 16). As shown in Fig. 16, the recognition rate decreaseswith the decreasing number of the tokens, indicating that we needto get enough data from each domain, so that any arbitrary testdocument can be classified correctly. However, we can reducethe number of tokens by choosing the test data carefully.

The memory and time requirements for storing the tokens andclassifying a document against different database sizes are also re-corded, (see Fig. 17). The memory requirement is shown in Fig. 17,which confirms the obvious reduction of memory requirementswhen the number of tokens is reduced. However, we found thatthe recognition time is almost invariant with the number of tokens.This is possibly due to our use of SQL statement, to first select thedata from the database, and then to compute the parameters asso-ciated with recognition. However, this might vary if the databasesize varies widely.

5.9. Improvement due to NDC

Finally, we recorded the improvement of similarity matching byusing the hybrid model. We found that the similarity matching rateis improved by around 80%, 20%, 20% and 50% in case of the sports,politics, science and entertainment test images, respectively (see

f Non-geometric Images

3 4 5rity Index

ching of non-geometric images.

nition Rate

ience Politics

Sports

3 4 5tagories

mains with 2000 tokens each.

http://www.bbc.com

http://sports.espn.go.com

http://content-usa.cricinfo.com

Improvement in Similarity Matching Rate

0

20

40

60

80

100

Entertainment Politics Science Sports

Impr

ovem

ent o

f Mat

chin

g R

ate

%

Fig. 18. Percentage of improvement in similarity matching after document categorization.

Memory Requirements

500 each

1000 each

2000 each

0

200

400

600

800

1000

1 2 3Token Number

Mem

ory

Req

uire

d

Fig. 17. Variation of memory requirements for the change of database size.

Recognition Rate vs Databased Size

500 each1000 each

2000 each

0

20

40

60

80

100

1 2 3Database Size( token numbers) for 5 subjects

Rec

ogni

tion

Rat

e

Fig. 16. Effect of the change of database size on the document recognition rate.


Fig. 18). This result proved that using NDC will increase the accu-racy rate of matching.

6. Conclusion and future work

Work for visually impaired people is always a challenging task,and the non-existence of related work makes it hard for us to imple-ment our approach. We think our work is a pioneer work in this do-main, and this makes some challenges harder to implement. We

implemented and tested a hybrid model for the similarity matchingof images. In addition to the analysis of local and global image fea-tures, the associated texts are also considered to categorize the gi-ven images and thus to reduce the search space. After identifyingthe domain of the test image, the similarity matching is performedwith that specific domain only. We made a classifier by using novelfeatures, devised on the bases of the color projection, to differenti-ate the geometrical images from the ordinary/normal images. Thus,we obtained a significant improvement in the similarity matching


that leads to a more accurate description of images for visually im-paired people. An important feature of the proposed model is thatits specific matching techniques suitable for a particular domaincan be developed and easily integrated into the system. As a futurework, we can enhance the proposed model in many dimensions, asit becomes an open area of research. For geometrical images, wewill increase the agent’s library by including more critical compos-ite shapes and charts that will increase the capability and accuracyof the descriptor. We can enhance this idea in 3D domain, especiallyfor regular three dimension objects such as a box and a sphere. Ourmain goal is to conduct a ground research in this domain which hasa great potential to contribute to human well-being.

Acknowledgments

We would like to thank King Fahd University of Petroleum &Minerals for providing the utilized computing facilities and sup-port. Special thanks are to Mr. David Birkett for his help in editingthe paper.

References

Aggarwal, C. C., & Philip, S. Y. (2001). Effective conceptual indexing and similaritysearch in text data. In Proceedings 1st IEEE international conference on datamining (pp. 3–10).

Ankerst, M., Kriegel, H. P., & Seidl, T. (1998). A multistep approach for shapesimilarity search in image database. IEEE Transactions on Knowledge and DataEngineering, 10(6), 996–1004.

Bartolome, H. A., & Gustavo, M. C. (2004). Audesc: translating images into words forSpanish visually impaired people. Meta XLIX, 2, 264–277.

Brunelli, R., & Mich, O. (1999). On the use of histograms for image retrieval. IEEEInternational Conference on Multimedia Computing and Systems (ICMCS’99), 2,143–147.

Chen, Y., & Wang, J. Z. (2004). Image categorization by learning and reasoning withregions. Journal of Machine Learning Research, 5, 913–939.

Ding, J., Lam, L., & Ching, Y. S. (1997). Classification of oriental and european scriptsby using characteristic features. Proceedings of the 4th International Conferenceon Document Analysis and Recognition, 2, 1023–1027.

Euripides, G., Petrakis, M., & Faloutsos, C. (1997). Similarity searching in medicalimage databases. IEEE Transactions on Knowledge and Data Engineering, 9(3),435–447.

Feng, L., Jeusfeld, M. A., & Hoppenbrouwers, J. (2001). Beyond information searchingand browsing: acquiring knowledge from digital libraries. Infolab: Tilburg.

Grigorescu, C., Petkov, N., & Westenberg, M. A. (2003). Contour detection based onnonclassical receptive field inhibition. IEEE Transactions on Image Processing,12(7), 729–739.

Heaiey, G., & Jain, A. (1996). Retrieving multispectral satellite images using physics-based invariant representations. IEEE Transactions on Pattern Analysis andMachine Intelligence, 18(8), 842–848.

Helmy, T. (2006). Towards a user-centric web portals management. Journal ofInformation Technology, 12(1), 1–15.

Helmy, T. (2007). Collaborative multi-agent-based e-commerce framework.International Journal of Computers Systems Science, 8(1), 3–12.

Helmy, T., & Amamiya, M. (2006). Multi-agent-based adaptive AV interface.International Arab Journal of Information Technology, 3(4), 291–298.

Helmy, T., & Hassan, M. M. (2005). Graph descriptor an approach to convertstandard geometrical and statistical figures into text and voice. In Proceedings ofIEE international conference on intelligent system Kuala Lumpur (pp. 180–186).

Jagadish, H. V. (1991). A retrievable technique for similar shapes. Proceedings of theACM SIGMOD international conference on management of data, Denver,Colorado, United States (pp. 208–217).

Jain, A. K., Duin, R., & Mao, J. (2000). Statistical pattern recognition – a review. IEEETransaction on Pattern analysis and Machine Intelligence, 22(1), 4–37.

Kherfi, M. L., Ziou, D., & Bernardi, A. (2004). Image retrieval from the world wideweb: issues, techniques and systems. ACM Computing Surveys, 36(1), 35–67.

Kieninger, G. T. (1996). The growing up of hyperbraille – an office workspace forblind people. ACM annual symposium on user interface (pp. 67–73).

Lehmann, T. M., Guld, M. O., Deselaers, T., Keysers, D., Schubert, H., Spitzer, K., et al.(2005). Automatic categorization of medical images for content-based retrievaland data mining. Computerized Medical Imaging and Graphics, 29, 143–155.

Manaris, B., McCauley, R., & MacGyvers, V. (2001). An intelligent interface forkeyboard and mouse control providing full access to PC functionality viaspeech. In Proceedings of 14th international Florida AI research symposium(FLAIRS-01) (pp. 182–188).

Mccrindle, R. (1999). The impact of new and distributed technologies on disabledusers. In IEE European workshop on distributed imaging (pp. 21/1–21/16).

Ndumu, D. T., & Nwana, H. S. (1997). Research and development challengesfor agent base systems. IEE Proceedings on Software Engineering, 144(1),2–10.

Plessers, P., Casteleyn, S., Yesilada, Y., Troyer, O. D., Stevens, R. & Harper, S., et al.(2005). Accessibility: a web engineering approach. In: Proceedings of the 14thinternational conference on world wide web, Japan (pp. 353–362).

Ran, L., Hela, A. & Moore, S. E. (2004). Drishti: an integrated indoor/outdoor blindnavigation system and service. In Proceedings of the second IEEE annualconference on pervasive computing and communications (pp. 23–30).

Rotard, M., Knödler, S. & Ertl, T. (2005). A tactile web browser for the visuallydisabled. In Proceedings of the sixteenth ACM conference on hypertext andhypermedia (pp. 15–22).

Russell, S., & Norvig, P. (2003). Statistical learning methods: artificial intelligence amodern approach. Prentice Hall. pp. 736–748.

Sarfraz, M. Jameel M. A. & Ghazi, S. A. (2003). Saudi Arabian license platerecognition system. In Proceedings of international conference on geometricmodeling and graphics (pp. 36–41).

Wenyin, L., Dumais, S., Sun, Y., Zhang, H., Czerwinski, M. & Field, B. (2001). Semi-automatic image annotation. In Proc. of interact: conference on HCI (pp. 326–333).

Yang, Y. (1999). An evaluation of statistical approaches to text categorization.Information retrieval. Kluwer Academic Publishers.

Zukas, A. & Price, R. J. (2008). Document categorization using latent semanticindexing. Available from http://www.contentanalyst.com/.

http://www.contentanalyst.com/

A hybrid computational model for an automated image descriptor for visually impaired users

Documents

Transcript of A hybrid computational model for an automated image descriptor for visually impaired users