Assistive Clothing Pattern Recognition for Visually Impaired People

234 IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 44, NO. 2, APRIL 2014

Assistive Clothing Pattern Recognitionfor Visually Impaired People

Xiaodong Yang, Student Member, IEEE, Shuai Yuan, and YingLi Tian, Senior Member, IEEE

Abstract—Choosing clothes with complex patterns and colors isa challenging task for visually impaired people. Automatic cloth-ing pattern recognition is also a challenging research problem dueto rotation, scaling, illumination, and especially large intraclasspattern variations. We have developed a camera-based prototypesystem that recognizes clothing patterns in four categories (plaid,striped, patternless, and irregular) and identifies 11 clothing col-ors. The system integrates a camera, a microphone, a computer,and a Bluetooth earpiece for audio description of clothing patternsand colors. A camera mounted upon a pair of sunglasses is usedto capture clothing images. The clothing patterns and colors aredescribed to blind users verbally. This system can be controlledby speech input through microphone. To recognize clothing pat-terns, we propose a novel Radon Signature descriptor and a schemato extract statistical properties from wavelet subbands to captureglobal features of clothing patterns. They are combined with localfeatures to recognize complex clothing patterns. To evaluate the ef-fectiveness of the proposed approach, we used the CCNY ClothingPattern dataset. Our approach achieves 92.55% recognition accu-racy which significantly outperforms the state-of-the-art textureanalysis methods on clothing pattern recognition. The prototypewas also used by ten visually impaired participants. Most thoughtsuch a system would support more independence in their daily lifebut they also made suggestions for improvements.

Index Terms—Assistive system, clothing pattern recognition,global and local image features, texture analysis, visually impairedpeople.

I. INTRODUCTION

BASED on statistics from the World Health Organization(WHO), there are more than 161 million visually impaired

people around the world, and 37 million of them are blind [9].Choosing clothes with suitable colors and patterns is a challeng-ing task for blind or visually impaired people. They manage thistask with the help from family members, using plastic braille la-bels or different types of stitching pattern tags on the clothes, orby wearing clothes with a uniform color or without any patterns.

Manuscript received April 4, 2013; revised July 9, 2013, October 20, 2013,and January 7, 2014; accepted January 17, 2014. Date of publication February13, 2014; date of current version March 12, 2014. This work was supportedin part by the National Science Foundation under Grant IIS-0957016, GrantEFRI-1137172, the National Institutes of Health under Grant 1R21EY020990,Army Research Office Grant W911NF-09-1-0565, the Federal Highway Ad-ministration Grant DTFH61-12-H-00002, and Microsoft Research. This paperwas recommended by Associate Editor Y. Yuan.

X. Yang and Y. L. Tian are with the City College, City University of NewYork, New York, NY 10031 USA (e-mail: [email protected]; [email protected]).

S. Yuan is with Teledata Communications Inc., Islandia, NY 11749 USA(e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/THMS.2014.2302814

Automatically recognizing clothing patterns and colors may im-prove their life quality. Automatic camera-based clothing pat-tern recognition is a challenging task due to many clothing pat-tern and color designs as well as corresponding large intraclassvariations [4]. Existing texture analysis methods mainly focuson textures with large changes in viewpoint, orientation, andscaling, but with less intraclass pattern and intensity variations(see Fig. 1). We have observed that traditional texture analysismethods [3], [5], [10], [11], [15], [19], [23], [26], [29], [32] can-not achieve the same level of accuracy in the context of clothingpattern recognition.

Here, we introduce a camera-based system to help visuallyimpaired people to recognize clothing patterns and colors. Thesystem contains three major components (see Fig. 2): 1) sen-sors including a camera for capturing clothing images, a micro-phone for speech command input and speakers (or Bluetooth,earphone) for audio output; 2) data capture and analysis to per-form command control, clothing pattern recognition, and coloridentification by using a computer which can be a desktop in auser’s bedroom or a wearable computer (e.g., a mini-computeror a smartphone); and 3) audio outputs to provide recognitionresults of clothing patterns and colors, as well as system status.

In an extension to [30], our system can handle clothes withcomplex patterns and recognize clothing patterns into four cat-egories (plaid, striped, patternless, and irregular) to meet thebasic requirements based on our survey with ten blind partic-ipants. Our system is able to identify 11 colors: red, orange,yellow, green, cyan, blue, purple, pink, black, grey, and white.For clothes with multiple colors, the first several dominant colorsare spoken to users. In order to handle the large intraclass varia-tions, we propose a novel descriptor, Radon Signature, to capturethe global directionality of clothing patterns. The combinationof global and local image features significantly outperformsthe state-of-the-art texture analysis methods for clothing patternrecognition. We also show that our method achieves compara-ble results to the state-of-the-art approaches on the traditionaltexture classification problems.

This paper is organized as follows. In Section II, we sum-marize the related work on assistive techniques for visuallyimpaired people and the research work on texture analysis. Thecomputations of global and local features for clothing patternrecognition are described in Section III. Section IV introducesthe system and interface design. The details of clothing pat-tern recognition and color identification are demonstrated inSection V. Section VI presents our experimental results on achallenging clothing pattern dataset and a traditional texturedataset. Section VII describes the preliminary evaluations byblind users. Section VIII concludes the paper.

2168-2291 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

YANG et al.: ASSISTIVE CLOTHING PATTERN RECOGNITION FOR VISUALLY IMPAIRED PEOPLE 235

Fig. 1. Intraclass variations in clothing pattern images and traditional textureimages. (a) Clothing pattern samples with large intraclass pattern and color vari-ations. (b) Traditional texture samples with less intraclass pattern and intensityvariations.

Fig. 2. Overview and architecture design of the camera-based clothing patternrecognition system for blind and visually impaired persons.

II. RELATED WORK

Assistive systems are being developed to improve the lifequality and safety for those with special needs [2], [6], [7],[16], [20]–[22], [27], [28], [30], [31], including indoor navi-gation and way finding, display reading, banknote recognition,rehabilitation, etc. Liu et al. [12] built a clothing recommen-dation system for specific occasions (e.g., wedding or dating).Hidayati et al. [14] proposed a method for genre classificationof upper-wear clothes. The two systems are both designed with-out considering key factors for blind users [1]. Yuan et al. [31]developed a system to assist blind people to match clothes froma pair of clothing images. This system can provide a user withthe information about whether or not the clothing patterns andcolors match. However, this system is not able to automaticallyrecognize clothing patterns.

Texture provides essential information for many image classi-fication tasks including clothing pattern recognition. Some earlyresearch on texture recognition [3], [5], [10], [15], [19], [23] fo-cused on the analysis of global two-dimensional (2-D) imagetransformations including in-plane rotation and scaling. Dueto the lack of invariance to general geometric transformations,these approaches cannot effectively represent texture imageswith large 3-D transformations such as viewpoint change andnonrigid surface deformation. Multifractal analysis [25], [26]

has achieved good resilience to 3-D deformations. Texture rep-resentations based on this method benefit from the invarianceof fractal dimensions to geometric transformations. For exam-ple, multi-fractal spectrum (MFS) proposed by Xu et al. [26]combined fractal dimensions of pixel sets grouped by densityfunctions and orientation templates. In order to make representa-tions of texture more robust to 3-D image transformations (e.g.,viewpoint change and nonrigid surface deformation) and illumi-nation variation, most of the recent methods reply on extractinglocal image features [11], [29], [32]. A texton dictionary is thengenerated by clustering the extracted local features. However,multiple features are able to capture properties of an image indifferent aspects. If different features are highly complementary,their combination will improve the feature representation. Forexample, Lazebnik et al. [11] proposed a texture representationmethod based on affine-invariant detectors (Harris and Lapla-cian) and descriptors (RIFT and SPIN). Zhang et al. [32] alsocombined scale invariant feature transform (SIFT) and SPIN fortexture classification.

Unlike existing traditional texture images, clothing patternscontain much larger intraclass variations within each pattern cat-egory. Although many computer vision and image processingtechniques have been developed for texture analysis and classi-fication, traditional texture analysis methods cannot effectivelyrecognize clothing patterns. Here, we develop a camera-basedsystem specifically for visually impaired people to help themrecognize clothing patterns and colors.

III. IMAGE FEATURE EXTRACTION FOR CLOTHING

PATTERN RECOGNITION

Some clothing patterns present as visual patterns character-ized by the repetition of a few basic primitives (e.g., plaids orstripes). Accordingly, local features are effective to extract thestructural information of repetitive primitives. However, due tolarge intraclass variance, local primitives of the same clothingpattern category can vary significantly (see Fig. 1). Global fea-tures including directionality and statistical properties of cloth-ing patterns are more stable within the same category. Therefore,they are able to provide complementary information to localstructural features. Next, we present extractions of global andlocal features for clothing pattern recognition, i.e., Radon Sig-nature, statistical descriptor (STA), and scale invariant featuretransform (SIFT).

A. Radon Signature

Clothing images present large intraclass variations, whichresult in the major challenge for clothing pattern recognition.However, in a global perspective, the directionality of clothingpatterns is more consistent across different categories and canbe used as an important property to distinguish different cloth-ing patterns. As shown in Fig. 8, the clothing patterns of plaidand striped are both anisotropic. In contrast, the clothing pat-terns in the categories of patternless and irregular are isotropic.To make use of this difference of directionality, we propose anovel descriptor, i.e., the Radon Signature, to characterize thedirectionality feature of clothing patterns.


Fig. 3. Computation of RadonSig. (a) An intensity image of clothing pattern.(b) Radon transform performed on a maximum disk area within the gradient mapof (a). (c) Result of Radon Transform. (d) Feature vector of Radon Signature.

Radon Signature (RadonSig) is based on the Radon transform[8] which is commonly used to detect the principle orientation ofan image. The image is then rotated according to this dominantdirection to achieve rotation invariance. The Radon transformof a 2-D function f (x, y) is defined as

R (r, θ) =∫ ∞

−∞

∫ ∞

−∞f (x, y) δ (r − x cos θ − y sin θ) dxdy

(1)where r is the perpendicular distance of a projection line tothe origin and θ is the angle of the projection line, as shownin Fig. 3(b). To retain the consistency of Radon transform fordifferent projection orientations, we compute the Radon trans-form based on the maximum disk area instead of the entireimage. The large intraclass variations of clothing patterns alsoreflect as images in the same category present large changes ofcolor or intensity. To reduce the intensity variations, we use theSobel operator to compute the gradient map as f (x, y) in (1).Fig. 3(b) illustrates the Radon transform over a disk area of gra-dient map. R (r, θ) in (1) is a function with two parameters of rand θ, as shown in Fig. 3(c). The directionality of an image canbe represented by Var (r, θi), the variances of r under a certainprojection direction θi :

Var (r, θi) =1N

N −1∑j=0

(R (rj , θi) − μ (r, θi))2 (2)

μ (r, θi) =1N

N −1∑j=0

R (rj , θi) (3)

where R (rj , θi) is the projection value at perpendicular distanceof rj and projection direction of θi ; μ (r, θi) is the expected valueof R (r, θi); N is the number of sampling bins in each projectionline. The RadonSig is formed by the variances of r under all

Fig. 4. Clothing patterns samples and associated RadonSig descriptors.

sampling projection directions:

[Var (r, θ0) ,Var (r, θ1) , . . . ,Var (r, θT −1)]

where T is the number of sampling projection directions. Itdetermines the feature dimension of Radon Signature. As theRadonSig is a nonsparse representation, we employ the L2-normto normalize the feature vector. The principle directions of theimage in Fig. 3(a) correspond to the two dominant peaks in theRadonSig in Fig. 3(d).

Fig. 4 illustrates RadonSig descriptors of four sample imagesfrom different clothing pattern categories. The plaid patternshave two principle orientations; the striped ones have one prin-ciple orientation; as for the patternless and the irregular images,they have no obvious dominant direction, but the directionalityof the irregular image presents much larger variations than thatof the patternless image. Accordingly, there are two dominantpeak values corresponding to two principle orientations in theRadonSig of the plaid image. The RadonSig of the striped imagehas one peak value associated with the one principle orientation.There is no dominant peak value in the irregular and the pattern-less cases. But the RadonSig of the patternless image is muchsmoother than that of the irregular image.

B. Statistics of Wavelet Subbands

The discrete wavelet transform (DWT) decomposes an im-age I into low-frequency channel Dj (I) under a coarser scaleand multiple high-frequency channels under multiple scalesWk,j (I) ; k = 1, 2, 3; j = 1, 2, . . . , J , where J is the numberof scaling levels. Therefore, in each scaling level j, we havefour wavelet subbands including one low-frequency channelDj (I) and three high-frequency channels Wk,j (I). The high-frequency channels Wk,j (I) ; k = 1, 2, 3 encode the disconti-nuities of an image along horizontal, vertical, and diagonal di-rections, respectively. In this paper, we apply J = 3 scalinglevels of DWT to decompose each clothing image, as shown inFig. 5.

Statistical features are well adapted to analyze textures whichlack background clutter and have uniform statistical properties.DWT provides a generalization of a multiresolution spectralanalysis tool. Therefore, we extract the statistical features fromwavelet subbands to capture global statistical information ofimages at different scales. It is customary to compute the singleenergy value on each subband [24]. In this paper, we employ


Fig. 5. The computation of STA on wavelet subbands. Three levels of waveletdecomposition are applied to a clothing image. Each decomposition level in-cludes four wavelet subbands of original, horizontal, vertical, and diagonalcomponents arranged from the close to the distant in each level. Four statisticalvalues calculated in each wavelet subband are concatenated to form the finaldescriptor.

four statistical values including variance, energy, uniformity,and entropy to all wavelet subbands. Thus, the STA is a featurewith the dimension of 48 (3 × 4 × 4). The four normalizedstatistical values extracted from each wavelet subband can becomputed by the following equations:

variance =L−1∑i=0

(zi − m)2 p (zi) / (L − 1)2 (4)

energy =L−1∑i=0

(zi − m)3 p (zi) / (L − 1)2 (5)

uniformity =L−1∑i=0

p2 (zi) (6)

entropy = −L−1∑i=0

p (zi) log2 p (zi) (7)

where zi and p (zi) , i = 0, 1, 2, . . . , L − 1 is the intensity leveland corresponding histogram; L is the number of intensity lev-els; m =

∑L−1i=0 zip (zi) is the average intensity level.

C. Scale Invariant Feature Transform Bag of Words

The local image features are well adapted to a number ofapplications, such as image retrieval, and recognition of object,texture, and scene categories [32], as they are robust to partialocclusion, cluttered background, and viewpoint variations. Thishas motivated the development of several local image featuredetectors and descriptors. Generally, detectors are used to de-tect interest points by searching local extrema in a scale space;descriptors are employed to compute the representations of in-terest points based on their associated support regions. In this

Fig. 6. Process of local image feature extraction.

paper, the uniform grids are used as the interest points samplingstrategy, as more sophisticated detectors tend to saturate andfail to provide enough interest points, especially for the texture-less images [18]. The evenly sampled interest points are thenrepresented by SIFT descriptors.

We choose the SIFT descriptor as the representation of in-terest points based on the following reasons: 1) the descriptorwith 128 dimensions is compact and fairly distinctive; 2) therepresentation with careful design is robust to variations in il-lumination and viewpoints; 3) an extensive comparison againstother local image descriptors observed that the SIFT descriptorperformed well in the context of image matching [17]. The bag-of-words (BOW) [18] method is further applied to aggregateextracted SIFT descriptors by labeling each SIFT descriptor asa visual word and counting frequencies of each visual world.The local feature representation of an image is therefore repre-sented as the histogram of the quantized SIFT descriptors. Weperform L2-norm and inverse document frequency (IDF) nor-malization of BOW histograms. Fig. 6 demonstrates the processof local features extraction.

IV. SYSTEM AND INTERFACE DESIGN

The camera-based clothing recognition aid prototype forblind people integrates a camera, a microphone, a computer,and a Bluetooth earpiece for audio description of clothing pat-terns and colors. A camera mounted upon a pair of sunglasses isused to capture clothing images. The clothing patterns and col-ors are described to blind users by a verbal display with minimaldistraction to hearing. The system can be controlled by speechinput through a microphone.

In order to facilitate blind users to interact, speech commandsinput from a microphone are used to provide function selectionand system control. As shown in Fig. 7, the interface designincludes basic functions and high priority commands.

Basic functions: A blind user can verbally request the func-tion he/she wants the clothing recognition aid to perform. Therecognition results will be presented to the blind user as audiooutputs including recognized, not recognized, and start a newfunction. As for the recognized function, the next level func-tions include pattern/colors to announce the recognized clothing


Fig. 7. System interface design for the proposed camera-based clothing patternrecognition system by using speech commands. The high priority commandscan be used at any time to overwrite the basic functions.

pattern and dominant colors; repeat results to repeat the recog-nized result; and save result to save the clothing image withassociated pattern and color information in the computer.

High priority commands: A blind user can set the systemconfiguration by several high priority speech commands suchas system restart, turn-off system, stop function (i.e., abort cur-rent task), speaker volume and speed control commands (e.g.,louder, quieter, slower, and faster), and help. The high prioritycommands can be used at any time. A user can speak help, andthe clothing recognition system will respond with the optionsassociated with the current function. Bone conducted earphonesor small wireless Bluetooth speakers can be employed to protectprivacy and minimize background sounds. The battery level willalso be checked and an audio warning is provided if the batterylevel is low.

Audio output: As for audio display, we use an operating sys-tem speech facility that is standard in modern portable computersystems and smartphones. We currently use Microsoft SpeechSoftware Development Kit which supports scripts. A numberof configuration options are also available according to userpreference, such as speech rate, volume, and voice gender.

V. RECOGNIZING CLOTHING PATTERNS AND COLORS

The extracted global and local features are combined to rec-ognize clothing patterns by using a support vector machines(SVMs) classifier. The recognition of clothing color is imple-mented by quantizing clothing color in the HIS (hue, saturation,and intensity) space. In the end, the recognition results of bothclothing patterns and colors mutually provide a more preciseand meaningful description of clothes to users.

A. Clothing Pattern Recognition

In our system, we empirically set the size of the visual vo-cabulary to 100. We apply three scaling levels to decomposeclothing images. The statistics of wavelet subbands features aretherefore formed by a vector with a dimension of 48. In thecomputation of the Radon Signature, we evenly sample 60 pro-

TABLE ICLOTHING PATTERNS AND DOMINANT COLORS

jection directions from 1◦ to 180◦. The feature vector of theRadonSig has a dimension of 60. We combine all the global andlocal features by concatenating corresponding feature channelstogether. The combined feature vector has a dimension of 208.

The combined feature vector is used as the inputs of SVMsclassifier with RBF kernel. In our experiments, the optimal pa-rameters of RBF kernel are found by 5-fold cross-validation,and the one-versus-one scheme is used.

B. Clothing Color Identification

Clothing color identification is based on the normalized colorhistogram of each clothing image in the HSI color space. Thekey idea is to quantize color space based on the relationshipsbetween hue, saturation, and intensity. In particular, for eachclothing image, our color identification method quantizes thepixels in the image to the following 11 colors: red, orange,yellow, green, cyan, blue, purple, pink, black, grey, and white.

The detection of colors of white, black, and gray is based onthe saturation value S and intensity value I . If the intensity Iof a pixel is larger than a upper intensity threshold IU , and thesaturation S is less than a saturation threshold ST , the color ofthe pixel is “white.” Similarly, the color of a pixel is determinedto be black if the intensity I is less than a lower intensity boundIL and saturation S is less than ST . For the remaining valuesof I while S is less than ST , the color of a pixel is identifiedas gray. For other colors (i.e., red, orange, yellow, green, cyan,blue, purple, and pink), the hue values are employed. The hueH can be visualized as a 360◦ color wheel. We quantize thecolor of red in the range of 345◦−360◦ and 0◦−9◦, orangeas 10◦−37◦, yellow as 38◦−75◦, green as 76◦−160◦, cyan as161◦−200◦, blue as 201◦−280◦, purple as 281◦−315◦, andpink as 316◦−344◦. The weight of each color is the percentageof pixels belonging to this color.

If a clothing image contains multiple colors, the dominant col-ors (i.e., pixels larger than 5% of the whole image) will be output.The clothing patterns and colors mutually provide complemen-tary information. As shown in Table I, the recognized patternsprovide additional information about how different colors arearranged, e.g., striped clothes with blue and white color. We testcolor identification in a clothing color matching experiment. Ifthe dominant colors of a pair of clothing image are the same,the two clothing images are determined as color matched. Theproposed color identification achieves 99% matching accuracyin the experiment. More details can be found in [31].


Fig. 8. Four sample images of four clothing patterns categories.

VI. CLASSIFICATION EXPERIMENTS

In this section, we evaluate the performance of the proposedmethod on two different datasets: 1) the CCNY Clothing Patterndataset with large intra-class variations to evaluate our proposedmethod and the state-of-the-art texture classification methods,and 2) the UIUC Texture dataset to validate the generaliza-tion of the proposed approach. Our experiments focus on theevaluation and validation of 1) the complementary relationshipsbetween the proposed global and local feature channels; 2) thesuperiority of our proposed method over the state-of-the-art tex-ture classification approaches in the context of clothing patternrecognition; and 3) the generalization of our approach on thetraditional texture classification.

A. Datasets

CCNY clothing pattern dataset: This dataset includes 627images of four different typical clothing pattern designs: plaid,striped, patternless, and irregular with 156, 157, 156, and 158images in each category. The resolution of each image is down-sampled to 140 × 140. Fig. 8 illustrates sample images in eachcategory. As shown in this figure, in addition to illuminationvariances, scaling changes, rotations, and surface deformationspresented in the traditional texture dataset, clothing patterns alsodemonstrate much larger intraclass pattern and color (intensity)variations, which augment the challenges of recognition. Theclothing pattern dataset can be downloaded via our researchwebsite.1

UIUC texture dataset: We also evaluate the proposed methodon the UIUC Texture dataset [11] which is a well-establishedtraditional texture dataset. It contains 1000 uncalibrated andunregistered images with the resolution of 640 × 480. Thereare 25 texture classes with 40 images for each class. The textureimages present rotation, scaling and viewpoint change, and non-rigid surface deformation. Fig. 9 shows four sample images offour texture classes in this dataset. As shown in Fig. 8 and Fig. 9,the traditional texture dataset mainly focuses on the geometric

1media-lab.engr.ccny.cuny.edu/data

Fig. 9. Four sample images of four sample texture classes in the UIUC Texturedataset.

changes of texture surfaces, but with less intraclass pattern andintensity variations compared to the clothing images.

B. Experiments and Discussions on Clothing PatternRecognition

1) Experimental Setup: In our implementation, the trainingset is selected as a fixed-size random subset of each class and allremaining images are used as the testing set. To eliminate the de-pendence of the results on the particular training images used, wereport the average of the classification rates obtained for 50 ran-domly selected training sets. This ensures all the classificationrates are statistically comparable. The recognition performanceis measured by the average classification accuracy.

2) Effectiveness of Different Features and Combinations:We first evaluate and demonstrate the complementary relation-ships between global and local features on clothing pattern im-ages. A combination of multiple features may obtain better re-sults than any individual feature channel. However, a combina-tion of features that are noisy, contradictory, or overlapping interms of class distribution could deteriorate the performance ofclassification.

To validate the effectiveness of the proposed features, wefirst evaluate the complementary relationships between differ-ent feature channels including global features of the RadonSigand statistics of wavelet subbands (STA), and local features(SIFT). SIFT represents the local structural features; STA isthe global statistical characteristics; and RadonSig captures theproperty of global directionality. Fig. 10 displays the recogni-tion results of different features as a function of training setsize. For individual feature channels, SIFT and STA achievecomparable recognition accuracies. While the results based ona single channel of the RadonSig are worse than that of SIFTor STA, the performance of SIFT+RadonSig is better than thatof SIFT+STA. Both of them outperform any individual fea-ture channel. Therefore, for clothing patterns recognition, theglobal and local feature combination of SIFT and RadonSig ismore effective than that of SIFT and STA. Furthermore, the


Fig. 10. Comparative evaluation on recognition accuracies of different featurechannels and their combinations versus percentage of training samples.

TABLE IIRECOGNITION ACCURACY (%) OF DIFFERENT FEATURE CHANNELS UNDER

DIFFERENT VOLUMES OF TRAINING SETS (10%, 30%, 50%, 70% OF THE

DATASET USED FOR TRAINING) ON CLOTHING PATTERN DATASET

combination of all three feature channels further improves therecognition results and dominates in all of different training setsizes.

The comparisons of different feature channels and their com-binations validate our intuition that the effectiveness and com-plementarities of our proposed feature channels. The detailedrecognition accuracies of Fig. 10 are listed in Table II. Thepercentages of training images per class are 10%, 30%, 50%,and 70%, respectively. As shown in Table II and Fig. 10, therecognition accuracy of SIFT+STA+RadonSig using 30% ofthe images as the training set is comparable or even better thanthat of other feature channels using 70% of the images as thetraining set. This observation demonstrates another merit of ourproposed approach that it is able to achieve a desirable result byusing much less training data.

3) Comparison With the State-of-the-art Texture AnalysisMethods for Clothing Pattern Recognition: We further com-pare the overall performance of our proposed method with thatof the state-of-the-art methods including MFS [26], SIFT [13],(H+L)(S+R) [11], and SIFT+SPIN [32], which have achievedthe state-of-the-art performances on the traditional texture clas-sification tasks. MFS [26] is an extension of the fractal dimen-sion based on three density functions of image intensity, imagegradient, and image Laplacian. It combines both global spatialinvariance and local robust measurement. SIFT [13] correspondsto the single-feature channel in our method. It captures the lo-cal image structural information. (H+L)(S+R) [11] is based onthe extraction of SPIN and RIFT descriptors on affine Harrisand Laplacian regions of an image. The elliptic regions de-tected by two sophisticated detectors are normalized to circles

TABLE IIIRECOGNITION ACCURACY (%) OF DIFFERENT METHODS AND VOLUMES OF

TRAINING SETS (10%, 30%, 50%, 70% OF THE DATASET USED FOR TRAINING)ON CLOTHING PATTERN DATASET

to handle affine transform. SPIN is a rotation-invariant localimage descriptor formed by a 2-D histogram of the distributionof pixel intensity values. RIFT is an alternate representationof SIFT. SIFT+SPIN [32] is the combination of two local im-age features including SIFT and SPIN. In the comprehensiveevaluations of texture classification [32], SIFT and SIFT+SPINorganized by the BOW model achieved the state-of-the-art re-sults on traditional texture datasets. In our implementation, thesampling strategies for SIFT and SIFT+SPIN are both uniformgrid sampling as mentioned in Section III-C.

Table III shows the recognition accuracies of different meth-ods on the CCNY clothing pattern dataset. The experiments areevaluated by using 10%, 30%, 50%, and 70% of the dataset astraining sets, and the rest as testing sets. As shown in this ta-ble, our proposed method significantly outperforms other well-established approaches, especially when the training set is small.A closer look at Table III also confirms that our proposed methodis able to obtain comparable or even better results by using muchless training data. For instance, the accuracy rate of our methodusing 30% of the images as a training set is better than thatof other well-established methods using 70% as a training set.While these methods perform very well on traditional texturedatasets, they cannot achieve the same level of accuracy forclothing pattern recognition due to the large intraclass varia-tions. (H+L)(S+R) performs much worse than other methods.This is mainly because of the insufficient interest points detectedby two sophisticated detectors, especially for striped and pat-ternless categories. It also confirms the importance of samplingdensity for clothing pattern classification. It is also interestingto observe that the performance of SIFT+SPIN is worse thanthat of SIFT alone. This is probably because SPIN is a localimage descriptor based on pixel intensities. But the clothingimages present large intraclass color variations, which accord-ingly result in large intraclass intensity variations. Since SPIN issensitive to intensity changes, it cannot overcome the large intr-aclass variations. This observation validates that it is importantto choose appropriate feature channels for feature combination.

Table IV demonstrates the confusion table results of ourmethod by using 70% of the images as the training set. In theconfusion table, each row represents the ground-truth categoriesof clothing pattern images and each column corresponds to therecognized category. The system recognizes patternless as thebest, where the images have the most discriminative local andglobal properties. On the other hand, plaid tends to confuse withstriped.


TABLE IVCONFUSION TABLE FOR THE CLOTHING PATTERN RECOGNITION EXPERIMENTS

OF OUR PROPOSED METHOD USING 70% OF THE IMAGES AS THE TRAINING SET

TABLE VCLASSIFICATION ACCURACY (%) OF DIFFERENT METHODS AND NUMBER OF

TRAINING IMAGES ON UIUC TEXTURE DATASET

C. Experiments With the UIUC Texture Dataset

The proposed method significantly outperforms the tradi-tional texture classification methods in recognizing clothingpatterns. In order to validate the generalization of our method,we further compare our method with other state-of-the-art ap-proaches on a traditional texture dataset, i.e., UIUC Texturedataset. The experimental setup is the same as Section VI-B1.

In the performance evaluation on the UIUC Texture dataset,we also compare our proposed method with MFS, SIFT,(H+L)(S+R), and SIFT+SPIN, as mentioned in Section VI-B3.Table V shows the classification rates of different methods onthe UIUC Texture dataset. In the experiments, the numbers oftraining images per class are 5, 10, 15, and 20, respectively. Theremaining images in each class are used as the testing set. Asshown in this table, our method achieves comparable results toSIFT, (H+L)(S+R), and SIFT+SPIN, and obtain better accu-racy rates than MFS. In contrast to the performance of clothingpattern recognition, (H+L)(S+R) performs very well on thisdataset as the two sophisticated detectors can localize suffi-cient interest points on highly textured images. This result againdemonstrates the importance of sampling density for robustnessof texture classification. SIFT+SPIN performs the best on UIUCTexture dataset while its performance on clothing pattern im-ages is much worse. This is because the intraclass intensities oftexture images in this dataset are much more consistent than thatin clothing pattern images. SPIN provides complementary ap-pearance feature to SIFT. However, when the number of trainingimages is 5, SIFT+SPIN largely outperforms our method. Thisis probably because the directionality property in nature textureis not as evident as that in clothing images and the RadonSigneeds a relatively large training set to model the directional-ity of texture images. But in other cases, our method achievescomparable results to SIFT+SPIN. Therefore, it demonstratesthe generalization of our method, i.e., it achieves state-of-the-artresults on both the clothing pattern images and the traditionaltexture dataset.

Fig. 11. Example setup of clothing pattern recognition prototype to captureimages by a web camera with auto focus on sunglasses.

VII. PROOF-OF-CONCEPT EVALUATION

The proposed clothing pattern and color recognition proto-type were tested in a proof-of-concept evaluation with ten blindparticipants including five men and five women with ages rang-ing from 36 to 72 years old. The testing was conducted in alaboratory under normal lighting conditions. Because some ofthe totally blind users are not aware of the environment lightingconditions, the system provides a reminder if the environmentis too dark before the testing starts.

As shown in Fig. 11, the clothing images are captured by acamera (Logitech web camera with auto focus) mounted on apair of sunglasses where the blind user holds the clothes to berecognized. The images captured by people with normal visionare similar to those captured by blind participants after theyare trained for several minutes to hold clothes in front of thecamera to occupy the image view. The images of the clothes arecaptured and recognized by the system. The system providesthe clothing patterns and verbally presents the three dominantcolors to the user.

The number of clothing patterns and colors recognized by thesystem are less than what humans can provide. In addition, theaudio description is not as flexible as what humans can state.However, most blind users expressed wanting such a systemto support more independence in their daily life. They alsoexpressed the desire of more detailed colors such as rose redand more clothing patterns. The blind users expressed a desirefor faster speech feedback in order to gain more information.Some blind participants expressed a desire to have the cameraon a cap instead of on glasses and to have the function availableon mobile phones.

VIII. CONCLUSION

Here, we have proposed a system to recognize clothing pat-terns and colors to help visually impaired people in their dailylife. We employ RadonSig to capture the global directionalityfeatures; STA to extract the global statistical features on waveletsubbands; and SIFT to represent the local structural features.The combination of multiple feature channels provides comple-mentary information to improve recognition accuracy. Basedon a survey and a proof-of-concept evaluation with blind users,we have collected a dataset on clothing pattern recognition


including four-pattern categories of plaid, striped, pattern-less, and irregular. Experimental results demonstrate that ourproposed method significantly outperforms the state-of-the-artmethods in clothing pattern recognition. Furthermore, the per-formance evaluation on traditional texture datasets validates thegeneralization of our method to traditional texture analysis andclassification tasks. This research enriches the study of textureanalysis, and leads to improvements over existing methods inhandling complex clothing patterns with large intraclass varia-tions. The method also provides new functions to improve thelife quality for blind and visually impaired people.

ACKNOWLEDGMENT

The authors would like to thank Dr. A. Arditi for recruitingblind participants to conduct the algorithm and prototype systemevaluation. The authors would also like to thank the anonymousreviewers for their constructive comments and insightful sug-gestions that improved the quality of this paper.

REFERENCES

[1] A. Arditi and Y. Tian, “User interface preferences in the design of a camera-based navigation and wayfinding aid,” J. Visual Impairment Blindness,vol. 107, no. 2, pp. 18–129, 2013.

[2] D. Dakopoulos and N. G. Bourbakis, “Wearable obstacle avoidance elec-tronic travel aids for the blind: A survey,” IEEE Trans. Syst., Man, Cybern.C, vol. 40, no. 1, pp. 25–35, Jan. 2010.

[3] L. Davis, S. Johns, and J. Aggarwal, “Texture analysis using general-ized co-occurrence matrices,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. PAMI-1, no. 3, pp. 251–259, Jul. 1979.

[4] D. Gould, “The making of a pattern,” Vogue Patterns, 1996.[5] R. Haralick, “Statistical and structural approaches to texture,” Proc. IEEE,

vol. 67, no. 5, pp. 786–804, May 1979.[6] F. Hasanuzzaman, X. Yang, and Y. Tian, “Robust and effective component-

based banknote recognition for the blind,” IEEE Trans. Syst., Man, Cybern.C, vol. 42, no. 6, pp. 1021–1030, Nov. 2012.

[7] A. Huete, J. Victores, S. Martinez, A. Gimenez, and C. Balaguer, “Per-sonal autonomy rehabilitation in home environment by a protable assistiverobot,” IEEE Trans. Syst., Man, Cybern. C, vol. 42, no. 4, pp. 561–570,Jul. 2012.

[8] K. Khouzani and H. Zaden, “Radon transform orientation estimation forrotation invariant texture analysis,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 27, no. 6, pp. 1004–1008, Jun. 2005.

[9] I. Kocur, R. Parajasegaram, and G. Pokharel, “Global data on visual im-pairment in the year 2002,” Bulletin World Health Org., 2004.

[10] S. Lam, “Texture feature extraction using gray level gradient based onco-occurrence matrices,” in Proc. Int. Conf. Syst., Man Cybern., 1996,pp. 267–271.

[11] S. Lazebnik, C. Schmid, and J. Ponce, “A sparse texture representation us-ing local affine regions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27,no. 8, pp. 1265–1277, Aug. 2005.

[12] S. Liu, J. Feng, Z. Song, T. Zhang, H. Lu, C. Xu, and S. Yuan, “Hi, magiccloset, tell me what to wear,” in Proc. ACM Multimedia, 2012.

[13] D. Lowe, “Distinctive image features from scale-invariant keypoints,” Int.J. Comput. Version, vol. 60, no. 2, pp. 91–110, 2004.

[14] S. Hidayati, W. Cheng, and K. Hua, “Clothing genre classification byexploiting the style elements,” in Proc. ACM Int. Conf. Multimedia, 2012,pp. 1137–1140.

[15] V. Manian, R. Vasquez, and P. Katiyar, “Texture classification using logicaloperation,” IEEE Trans. Image Process., vol. 9, no. 10, pp. 1693–1703,Oct. 2000.

[16] R. Manduchi and J. Coughlan, “(Computer) vision without sight,” Com-mun. ACM, vol. 55, no. 1, pp. 96–104, 2012.

[17] K. Mikolajczyk and C. Schmid, “A performance evaluation of localdescriptors,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 10,pp. 1615–1630, Oct. 2005.

[18] E. Nowak, F. Jurie, and B. Triggs, “Sampling strategies for bag-of-featuresimage classification,” in Proc. Eur. Conf. Comput. Vis., 2006, pp. 490–503.

[19] T. Randen and J. Husoy, “Filtering for texture classification: A com-parative study,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 4,pp. 291–310, Apr. 1999.

[20] S. Shoval, J. Borenstein, and Y. Koren, “Auditory guidance with thenavbelt—A computerized travel aid for the blind,” IEEE Trans. Syst.,Man, Cybern. C, vol. 28, no. 3, pp. 459–467, Aug. 1998.

[21] E. Tekin, J. Coughlan, and H. Shen, “Real-time detection and reading ofLED/LCD displays for visually impaired persons,” in Proc. IEEE Work-shop Appl. Comput. Vision, 2011, pp. 491–496.

[22] Y. Tian, X. Yang, C. Yi, and A. Arditi, “Toward a computer vision basedwayfinding aid for blind persons to access unfamiliar indoor environ-ments,” Mach. Vis. Appl., vol. 24, no. 3, pp. 521–535, 2012.

[23] M. Varma and A. Ziisserman, “Texture classification: Are filter banksnecessary,” in Proc. Comput. Vis. Pattern Recog., 2003, pp. 691–698.

[24] Z. Wang and J. Yong, “Texture analysis and classification with linear re-gression model based on wavelet transform,” IEEE Trans. Image Process.,vol. 17, no. 8, pp. 1421–1430, Aug. 2008.

[25] H. Wendt, P. Abry, S. Jaffard, H. Ji, and Z. Shen, “Wavelet leader mul-tifractical analysis for texture classification,” in Proc. Int. Conf. ImageProcess., 2009, pp. 3829–3832.

[26] Y. Xu, H. Ji, and C. Fermuller, “Viewpoint invariant texture descriptionusing fractal analysis,” Int. J. Comput. Vis., vol. 83, no. 1, pp. 85–100,2009.

[27] X. Yang and Y. Tian, “Robust door detection in unfamiliar environmentby combining edge and corner features,” in Proc. IEEE CVPR WorkshopComput. Vis. Appl. Visually Impaired, 2010, pp. 57–64.

[28] X. Yang, Y. Tian, C. Yi, and A. Arditi, “Context-based indoor objectdetection as an aid to blind persons accessing unfamiliar environments,”in Proc. Int. ACM Conf. Multimedia, 2010, pp. 1087–1090.

[29] X. Yang and Y. Tian, “Texture representations using subspace embed-dings,” Pattern Recog. Lett., vol. 34, no. 10, pp. 1130–1137, 2013.

[30] X. Yang, S. Yuan, and Y. Tian, “Recognizing clothes patterns for blindpeople by confidence margin based feature combination,” in Proc. Int.ACM Conf. Multimedia, 2011, pp. 1097–1100.

[31] S. Yuan, Y. Tian, and A. Arditi, “Clothes matching for visually impairedpersons,” J. Technol. Disability, vol. 23, no. 2, pp. 75–85, 2011.

[32] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, “Local features andkernels for classification of texture and object categories: A comprehensivestudy,” Int. J. Comput. Vis., vol. 73, no. 2, pp. 213–238, 2007.

Xiaodong Yang (S’12) received the B.S. degree fromthe Huazhong University of Science and Technology,Wuhan, China, in 2009. He is currently working to-ward the Ph.D. degree with the Department of Elec-trical Engineering at City College, City University ofNew York, New York, NY, USA.

His research interests include computer vision,machine learning, video surveillance, and multime-dia. His current research focuses on large-scale imageand video recognition, surveillance event detection,human activity analysis, and computer vision-based

assistive systems for visually impaired people.


Shuai Yuan received the B.S. degree from SoutheastUniversity, Nanjing, China, in 2008, and the M.S. de-gree in electrical engineering from the City College,City University of New York, New York, NY, USA,in 2011.

In 2009, he joined Media Lab of the City Collegeof New York and was a Research Assistant whilepursuing the M.S. degree. He is currently a SoftwareDeveloper in Teledata Communications Inc. His re-search interest is on pattern recognition and classifica-tion. His research interests include machine learning,

image processing, and object recognition.

YingLi Tian (M’99–SM’01) received the B.S. andM.S. from TianJin University, TianJin, China, in 1987and 1990, respectively, and the Ph.D. degree from theChinese University of Hong Kong, Hong Kong, in1996.

After holding a faculty position at National Lab-oratory of Pattern Recognition, Chinese Academy ofSciences, Beijing, she joined Carnegie Mellon Uni-versity in 1998, where she was a Postdoctoral Fellowwith the Robotics Institute. Then, she was a ResearchStaff Member with IBM T. J. Watson Research Cen-

ter from 2001 to 2008. She is one of the inventors of the IBM Smart SurveillanceSolutions. She is currently a Professor with the Department of Electrical Engi-neering, City College of New York, New York, NY, USA. Her current researchinterests include a wide range of computer vision problems from motion detec-tion and analysis, assistive technology, to human identification, facial expressionanalysis, and video surveillance.

Assistive Clothing Pattern Recognition for Visually Impaired People

Documents

Transcript of Assistive Clothing Pattern Recognition for Visually Impaired People