Texture classification using invariant features of local textures

14
Published in IET Image Processing Received on 30th November 2008 Revised on 5th October 2009 doi: 10.1049/iet-ipr.2008.0229 ISSN 1751-9659 Texture classification using invariant features of local textures P. Janney 1 G. Geers 2 1 School of Computer Science and Engineering, University of New South Wales, Sydney NSW 2032, Australia 2 National ICT Australia (NICTA), Sydney NSW 2032, Australia E-mail: [email protected] Abstract: In this paper, the authors present a texture descriptor algorithm called invariant features of local textures (IFLT). IFLT generates scale, rotation and (essentially) illumination invariant descriptors from a small neighbourhood of pixels around a centre pixel or a texture patch. Texture classification experiments were carried out on the Brodatz, Outex and KTH-TIPS2 databases. Demonstrated texture classification accuracy exceeds the previously published state of the art at a significantly lower computational cost. Experiments also suggests that IFLT descriptors are in a sense intuitive texture descriptors. 1 Introduction Texture classification is a fundamental low-level processing step in image analysis and computer vision. When images or videos are captured using state of the art cameras or sensors, they are subject to geometric distortions (e.g. translation, rotation, skew and scale) because of varying viewpoints and lens abberations. Hence affine-invariant descriptors are required for the analysis of real world texture images/patches. There are numerous algorithms in the open literature for texture feature extraction and classification [1, 2]. The vast majority of these algorithms make an explicit or implicit assumption that all images are captured under the same orientation (i.e. there is no inter-image rotation). For a given texture patch, no matter how it is rotated, it is always perceived as the same texture by a human observer. Therefore from both the practical and the theoretical point of view, rotation invariant texture classification is highly desirable. The first few approaches to rotation invariant texture description include generalised co-occurrence matrices [3], polarograms [4] and texture anisotropy [5]. Researchers in [6] derived computationally efficient texture features by applying the partial form of Gabor functions. These features were then transformed to two-dimensional (2D) closed shapes and their moment invariants and global shape descriptors derived to classify the rotated textures. Other researchers have used Gabor wavelets and other basis functions to derive rotation invariant features [7–10]. Global textures are not very distinctive when there are texture variations across the image. Hence local texture descriptors are the preferred by means of describing textures in an image [11]. Using a circular neighbour set, Porter and Canagarajah [11] presented rotation invariant generalisations for all three mainstream paradigms: wavelets, Gaussian Markov random fields (GMRF) and Gabor filtering. Utilising similar circular neighbourhoods, Arof and Deravi [12] obtained rotation invariant features using the 1D discrete Fourier transform (DFT). Wavelet analysis has been used previously to discriminate textures. However, texture anisotropy caused because of rotation will generate different set of features when wavelet analysis are used [13]. A comprehensive literature survey of the existing texture classification techniques is available in Zhang and Tan [7]. In [14], a rotationally invariant approach to material classification based on the 3D texton representation is presented. Despite the clear need for rotationally invariant global texture classification there is a little mention in the current literature. Recently, researchers in [15] developed a new local texture descriptor called local binary pattern (LBP). This method is 158 IET Image Process., 2010, Vol. 4, Iss. 3, pp. 158–171 & The Institution of Engineering and Technology 2010 doi: 10.1049/iet-ipr.2008.0229 www.ietdl.org

Transcript of Texture classification using invariant features of local textures

15

&

www.ietdl.org

Published in IET Image ProcessingReceived on 30th November 2008Revised on 5th October 2009doi: 10.1049/iet-ipr.2008.0229

ISSN 1751-9659

Texture classification using invariant featuresof local texturesP. Janney1 G. Geers2

1School of Computer Science and Engineering, University of New South Wales, Sydney NSW 2032, Australia2National ICT Australia (NICTA), Sydney NSW 2032, AustraliaE-mail: [email protected]

Abstract: In this paper, the authors present a texture descriptor algorithm called invariant features of localtextures (IFLT). IFLT generates scale, rotation and (essentially) illumination invariant descriptors from a smallneighbourhood of pixels around a centre pixel or a texture patch. Texture classification experiments werecarried out on the Brodatz, Outex and KTH-TIPS2 databases. Demonstrated texture classification accuracyexceeds the previously published state of the art at a significantly lower computational cost. Experiments alsosuggests that IFLT descriptors are in a sense intuitive texture descriptors.

1 IntroductionTexture classification is a fundamental low-level processingstep in image analysis and computer vision. When imagesor videos are captured using state of the art cameras orsensors, they are subject to geometric distortions (e.g.translation, rotation, skew and scale) because of varyingviewpoints and lens abberations. Hence affine-invariantdescriptors are required for the analysis of real world textureimages/patches. There are numerous algorithms in the openliterature for texture feature extraction and classification[1, 2]. The vast majority of these algorithms make anexplicit or implicit assumption that all images are capturedunder the same orientation (i.e. there is no inter-imagerotation). For a given texture patch, no matter how it isrotated, it is always perceived as the same texture by ahuman observer. Therefore from both the practical and thetheoretical point of view, rotation invariant textureclassification is highly desirable.

The first few approaches to rotation invariant texturedescription include generalised co-occurrence matrices [3],polarograms [4] and texture anisotropy [5]. Researchers in[6] derived computationally efficient texture features byapplying the partial form of Gabor functions. Thesefeatures were then transformed to two-dimensional (2D)closed shapes and their moment invariants and global

8The Institution of Engineering and Technology 2010

shape descriptors derived to classify the rotated textures.Other researchers have used Gabor wavelets and other basisfunctions to derive rotation invariant features [7–10].Global textures are not very distinctive when there aretexture variations across the image. Hence local texturedescriptors are the preferred by means of describingtextures in an image [11]. Using a circular neighbour set,Porter and Canagarajah [11] presented rotation invariantgeneralisations for all three mainstream paradigms:wavelets, Gaussian Markov random fields (GMRF) andGabor filtering. Utilising similar circular neighbourhoods,Arof and Deravi [12] obtained rotation invariant featuresusing the 1D discrete Fourier transform (DFT). Waveletanalysis has been used previously to discriminate textures.However, texture anisotropy caused because of rotation willgenerate different set of features when wavelet analysis are used[13]. A comprehensive literature survey of the existing textureclassification techniques is available in Zhang and Tan [7].

In [14], a rotationally invariant approach to materialclassification based on the 3D texton representation ispresented. Despite the clear need for rotationally invariantglobal texture classification there is a little mention in thecurrent literature.

Recently, researchers in [15] developed a new local texturedescriptor called local binary pattern (LBP). This method is

IET Image Process., 2010, Vol. 4, Iss. 3, pp. 158–171doi: 10.1049/iet-ipr.2008.0229

IEdo

www.ietdl.org

based on recognising that certain LBPs, which are termed‘uniform’, are a fundamental property of local imagetextures and their occurrence histogram is shown to be avery powerful texture feature. They derive a generalisedgrey-scale and rotation invariant operator representationthat allows for detecting the ‘uniform’ patterns for anyquantisation of the angular space and for any spatialresolution. They also present a method for combiningmultiple operators for multi-resolution analysis. LBPgenerates very distinct descriptors for textures, which arevisually similar. In other words, LBP-generated descriptorstend to place ‘visibly alike’ textures very far from each otherin feature space.

In this paper, we continue to develop the work originallypresented in [16]. In particular, we show that invariantfeatures of local textures (IFLT) are computationallyefficient, robust and in some sense ‘intuitive’ texturedescriptors that are truly invariant across a wide range ofimage scales and illuminations. Section 2 provides adetailed description of the algorithm. The experimentalsetup, results and analysis are presented in Section3. Experimental setup consists of in-depth analysis ondatabases such as Brodatz [17], Outex [18] and KTH-TIPS2 [19] for scale, rotation and illumination invariantproperties.

2 Invariant features of localtexturesResearchers in [20] have developed scaling laws from theintensity gradient field and derived a similarity measure fortexture retrieval. An approximation of the gradient fieldin a small image neighbourhood may be derived byconsidering the pixel intensities in that image region. Suchlocal gradient information forms the basis of the IFLTalgorithm.

Consider a 3 × 3 neighbourhood of pixels as shown inFig. 1a. True circular symmetry around Xc can beachieved by recalculating pixel intensities at the coordinatesgiven by

Xi = R cos2pi

P, R sin

2pi

P

( )(1)

where Xi is the equivalent position of the ith of P 2 1 pixelsin circular symmetry around Xc with radius R.

The grey values of neighbours, which do not fall exactly onintegral pixels are estimated by interpolation. With Ic as theintensity of the centre pixel, calculating the gradient ofintensity in all directions with reference to the centre pixel,gradient components which are approximately intensityinvariant are computed.

T Image Process., 2010, Vol. 4, Iss. 3, pp. 158–171i: 10.1049/iet-ipr.2008.0229

The gradient intensities around the centre pixel can bewritten as a 1D vector, I, as shown in

I = [Ic − I0, . . . , Ic − IP−1] (2)

where Ic is the intensity of the centre pixel and I(0, . . . , P21) arethe intensities of the neighbouring pixels.

Normalisation of this 1D vector further enhances intensityinvariance

I norm = I���������∑j (I j)

2√ (3)

The vector thus derived represents the intensity gradient(formally it is, in fact, the vector of first-order finitedirectional differences in each of the P directions) aroundthe centre pixel and is also intensity invariant within thelocal neighbourhood. It can be seen from Fig. 1a that anyrotational effects result in linear shifts in the 1D vector of(2). That is, rotations in image space correspond to linearshifts in the transformed space. In this work, Haar waveletswere used because they are computationally efficient andhave smallest possible support.

Figure 1 IFLT Methodology

a 3 × 3 neighbourhood of pixels and IFLT methodb Block diagram of multi-scale version of IFLT

159

& The Institution of Engineering and Technology 2010

16

&

www.ietdl.org

The discrete wavelet transform (DWT) of a signal I, in theHaar basis, is calculated by passing it through a series offilters [21] whose coefficients are given in

h = 1��2

√ ,−1��

2√

[ ], g = 1��

2√ ,

1��2

√[ ]

(4)

The signal is decomposed simultaneously using a high-passfilter h and a low-pass filter g. The outputs of the high-pass filter are known as the detail coefficients and thosefrom the low-pass filter are referred to as the approximationcoefficients. The filter outputs are then downsampled by 2.

The nth component of downsampling a vector y by k maybe written as

(y � k)[n] = y[kn] (5)

where � is used to denote the downsampling operator.Noting that the wavelet transform operation corresponds toa convolution followed by downsampling by 2 allows thefilter outputs to be written more concisely as

ylow = (I ∗ g) � 2, yhigh = (I ∗ h) � 2 (6)

The detail and approximate coefficients have shift invariantenergy distributions. As shown above rotations in imagespace have been transformed into linear shifts in transformspace and so the energy distribution of the detail andapproximate coefficients are also rotation invariant.

DWT were previously used to extract texture features todiscriminate between textures [13]. These texture featurescannot handle texture anisotropy, thus do not possessrotation invariant properties. However, when DWT isapplied on the transformed space, the resulting features arerotation invariant.

In the experiments described below, the mean andstandard deviation of the high-pass and the low-pass filteroutputs generated by one step of the wavelet transform of(2) are used as the texture features. These features areinherently intensity and rotation invariant for a small 3 × 3neighbourhood of pixels. A flowchart of the IFLTmethodology is shown in Fig. 1a.

The next step in building the texture descriptor is to extractstatistical distributions of local texture features from a imagepatch.

Before proceeding it should be noted that an M × Nimage consisting of a large number of small patches willusually not have been captured under uniform illumination.Such non-uniform illumination can give rise to apparenttexture distortion. However, the illumination will (except inpathological cases) be essentially uniform over ‘small’ imageregions. Thus, statistical distributions of local texturefeatures collected over an M × N image patch will give rise

0The Institution of Engineering and Technology 2010

to texture features, which are substantially illuminationinvariant.

Given an M × N patch of pixels, the following steps areperformed:

1. A 3 × 3 sliding window is applied across the whole patchand local texture features are extracted from all the slidingwindow locations.

2. A histogram is built from the extracted local texturefeatures in the texture patch. This involves partitioning thefour dimensions of texture features (mean and the standarddeviation of the energy distributions of the high-pass andthe low-pass wavelet bands) into a number of bins andcalculating the number of occurrence of local texture featurevalues in those bins.

3. To compute the distance between two texture patches,the euclidean distance between corresponding histogramscould be used. However, any other possible distancemeasure between histograms, such as x2-distance, couldalso be used.

The histogram extracted in step 2 above serves as thetexture descriptor of an image patch. Thus, we have derivedan essentially IFLT.

2.1 Multiscale version

Local textures present inside an image patch have astructural relationship to the surrounding textures. Scale-space analysis of images provides a means of representingimage features that are scale independent and there is aclose link between scale-space theory and biologicalvision. Many scale-space operations show a high degreeof similarity with receptive field profiles recorded fromthe mammalian retina and the first stages in the visualcortex [22].

Using the original image we have texture features that areillumination and rotation invariant. Applying the algorithmto describe local textures on images at different scales of ascale-space decomposition would provide a range of localtexture descriptors that are better equipped at recognisingtextures at different scales.

Moreover, applying the same algorithmat different scalesof a scale-space representation of an image yields IFLTthat take into consideration the spatial arrangement ofthese textures in an image. A multiscale version of thealgorithm is shown in Fig. 1b. A Gaussian filter was usedas a low-pass (blurring) filter. Texture histograms acrossscales were concatenated to form as a texture descriptor forthe input image patch. The final distance between twotexture patches is taken as the sum of the distances acrossall scales. Different weights can be given to different scaleswhen calculating the combined distance.

IET Image Process., 2010, Vol. 4, Iss. 3, pp. 158–171doi: 10.1049/iet-ipr.2008.0229

IEdo

www.ietdl.org

3 Experimental setup, resultsand analysis3.1 Experiment on the Brodatz album

We have bench marked the results from the LBP experimentspresented in [15]. Researchers in [15] have used 16 sourcetextures from the Brodatz album [17] as shown in Fig. 2.The source textures were digitally captured from the sheetsin the Brodatz album. Originally each texture class consistedof eight 256 × 256 images, Porter and Canagarajah [11]used bilinear interpolation to create rotated texture imagesof size 180 × 180 from these source images. A smallamount of artificial blur was added to images, which werenot rotated through multiples of 908. Considering this in

conjunction with the fact that rotated textures do not haveany local intensity distortions such as shadows, the imagedata provide a slightly simplified but highly controlledproblem for rotation invariant texture analysis.

The LBP descriptor designated LBPriu2P,R and IFLT

descriptor designated IFLT, depends on P, the number ofpixels and R, the sampling radius. The work described in[15] used (P, R) values of (8, 1), (16, 2) and (24, 3) for thethree spatial and three angular resolutions in theirexperiments. A rotation invariant variance measure(VARP,R) that characterises the local contrast of localimage texture is appended to LBPriu2

P,R to achieve maximumperformance [15].

Figure 2 180 × 180 samples of 16 textures used in experiments at particular angles

T Image Process., 2010, Vol. 4, Iss. 3, pp. 158–171 161i: 10.1049/iet-ipr.2008.0229 & The Institution of Engineering and Technology 2010

16

&

www.ietdl.org

LBPriu2P,R and IFLT outputs were accumulated into

histogram with P + 2 bins. Each bin effectively provides anestimate of the probability of encountering thecorresponding texture [15]. Histograms of every trainingsample of given class were added to yield a ‘model’histogram for each class. During classification, the testsample histogram was compared with the model histogramof each class. Euclidean distance between histograms isused for determining feature similarity. Classificationaccuracy is the reported performance measure.

3.1.1 Experiment 1: In the experimental setup of [15],every image in training set of each class was divided into121 disjoint 16 × 16 sub-images at each rotation angle –0, 30, 45 and 608. Thus the training set consisted of 484(four angles, 121 samples) images for each of the 16 textureclasses. Model histograms for each class were generated.Yielding 16 reliable model histogram containing 484(16 2 2R2) entries (the operators have R pixel border). Therelatively small size of the training samples increases thedifficulty of the problem. Textures for classification werepresented at rotation angles of 20, 70, 90, 120, 135 and1508 – 672 samples – 42 (six angles, seven images) foreach of the 16 textures. Typical histograms of thetest samples contained (180 2 2R2) entries.

Researchers in [15] reported 99.6% classification accuracyusing LBPriu2

P,R texture descriptors and the G-statistic(log-likelihood ratio). However, by appending VARP,R toLBPriu2

P,R a maximum performance (on certain tests) of100% was achieved.

The results in Table 1 correspond to the percentages ofcorrectly classified samples of all the test patches. Forcomparison purposes, LBPriu2

P,R performance is also given inTable 1. It is evident from Table 1, that the proposed IFLTalgorithm has 98.06% classification accuracy compared to the88.2% classification accuracy of LBPriu2

P,R for (P, R) ¼ (8, 1),which is the basic texture operator. For higher (P, R) ¼(16, 2) and (24, 3), there is a slight improvement in

Table 1 Performance (%) of LBP and IFLT algorithm onBrodatz textures with training samples of size 16 × 16

P, R Bins LBPP,Rriu2 IFLT

(2 waveletscales)

8, 1 10 88.2 98.06

16, 2 18 98.5 98.66

24, 3 26 99.1 99.2

8, 1 + 16, 2 10 + 18 99 98.4

8, 1 + 24, 3 10 + 26 99.6 99.3

16, 2 + 24, 3 18 + 26 99 99.4

8, 1 + 16, 2 + 24, 3 10 + 18 + 26 99.1 99.75

2The Institution of Engineering and Technology 2010

performance when compared to LBPriu2P,R. These test results

also confirm that IFLT descriptors do possess rotationinvariant properties.

It is also evident from Table 1 that the combinationof different spatial resolutions in IFLT does not improvethe performance to a great extent. The difference inperformance between the three different spatial resolutionsin the proposed algorithm is not considerable when comparedto the difference in the performance between the threespatial resolutions in LBPriu2

P,R. This strongly suggests thatthe texture features generated by IFLT are more stable atdifferent spatial resolutions when compared to LBPriu2

P,R.

(P, R) ¼ (8, 1) of LBPriu2P,R has difficulty in discriminating

strongly oriented textures – misclassifications of Rattan,Straw and Wood [15] being largely responsible fordecreased performance. The number of misclassificationsfor IFLT was considerably less than LBPriu2

P,R at (P, R) ¼(8, 1), where the test samples were misclassified as Rattanor Sand. In this case, the true model was ranked second forall the misclassified test samples. However, at higher spatialresolutions the test samples were misclassified as Mattingor Rattan, while the true model was ranked second.Matching at coarser wavelet-scales was not possible becausethe training samples consisted of 16 × 16 images.

3.1.2 Experiment 2: We performed a second set of testswhere the image data consisted of 16 source texture classesfrom the Brodatz album, shown in Fig. 4, with the trainingset for each texture class consisting of four 180 × 180images at angles 0, 30, 45 and 608. The only differencebetween this experimental setup and the previousexperimental setup is that the training samples were notdivided into 16 × 16 sub-images. We were able to derivecoarser wavelet-scale images because the training sampleswere 180 × 180 images. However, the classificationprocedure remained the same.

Model histogram for each of the 16 texture classes wascalculated. Consequently, obtaining 16 reliable modelhistogram containing 4 (16 2 2R2) entries (the operatorshave R pixel border). The performance of the texturefeature was evaluated with 672 test images. Typicalhistograms of the test samples contained (180 2 2R2) entries.

The test results are provided in Table 2. As seen fromTable 2 the proposed algorithm can achieve 100%performance consistently for (P, R) ¼ (16, 2) and (24, 3)while for (P, R) ¼ (8, 1) it is around 98%.

The above tests provide an interesting set of results. Asseen from Table 1, the classification accuracy of IFLT at(P, R) ¼ (8, 1) is around 98% when the training sampleswere 16 × 16 images. However, as seen from Table 2 theclassification accuracy at (P, R) ¼ (8, 1) is around 99%when the training samples were 180 × 180 images. Thedifference between the classification accuracies between

IET Image Process., 2010, Vol. 4, Iss. 3, pp. 158–171doi: 10.1049/iet-ipr.2008.0229

IEdo

www.ietdl.org

these two test results at (P, R) ¼ (8, 1) is negligible. Thisstrongly suggests that IFLT generates distinctive localtexture features irrespective of size of the training images.

Researchers in [15] state that spatial dependencies betweenadjacent neighbourhoods are inherently incorporated in thehistogram because only a small subset of patterns can residenext to a given pattern. They restrict the number of thissmall subset to P + 2. It is clearly evident from Table 3 thatby using IFLT descriptors, the subset of texture patternscan be at least 25% smaller than P + 2 and still achieve bestperformance accuracy. Hence, IFLT descriptors are moreconcise and accurate in representing textures.

Table 2 Performance (%) results of IFLT algorithm onBrodatz textures with training samples of size 180 × 180

P, R Bins IFLT (3+ waveletscales)

8, 1 10 98.8

16, 2 18 99.4

24, 3 26 100

8, 1 + 16, 2 10 + 18 98.95

8, 1 + 24, 3 10 + 26 99.7

16, 2 + 24, 3 18 + 26 99.7

8, 1 + 16, 2 + 24, 3 10 + 18 + 26 99.7

Table 3 Performance (%) results of IFLT methodology onBrodatz textures with smaller than P + 2 bins (a) trainingsamples of size 16 × 16, (b) training samples of size180 × 180

(a)

P, R Bins IFLT (2 wavelet scales)

8, 1 5 98

16, 2 5 99.4

10 99.1

24, 3 18 99.6

(b)

P, R Bins IFLT (3+ wavelet scales)

8, 1 5 98

16, 2 5 100

10 99.7

24, 3 16 100

18 100

24 100

26 100

T Image Process., 2010, Vol. 4, Iss. 3, pp. 158–171i: 10.1049/iet-ipr.2008.0229

3.2 Experiment on Outex database

This set of experiments were conducted using the Outeximage database [18], which provides a large collection oftextures and ready-made test suites for different types oftexture analysis problems, together with baseline results forwell-known published algorithms. Both artificial andnatural surface textures are included in the collection. Thecollection of surface textures exhibit well-defined variationsin terms of illumination, rotation and spatial resolution[23]. The diversity of the surface textures provides a richfoundation for building the problems. The Outexframework [23] provides a steadily increasing number oftest suites, which encapsulate a problem precisely specifyinginput and output data. Specifications are provided in theform of generic text and image files, hence the user of theframework is not constrained to any given programmingenvironment. The framework, the image database and thetest suites are publicly available on-line [18].

Each sample is 128 × 128 pixels in size. Examples of eachof the 24 texture classes are shown in Fig. 3. The underlyingtexture pattern is roughly uniform over the whole sourceimage, while local grey-scale variations because of varyingcolour properties are present as seen in canvas023 andcanvas033 from Fig. 4.

Most of the texture samples are canvases with strongdirectional structure [15]. Some of them have large tactiledimension (i.e. they appear ‘rough’, e.g.: canvas025, canvas033and canvas038), which can induce considerable local grey-scale distortions. Taking variations caused by different spectraof the illuminants into account, this collections of texturespresents a realistic and challenging set of problems forillumination and rotation invariant texture analysis.

3.2.1 Rotation invariant classification: Rotationinvariant classification experiments were conducted onOutex test suite Outex TC 00010 [18]. Each surfacetexture was captured in nine rotation angles (0, 5, 10, 15,30, 45, 60, 75 and 908).

The classifier was trained with the reference textures of 480(24 classes× 20 samples) models captured under theilluminant inca and angle 08 in each texture class. Whilethe test database consisted of 3840 (24 classes × 20samples × 8 angles) samples captured under the sameilluminant. Two different classification types were used forevaluating the texture descriptor with this test suite. Asimple euclidean distance with k-nearest neighbour (k-NN)classification was employed to evaluate the texturedescriptors for this particular test suite.

As seen from Table 4, it is clearly evident that IFLTtexture descriptors are more stable and powerful whencompared to LBPriu2

P,R descriptors. There is a drastic increaseof around 5–6% in correct classification on average, exceptfor (P, R) ¼ (8, 1) where there is slight decrease in correct

163

& The Institution of Engineering and Technology 2010

16

&

www.ietdl.org

classification when compared with LBPriu2P,R. Thirty-three of

canvas025 was mostly misclassified as canvas031 and 30 ofcanvas025 were misclassified as canvas033, 58 of canvas033were misclassified as canvas031 and 57 of canvas035 weremisclassified as canvas003 and canvas033, respectively.These contributed to heavy reduction in classificationperformance of IFLT at (P, R) ¼ (8, 1).

Comparing performance of IFLT from simple 1-NNclassification and 3-NN classification, it appears that thedescriptors are slightly noisy hence 3-NN classificationappears to produce better classification.

Table 5 shows the number of misclassifications for LBPriu2P,R

and IFLT. It is fairly evident that IFLT produces far less

Figure 3 128 × 128 samples of each of the 24 texture class at particular angles

4 IET Image Process., 2010, Vol. 4, Iss. 3, pp. 158–171The Institution of Engineering and Technology 2010 doi: 10.1049/iet-ipr.2008.0229

IETdo

www.ietdl.org

misclassification. Textures like canvas031 has 21misclassification using LBPriu2

P,R compared to 2 using IFLT,canvas032 has 21 misclassification using LBPriu2

P,R comparedto 9 using IFLT, canvas038 has 30 misclassifications usingLBPriu2

P,R compared to 4 using IFLT, canvas039 has 20misclassification compared to none with IFLT andcanvas033 has 48 misclassification using LBPriu2

P,R comparedto 22 using IFLT. These textures have of high tactiledimensionality (i.e. they appear ‘rough’). IFLT descriptorsprove to be more efficient and precise when used forclassification of textures with high tactile dimensionality.

3.2.2 Rotation and illumination invariantclassification: This experiment was conducted on Outextest suite Outex_TC_00012 [18]. This test suite hasrotation and illumination transformations. Each sample wascaptured under three different types of illumination (inca,horizon and tl84) and at nine different rotation angles (0,5, 10, 15, 30, 45, 60, 75 and 908).

The classifier is trained with reference textures (20 samples)of illuminant inca and angle 08 in each texture class andtested with all samples captured using illuminant tl84 andhorizon. Hence, in both problems there are 480 (24classes × 20 samples) models and 4320 (24 classes × 20samples × 9 angles) validation samples in total.

Figure 4 Intra grey-scale variations caused by varyingcolour content of source textures

Table 4 Performance (%) results of IFLT compared withLBPP,R

riu2 on Outex TC 00010 using 1-NN and 3-NN classification

P, R 1-NNclassification

3-NNclassification

LBPP,Rriu2 IFLT LBPP,R

riu2 IFLT

8, 1 82 86.8 85.1 81.7

16, 2 83.2 89 88.5 95.9

24, 3 90.1 90 94.6 97.8

8, 1 + 16, 2 90.5 89.5 93.1 95.3

8, 1 + 24, 3 90.5 90 96.3 98.07

16, 2 + 24, 3 92.4 89.1 95.4 97.1

8, 1 + 16, 2 + 24, 3 92.6 89.2 96.1 97.8

Image Process., 2010, Vol. 4, Iss. 3, pp. 158–171i: 10.1049/iet-ipr.2008.0229

Table 6 illustrates the performance for rotationand illumination invariant texture classification usingLBPriu2

P,R and IFLT. The classifier was trained with thereference textures which were captured under theillumination inca at angle 08 and the tests samples werecaptured under the illumination tl84 and horizon, andinclude samples from all nine rotation angles, that is180 samples of each texture were used for testing theclassifier.

The difference in classification performance is clearly

noticeable between LBPriu2P,R and IFLT. Performance

deteriorates with LBPriu2P,R descriptors, whereas IFLT

descriptors maintain high classification accuracy when theclassifier was evaluated with samples captured underdifferent illumination than the reference textures used intraining. IFLT descriptors achieve a correct classification ofaround 93%, whereas LBPriu2

P,R could only produce 84%correct classification when classifier was trained withsamples from illuminant inca and tested with samples from

Table 5 Numbers of misclassified samples for each texturefor LBPP,R

riu2 and IFLT with (P, R) ¼ (24, 3) on Outex_TC_00010

Texture LBPP,Rriu2 IFLT

canvas002 1 .

canvas005 15 .

canvas011 6 .

canvas023 2 6

canvas025 3 4

canvas031 21 2

canvas032 21 9

canvas033 48 22

canvas035 15 18

canvas038 30 4

canvas039 20 .

tile005 3 2

tile006 14 9

carpet002 1 .

carpet004 2 .

carpet005 4 .

carpet009 3 .

total 209 66

classification accuracy 85 97.8

Samples with no misclassifications areomitted from this table

165

& The Institution of Engineering and Technology 2010

16

&

www.ietdl.org

illuminant tl84. On the other hand, when the classifier wastrained with samples from illuminant inca and tested withilluminant horizon IFLT produced around 90% correctclassification whereas LBPriu2

P,R could produce around 81%correct classification. The difference in performance whentested with two different sets of samples captured undertwo different illuminations is quite high with LBPriu2

P,R

descriptors, whereas it is low with IFLT descriptors. Thisfurther cements the fact that IFLT descriptors are morestable and reliable descriptors under varying illuminationconditions.

The basic descriptor IFLT at (P, R) ¼ (8, 1) produces lesscorrect classification when compared to descriptors fromhigher resolution like at (P, R) ¼ (16, 2) and (P, R) ¼ (24,3) because of the fact that IFLT at (P, R) ¼ (8, 1) is betterequipped for rotation invariant classification and not somuch for illumination invariant classification.

In terms of misclassifications, LBPriu2P,R had a total of 692

misclassifications while IFLT had only 315 totalmisclassifications. Out of 315 misclassifications, around100 misclassifications were of canvas038. Fig. 5 showsthree different samples of canvas038, which illustratethe prominent tactile dimension of canvas038 and theeffect on local texture under different illuminationconditions.

These experimental results have shown that IFLTdescriptors are better equipped and more stable descriptorsfor texture classification irrespective of the spectralproperties of the illuminants affecting the colours oftextures. It should also be noted that IFLT descriptors arerobust enough to handle significant variations inimaging geometry of the illuminants, which affect theappearance of local distortions caused by the tactiledimension of textures.

Table 6 Performance (%) results for rotation andillumination invariant texture classification using IFLTand LBPP,R

riu2 on Outex_TC_00012

P, R LBPP,Rriu2 IFLT

‘tl84’ ‘horizon’ ‘tl84’ ‘horizon’

8, 1 67.5 62.7 77.5 76.3

16, 2 81.2 74.1 90.7 91

24, 3 84.0 80.5 92.7 90.1

8, 1 + 16, 2 83.8 78.3 89.7 90.1

8, 1 + 24, 3 90.2 84.1 93.07 90.5

16, 2 + 24, 3 86.4 82.5 94.6 91.8

8, 1 + 16, 2 + 24, 3 88.8 83.4 94.1 91.7

6The Institution of Engineering and Technology 2010

3.3 KTH-TIPS2 materials database

The KTH-TIPS2 database builds upon the well-knownKTH-TIPS database, by providing multiple images ofdifferent samples of different materials. This provides a

Figure 5 Three samples of canvas038

a inca, 08b horizon, 458c tl84, 908

IET Image Process., 2010, Vol. 4, Iss. 3, pp. 158–171doi: 10.1049/iet-ipr.2008.0229

IEdo

www.ietdl.org

much tougher platform to achieve material categorisation.Both databases are available on-line [19].

The database contains four physical, planar samples eachof 11 materials [24]. The database provides images withvariations in ‘scale’ as well as variations in ‘pose’ and‘illumination’, similar to KTH-TIPS, and in part theColumbia-Utrecht Reflectance and Texture Database(CUReT) [25]. One image from each sample is shown inFig. 6, many of these materials have 3D structure, implyingthat their appearance can change considerably as pose andlighting are changed. The variation in appearance betweenthe samples in each category is larger for some categoriesthan others. Cork, for instance, contains relatively littleinter class variation, whereas cracker and wool exhibitsignificant variation. The appearance of wool depends notonly on the material, but also on how it has been treated,in this case how the thread was spun and subsequentlyknitted. Brown bread and white bread are subclasses ofbread, and it might also make sense to group linen andcotton together in a woven fabric class. Hence, its not veryobvious as to how the samples should be split intocategories [24]. This database provides a good platform forfuture studies of unsupervised or supervised grouping ofclasses into higher-level categories, whether visual orsemantic, in a hierarchical structure.

The acquisition procedure for KTH-TIPS2 has beendescribed in more detail in [19]. The database containsimages at nine scales equally spaced logarithmically overtwo octaves. KTH-TIPS2 contains images at three poses(frontal, rotated 22.58 left and 22.58 right) captured underfour illumination conditions. The three illuminations usedwere frontal, 458 from the top and 458 from the side (alltaken using a desk-lamp with a tungsten light bulb),whereas for the fourth illumination condition researchers in[19] used the fluorescent lights in the laboratory. Althoughsome variation in pose and illumination is present,

T Image Process., 2010, Vol. 4, Iss. 3, pp. 158–171i: 10.1049/iet-ipr.2008.0229

KTH-TIPS2 contains significantly fewer settings forlighting and viewing angle than does CUReT.

3.3.1 Categorisation: The aim of the experiments inthis section is to evaluate the performance of IFLTdescriptors, which have previously been applied toidentifying particular samples. Coloured images wereconverted to grey-scale for our experiments. Recognitionand categorisation may be performed within exactly thesame pattern recognition framework, but categorisation islikely to provide a tougher platform for analysis.

More specifically, the goals of this section is to study theperformance of IFLT descriptors with classifiers such assupport vector machines (SVMs) and nearest neighbourclassifiers (NNCs) and also to investigate the improvementin performance as more samples are introduced into thetraining set, thus providing a richer model of each material.

We compare the performance of IFLT descriptor alongwith a rotational invariant MR8 descriptor [14], the jointdescriptor of [26] using an 11 × 11 patch and therotationally invariant uniform LBP descriptor at threespatial resolutions, LBP(8,1+16,3 + 24,5)

riu2 as in [27].LBP(8,1+16,3 + 24,5)

riu2 denotes a rotationally uniformdescriptor with neighbourhoods of 8, 16 and 24 pixels atradii 1, 3 and 5.

For NNC, the x2-distance between histograms were usedand for the SVM we followed the methodology of [24], andused the Gaussian radial basis function (Gaussian-RBF) andx2-kernel (7) since that gave the best results for identificationexperiments [24]. The main aim of this experiment was tocompare the performance of IFLT descriptor with respectto the state-of-the-art on the material categorisationproblem. Hence, we did not conduct more experimentswith different classifiers or different SVM parameters.

Figure 6 Image samples from KTH-TIPS2 database

Each row shows one example image from each of four samples of a category. In addition, each sample was imaged under varying pose,illumination and scale conditions

167

& The Institution of Engineering and Technology 2010

16

&

www.ietdl.org

g was set to 0.01 throughout

K = exp{−gx2(x, y)}, x2 =∑

i

|xi − yi|2

|xi + yi|2(7)

Similar to the experiments in [24] we first performexperiments where only a single sample is available duringtraining. All the images of that sample are placed in thetraining set, and testing is subsequently performed on allimages of all the remaining samples. This experiment isrepeated four times using different training samples. Wealso perform similar experiments with two and threesamples in the training set. Testing is always conductedonly on unseen samples and so we are truly studyingcategorisation as opposed to exemplar identification.

Figs. 7a and b, illustrate results for categorisation of all fourdifferent descriptors with SVM and NNC classification,comparing the four descriptors. Results are averaged over

Figure 7 Categorisation on the KTH-TIPS2 database,comparing MR8, Joint and LBP with IFLT descriptors

a SVM-based categorisationb Nearest neighbour classification (NNC)-based categorisation

8The Institution of Engineering and Technology 2010

four runs. These results support the fact that including moresamples increases performance. At the outset it looks likethat there is no significant trend to show the superiority ofany descriptor, the plots are all closely grouped. However,comparing results between the classification output of NNCand SVM, there seems to be approximately 10–15%increase in performance when SVM classification is used.

However, with MR8, Joint and LBP descriptors the easiestmaterial to recognise was cork (98%), followed by wool(above 95%), cracker was the hardest to recognise (20–40%depending on the number of training samples), followedby the less predictable cotton (35–55%). With theIFLT descriptor, the easiest to recognise materials werecork and aluminium foil (above 95%) followed by wool(above 93%) and the hardest to recognise was corduroy(22%) followed by cotton (28%). In contrast to the otherdescriptors, IFLT recorded upwards of 50% correctclassification for cracker.

IFLT descriptors found cotton as one of the hardesttextures to recognise. Approximately 30% of samples ofcotton were mis-classified as linen. Fig. 8d shows onesample each from cotton and linen. It is clear that even atrained human eye would find it hard to distinguishbetween samples of cotton and linen. Similarly 30% of thelinen samples are misclassified as cotton.

When texture models were visualised in a threedimensional space, cotton (denoted by red) was closer tolinen (denoted by blue) in IFLT feature space (8a). In LBPfeature space(8b), linen was considerably further away fromcotton with three other nearer textures in-between. InMR8 feature space(8c ), wool (denoted by green) is theclosest to cotton followed by linen.

Misclassifications between visibly alike materials such ascotton and linen and the visualisation in 3D space suggeststhat IFLT descriptors of such materials are close-togetherin feature space. IFLT generates descriptors that place‘visibly alike’ materials ‘close together’, which is in completecontrast to LBP, MR8 or joint descriptors. In this sense,IFLT is an intuitive texture descriptor.

Thus, there is reason to believe that by considering cottonand linen as belonging to one class the overall classificationperformance of IFLT will exceed the current state of the art.

3.4 Computational cost

Researchers in [24] have quoted that the LBP descriptors aremore compact, and are also the fastest to compute comparedto the existing state-of-the-art. In their bare format, bothIFLT and LBPriu2

P,R consider a neighbourhood of N pixels.In this case, LBPriu2

P,R requires O(N 2) computations togenerate a rotation invariant descriptor whereas IFLT takesO(N ) computations to generate rotation and scale invariantdescriptor per centre pixel.

IET Image Process., 2010, Vol. 4, Iss. 3, pp. 158–171doi: 10.1049/iet-ipr.2008.0229

IEdo

www.ietdl.org

Figure 8 Representation of different texture classes in feature space

a IFLT feature spaceb LBP feature spacec MR8 feature spaced Sample image from cotton and linen, respectivelyPlease note that the diagrams for feature space is best viewed in colour. Cotton, linen and wool are denoted by red, blue and greenrespectively

Hence, the process of generating rotation, scale and(essentially) invariant texture descriptors using IFLT iscomputationally less intense when compared to LBPriu2

P,R.

4 ConclusionsIFLT is a local texture descriptor, which possesses scale,rotation and (partial) illumination invariance characteristics.Performance results of the IFLT algorithm show that thedescriptors are more distinctive with respect to orientedtextures. IFLT Descriptors have also been able todiscriminate between strongly oriented textures veryefficiently and are more stable in different spatialresolutions. The experiments carried out in Section 3unambiguously demonstrate that IFLT outperforms current

T Image Process., 2010, Vol. 4, Iss. 3, pp. 158–171i: 10.1049/iet-ipr.2008.0229

state-of-the-art texture classifiers on the majority ofwell-known data sets. In particular, the recognitionperformance increase on the most challenging data set(Outex_TC_00012) is substantial.

It is useful to observe that the texture misclassificationsnoted in Section 3.3 show that IFLT is an intuitive texturedescriptor, which generates descriptors that place ‘visiblyalike’ materials ‘close together’ in complete contrast to othermethods.

IFLT is formally an O(N ) algorithm and this should notbe discounted even for the small values of N (typically 9)used in applications. Repeated use of the algorithmic core

169

& The Institution of Engineering and Technology 2010

17

&

www.ietdl.org

at each pixel of an image rapidly magnifies the savings intothe observable range.

Hence, IFLT is a very robust, efficient and computationallyefficient descriptor, which as a fundamental method has a widerange of potential applications in the field of computer visionand image/video processing.

5 AcknowledgmentsNICTA is funded by the Australian Federal Government asrepresented by the Department of Broadband,Communications and the Digital Economy, the NSWDepartment of State and Regional Development, the ACTGovernment and the Australian Research Council throughthe ICT Centre of Excellence Program.

6 References

[1] HARLICK R.M.: ‘Statistical and structural approaches totexture’, Proc. IEEE, 1979, 67, (5), pp. 786–804

[2] REED T., BUF J.H.D.: ‘A review of recent texturesegmentation and feature extraction techniques’, CVGIP,Image Underst., 1995, 57, pp. 395–372

[3] DAVIS L., JOHNS S., AGGARWAL J.: ‘Texture analysis usinggeneralized cooccurrence matrices’, IEEE Trans. PatternAnal. Mach. Intell., 1979, 1, pp. 251–259

[4] DAVIS L.: ‘Polarograms: a new tool for image textureanalysis’, Pattern Recognit., 1981, 13, (3), pp. 219–223

[5] CHETVERIKOV D.: ‘Experiments in the rotation-invarianttexture discrimination using anisotropy features’. Proc.Sixth Int. Conf. Pattern Recognition (ICPR), 1982,pp. 1071–1073

[6] LIU G.C.P., RYU J.C.K.H.: ‘New shape-based texturedescriptors for rotation invariant texture classification’.Proc. Int. Conf. on Image Processing (ICIP), 2003, vol. 3,pp. 533–536

[7] ZHANG J., TAN T.: ‘Brief review of invariant texture analysismethods’, Pattern Recognit., 2002, 35, (3), pp. 735–747

[8] GREENSPAN H., BELONGIE S., GOODMAN R., PERONA P.: ‘Rotationinvariant texture recognition using a steerable pyramid’.Proc. 12th Int. Conf. Pattern Recognition (ICPR), 1994,vol. 2, pp. 162–167

[9] HALEY G., MANJUNATH B.: ‘Rotation-invariant textureclassification using a complete space-frequency model’,IEEE Trans. Image Process., 1999, 8, (5), pp. 255–269

[10] LAM W.K., LI C.K.: ‘Rotated texture classificationby improved iterative morphological decomposition’,

0The Institution of Engineering and Technology 2010

IEE Proc. Vision Image Signal Process., 1997, 144,pp. 171–179

[11] PORTER R., CANAGARAJAH N.: ‘Robust rotation-invarianttexture classification: wavelet, gabor filter and gmrf basedschemes’, IEE Proc. Vision Image Signal Process., 1997,144, pp. 180–188

[12] AROF H., DERAVI F.: ‘Circular neighbourhood and 1-d dftfeatures for texture classification and segmentation’, IEEProc. Vision Image Signal Process., 1998, 145, pp. 167–172

[13] POPOVIC M.: ‘Texture analysis using 2d wavelettransform: theory’. Proc. Int. Conf. on Telecommunicationin Modern Satellite, Cable and Broadcasting Services,1999, vol. 1, pp. 149–158

[14] VARMA M., ZISSERMAN A.: ‘Classifying images of materials:achieving viewpoint and illumination independence’. Proc.European Conf. on Computer Vision (ECCV), 2002,pp. 255–271

[15] OJALA T., PIETIKAINEN M., MAENPAAET T.: ‘Multiresolution gray-scale and rotation invariant texture classification with localbinary patterns’, IEEE Trans. Pattern Anal. Mach. Intell.(PAMI), 2002, 24, (7), pp. 971–987

[16] JANNEY P., YU Z.: ‘Invariant features of local textures – arotation invariant local texture descriptor’. Proc. IEEE Conf.on Computer Vision and Pattern Recognition (CVPR), June2007, pp. 1–7

[17] BRODATZ P.: ‘Textures: a photographic album for artistsand designers’ (Dover Publications, 1966)

[18] University of Oulu Texture database, http://www.outex.oulu.fi/temp/

[19] MALLIKARJUNA P., TARGHI A.T., HAYMAN E., ET AL.: ‘The kth-tipsand kth-tips2 databases’, http://www.nada.kth.se/cvap/databases/kth-tips

[20] YANG Z., XIAO J.: ‘Scaling laws in image gradient andtexture retrieval’. Proc. Int. Conf. on Pattern Recognition(ICPR), August 1998, vol. 2, pp. 1061–1064

[21] MALLAT S.: ‘A wavelet tour of signal processing’, ‘Waveletanalysis & its applications’ (Academic Press, September1999, 2nd edn.)

[22] YOUNG R.A.: ‘The Gaussian derivative model for spatialvision: retinal mechanisms’, Proc. Spatial Vis., 1987, 2, (4),pp. 273–293

[23] OJALA T., MAENPAA T., PIETIKAINEN M., ET AL.: ‘Outex-newframework for empirical evaluation of texture analysisalgorithms’. Proc. 16th Int. Conf. on Pattern Recognition(ICPR), 2002, vol. 1, pp. 701–706

IET Image Process., 2010, Vol. 4, Iss. 3, pp. 158–171doi: 10.1049/iet-ipr.2008.0229

www.ietdl.org

IETdo

[24] CAPUTO B., HAYMAN E., MALLIKARJUNA P.: ‘Class-specificmaterial categorisation’. Proc. 10th IEEE Int. Conf. onComputer Vision (ICCV), October 2005, vol. 2,pp. 1597–1604

[25] DANA K., VAN GINNEKEN B., NAYAR S., KOENDERINK J.:‘Reflectance and texture of real-world surfaces’, ACMTrans. Graph. (TOG), 1999, 18, (1), pp. 1–34

Image Process., 2010, Vol. 4, Iss. 3, pp. 158–171i: 10.1049/iet-ipr.2008.0229

[26] VARMA M., ZISSERMAN A.: ‘Texture classification: are filterbanks necessary?’. Proc. IEEE Conf. on Computer Visionand Pattern Recognition (CVPR), June 2003, vol. 2,pp. 691–699

[27] PIETIKAINEN M., NURMELA T., MAENPAA T., TURTINEN M.: ‘View-based recognition of real-world textures’, PatternRecognit., 2004, 37, (2), pp. 313–323

171

& The Institution of Engineering and Technology 2010