INDEPENDENT COMPONENT ANALYSIS FOR CLASSIFYING MULTISPECTRAL IMAGES WITH DIMENSIONALITY LIMITATION

International Journal of Information AcquisitionVol. 1, No. 3 (2004) 201–216c© World Scientific Publishing Company

INDEPENDENT COMPONENT ANALYSIS FORCLASSIFYING MULTISPECTRAL IMAGES WITH

DIMENSIONALITY LIMITATION

QIAN DUDepartment of Electrical and Computer Engineering

Mississippi State University, MS 39762, USA

IVICA KOPRIVA and HAROLD SZUDepartment of Electrical and Computer Engineering

The George Washington University, Washington, DC 20052, USA

Received 21 July 2004Accepted 22 August 2004

Airborne and spaceborne remote sensors can acquire invaluable information about earthsurface, which have many important applications. The acquired information usually isrepresented as two-dimensional grids, i.e. images. One of techniques to processing suchimages is Independent Component Analysis (ICA), which is particularly useful for clas-sifying objects with unknown spectral signatures in an unknown image scene, i.e. unsu-pervised classification. Since the weight matrix in ICA is a square matrix for the purposeof mathematical tractability, the number of objects that can be classified is equal to thedata dimensionality, i.e. the number of spectral bands. When the number of sensors (orspectral channels) is very small (e.g. a 3-band CIR photograph and 6-band Landsat imagewith the thermal band being removed), it is impossible to classify all the different objectspresent in an image scene using the original data. In order to solve this problem, wepresent a data dimensionality expansion technique to generate artificial bands. Its basicidea is to use nonlinear functions to capture and highlight the similarity/dissimilaritybetween original spectral measurements, which can provide more data with additionalinformation for detecting and classifying more objects. The results from such a nonlin-ear band generation approach are compared with a linear band generation method usingcubic spline interpolation of pixel spectral signatures. The experiments demonstrate thatnonlinear band generation approach can significantly improve unsupervised classificationaccuracy, while linear band generation method cannot since no new information can beprovided. It is also demonstrated that ICA is more powerful than other frequently usedunsupervised classification algorithms such as ISODATA.

Keywords : Independent component analysis; unsupervised classification; data dimensionality;remotely sensed imagery; multispectral imagery; aerial photograph.

201

202 Q. Du, I. Kopriva & H. Szu

1. Introduction

Airborne and spaceborne remote sensors canbe used to measure the object properties onthe earth’s surface and monitor the humanactivities on the earth. When information isacquired by collecting the solar energy that isreflected from the earth surface and displayedin two-dimensional grids, the resultant productsare images in optical remote sensing. In somecases, there are hundreds of sensors operatingat contiguous spectral ranges for capturing co-registered images, which is called hyperspectralimaging. While in some other cases, thenumber of sensors is small and only several spec-tral bands are available, which is called mul-tispectral imaging. Hyperspectral images haveabundant spectral information for object detec-tion, classification, and identification, but imageanalysis is quite complicated due to the datacomplexity. The image analysis of multispectralimages is relatively simple, but we may meetthe data dimensionality limitation in some occa-sions. When the hardware condition (e.g. thenumber of sensors) cannot be improved, we mayresort to some signal processing techniques toexplore additional information from the multi-spectral data that is original acquired.

As remote sensing and its applicationsreceive lots of interests recently, many algo-rithms in remotely sensed image analysis havebeen proposed. While they have achieved a cer-tain level of success, most of them are supervisedmethods, i.e. the information of the objects to bedetected and classified is assumed to be knowna priori. If such information is unknown, thetask will be much more challenging. Since thearea covered by a single pixel is very large,the reflectance of a pixel can be considered asthe mixture of all the materials resident in thearea covered by the pixel. Therefore, we have todeal with mixed pixels instead of pure pixels asin conventional digital image processing. Linearspectral unmixing analysis is a popularly usedapproach in remote sensing image processing touncover material distribution in an image scene[Singer & McCord, 1979; Adams & Smith, 1986;Adams, Smith & Gillespie, 1993; Settle & Drake,1993]. Let N be the number of bands (i.e. spec-tral channels) and r = (r1r2 · · · rN )T a column

pixel vector with dimensionality N in an image.An element ri in the r is the reflectance col-lected at the ith band. Assume that there are aset of distinct materials (endmembers) presentin the image scene and all the pixel reflectancesare assumed to be linear mixture from theseendmembers. Let M denote a signature matrix.Each column vector in M represents an end-member signature, i.e. M = [m1,m2, . . . ,mp],where p is the total number of endmembers res-ident in the image scene. Let α = (α1α2 · · ·αp)T

be the unknown abundance column vector asso-ciated with M, which is to be estimated. Theith element αi in α represents the abundancefraction of mi in pixel r. According to the linearmixture model,

r = Mα + v, (1)

where v is the noise term. If the endmembersignature matrix M is known, then α can beapproximated by using its least squares estimateα̂ to minimize the following estimation error:

e = ‖r− Mα̂‖2. (2)

If α̂i for all the pixels are plotted, a new imageis generated to describe the distribution of theith endmember mi , where a bright pixel rep-resents high abundance of the ith material inthe image scene and a dark pixel represents lowabundance. As a result, soft classification can beachieved.

In practice, it may be difficult to haveprior information about the image scene andendmember signatures. Moreover, in-field spec-tral signatures may be different from thosein spectral libraries due to atmospheric andenvironmental effects. So an unsupervised clas-sification approach is preferred. Independentcomponent analysis (ICA) has been successfullyapplied to blind source separation [Jutten &Herault, 1991; Cardoso & Souloumiac, 1993;Comon, 1994; Cardoso & Laheld, 1996; Cichocki& Unbehauen, 1996; Szu, 1999, 2003, 2004]. Thebasic idea of ICA is to decompose a set ofmultivariate signals into a base of statisticallyindependent sources with the minimal loss ofinformation content so as to achieve

Independent Component Analysis for Classifying Multispectral Images 203

classification. The standard linear ICA-baseddata model with additive noise is

u = As + v, (3)

where u ∈ RN×M is an N dimensional datavector consisted of M samples (for an imagedata, M is the number of pixels), A ∈ RN×N isan unknown mixing matrix, and s ∈ RN×M

is an unknown source signal vector. Threeassumptions are made on the unknown sourcesignals s:

(1) each source signal is an independent andidentically distributed (i.i.d.) stationaryrandom process;

(2) the source signals are statistically indepen-dent at any time; and

(3) at most one among the source signals hasGaussian distribution.

The mixing matrix A although unknown isalso assumed to be non-singular. Then the solu-tion to the blind source separation problem isobtained with the scale and permutation inde-terminacy, i.e. Q = WA = PΛ, where Wrepresents the unmixing matrix, P is generalizedpermutation matrix and Λ is a diagonal matrix.These requirements ensure the existence anduniqueness of the solution to the blind sourceseparation problem (except for ordering, sign,and scaling). Comparing to many conventionaltechniques, which use up to the second orderstatistics only, ICA exploits high order statis-tics that makes it a more powerful method inextracting irregular features in the data [Hyvari-nen, Karhunen & Oja, 2001].

Several researchers have explored the ICAto remote sensing image classification. Szu andBuss [2000], and Yoshida and Omatu [2000]investigated ICA classifiers for multispectralimages. Tu [2000] and Chang et al. [2002] devel-oped their own ICA-type approaches to hyper-spectral images. Chen and Zhang [1999] andFiori [2003] presented their ICA results for SARimage processing. In general, when we intendto apply the ICA approach to classify opticalmultispectral/hyperspectral images, the linearmixture model in Eq. (1) needs to bereinterpreted so as to fit in the ICA model

in Eq. (3). Specifically, the endmember matrixM in Eq. (1) corresponds to the unknownmixing matrix A in Eq. (3) and abundancefractions in α in Eq. (1) correspond to sourcesignals in s in Eq. (3). Moreover, the p abun-dance fractions α1, α2, . . . , αp are considered asunknown random quantities specified by randomsignal sources rather than unknown determinis-tic quantities as assumed in the original linearmixture mode (1). With these interpretationsand above-mentioned assumptions, we will usemodel (3) to replace model (1) thereafter. Theadvantages offered by using model (3) in remotesensing image classification are:

(1) no prior knowledge of the endmembers inthe mixing process is required;

(2) the spectral variability of the endmemberscan be accommodated by the unknown mix-ing matrix of A since the source signals areconsidered as scalar and random quantities;and

(3) higher order statistics can be exploitedfor better feature extraction and patternclassification.

In the ICA model (3), A is generally set to bea square matrix for mathematical tractability.Then for an N -band image, we can separate andclassify N materials/objects/classes, i.e. p = Nin model (1). When dealt with a multispectralimage or a color composite image (3 RGB bandsor CIR bands), this brings about a limitation toclassification performance. Because in most ofpractical applications such as land cover map-ping, the number of land cover patterns in animage scene is usually not a small value. In orderto let ICA technique be applicable to such casesand satisfying classification be achievable, wewill introduce a data dimensionality expansionapproach to resolve this problem, which gener-ates artificial spectral measurements to exploreadditional information in the original data. Toensure such new measurements are linearly inde-pendent of the original data, a nonlinear func-tion is to be adopted. In our experiments, anICA method referred to as Joint ApproximateDiagonalization of Eigenmatrices (JADE) algo-rithm is used, although any other ICA algorithm


such as FastICA [Hyvarinen & Oja, 1997] can beused as well.

The remainder of this paper is organizedas follows. Section 2 describes the JADE ICAalgorithm. Section 3 introduces a dimensionalityexpansion technique for the application of theJADE algorithm to remote sensing images.Section 4 presents the qualitative and quantita-tive experimental results with classification per-formance comparison. And Sec. 5 presents someconcluding remarks.

2. ICA and JADE

The strategy of ICA algorithms is to find a lin-ear transform W (i.e. the unmixing matrix ofsize N × N):

z = Wu = WAs + Wv = Qs + Wv, (4)

such that components of the vector z are asstatistically independent as possible. Based onthe assumption that source signals s are mutu-ally statistically independent and non-Gaussian(except one that is allowed to be Gaussian), thevector z will represent vector of the source sig-nals s up to the permutation, scale and signfactors.

There are several different types ofICA algorithms developed recently, such asnon-Gaussianity maximization based ICA[Hyvarinen & Oja, 1997], maximum likelihoodestimation based ICA [Pham, 1997; Bell &Sejnowski, 1996; Lee, Girolami & Sejnowski,1999], mutual information minimization basedICA [Comon, 1994], nonlinear decorrelationbased ICA [Cichocki & Unbehauen, 1994, 1996],and nonlinear ICA [Oja, 1997]. Here, we selectthe popular Joint Approximate Diagonalizationof Eigenmatrices (JADE) algorithm [Cardoso &Souloumiac, 1993] which is based on the usageof the fourth order statistics (cumulants). Thehigher order statistical dependence amongdata samples is measured by the higher ordercross-cumulants. Smaller values of the cross-cumulants represent less dependent samples. InJADE the fourth order statistical independenceis achieved through minimization of the squaresof the fourth order cross-cumulants between the

components zi of z in (4). The fourth-ordercross-cumulants can be computed as

C4(zi, zj , zk, zl)= E�zizjzkzl� − E�zizj�E[zkzl]

−E[zizk]E�zjzl� − E[zizl]E�zjzk� (5)

for 1 ≤ i, j, k, l ≤ N , where E[·] denotes mathe-matical expectation operator. Equation (5) canbe expressed as an N2 × N2 Hermitian matrix.The optimal unmixing matrix W∗ is the onethat satisfies

W∗ = arg min∑i,j,k,l

off(WT C4(zi, zj , zk, zl)W),

(6)

where off(·) is a measure for the off-diagonalityof a matrix defined as

off(X) =∑

1≤i�=j≤N

|xij |2 , (7)

where xij is the ijth element of a matrix X.An intuitive way to solving the optimizationproblem expressed in Eq. (6) is to jointly diag-onalize the eigenmatrices of C4(zi, zj , zk, zl) inEq. (5) via Givens rotation sweeps. In order forthe first and second order statistics not to affectthe results, the data u in Eqs. (3) and (4) shouldbe pre-whitened (mean is zero and covariancematrix is the identity matrix).

One of the advantages of using such a fourthorder cumulant based ICA algorithm is itscapability to suppress additive Gaussian noise,because the fourth order cumulants asymptot-ically vanish for Gaussian processes, i.e. theyare blind w.r.t. Gaussian noise. Based on themultilinearity property of the cumulants, we canrewrite the fourth order cross-cumulants of thez in Eq. (4) as [McCullagh, 1995]:

Ckl(zi, zj) =N∑

n=1

qkinql

jnC4(sn)

+N∑

n=1

wkinwl

jnC4(vn), (8)

where q and w are the corresponding elementsof the matrices Q and W in C4. Now thanksto the fact that the fourth order cumulants are

https://www.researchgate.net/publication/222489418_Independent_Component_Analysis_a_New_Concept?el=1_x_8&enrichId=rgreq-6e1ac063-06dd-4eac-96c9-a3588113260d&enrichSource=Y292ZXJQYWdlOzIyMDQzNjY3NDtBUzoxMDE5MDczOTM2NzkzNjVAMTQwMTMwODAyNDgxMA==

https://www.researchgate.net/publication/220499546_A_Fast_Fixed-Point_Algorithm_for_Independent_Component_Analysis?el=1_x_8&enrichId=rgreq-6e1ac063-06dd-4eac-96c9-a3588113260d&enrichSource=Y292ZXJQYWdlOzIyMDQzNjY3NDtBUzoxMDE5MDczOTM2NzkzNjVAMTQwMTMwODAyNDgxMA==

https://www.researchgate.net/publication/220499546_A_Fast_Fixed-Point_Algorithm_for_Independent_Component_Analysis?el=1_x_8&enrichId=rgreq-6e1ac063-06dd-4eac-96c9-a3588113260d&enrichSource=Y292ZXJQYWdlOzIyMDQzNjY3NDtBUzoxMDE5MDczOTM2NzkzNjVAMTQwMTMwODAyNDgxMA==

https://www.researchgate.net/publication/223694041_The_Nonlinear_PCA_Learning_Rule_in_Independent_Component_Analysis?el=1_x_8&enrichId=rgreq-6e1ac063-06dd-4eac-96c9-a3588113260d&enrichSource=Y292ZXJQYWdlOzIyMDQzNjY3NDtBUzoxMDE5MDczOTM2NzkzNjVAMTQwMTMwODAyNDgxMA==


blind w.r.t. Gaussian noise, i.e. C4(v) = 0,Eq. (8) becomes

Ckl(zi, zj) ∼=N∑

n=l

qkinql

jnC4(sn)

for k, l ∈ {1, 2, 3} and k + l = 4. (9)

Therefore, in addition to recovering theunknown source signals, the JADE algorithmcan successfully suppress additive Gaussiannoise.

A critical factor in the ICA applications isthe level of singularity/non-singularity of themixing matrix A. Although A is unknown, wehave assumed it to be non-singular and invert-ible. Hence, in the context of our application ofICA in remote sensing image analysis, the infor-mation taken by different spectral channels mustbe distinct enough in order to make measure-ments linearly independent and sources to beblindly separable. This requirement can be metwhen we use a data dimensionality expansionprocess to generate artificial spectral measure-ments, which is to be discussed in the followingsection.

3. ICA for Multispectral ImageClassification

For the purposes of simplicity and mathemati-cal tractability, the mixing and unmixing matri-ces A and W are assumed to be square, i.e.the number of source signals to be classified phas to be equal to the data dimensionality N .For instance, if there are six classes presentin an image scene, we need to have six spec-tral channels (sensors) to collect measurementdata. For many remote sensing images for com-mercial purposes, such as color-infrared (CIR)photograph and multispectral images (e.g.3-band SPOT, 4-band QuickBird, 4-BandIkonos, 6-band LandSat with the thermal bandbeing removed), it is difficult for the ICAtechnique to achieve fine classification since thenumber of classes present in an image sceneis generally larger than the number of spec-tral bands. When the hardware condition (thenumber of sensors) is limited, we may have

to resort to signal processing approaches toincrease the data dimensionality so as to pro-vide additional information for classificationimprovement.

3.1. Nonlinear band generation

A simplest solution to relaxing the data dimen-sionality is to adopt some nonlinear func-tions to generate a set of artificial images asadditional linearly independent spectral mea-surements [Ren and Chang, 2000]. Althoughtheoretically any nonlinear function can be used,some specific rules need to be set for nonlinearfunction selection. Based on our experimentalstudies, we find out that a nonlinear functionthat can enlarge or emphasize the discrepancybetween original spectral measurements willhelp to improve classification performance, sincethe techniques applied here use spectral infor-mation. A simplest but effective choice is multi-plication. When two original images Bi and Bj

are multiplied together (each pair of correspond-ing pixels at the same location is multiplied),a new image is generated, i.e. {BiBj}N

i,j=1,i�=j.Here, multiplication acts as match filtering.When the multiplicant and multiplier are equal,the product is the maximum (at the quantitylevel of the multiplicant and multiplier). So mul-tiplication can emphasize the spectral similaritybetween two spectral measurements of the samepixel, which is equivalent to emphasizing theirdissimilarity or discrepancy. Multiplication canbe also used for a single band, i.e. {BiBi}N

i=1.Then it emphasizes a single spectral measure-ment itself, which is also equivalent to enlargingthe spectral difference from other spectral mea-surements of this pixel. Adding these two setsof artificial bands, there are totally N + N +(N

2

)= N2/2 + 3N/2 bands available for pro-

cessing. According to our experience, these twosets of spectral bands can provide useful addi-tional information for detection and classifica-tion. Another advantage of using multiplicationis the great simplicity of chip design for practicalimplementation [Du and Nekovei, 2005].

Let us assume there are four three-dimensional pixel vectors as listed in Table 1,which represent four class means. The first pixel

https://www.researchgate.net/publication/223549306_Implementation_of_real-time_constrained_linear_discriminant_analysis_to_remote_sensing_image_classification?el=1_x_8&enrichId=rgreq-6e1ac063-06dd-4eac-96c9-a3588113260d&enrichSource=Y292ZXJQYWdlOzIyMDQzNjY3NDtBUzoxMDE5MDczOTM2NzkzNjVAMTQwMTMwODAyNDgxMA==

https://www.researchgate.net/publication/3202478_A_generalized_orthogonal_subspace_projection_approach_to_multispectral_image_classification?el=1_x_8&enrichId=rgreq-6e1ac063-06dd-4eac-96c9-a3588113260d&enrichSource=Y292ZXJQYWdlOzIyMDQzNjY3NDtBUzoxMDE5MDczOTM2NzkzNjVAMTQwMTMwODAyNDgxMA==


Table 1. Four three-dimensional pixel vectors with six nonlinearly generated bands.

B1 B2 B3 B1B2 B2B3 B1B3 (B1)2 (B2)

2 (B3)2

r1 0.2500 0.5000 0.7500 0.1250 0.3750 0.1875 0.0625 0.2500 0.5625r2 0.3400 0.4462 0.7714 0.1517 0.3442 0.2623 0.1156 0.1991 0.5950r3 0.2472 0.5783 0.8024 0.1429 0.4640 0.1984 0.0611 0.3344 0.6439r4 0.2413 0.4037 0.8143 0.0974 0.3287 0.1965 0.0582 0.1630 0.6631

vector r1 = (0.25, 0.5, 0.75)T , and r2, r3, r4

are generated by adding random values within(−0.1,+0.1) to the elements of r1. So the spec-tral signatures of these four vectors are veryclose, and it is difficult to differentiate themfrom each other. Since there are only threebands, we need to generate more bands fortheir classification. The above-mentioned tech-niques are used to generate additional six bands,and the resulting vectors are nine-dimensional.The spectral difference can be quantified byusing spectral angle mapper (SAM) [Kruseet al., 1993], which calculates the angle betweentwo unit vectors, i.e. d(ri, rj) = cos−1

(rT

i rj

|ri|·|rj|).

If ri = rj , then d(ri, rj) = 0; if ri ⊥ rj ,then d(ri, rj) = π

2 . For all other cases, 0 <d(ri, rj) < π

2 . A large d(ri, rj) means ri andrj are very dissimilar and easy to be differen-tiated from each other. Tables 2 and 3 list thespectral distances between each pair of vectorsbefore and after data dimensionality expansion.We can see that the distances of the constructed

Table 2. SAM-based spectral distancesbetween original 3D vectors.

r1 r2 r3 r4

r1 0 0.1117 0.0436 0.1238r2 0 0.1529 0.1215r3 0 0.1596r4 0

Table 3. SAM-based spectral distancesbetween 9D vectors (nonlinearly generated).

r1 r2 r3 r4

r1 0 0.1275 0.0663 0.1507r2 0 0.1800 0.1409r3 0 0.1962r4 0

nine-dimensional vectors in Table 3 are largerthan their counterparts in Table 2, which meansthat using a simple nonlinear function such asmultiplication the spectral differences betweenthese four classes can be increased. In otherwords, their separability is increased. As aresult, not only the data dimensionality isexpanded to make an ICA algorithm applica-ble, but also the classification results can beimproved.

3.2. Band interpolation

An intuitive method to generate artificial bandsis to interpolate the spectral signature of eachpixel. Let r = (r1r2r3)T denote a pixel vector ina 3-band image, and r1, r2, and r3 are reflectancein band B1, B2, and B3, respectively. In order toexpand the data dimensionality, we may inter-polate an element r4 using r1 and r2, r5 usingr2 and r3. Then the r4 of all the pixels con-struct a new band denoted as I(B1B2) and r5 asI(B2B3). These five bands can be used for clas-sification. If more bands are needed, more ele-ments will be interpolated between [r1, r2] and[r2, r3]. A classical cubic spline interpolation canbe used for this purpose [Unser, 1999]. However,because of the linear nature of such interpola-tion, the generated bands do not contain moreinformation. So the classification results maynot be improved, as to be demonstrated in theexperiments.

Table 4 lists the same four 3D vectors as inTable 1. For each pixel, three measurements aregenerated by cubic spline interpolation betweenits values in B1 and B2, and another three mea-surements are interpolated between the valuesin B2 and B3. The spectral distances of these9D vectors are calculated in Table 5. Comparedto Table 2, they are not larger than the origi-nal distances between the four 3D vectors. This

https://www.researchgate.net/publication/3321323_Splines_A_Perfect_Fit_for_Signal_and_Image_Processing?el=1_x_8&enrichId=rgreq-6e1ac063-06dd-4eac-96c9-a3588113260d&enrichSource=Y292ZXJQYWdlOzIyMDQzNjY3NDtBUzoxMDE5MDczOTM2NzkzNjVAMTQwMTMwODAyNDgxMA==


Table 4. Four three-dimensional pixel vectors with six linearly generated bands.

B1 B2 B3 I(B1B2)1 I(B1B2)2 I(B1B2)3 I(B2B3)1 I(B2B3)2 I(B2B3)3

r∗1 0.2500 0.5000 0.7500 0.3125 0.3750 0.4375 0.5625 0.6250 0.6875

r∗2 0.3400 0.4462 0.7714 0.3491 0.3731 0.4071 0.4992 0.5745 0.6670

r∗3 0.2472 0.5783 0.8024 0.3405 0.4274 0.5070 0.6429 0.7024 0.7559

r∗4 0.2413 0.4037 0.8143 0.2611 0.2982 0.3474 0.4755 0.5713 0.6859

Table 5. SAM-based spectral distancesbetween original 9D vectors (linearlygenerated).

r∗1 r∗2 r∗3 r∗4

r∗1 0 0.0889 0.0335 0.1154

r∗2 0 0.1189 0.1108

r∗3 0 0.1440

r∗4 0

means using the measurements generated by alinear method cannot improve the classificationbecause no more independent spectral informa-tion is carried out, no matter how sophisticatedsuch a linear method is. Interestingly, the spec-tral distances in Table 5 are even smaller thanthe original in Table 2. This is due to the factthat more training samples are required to sep-arate two similar classes when the data dimen-sionality is increased [Hughes, 1968; Landgrebe,2002], which makes it more difficult to achievecomparable class separation.

4. Experiments

4.1. Qualitative study

Example 1: CIR image experiment

Figure 1 shows a CIR composite image aboutthe urban/suburban area of Houghton, Michi-gan, which has three bands. There are build-ings, vegetation, and water bodies present inthis image scene. ICA classification resultsusing the original 3-bands were shown inFig. 2. The urban area with part of vege-tation (e.g. grassland) was classified in inde-pendent component 1 (IC1), lake and riverwere classified in IC2, and other vegetation(e.g. broadleaf trees) was classified in IC3.Three new bands (B1B2,B2B3,B1B3) werenonlinearly generated, which were used together

Fig. 1. An original CIR image.

with three original bands for ICA classification.As shown in Fig. 3, finer classification resultswere provided, where the urban area with highresidential density was classified in IC2, build-ings (with another type of roofing) were clas-sified in IC1, vegetation (e.g. grassland) wasclassified in IC3 and vegetation (e.g. broadleaftrees) was classified in IC4. It is very interestingthat both lake and river were classified into twodifferent ICs: IC5 and IC6. The reason is thatthey have different clarity and depth. The riverflowing through the urban area is more turbidwith suspended matters, and water is relativelyshallow. Part of the lake was also displayed withgray shades in the classified river image in IC6,which means the water clarity and depth in dif-ferent lake areas is different. For instance, thewater close to the shore is more turbid and shal-low. It should be noted that no more classes wereproduced if more artificial bands were generated.

Figure 4 shows the ICA classification resultsusing original 3-band with two cubic spline inter-polated bands. We can see that IC2, IC3 andIC4 presented the same class information, whichcorresponds to IC1 in Fig. 2 using the original3-band image, IC1 is the same as IC2 in Fig. 2,and IC5 is the same as IC3 in Fig. 2. Usingthe original 3-band image, only part of vegeta-tion was classified, river and lake were claimedas a single class, and all others were grouped


IC1 (urban area and part of vegetation) IC2 (lake and river)

IC3 (part of vegetation)

Fig. 2. Classification using original 3-band CIR image.

IC1 (buildings) IC2 (high residential urban area)

IC3 (vegetation 1) IC4 (vegetation 2)

IC5 (lake) IC6 (river)

Fig. 3. Classification using three original bands plus three nonlinearly generated bands in the CIR image experiment.


IC1 IC2

IC3 IC4

IC5

Fig. 4. Classification using data generated by cubic spline band interpolation in the CIR image experiment.

together. This demonstrates that linear meth-ods such as band interpolation cannot providenew data information, so no improvement inclassification can be brought about althoughdata dimensionality is increased.

Example 2: SPOT image experiment

The image used in the second experiment wasa 3-band SPOT multispectral image about aWashington DC suburban area, as shown inFig. 5. According to prior information, thereare five classes present in this image scene:Buildings, roads, vegetation, soil, and water.Figure 6 shows the classification using the origi-nal 3 bands, where IC1 contained all the classes,and IC2 and IC3 were for different types ofbuildings. Obviously, these five objects were not

well classified, because data dimensionality wastoo limited to separate them from each other.Then we generated three new bands by multi-plying each pair of bands (B1B2,B2B3,B1B3).These three new bands together with originalthree bands enabled us to classify 6 classes,which were shown in Fig. 7: Roads and somebuildings in IC1, vegetation in IC2, soil in IC3,different types of buildings in IC5 and IC6, andwater in IC4. Buildings have diverse spectralsignatures, mainly depending upon the materi-als covering the roofs. That is why they wereclassified into several classes. Some buildingswere made up of the same material as the roads,so they were classified into the same class in IC1.Compared to Fig. 6, classification was improvedby using nonlinearly generated bands. If wegenerate more artificial bands, we do not find


Band 1 Band 2 Band 3

Fig. 5. An original 3-band SPOT image.

IC1 (all classes) IC2 (buildings) IC3 (buildings)

Fig. 6. Classification using original 3-band SPOT image.

IC1 (roads and buildings) IC2 (vegetation) IC3 (soil)

IC4 (water) IC5 (buildings) IC6 (buildings)

Fig. 7. Classification using three original bands plus three nonlinearly generated bands in the SPOT image experiment.


IC1 IC2 IC3

IC4 IC5

Fig. 8. Classification using data generated by cubic spline interpolation in the SPOT image experiment.

more meaningful classes and buildings and roadscannot be further separated. This also meansthe classification accuracy is upper-bounded bythe information included in the original data.If there is no spectral difference (or differenceis too subtle) between two classes, they willnot be separated no matter how many nonlin-ear measurements are generated. So the nonlin-ear approach presented here can only “explore”information embedded in the original data, butnot “create” brand-new information.

When using the band interpolation, the clas-sification could not be improved as shown inFig. 8. IC4, IC2, and IC3 in Fig. 8 are the sameas IC1, IC2 and IC3 in Fig. 6 using three orig-inal bands, respectively. IC1 and IC5 in Fig. 8present the similar information, which were veryclose to the negative image of IC4. This exper-iment further demonstrates that linearly gener-ated bands cannot help to improve classification.

4.2. Quantitative study

In order to quantitatively assess the per-formance of the JADE algorithm in con-junction with the dimensionality expansion

approach and compare with other frequentlyused unsupervised classification algorithm inremote sensing such as ISODATA, an HYDICEimage scene with precise pixel-level ground truthis used. The original image has 210 bands. Afterwater absorption bands are removed, only 169bands are remained. In order to simulate a mul-tispectral case, only 17 bands are uniformlyselected from these 169 bands and used in theexperiment. As shown in Fig. 9(a), there are15 panels present in the image scene which arearranged in a 5× 3 matrix. Each element in thismatrix is denoted by pij with row indexed byi and column indexed by j. The three panelsin the same row were made from the samematerial and are of size 3m × 3m, 2m × 2m,

(a): image scene (b): panel arrangement

p11, p12, p13

p21, p22, p23

p31, p32, p33

p41, p42, p43

p51, p52, p53

Fig. 9. The HYDICE image scene used in theexperiment.


and 1m × 1m, respectively, and they are con-sidered as a single class. The ground truth mapwas plotted in Fig. 9(b) with the precise spatiallocations of panel pixels, where the black pixelsare referred to as panel center pixels and whitepixels are considered as panel pixels mixed withbackground pixels. The panels in each row are inthe same class, and the signatures of these fivepanel classes are very similar. The grassland andtrees to the left of the panels are the two back-ground classes.

In this case, 17 bands were enough forclassifying the 7 classes and the JADE algo-rithm was directly applied. The five indepen-dent components containing panels were shownin Fig. 10(a), where the five panel classes couldbe separated although some background pixelswere misclassified, particularly when classifyingP2 and P3. The ISODATA results were shownin Fig. 10(b), where only two classes containedpanels. We can see that the five panel classescould not be well separated because their spec-tral signatures were very close and ISODATAwould only cluster them together. Here, the ISO-DATA algorithm was initiated by selecting thepixel with the largest norm as the first class cen-troid, and the pixel with the greatest distancefrom this pixel as the second class centroid. Thealgorithm was stopped when the same numberof classes as in JADE algorithm was generated.Actually, if the ISODATA continues running,the background will be split into more classes,but panels cannot be further separated. We

also randomly picked up a pixel as the initialclass. Then the final results were slightly differ-ent with panels still being misclassified into twoclasses. It should be noted that the ISODATAimages are black and white instead of grayscalethrough the use of ICA technique, because itperforms 0–1 membership assignment (i.e. hardclassification).

In order to further improve the performance,16 bands were added which were nonlinearlygenerated by multiplying all the pairs of adja-cent bands. These 16 bands with original 17bands were used for re-classification. As shownin Fig. 11, both ISODATA and JADE wouldprovide better results in terms of better back-ground suppression. But the ISODATA stillcould not separate the five panel classes fromeach other. Table 6 lists the numbers of correctlydetected pure panel pixels (ND) and false alarmalarmed pixel (NF ) after comparing with thepixel-level ground truth. Here, the grayscaleclassification maps generated by JADE wereconverted into black and white images by set-ting the threshold to be the middle point of thegray levels in each classification map. We can seethat JADE significantly outperforms ISODATAsince its related ND is larger and NF is muchsmaller. When 33 bands were used, JADE couldcorrectly classify 14 out of 19 pure panel pix-els without false alarm. The missed five panelpixels are in the rightmost column and theirsize is 1m × 1m. The spatial resolution of thisimage scene is 1.5 m, so these panel pixels are

(a): JADE results

(b): ISODATA results

Fig. 10. Panel classification results using original 17 HYDICE bands.


(a): JADE results

(b): ISODATA results

Fig. 11. Panel classification results using 33 HYDICE bands (original 17 bands plus generated 16 bands).

Table 6. Classification performance comparison.

JADE JADE ISODATA ISODATAThe Number (33 bands) (17 bands) (33 bands) (17 bands)

of PurePixels ND NF ND NF ND NF ND NF

P1 3 2 0 1 0 1 47 1 91P2 4 3 0 3 47 3 40 3 90P3 4 3 0 3 28 3 46 2 10P4 4 3 0 3 0 3 39 2 13P5 4 3 0 3 0 2 38 3 13

Total 19 14 0 13 75 12 210 11 217

embedded at the subpixel level and prone to bemissed. The major reason to why JADE is morepowerful than ISODATA is that JADE as well asother ICA algorithms takes advantage of higherorder statistics in the data (the fourth ordercumulants in JADE) while the clustering-basedISODATA uses only the second order statistics.

This experiment demonstrates that ICA is amore powerful classification approach than othercommonly used unsupervised classification tech-nique such as ISODATA. It also shows that evenwhen the data dimensionality is larger than thenumber of classes and an ICA algorithm canbe directly applied, by using nonlinearly gen-erated bands the classification accuracy can beimproved.

5. Conclusions and Discussions

Independent component analysis (ICA) can pro-vide unsupervised classification for remotely

sensed images, which is a very useful approachwhen no prior information is available about animage scene and its classes. But when appliedto multispectral images or CIR photograph, itsperformance is limited by the data dimension-ality, i.e. the number of classes (e.g. objectsor materials) to be classified cannot be greaterthan the number of spectral bands. In orderto relax this limitation, we present a nonlin-ear band generation method to produce addi-tional spectral measurements. We find out thata simple multiplication operation can explorethe spectral discrepancy between original spec-tral measurements and improve the classifica-tion performance. We also demonstrate thatICA in conjunction with such a nonlinear bandgeneration method outperforms ISODATA, afrequently used unsupervised classification algo-rithm in remote sensing, because it uses higherorder statistics in the data that is more sensitiveto the features of objects/classes.


Several comments are noteworthy.

(1) Although the JADE algorithm is used inour experiments, the data dimensionalityexpansion approach is applicable to any ICAalgorithm that has the data dimensionalitylimitation problem.

(2) Here we deal with unsupervised classifica-tion problem, so we do not know how manyclasses ought to be classified in a real appli-cation. But when enough spectral bands aregenerated, we will find that some classifi-cation maps may only contain noise andsome just repeat similar classification resultas others. Some techniques are availablefor dimensionality estimation [Chang & Du,2004], but it is out of the scope of thispaper.

(3) The computational complexity of the ICAapproach generally is increased on the orderof N4, which is its major drawback. Hencein real applications we do not prefer addingtoo many artificial bands in order to achievethe balance between classification accuracyand computational complexity.

(4) The number of classification maps to begenerated is equal to the final data dimen-sionality. Classification maps are in a ran-dom order, so their interpretation has tobe made carefully. The ICA model is scal-ing and sign indeterminate since any scalarmultiplier β (sign is included) in one of thesources si can always be canceled by divid-ing the corresponding column ai of A by1/β. By displaying the classification mapsaccording to their relative gray levels (i.e.the pixel at the minimum value is in black,the pixel at the maximum value is in white,and others in a shade between black andwhite), the (positive) scalar factor |β| canbe easily offset. As for the sign factor, dis-playing the negative image of a classificationmap and then examining both positive andnegative images can help to select the onethat is more meaningful.

(5) The success of the data dimensionalityexpansion approach depends upon the infor-mation contained in the original data. For

instance, such a technique may not beable to help a RGB photograph since theinformation contained in the three originalcolor bands is too limited. But it can usuallyhelp a CIR image because the near-infraredband contains important information that isdifferent from color bands. In other words,this technique can only explore the informa-tion embedded in the original data, but notcreate completely new information, which isreasonable.

As future work, we will investigate if multi-plication using multiple bands can provide fur-ther improvement and conduct more tests onother data sets.

Acknowledgments

The authors wish to acknowledge the supportfrom Office of Naval Research (ONR) for thisstudy. The authors also thank Professor Chein-IChang at University of Maryland for provid-ing the SPOT and HYDICE data used in theexperiments.

References

Adams, J. B. and Smith, M. O. [1986] “Spectral mix-ture modeling: A new analysis of rock and soiltypes at the Viking Lander 1 suite,” Journal ofGeophysical Research 91(B8), 8098–8112.

Adams, J. B., Smith, M. O. and Gillespie, A. R.[1993] “Image spectroscopy: Interpretation basedon spectral mixture analysis,” In: C. M. Pietersand P. A. Englert (eds.) Remote GeochemicalAnalysis: Elemental and Mineralogical Composi-tion, Cambridge University Press, pp. 145–166.

Bell, A. J. and Sejnowski, T. J. [1996] “Aninformation-maximization approach to blind sep-aration and blind deconvolution,” Neural Compu-tation 7(6), 1129–1159.

Cardoso, J. F. and Souloumiac, A. [1993] “Blindbeamforming for non gaussian signals,” IEEProceedings-F 140(6), 362–370.

Cardoso, J. F. and Laheld, B. [1996] “Equivalentadaptive source separation,” IEEE Transactionson Signal Processing 44(12), 3017–3030.


Chang, C.-I., Chiang, S.-S., Smith, J. A. andGinsberg, I. W. [2002] “Linear spectral ran-dom mixture analysis for hyperspectral imagery,”IEEE Transactions on Geoscience and RemoteSensing 40(2), 375–392.

Chang, C.-I. and Du, Q. [2004] “Estimation ofnumber of spectrally distinct signal sources inhyperspectral imagery,” IEEE Transactions onGeoscience and Remote Sensing 42(3), 608–619.

Chen, C. H. and Zhang, X. [1999] “Independentcomponent analysis for remote sensing study,”Proceedings of SPIE 387, 150–158.

Cihocki, A., Unbehauen, R. and Rummert, E. [1994]“Robust learning algorithm for blind separationof signals,” Electronic Letters 28(21), 1986–1987.

Cichocki, A. and Unbehauen, R. [1996] “Robustneuralnetworks with online learning for blind identi-fication and blind separation of sources,” IEEETransactions on Circuit and Systems 43(11),894–906.

Comon, P. [1994] “Independent component analysis— A new concept?” Signal Processing 36, 287–314.

Du, Q. and Nekovei, R. [2005] “Implementation ofreal-time constrained linear discriminant analysisto remote sensing image classification,” PatternRecognition (in print).

Fiori, S. [2003] “Overview of independent compo-nent analysis technique with an application tosynthetic aperture radar (SAR) imagery process-ing,” Neural Networks 16, 453–467.

Hughes, G. F. [1968] “On the mean accuracy of sta-tistical pattern recognizers,” IEEE Transactionson Information Theory IT-14, 55–63.

Hyvarinen, A. and Oja, E. [1997] “A fast fixed-pointalgorithm for independent component analysis,”Neural Computation 9(7), 1483–1492.

Hyvarinen, A., Karhunen, J. and Oja, E. [2001] Inde-pendent Component Analysis, John Wiley & Sons.

Jutten, C. and Herault, J. [1991] “Blind separationof sources, Part I: An adaptive algorithm basedon neuromimetic architecture,” Signal Processing24(1), 1–10.

Kruse, F. A., Lefkoff, A. B., Boardman, J. W.,Heidebrecht, K. B., Shapiro, A. T., Barloon, P. J.,and Goetz, A. F. H. [1993] “The spec-tral image processing system (SIPS)-interactivevisualization and analysis of imaging spectrom-eter data,” Remote Sensing of Environment 44,145–163.

Landgrebe, D. [2002] “Hyperspectral image dataanalysis,” IEEE Signal Processing Magazine19(1), 17–28.

Lee, T. W., Girolami, M. and Sejnowski, T. J.[1999] “Independent component analysis using anextended infomax algorithm for mixed subgaus-sian and supergaussian sources,” Neural Compu-tation 11(2), 417–441.

Lotsch, A., Friedl, M. A. and Pinzon, J. [2003]“Spatio-temporal deconvolution of NDVI imagesequences using independent component anal-ysis,” IEEE Transactions on Geoscience andRemote Sensing 41(12), 2938–2942.

McCullagh, P. [1995] Tensor Methods in Statistics,London, Chapman & Hall.

Oja, E. [1997] “The nonlinear PCA learning rule inindependent component analysis,” Neurocomput-ing 17(1), 25–45.

Pham, D. T. [1997] “Blind separation of mixturesof independent sources through a quasimaximumlikelihood approach,” IEEE Tran. Signal Process-ing 45, 1712–1725.

Ren, H. and Chang, C.-I. [2000] “A generalizedorthogonal subspace projection approach to unsu-pervised multispectral image classification,” IEEETransactions on Geosciences and Remote Sensing38(6), 2515–2528.

Settle, J. J. and Drake, N. A. [1993] “Linear mix-ing and estimation of ground cover proportions,”International Journal of Remote Sensing 14,1159–1177.

Singer, R. B. and McCord, T. B. [1979] “Mars:Large scale mixing of bright and dark surfacematerials and implications for analysis of spectralreflectance,” Proc. 10th Lunar Planet. Sci. Conf.,pp. 1835–1848.

Szu, H. [1999] “Independent component analysis(ICA): An enabling technology for intelligentinformation/image technology (IIT),” IEEE Cir-cuits Devices Magazine 10, 14–37.

Szu, H. and Buss, J. [2000] “ICA neural net to refineremote sensing with multiple labels,” Proceedingsof SPIE 4056, 32–49.

Szu, H., Noel, S., Yim, S.-B., Willey, J. and Landa, J.[2003] “Multimedia authenticity protection withICA watermarking and digital bacteria vaccina-tion,” Neural Networks 16, 907–914.

Szu, H. [2004] “Geometric topology for informa-tion acquisition by means of smart sensor web,”International Journal of Information Acquisition1(1), 1–22.


Tu, T. M. [2000] “Unsupervised signature extractionand separation in hyperspectral images: A noise-adjusted fast independent component analysisapproach,” Optical Engineering 39(4), 897–906.

Unser, M. [1999] “Splines: A perfect fit for signal andimage processing,” IEEE Signal Processing Mag-azine 16(6), 22–38.

Yoshida, T. and Omatu, S. [2000] “Pattern recog-nition with neural networks,” Proceedings ofInternational Geoscience and Remote SensingSymposium, pp. 699–701.

INDEPENDENT COMPONENT ANALYSIS FOR CLASSIFYING MULTISPECTRAL IMAGES WITH DIMENSIONALITY LIMITATION

Documents

Transcript of INDEPENDENT COMPONENT ANALYSIS FOR CLASSIFYING MULTISPECTRAL IMAGES WITH DIMENSIONALITY LIMITATION