Automatic recognition of poleward moving auroras from all-sky image sequences based on HMM and SVM

Planetary and Space Science 69 (2012) 40–48

Contents lists available at SciVerse ScienceDirect

Planetary and Space Science

0032-06

http://d

n Corrnn Cor

E-m

huzejun

journal homepage: www.elsevier.com/locate/pss

Automatic recognition of poleward moving auroras from all-sky imagesequences based on HMM and SVM

Qiuju Yang a, Jimin Liang a,n, Zejun Hu b,nn, Zanyang Xing b,c, Heng Zhao a

a School of Life Sciences and Technology, Xidian University, Xi’an 710071, Chinab SOA Key Laboratory for Polar Science, Polar Research Institute of China, Shanghai 200136, Chinac School of Science, Xidian University, Xi’an 710071, China

a r t i c l e i n f o

Article history:

Received 12 November 2011

Received in revised form

18 April 2012

Accepted 20 April 2012Available online 30 April 2012

Keywords:

Poleward moving auroras (PMAs)

Hidden Markov model (HMM)

Support vector machine (SVM)

Performance metrics

Imbalance classification

33/$ - see front matter & 2012 Elsevier Ltd. A

x.doi.org/10.1016/j.pss.2012.04.008

esponding author. Tel.: þ86 29 81891070; fa

responding author. Tel.: þ86 21 58713682; f

ail addresses: [email protected] (J.

@pric.gov.cn (Z. Hu).

a b s t r a c t

We present an automatic method to recognize the poleward moving auroras (PMAs) from all-sky image

sequences. A simplified block matching algorithm combined with an orientation coding scheme and

histogram statistics strategy was utilized to estimate the auroral motion between interlaced images.

An all-sky image sequence was first modeled by hidden Markov models (HMMs) and then represented

by HMM similarities. The imbalanced classification problem, i.e., non-PMA events far outnumbering

PMA events, was addressed by the metric-driven biased support vector machine (SVM). The proposed

method was evaluated using auroral observations in 2003 at the Chinese Yellow River Station. Five days

observations were manually labeled as PMA or non-PMA events considering both the keogram and

all-sky image information. The supervised classification experiments were carried out and achieved

satisfactory results. We further detected PMAs from auroral observations in the remaining days and the

resultant double-peak occurrence distribution was compared with that of the well-known poleward

moving auroral forms (PMAFs).

& 2012 Elsevier Ltd. All rights reserved.

1. Introduction

1.1. Background and motivation

Poleward moving auroral forms (PMAFs) are one of the mostintensively investigated phenomena of the dayside auroras(Vorobjev et al., 1975; Sandholt et al., 1986, 1990; Fasel, 1995;Karlson et al., 1996; Zhang et al., 2010). They have been widelyaccepted as the typical ionospheric signatures of the pulseddayside magnetopause reconnection in the form of flux transferevents (FTEs) (Fasel, 1995). The PMAFs are usually observed bymeridian-scanning photometers or in the keogram displays ofall-sky images. There are various criteria, such as the physicalmechanism, lifetime, latitudinal width, longitudinal extent, quasi-periodicities etc., to define an auroral event as PMAFs (Fasel,1995; Kozlovsky and Kangas, 2002). However, not all auroras withthe poleward moving characteristics are identified as PMAFs.Milan et al. (1999) identified another class of quasi-continuous,short period (1–2 min) wave-like auroral emission. Kozlovsky andKangas (2002) observed the near-noon high-latitude polewardmoving auroral arcs. Their studies show that although possessing

ll rights reserved.

x: þ86 29 81891060.

ax: þ86 21 58715031.

Liang),

similar poleward moving characteristics, the physical mechan-isms of these auroras are quite different from that of PMAFs.Sandholt and Farrugia (2007a,b) found that the PMAF activitiesinitiated in the pre- and postnoon sectors are separated by amidday auroral brightening sequence (MABS) in a region of muchreduced plasma flow shear relative to that on either side of noon.MABSs possess distinct poleward moving signatures in keograms,but show essential differences from PMAFs both in spectrum andmorphology.

Because PMAFs are defined in keograms, previous studies usedto detect PMAFs directly in the keograms or meridian-scanningphotometer observations. The widely available all-sky auroralimages acquired from all-sky cameras/imagers (ASC/ASI) wereusually utilized to assist the analysis of PMAF events. The rapidaccumulation of large collections of ASC images and the inherentadvantages provided by all-sky image sequences vs. keogramshave led to our interest in PMAF events detection directly usingall-sky image sequences. Due to the fact that few investigationsare aiming at this problem, the definition of PMAFs in all-skyimage sequences is not yet clear. Therefore, in this paper, wepreliminarily focus on automatic detection of poleward movingauroras (PMAs) in all-sky image sequences, which providesauroral events for further classification and physical mechanismstudy. By using the term PMAs, we refer to three types of auroraevents: (1) PMAFs; (2) poleward moving structures displayed inkeograms but not categorized as PMAFs; and (3) poleward

www.elsevier.com/locate/pss

www.elsevier.com/locate/pss

dx.doi.org/10.1016/j.pss.2012.04.008

dx.doi.org/10.1016/j.pss.2012.04.008

dx.doi.org/10.1016/j.pss.2012.04.008

mailto:[email protected]

mailto:[email protected]

dx.doi.org/10.1016/j.pss.2012.04.008

Q. Yang et al. / Planetary and Space Science 69 (2012) 40–48 41

moving auroras in all-sky image sequences but not reflected inkeograms.

Manually detection of PMAs, especially in a large set of ASCimage sequences, is tedious and cumbersome. Our study aims toapply pattern recognition techniques to automatically recognizePMAs in all-sky image sequences. The challenges arise from twoaspects. First, Aurora is a complex natural phenomenon changinggradually but significantly over time, whose morphology andmotion does not typically exhibit uniform statistical character-istics. Due to the extremely variable nature of the aurora, PMAevents derived from all-sky image sequences differ dramaticallyin terms of shape, brightness, velocity and duration. Multiplemotion forms are likely to co-exist in different image areas.The mixture of poleward movement with other types of motiongreatly hampers the automatic recognition of PMAs. Second,PMAs do not occur frequently. Thus the number of non-PMAsevents in auroral observations is much greater than that of PMAsevents. This causes the imbalanced (or skewed) data classificationproblem where the negatives are much more pronounced thanthe positives, which is considered as one of ten challengingproblems in data mining research (Yang and Wu, 2006).

1.2. Outline of proposed approach

We present a three-level framework to detect PMAs in an all-sky image sequence. The low level includes image pre-processingand feature extraction, the middle level addresses modelingauroral sequences with hidden Markov models (HMMs) and thesimilarity representation based on HMMs, and the high level refersto making a binary classification with a metric-driven SVM classifier.Since the all-sky images were pre-processed in the same manneras in Wang et al. (2010), they are not described here.

1.2.1. Low level: feature extraction

Many motion estimation methods are proposed to determinethe pixel transformation from one image to another. They can beroughly categorized into pixel-based methods and feature-basedmethods. Feature-based methods first extract features of theimage, such as edges, corners, object boundaries or completeobjects and then analyze their correspondences to determine themotion. Pixel-based methods include transform domain methods,optical flow, block-matching algorithms, etc. Transform domainbased methods transform the motion in the space domain toother domains such as the frequency domain to estimate theparameters. Optical flow is an approximation of the local imagemotion and specifies how much each image pixel moves betweenadjacent images. Blixt et al. (2006) applied optical flow to estimatemotions in auroral image sequences. The aperture problem of theoptical flow algorithms requires some additional constraints (e.g.,constant luminance and spatial continuity) to reach a solution.However, being a non-rigid motion with the deformable andtransient brightness patterns implied, the constraints are alwaysviolated in the case of auroral motions. This is especially true forpoleward-moving dayside auroral forms rebrightening duringtheir lifetimes, which makes it even more difficult to meet theconstant luminance request. The block-matching algorithms areoften used to calculate the optical flow and to estimate the objectmotion. The idea behind block matching is to divide each frameinto a matrix of ‘macro blocks’ (Barjatya, 2004) and a motionvector is estimated for each block by finding its best match in theprevious frame within a predefined search window. In this study,based on the block matching algorithm, we present a featureextraction method to characterize the poleward motion of aur-oras by combining an orientation coding scheme and histogramstatistics strategy.

1.2.2. Middle level: HMM modeling and similarity-based

representation

The aurora evolution involves two distinct stochastic pro-cesses: one is the hidden collision of charged particles from theEarth’s magnetosphere and the solar wind, and the other is theresultant observable auroral activity displayed in the sky. Thisnaturally leads us to use the hidden Markov model (HMM)(Rabiner, 1989) to model the auroral event. An HMM is a doublystochastic process with an underlying stochastic process that isnot observable, but can be observed through another set ofstochastic processes that produce the sequence of observations.It is one of the most popular methods in machine learningand statistics for modeling sequences such as speech, gestures,proteins, etc. Another reason for choosing HMMs is that dealingwith variable length feature vectors is the advantage of HMMsover other traditional machine learning methods (Milone andDi Persia, 2007). It makes HMMs fit well to the task of PMAsrecognition since the PMA sequences have different lengths.

The HMM model can be used for the classification task byitself. However, the standard HMM-based approaches apply theMaximum Likelihood (ML) as their classification criterion. MLcriterion is evaluated using information from only one class anddoes not exploit information of other classes, so the discrimina-tory power of HMM is poor. Bicego et al. (2004) proposed asimilarity-based classification approach for HMM that improvesthe ML classification scheme. Objects were transformed into asimilarity space by calculating likelihoods with respect to a fewreference HMMs. Their experiments showed that a simple classi-fier like the k-nearest-neighbor (k-NN) in such space outperformsstandard HMM-based classification schemes. In this paper, weadopt the similarity-based approach to represent all-sky imagesequences.

1.2.3. High level: imbalanced classification and SVM classifier

The k-NN classifier suffers from its sensitivity to noisy exam-ples, so more global classifiers are necessary for complex pro-blems. Because k-NN classifies an object based on the majorityvotes from its neighbors, they are sensitive to skewed classdistribution. When applied for PMAs recognition, the testsequence tends to be assigned with the non-PMA class label.There are a number of solutions to the class imbalance problemboth at the data and algorithmic levels (Kotsiantis et al., 2006).At the data level, the solutions include many different forms ofover-sampling and under-sampling techniques. At the algorith-mic level, it is worth mentioning cost-sensitive learning, SVM,one-class classifiers and ensemble methods.

SVM is a powerful technique in the field of statistical learningtheory (Vapnik, 1998), which is a global classifier with strongdiscriminative ability and good generalization performance.Compared to other classifiers, SVM is more accurate on moder-ately imbalanced data. The reason is that only SVs are used forclassification and many majority samples far from the decisionboundary can be removed without affecting the classification(Akbani et al., 2004). To address the skewed class distributionproblem, an improved SVM classifier named biased SVM wasproposed simply through setting a different penalty factor to eachclass (Veropoulos et al., 1999). In this paper, based on the HMM-similarity representation, the biased SVM instead of k-NN isapplied for PMAs recognition.

The HMM similarity-based biased SVM classifier may beviewed as a two-level architecture combining HMM and biasedSVM. It inherits the dynamic modeling ability of HMM and therobust discriminating capability of SVM. Similar work has beendeveloped by researchers in different fields (Lee and Choi, 2003;Huang et al., 2006; Wan et al., 2007; Sloin and Burshtein, 2008;

Q. Yang et al. / Planetary and Space Science 69 (2012) 40–4842

Florin and Militaru, 2010). In these papers, the standard SVM wasadopted and the accuracy (recognition rate) was taken as theunique evaluation criterion. The goal of our study is to investigatethe efficiency of the hybrid HMM-SVM architecture for PMAsrecognition. This is a class imbalance problem as discussedpreviously. A frequented selection for evaluating classifier per-formance is the predictive accuracy, which may not be usefulwith imbalanced data since the majority class is metric biased.For instance, for a data set consisting of 95 majorities and only5 minorities, by assigning all data to the majority class, a 95%accuracy is achieved. A standard SVM classifier is designed tomaximize the accuracy of its model, so it will result in deterio-rated classification performance in the minority class. Balancedaccuracy (BAC), geometric mean (GMean), F1-score (F1S, namelythe F-measure Fb with b¼1), mean F1-score (MF1S) and AreaUnder ROC (AUC) are some metrics that have been widely usedfor evaluation of models on imbalanced domains. Naturally, it isbelieved that if the criteria appropriate for the imbalanced dataset are optimized directly, the resultant SVM classifiers will bemore suitable and robust for imbalance classification.

Some related works have been involved in the literature. Callutand Dupont (2005) introduced an SVM model which was opti-mized under Fb. Brefeld and Scheffer (2005) proposed to optimizean SVM in terms of AUC. Joachims (2005) presented to build SVMclassifiers under multivariate performance measures. Although itsounds appealing from a theoretical aspect, their works showedthat when SVM classifiers are not designed based on accuracy,many difficulties exist in solving the model parameters, which islikely to produce sub-optimal solutions and therefore degradesthe performance of classifiers. To avoid this problem, we designedvarious metrics optimized biased SVM classifiers simply byobtaining the optimum parameters corresponding to each metricbased on cross-validation method in the training process. Thismay be seen as conducting meta-learning on accuracy drivenbiased SVM models (Lin et al., 2010), therefore such a way issimple and easy for operation. In this study, five new parametricmodels of biased SVM were obtained, including BAC-SVM,GMean-SVM, F1S-SVM, MF1S-SVM and AUC-SVM. All of themwere compared with the accuracy driven biased SVM in terms ofthe above described metrics.

1.3. Contributions of this paper

We proposed an automatic method to recognize PMAs fromall-sky image sequence. Our contribution is two folds. First, wecharacterized the poleward auroral motions based on the block-matching algorithm along with an orientation coding scheme andhistogram statistics strategy, and the extracted features wereinput into HMMs as their observation vectors and modeled bythem. Second, the HMM-based similarity representation frame-work was included into our recognition system, and the biasedSVM classification was performed in the similarity space. To copewith the class imbalance problem, various metric-driven biasedSVMs were trained with the parameters obtained by cross-validation method. The proposed method was verified by super-vised classification and PMAs detection experiments using theASC observations of the winter in 2003 at the Chinese YellowRiver Station.

2. Feature extraction

2.1. Auroral data

The auroral data used in this paper were obtained from theChinese Yellow River Station (YRS) at Ny-Alesund, Svalbard (Hu

et al., 2009). YRS is located on geographic coordinates 78.921N,11.931E, which is one of the few stations that can make long-termoptical auroral observations in the dayside during the borealwinter seasons on Earth. The optical instruments at YRS havebeen put into use since 2003, which can capture photo-emissionsat 427.8, 557.7 and 630.0 nm simultaneously, and can make24-hour surveys of auroral emissions with a temporal resolutionof 10 s during the winter season from October to March of thefollowing year.

The data from December 2003 to January 2004 were investi-gated in the present work. Optical observations were not avail-able during cloudy or snowy conditions, therefore in this study20 d were selected altogether. We concentrated on the daysideaurora (03:00–15:00 UT/06:00–18:00 MLT) to avoid the influenceof daylight and only auroras at 557.7 nm were adopted. Thedataset was divided into two parts. Part one, named as dataset 1(DS1), consisted of manually labeled 137 PMA sequences and 414non-PMA sequences occurring during 5 d in December 2003. Parttwo, named as dataset 2 (DS2), was made up of auroral imagesacquired from the remaining 15 d between December 2003 andJanuary 2004. We first manually segmented the continuousauroral observations with 3 min duration (18 frames). In case ofpoor observations or lack-of-auroral observations, some ASCimage sequences were discontinuous. We merged the fragmentsinto the adjacent sequences, and finally obtained 3319 sequenceswith the lengths of 18–35 frames. The all-sky images werepre-processed in the same manner as in Wang et al. (2010). Allall-sky images were rotated counter-clockwise by 61.11 withgeomagnetic north being upward. Hence, PMAs were convertedto upward moving auroras in the resultant images.

2.2. Feature extraction

The initial motivation for the proposed feature extractionapproach came from Yuan (2008), where it was stated that smokeusually drifts upwards continually by hot airflows. Yuan proposeda fast accumulative motion orientation model based on anintegral image for video smoke detection. Auroras possess similarnon-rigid property as smoke. Furthermore, PMAs are also identi-fied by their unique moving direction, namely the upward driftingof auroral forms in all-sky image sequences. The features inYuan’s approach are accumulative results of all of the frames ina sequence, which were employed directly to make a decisionwhether it was a smoke video by comparing the ratio of theupward motion directions to all directions with a thresholddefined previously. However, the motion of the aurora is muchmore complex than smoke. As aforementioned, other movementtypes may also exist in PMA sequences. Yuan’s accumulativemethod may be misleading with regards to PMAs recognition.The threshold is determined heuristically and is crucial in deter-mining the final results. Inspired by Yuan’s work, we proposed anew feature to recognize PMAs based on the block matchingalgorithm combined with a special orientation coding scheme andhistogram statistics strategy. Its schematic is illustrated in Fig. 1.The features were extracted in every individual frames and thenfed into HMMs, which includes three steps.

Step 1: Down-sampling and smoothing. All-sky images weresplit up into blocks the size of 4�4 (heuristically chosen) pixels.To reduce noise disturbance, each block was expressed as a pointby the sum of the pixel values within it:

f ðx,yÞ ¼X

i,jAbðx,yÞ

Iði,jÞ, i,jA ½1,440�, x,yA ½1,110�, ð1Þ

where I(i,j) represents an all-sky image of size 440�440, b(x,y) isthe block in the xth row and yth column and f(x,y) is the resultantimage of size 110�100.

Fig. 1. Schematic of the proposed feature extraction method.


Step 2: Simplified block matching. Block matching operation inthe original ASC images degenerates to point matching in thedown-sampled images. It is reasonable to use the Full Search (FS)strategy still with efficiency. After analyzing numerous all-skyimage sequences, we found the block size of 7�7 and interlacedframe matching are good choices to estimate the auroral motion(See Fig. 1(c)). The error function for matching is defined as

Eðx,yÞ ¼ 9f ðx,y,tÞ�f ðx,y,t�2Þ9: ð2Þ

Our objective is to estimate motion orientation, but not themotion magnitude. For computational efficiency, we non-uni-formly discretized the motion orientation into 33 directions andcoded them from 0 to 32. Fig. 1(d) gives the code value table fordifferent motion directions. Blocks with very small motion mag-nitude were coded as 0 and considered as stationary.

Step 3: Motion feature representation. The histogram of thecoded motion directions was used to statistically represent theauroral motion in each all-sky image. The stationary code (c¼0)was excluded when calculating the histogram.

3. HMM-based auroral sequence modeling

3.1. HMM design

An HMM is distinguished from a general Markov model in thatthe states in an HMM cannot be observed directly and can only beestimated through another set of stochastic processes that pro-duce the sequence of observations. Assume that the total numberof states is N and the number of distinct observations per state isM, and let S¼{s1,y,sN} and V¼{v1,y,vM} denote the state set andobservation set respectively. An HMM l can be formally char-acterized by l¼(p,A,B), where p indicates a vector of initial stateprobabilities, A is the transition matrix representing the prob-abilities of transition between states, and B is a matrix ofobservation probability densities which describe the probabilityfunction of symbol emission from each state. HMM theory willnot be covered in detail here, a comprehensive description pleaserefer to Rabiner (1989).

In this study, we used the aforementioned histogram featuresof each all-sky sequence to estimate its HMM model l¼(p,A,B).The HMM observation V¼{v1,y,vM} was the histogram featuresof the image sequence, where vi, i¼1,...,M was the histogramfeature of each frame, and M was the frame number. The imagesequence was divided into 3 sections of about the same numberof frames. The mean feature vectors of the sections were set as theinitial states of HMM, thus the hidden state number N¼3 in thisstudy. The parameter B¼{bi(k)} was defined as a function of thedistances between observations and hidden states, which was

given as

biðkÞ ¼ a=dðiÞUexpð�dðvk,siÞ=dðiÞÞ, ð3Þ

where d(vk,si) is the Chi-square (w2) histogram distance betweenthe frame feature vk and state si. a is a coefficient less than d(i),which reflects the aggregation degree of the state (Chen et al.,2009):

dðiÞ ¼X

vk ACi

dðvk,siÞ=ni, ð4Þ

where Ci represents the ith section, and ni is the number of framesin ith section. The only requirement of initializing transitionmatrix A is the left–right HMM structure (topology). The initialstate of HMM was set to the first state. The Baum–Welchalgorithm (Rabiner, 1989) was adopted to re-estimate p and A.The Viterbi algorithm (Rabiner, 1989) was utilized to obtain themost probable path, with which the states and the observationprobability matrix B were then updated. The iteration processeswere repeated several times until convergence. In the recognitionprocess, we first extracted the feature vectors of the test auroralsequences. By calculating their distances from the states of thetraining models, the new emission probabilities were acquired.The recognition result P(O9l) can be readily estimated using theBaum–Welch algorithm or the Viterbi algorithm, which is ameasure of the matching degree between the probe sequenceand the training model.

3.2. Similarity-based representation

The general idea of similarity-based representation is to builda new representation space using the similarity values betweenobjects and a predetermined set of objects called the representa-tives set. Subsequent processing (e.g. classification or clustering)is performed in that space. This approach is helpful to make use ofall of the information contained in the data set. Specifically, if twosequences, say Oi and Oj, present similar degrees of similarity toseveral other sequences, for example if they are both very similarto some sequences, and also both very dissimilar to some othersequences, this enforces the hypothesis that Oi and Oj belong tothe same class. Pekalska and Duin (Pekalska and Duin, 2001,2002; Pekalska et al., 2002) and Bicego (Bicego et al., 2004;Bicego, 2005) are very active researchers in this area, havingproduced numerous papers on this topic. The key issue of thesimilarity-based approach is the choice of the representative set.For simplicity, we adopted the ‘‘one per sequence’’ approach(Bicego et al., 2004) combined with HMMs. Each sequence inthe dataset is firstly used to train an HMM, and then each one isrepresented by a similarity vector which consists of the HMMscores calculated with the Viterbi algorithm between thesequence and all the models. Concretely, an arbitrary sequence


Oi in a dataset comprised of m sequences is expressed as

FðOiÞ ¼ log PðOi9l1Þ=Ti

h

log PðOi9l2Þ=Ti

:::

log PðOi9lmÞ=Ti

�, ð5Þ

where li is the HMM model estimated from sequence Oi and Ti thelength of sequence Oi.

4. Metrics for imbalanced classification

The results of binary classification are usually demonstratedby a confusion matrix as shown in Table 1, which recordscorrectly and incorrectly recognized instances for each class.Except for the accuracy (Eq. (6)), there are some comprehensivemetrics for an imbalanced system comprised of a few atomicmetrics. BAC (Eq. (11)) and GMean (Eq. (12)) are the geometricmean and arithmetic mean of sensitivity (Eq. (7)) and specificity(Eq. (8)). By integrating precision and recall, the F1-score atEq. (13) is another popular metric proposed to evaluate biasedsystem performance. Since both precision and recall focus on thepositive class, the F1-score is still a biased metric to some extent.MF1S at Eq. (14) is then introduced to eliminate the effect byadding F1S�1, namely the F1-score calculated from the negativeclass (Ferri et al., 2009).

accuracy¼ ðTPþTNÞ=ðTPþFPþFNþTNÞ ð6Þ

precision¼ TP=ðTPþFPÞ ð7Þ

recall¼ TP=ðTPþFNÞ ð8Þ

sensitivity¼ TP=ðTPþFNÞ ¼ recall ð9Þ

specif icity¼ TN=ðTNþFPÞ ð10Þ

BAC ¼ ðsensitivityþspecif icityÞ=2 ð11Þ

GMean¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffisensitivity� specif icity

pð12Þ

F1S¼ 2� precision� recall=ðprecisionþrecallÞ ð13Þ

MF1S¼ ðF1SþF1S�1Þ=2 ð14Þ

Another popular and visual metric for imbalanced classifica-tion is the receiver operating characteristic (ROC) curve, which isa two-dimensional plot of sensitivity vs. 1-specificity for allpossible thresholds. An ROC graph depicts relative trade-offsbetween benefits (true positives) and costs (false positives)(Fawcett, 2006). For convenient comparison, the ROC performanceis often reduced to a single scalar value, namely the area underthe ROC curve (AUC). There are many methods for calculating theAUC value. Hand and Till (2001) provided a very straightforwardway to estimate AUC with no mention of the given thresholds.

AUC¼S1�n1ðn1þ1Þ=2

n1n�1, ð15Þ

Table 1Confusion matrix.

Predicted positives Predicted negatives

Real positives True positive (TP) False negative (FN)

Real negatives False positive (FP) True negative (TN)

where n1 and n�1 represent the number of positive and negativeexamples respectively. Parameter S1 is defined as

S1 ¼Xn1

i ¼ 1ri, ð16Þ

with ri being the rank of the ith positive example in the data set.

5. SVM classification

5.1. Biased-SVM

The main idea of SVM is to separate examples with a lineardecision surface and maximize the margin of separation betweenthe two different classes. Usually we account for the imbalancedclass distribution by means of pre-specifying different penaltyconstants Cþ and C� on Lagrange multipliers for different classesto adjust the cost of false positives vs. false negatives (Morik et al.,1999; Veropoulos et al., 1999). Thus the objective functionbecomes

minw,wb

12:w:2

þCþXfi9yi ¼ þ1g

xiþC�Xfi9yi ¼ �1g

xi: ð17Þ

This is called biased SVM and is incorporated in the SVM solverLIBSVM (Chang and Lin, 2011) which we used for our study. Theasymmetric penalties are always set to ensure that the totalpenalty is equal for the two classes, i.e.

CþNþ ¼ C�N�, ð18Þ

where Nþ and N� are the number of examples belonging to thepositive and negative classes respectively. By increasing themargin on the side of the minority class, this method provides away to induce a decision boundary which is much more distantfrom the ‘‘critical’’ class than it is from the other (Wang andJapkowicz, 2009).

5.2. SVM under various metrics

As described above, standard SVMs are optimized by accuracy(minimization of the overall error) which is not an appropriatemeasure for imbalanced data sets. Various metrics driven biasedSVMs were then presented, which was implemented by conduct-ing the meta-learning method on a standard SVM model in thispaper. Specific procedures are as follows, which are repeatedmultiple times to reach a reliable result. Let the train set and testset at each time be denoted as Strain and Stest respectively.

Step 1: Selecting the forms of the kernel function and designatea set of parameters. For example, if the RBF kernel is chosen,parameter g of the RBF kernel and the regularization constant C

may be taken into account since both have a strong influence onSVM performance.

Step 2: Based on the set of parameters, v-fold cross-validationexperiments are taken from the Strain. That is, after obtaining theSVM model by using (v�1)/v from the Strain data, the remaining 1/v of the Strain data is used for SVM prediction and various metricvalues are calculated. This procedure is repeated v times and theaverage results are computed.

Step 3: Based on the grid-search method, by repeating Step 2,the optimum model parameters corresponding to each metric areobtained.

Step 4: By utilizing the optimum model parameters under eachmetric to train the SVM models based on the whole Strain data,SVM classifiers driven by different metrics are obtained. We callthem nn–SVM, where nn refers to a metric described in Section 4.

Step 5: Predicting the labels of the Stest data based on themodels obtained in Step 4, we calculate various metrics corre-sponding to each model.


6. Experiments and results

6.1. Data labeling

Continuous all-sky image frames containing poleward movingstructures were picked out from observations and labeled as aPMA sequence. Keograms were also referred when labeling,where PMAFs are characterized as long, northward pointingstructures slanted toward the increasing time. The durations ofthe labeled PMA events are between 2–18 min. Non-PMA eventswere later picked from the remaining observations and each onewas separated by considering both the morphology and motioninformation in all-sky image sequences. We labeled 137 PMAsequences and 414 non-PMA sequences from five days worth ofauroral observations in December 2003 at YRS.

6.2. Supervised binary classification

The ten times ten-fold cross-validation method was applied totest the performance of various algorithms on DS1. The experimentcontains two parts.

Fig. 2. Comparisons of the proposed algorithm and some existing methods.

Table 2List of parameters.

Parameters Values Meaning

g 2�20:20.5:24 RBF kernel parameterC1 C1¼u C2, u¼1, IR Regularization constant of PMAs IR¼

(# non-PMAs)/(# PMAs)

C2 2�10:20.5:215 Regularization constant of non-PMAs

v 5 v-fold cross-validation

Fig. 3. Comparisons between differen

6.2.1. Comparing the proposed algorithm with existing methods

In this part, the proposed HMM similarity-based biased-SVMmethod (HMMþbSVM) was compared with Yuan’s method(Yuan, 2008), the HMM classifier and HMM similarity-basedk-NN (k¼3) classifier (HMMþkNN) (Bicego et al., 2004). In thecase of HMM classification, HMM model parameters are esti-mated from multiple training sequences (Davis and Lovell, 2002)in order to ensure their robustness. Two models were built forPMAs and non-PMAs respectively, and an all-sky image sequencewas classified into the class whose model shows the higherprobability (likelihood) of having generated this sequence. Theweighted Euclidean distance was used to improve the perfor-mance of the HMM similarity-based k-NN classifier, with a weightvalue wi ¼ expðlog PðO9liÞ=TÞ. SVM classifiers were built using theLIBSVM package. A good kernel function K(x,y) is very critical forSVMs which is in fact a measure of the data similarity. Consider-ing that the auroral sequences had been represented by thesimilarity vectors composed of HMM scores, we focused on theRBF kernel. The default values of LIBSVM were directly set to thekernel parameter g and regularization constant C. Accuracy (ACC),BAC, GMean, F1-score (F1S), mean F1-score (MF1S) and AUC wereadopted to quantitatively evaluate the performance, as depictedin Fig. 2.

Fig. 2 shows that: (1) The simple accumulative methodproposed by Yuan (2008) cannot implement PMAs recognitionbecause the motion of the auroras are much more complex thansmoke. In addition to poleward movement, many other movingtypes also exist in a PMA sequence. The accumulative approachloses effectiveness on this problem and complicated models areneeded. (2) HMM-based k-NN is not superior to the conventionalHMM in this experiment. This does not mean the similarity-basedmethod proposed by Bicego et al. (2004) is not good, just becausethe imbalanced data sets have detrimental effects on the perfor-mance of the k-NN classifier. For HMMs, though, there shouldalways be a sufficient number of examples from each class totrain a robust model, so that they are not so sensitive toimbalanced class distributions. This is somewhat like the reasonof SVM countervailing imbalanced distributions based on SVs.(3) Even though just the default values are being used, theproposed HMM-based biased SVM performs much better thanother algorithms overall. Nevertheless, on some metrics such asBAC and GMean, the advantage is not very obvious. This may becaused by that the biased SVM classifier is designed with accuracymaximization as discussed above.

6.2.2. Comparisons between various parametric SVMs

In the second part, the accuracy driven SVM classifier wascompared with five parametric SVMs in the HMM-based similarity

t metrics driven SVM classifiers.


representation space, which were optimized by BAC, GMean, F1-score, mean F1-score and AUC respectively. The parameters corre-sponding to each SVM were obtained by a grid-search method inthe SVM training process, as given in Table 2. The results shownin Fig. 3 demonstrate that a conventional accuracy driven SVMclassifier has the greatest accuracy and obvious bias toward themajority class (non-PMAs), with a high TNR and low TPR being itstypical behaviors. Other parametric SVMs can rectify the ‘‘bias’’ invarying degrees. AUC-based SVM behaves very badly except on themetric of AUC, which may attribute to that it is a rank metric.The much higher TPR and much lower TNR of the F1-measure ascompared to the mean F1-score indicate that the former focusesmore on positive class than the latter. GMean-SVM and BAC-SVMoutperform other metrics driven SVM.

Fig. 4. Distribution of detected PMAs versus MLT.

Fig. 5. Detection results shown in keograms. The white lines indicate the time duration

combining information from three wavelengths in the keograms. (For interpretation of t

of this article.)

6.3. PMAs-detection experiments

The above supervised experiments proved that the proposedalgorithm has good performance for PMAs recognition. Theauroral observations used in the aforementioned experimentswere manually labeled into PMAs and non-PMAs, which iscumbersome and impractical when abundant instances arerequired for a statistical study. The PMAs in a large dataset areexpected to be automatically recognized. Therefore, we furtherinvestigated the effectiveness of our proposed method byattempting to detect PMAs in AS2 using all the labeled 551sequences in AS1 as the training set. Using the best SVM classifieridentified in the supervised experiments, namely the GMean-based SVM, 909 PMAs and 2410 non-PMAs sequences wererecognized in AS2. The ratio of PMAs to non-PMAs is 909/2410¼37.7%, which agrees with the manual labeling result inAS1 (137/414¼33.1%). Fig. 4 gives the distribution of the detectedPMAs with the temporal axis being divided into 240 bins with3 min duration. It is apparent that the occurrence of PMAs has adistinct double-peak distribution in the dayside sectors, peakingaround pre-noon 0900 MLT and post-noon 1500 MLT. The double-peak distribution is in accordance with previous statistical studiesin Northern Hemisphere that there is a significant decrease inPAMF activity at magnetic noon (Fasel, 1995). Recent studiesalso found that PMAFs occurred at the pre-noon and post-noonsectors are often separated by a less intense auroral band aroundnoon, which is often referred to as the ‘‘midday gap aurora’’(Cogger et al., 1977; Sandholt and Farrugia, 2003, 2007a,b).The occurrence of PMAs is approximate to that of PMAFs, becausethe PMAFs account for the majority of PMAs (Fig. 5 will illustratethis). However, there are some differences between their distri-butions since PMAs include three event types, as described inSection 1.1. In the following, we further investigated the relation-ship between the PMAs detected by the proposed algorithm inall-sky image sequences and the manually labeled PMAFs inkeograms.

of PMAs obtained by our algorithm and the pink lines are PMAFs labeled by experts

he references to color in this figure legend, the reader is referred to the web version

Fig. 6. Examples of omitted PMAs in keograms.


Fig. 5 illustrates some of the comparison results in keograms,where the white lines indicate the time periods of PMAs detectedby the proposed method and the pink lines are PMAFs labeled byexperts combining information from three wavelengths in thekeograms. For a clear display, each subfigure depicts only threehours of auroras. In Fig. 5, the detected PMAs include almost all ofthe labeled PMAFs while some detected PMAs have no indicationin the keograms. This indicates that not all the poleward movingstructures are PMAFs. The fact that most detected PMAs arereflected in keograms demonstrates that keogram is an effectivetool for analyzing auroral activities. The omitted PMAs in keo-grams mainly occurred on 3 January 2004. Four events areillustrated in Fig. 6, which occurred between 0421–0426 UT,0533–0540 UT, 0609–0613 UT and 0642–0645 UT respectively.We can clearly see that poleward moving auroral forms exist ineach all-sky image sequence, which further proves the validityand robustness of our algorithm in detecting poleward structuresin all-sky image sequences.

The start time and end time of PMAs are not preciselydetermined in our experiments, because we segmented theauroral observations with a fixed length and the motion of thesequence was estimated by its majority frames. Thus some

detected results have prolonged durations, such as 0445–0527 UT on Jan. 03 and 1027–1106 UT on Jan. 01, which doesnot mean a PMA event lasted such a long time; instead, PMAsexisted in each segmented sequence.

7. Conclusion and future work

The automatic recognition of PMAs from a series of all-skyimage sequences is addressed in this paper. It is an important newidea, which on one hand, all-sky image sequences take theadvantage of much more information than keograms; on the otherhand, the automatic method provides the possibility for providingabundant cases in the huge auroral observations for statisticalstudies. The hybrid HMM-SVM architecture integrates thedynamic modeling ability of HMM and the robust discriminatingcapability of SVM. The problem of non-PMAs far outnumberingPMAs is emphasized and resolved by designing metric-drivenSVMs. The validity of the proposed algorithm was verified bysupervised classification and PMAs detection experiments.

The automatic auroral sequence analysis is still in its earlystage. Because the start time, end time, and lifetime of PMAs have


a close relation with their physical mechanism, our future workincludes automatic segmentation of auroral sequences for precisedetermination of the duration of PMAs. Automatic segmentationcan also avoid the manual process which is prone to inconsistentresults due to varying levels of concentration. This study focusedon the auroral observations at 557.7 nm between years 2003 and2004. More observations and multiple wavelength observationsmay be taken into account in the following study.

Acknowledgments

This study was supported by the National Natural ScienceFoundation of China (60872154, 60902083, 41031064, 40904041,40974103), the Ocean Public Welfare Scientific Research Project,State Oceanic Administration People’s Republic of China (no.201005017), and the Youth Fund of State Oceanic AdministrationPeople’s Republic of China (2010613).

References

Akbani, R., Kwek, S., Japkowicz, N., 2004. Applying support vector machines toimbalanced data sets. In: Proceedings of the 15th European Conference onMachine Learning. 39–50.

Barjatya, A., 2004. Block Matching Algorithms for Motion Estimation, TechnicalReport. Utah State University.

Bicego, M., Murino, V., Figueiredo, M.A.T., 2004. Similarity-based classification ofsequences using hidden Markov models. Pattern Recognition 37, 2281–2291.

Bicego, M., 2005. Odor classification using similarity-based representation.Sensors and Actuators B: Chemical 110, 225–230.

Blixt, E.M., Semeter, J., Ivchenko, N., 2006. Optical flow analysis of the Auroraborealis. IEEE Transactions on Geoscience and Remote Sensing 3, 159–163.

Brefeld, U., Scheffer, T., 2005. AUC maximizing support vector learning. In:Proceedings of the 22nd International Conference on Machine Learning—

Workshop on ROC Analysis in Machine Learning.Callut, J., Dupont, P., 2005. Fb Support vector machines. In: Proceedings of the

International Joint Conference on Neural Networks, 1443–1448.Chang, C.-C., Lin, C.-J., 2011. LIBSVM: a library for support vector machines. ACM

Transactions on Intelligent Systems and Technology. 2:27:1–27:27.Chen, C., Liang, J., Zhao, H., Hu, H., Tian, J., 2009. Frame difference energy image for

gait recognition with incomplete silhouettes. Pattern Recognition Letters 30,977–984.

Cogger, L.L., Murphree, J.S., Ismail, S., Anger, C.D., 1977. Characteristics of dayside5577A and 3914A aurora. Geophysical Research Letters 4, 413–416.

Davis, R.I.A., Lovell, B.C., 2002. Improved estimation of hidden Markov modelparameters from multiple observation sequences. In: Proceedings of theInternational Conference on Pattern Recognition. 168–171.

Fasel, G.J., 1995. Dayside poleward moving auroral forms: a statistical study.Journal of Geophysical Research 100 (A7), 11891–11905.

Fawcett, T., 2006. An introduction to ROC analysis. Pattern Recognition Letters,861–874.

Ferri, C., Herna!ndez-Orallo, J., Modroiu, R., 2009. An experimental comparison ofperformance measures for classification. Pattern Recognition Letters 30,27–38.

Florin, R., Militaru, D., 2010. A HMM/SVM hybrid method for speaker Verification.In: Proceedings of the 8th Communications International Conference. 111–114.

Hand, D.J., Till, R.J., 2001. A simple generalization of the area under the ROC curveto multiple class classification problems. Machine Learning 45, 171–186.

Huang, B.Q., Du, C.J., Zhang, Y.B., Kechadi, M., 2006. A hybrid HMM-SVM method foronline handwriting symbol recognition. In: Proceedings of the InternationalConference on Intelligent Systems Design and Applications (ISDA). 887–891.

Hu, Z.-J., Yang, H., Huang, D., Araki, T., Sato, N., Taguchi, M., Seran, E., Hu, H., Liu, R.,Zhang, B., Han, D., Chen, Z., Zhang, Q., Liang, J., Liu, S., 2009. Synopticdistribution of dayside aurora: multiple-wavelength all-sky observation atYellow River Station in Ny-Alesund, Svalbard. Journal of Atmospheric andSolar-Terrestrial Physics 71, 794–804.

Joachims, T., 2005. A support vector method for multivariate performancemeasures. In: Proceedings of the International Conference on Machine Learn-ing (ICML). 377–384.

Karlson, K.A., Øieroset, M., Moen, J., Sandholt, P.E., 1996. A statistical study of fluxtransfer event signatures in the dayside aurora: the IMF by-related prenoon-postnoon asymmetry. Journal of Geophysical Research 101 (A1), 59–68.

Kotsiantis, S., Kanellopoulos, D., Pintelas, P., 2006. Handling imbalanced datasets: areview. GESTS International Transactions on Computer Science and Engineer-ing 30, 25–36.

Kozlovsky, A., Kangas, J., 2002. Motion and origin of noon high-latitude polewardmoving auroral arcs on closed magnetic field lines. Journal of GeophysicalResearch 107 (A2), 1017.

Lee, H., Choi, S., 2003. PCAþHMMþSVM for EEG pattern classification. In:Proceedings of the International Symposium on Signal Processing and itsApplications, 1, 541–544.

Lin, Z., Hao, Z., Yang, X., 2010. Effects of several evaluation metrics on imbalanceddata learning. Journal of South China University of Technology (NaturalScience Edition) 38, 147–154.

Milan, S.E., Yeoman, T.K., Lester, M., Moen, J., Sandholt, P.E., 1999. Post-noon two-minute period pulsating aurora and their relationship to the dayside convec-tion pattern. Annales Geophysicae 17, 877.

Milone, D.H., Di Persia, L.E., 2007. An EM algorithm to learn sequences in thewavelet domain, In: Lecture Notes in Computer Science. vol. 4827. 518–528.

Morik, K., Brockhausen, P., Joachims, T., 1999. Combining statistical learning with aknowledge-based approach—a case study in intensive care monitoring. In:Proceedings of the 16th ICML. 268–277.

Pekalska, E., Duin, R., 2001. Automatic pattern recognition by similarity represen-tations. Electronics Letters 37, 159–160.

Pekalska, E., Duin, R., 2002. Dissimilarity representations allow for building goodclassifiers. Pattern Recognition Letters 23, 943–956.

Pekalska, E., Paclik, P., Duin, R., 2002. A generalized kernel approach to dissim-ilarity-based classification. Journal of Machine Learning Research 2, 175–211.

Rabiner, L., 1989. A tutorial on Hidden Markov Models and selected applications inspeech recognition. Proceedings of the IEEE 77, 257–285.

Sandholt, P.E., Farrugia, C.J., 2003. Does the aurora provide evidence for theoccurrence of antiparallel magnetopause reconnection? Journal of GeophysicalResearch 108 (A12). 2003JA010066.

Sandholt, P.E., Farrugia, C.J., 2007a. Role of poleward moving auroral forms in thedawn-dusk auroral precipitation asymmetries induced by IMF By. Journal ofGeophysical Research 112 (A4), A04203.

Sandholt, P.E., Farrugia, C.J., 2007b. Poleward moving auroral forms (PMAFs)revisited: responses of aurorae, plasma convection and Birkeland currents inthe pre- and postnoon sectors under positive and negative IMF BY conditions.Annales Geophysicae 25, 1629–1652.

Sandholt, P.E., Lockwood, M., Freeman, K.S.C., Oguti, T., Cowley, S.W.H., 1990.Midday auroral breakup events and related energy and momentum transferfrom the magnetosheath. Journal of Geophysical Research 95, 1039–1060.

Sandholt, P.E., Deehr, C.S., Egeland, A., Lybekk, B., Viereck, R., Romick, G.J., 1986.Signatures in the dayside aurora of plasma transfer from the magnetosheath.Journal of Geophysical Research 91, 10,063.

Sloin, A., Burshtein, D., 2008. Support vector machine training for improved hiddenMarkov modeling. IEEE Transactions on Signal Processing 56, 172–188.

Vapnik, V.N., 1998. Statistical Learning Theory: Inference from Small Samples.Veropoulos, K., Campbell, C., Cristianini, N., 1999. Controlling the sensitivity of

support vector machines. International Joint Conferences on Artificial Intelli-gence, 55–60.

Vorobjev, V.G., Starkov, G.V., Gustafsson, G., Feldshtein, Ia.I., Shevnina, N.F., 1975.Dynamics of day and night aurora during substorms. Planetary Space andScience 23, 269–278.

Wang, B.X., Japkowicz, N., 2009. Boosting support vector machines for imbalanceddata sets. Knowledge and Information Systems 25, 1–20.

Wang, Q., Liang, J.M., Hu, Z.-J., Hu, H.-H., Zhao, H., Hu, H.-Q., Gao, X.-B., Yang, H.-G.,2010. Spatial texture based automatic classification of dayside Aurora in all-sky images. Journal of Atmospheric and Solar-Terrestrial Physics 72, 498–508.

Wan, W. Liu, H., Wang, L., Shi, G., Li, W.J., 2007. A hybrid HMM/SVM classifier formotion recognition using uIMU data. In: Proceedings of the IEEE InternationalConference on ROBIO, Sanya, Hainan, China, 15–18 December 2007. 115–120.

Yang, Q., Wu, X., 2006. 10 challenging problems in data mining research.International Journal of Information Technology and Decision Making 5,597–604.

Yuan, F., 2008. A fast accumulative motion orientation model based on integralimage for video smoke detection. Pattern Recognition Letters 29, 925–932.

Zhang, Q.-H., Dunlop, M.W., Lockwood, M., Liu, R.-Y., Hu, H.-Q., Yang, H.-G., Hu,Z.-J., Bogdanova, Y.V., Shen, C., Zhang, B.-C., Han, D.-S., Liu, S.-L., McCrea, I.W.,Lester, M., 2010. Simultaneous observations of reconnection pulses at Clusterand their effects on the cusp aurora observed at the Chinese Yellow RiverStation. Journal of Geophysical Research 115, A10237, http://dx.doi.org/10.1029/2010JA015526.

dx.doi.org/10.1029/2010JA015526

dx.doi.org/10.1029/2010JA015526

dx.doi.org/10.1029/2010JA015526

dx.doi.org/10.1029/2010JA015526

Automatic recognition of poleward moving auroras from all-sky image sequences based on HMM and SVM

Documents

Transcript of Automatic recognition of poleward moving auroras from all-sky image sequences based on HMM and SVM