Evolutionary Training of SVM for Multiple Category Classification Problems with Self-adaptive...

Evolutionary Training of SVM for Multiple

Category Classification Problems with Self-

Adaptive Parameters

ANGEL KURI-MORALES

Instituto Tecnológico Autónomo de México

[email protected]

IVÁN MEJÍA-GUEVARA

Universidad Nacional Autónoma de México

[email protected]

Evolutionary Training of SVM for Multiple CategoryClassification Problems with Self-Adaptive Parameters

Contents

I. SVM and GA

1. SVM

2. VGA

II. Multiple Category Classification

III. Experiments

IV. Conclusions.

I. SVM and GA

1. SVM

K(x,x1)

K(x,x2)

K(x,xm1)

•

•

•

•

•

•

y

Training Data

x

x1

x2

xm0

b

Hiden Layer of m1

kernels.Layer of m0 input

data

Linear

outputs

Neural

Output

Network Architecture of SVM

• SVMs are NNs witha hidden layer ofnonlinear units

• The election of the“correct” architectu-re is not an issuethat determines theproper functionalityof this method as ithappens in otherNNs (i.e. percept-rons)


I. SVM and GA

1. SVM• There are many linear functionswhich solve the problem oflinearly separable patterns

r

ρ

wxT+b=0wxT+b>0

wxT+b<0

• SVMs find the optimal hyper-plane (OHP): the one thatmaximizes the margin ofseparation between the nearestpoints of both classes


I. SVM and GA

1. SVM

ξi

ξi

• For nonlinearly separable patterns a set of slack {ξi}

variables are defined in orderto identify the number ofpoints of missclassification


I. SVM and GA

1. SVM

Φ: x→ φ(x)

• In this case, SVMs solve the problem mapping the inputspace into a higher dimentional feature space and finding a linear function that separates the classes in that space


I. SVM and GA

1. SVM

• SVM finds the OHP in the featurespace by solving a QuadraticOptimization (QO) problem

• The “dual” problem is a betterway to optimize because, amongother things, the nonzero optimalpoints correspond to the supportvectors

• The regularization parameter (C) reflects a trade off betweenacceptable missclassification and the power of generalization of thenetwork

Primal Problem:

( )

( ) ,...,N2, 1 iξ1bxwd

s. t.:

ξ Cww21

w,ξMin Φ

iiT

i

N

1ii

T

=−≥+

+= ∑=

( )

N,...,2,1i C0

0d

:.t.s

)x,x(Kdd21

Q Max

i

N

1iii

N

1i

N

1jjijiji

N

1ii

=≤α≤

=α

αα−α=α

∑

∑∑∑

=

= ==α

Dual Problem:


I. SVM and GA

1. SVM

C Parameter:

� It is traditionally set by user

� C controls the tradeoff between the complexity of the machine

and the number of non-separable points; it may, therefore, be

viewed as a form of a “regularization” parameter.


I. SVM and GA

2. VGA

� GAs have been successfully used to solve numerical (and non-

numerical) optimization problems

� In our experiments every variable was expressed in fixed point format

� In order for C to be genetically determined, it was included in the

genome as shown:

20 bits4 bits1 bit

DecimalIntegerSign

αi


I. SVM and GA

2. VGA

� Once the initial population is generated Vasconcelos’

model is used. Basically, this model considers:

� Deterministic coupling

� Annular Crossover

� Full elitism


Deterministic Coupling


Annular Crossover


Full Elitism


I. SVM and GA

2. VGA

� GAs were originally conceived to solve unconstrainedoptimization problems

� However, SVM’s dual problem is a constrained optimization

problem

� Hence, it is neccesary to modify the problem in order for it to be

solved with a GA

� A penalty function F(x) is chosen to transform the original

maximization constrained problem into a non-constrained one

( ) ( )

≠−

−= ∑

=

otherwise 0

1

tsxft

ZZ

xF

s

i


( ) ( )

≠−

−= ∑

=

otherwise 0

1

tsxft

ZZ

xF

s

i

In the formula:

Z is a very large positive number (K=109)

In this problem,

• t corresponds to the number of support vectors + 1

• s is the number of constraints satisfied by the phenotype of the

individual in the GA’s population


II. Multicategory Classification



1. One-vs-One

� A SVM model is built for each pair of classes. This results in

p(p-1)/2 SVM classifiers, where p is the number of classes in a

specific problem.

� We used the majority voting scheme (MVS) to measure the

accuracy of one-vs-one method:

For a new example xi, class “l” wins a new vote if the classifier l_m

(meaning “class “l” vs class m”) says xi is in class “l”. Otherwise, the vote for class m is increased by one. After p(p-1)/2 binaryclassifiers cast their vote, the MVS assigns x

ito the class with the

largest number of votes.



1. One-vs-One

1 1

32

2

3



2. One-vs-All

� In this strategy, p classifiers are used.

� All N objects are used in each classifier.

� The accuracy is measured using the winner-takes-all (WTA)

strategy where every new xiis assigned to the class with the

largest output.



2. One-vs-All: Winner Takes All

1

A

2

A

3

A

ρ2

ρ1

ρ3ρ1> ρ2

ρ2> ρ3


III. Experiments: Data

� Three data sets were considered in this paper, all taken

from the UCI Machine Learning Repository. In each case

we used a training set (which consisted of a random

sample of size 85%) and a test set (the remaining 15%).

1. Lung Cancer: 3 classes, 100 instances (enriched with

natural splines) and 55 attributes.

2. Wine Recognition: 3 kinds of wines,178 instances and

13 attributes.

3. Iris Plant: 3 classes, 150 instances and 4 attributes.


III. Experiments: Parameters

� Probability of Crossover (Pc) was set to 0.9.

� Probability of Mutation (Pm) was set to 0.05.

� A Radial Basis Function Kernel was chosen with σ2 = 2.

Where σ is a parameter.


( )

−−= 2

22

1exp, ii xxxxk

σ

III. Experiments: Results

� The following table shows the classification accuracy

obtained from the application of one-vs-one and Mayority

Voting Scheme methodologies.

12.0388.0%88.0%Iris

13.6796.0%96.0%Wines

4.1796.7%99.3%Lung Cancer

Average CTest AccuracyTraining Accuracy

Problem

one-vs-one MVS


III. Experiments: Results

� The following table shows the classification accuracy

obtained from the application of one-vs-all and Winner

Takes All methodologies.

12.01100.0%97.6%Iris

11.0092.9%94.7%Wines

12.0093.3%92.0%Lung Cancer

Average CTest AccuracyTraining Accuracy

Problem

one-vs-all WTA


Remarks

� We should keep in mind that there is a potentially infinite

number of possible values for C out of which, typically, the

user has to select the most adequate one.

� However, as shown, an adequately chosen GA is able to

relieve the user from this task.


IV. Conclusions

� The application of SVM to some multi-class classification

problems resulted in a good performance.

� The value of the regularization parameter was automatically

found through the algorithm rather than put in by hand.

� Hence, the theoretical advantages of SVMs may be fully

exploited.

� This seems to solve (we are on the process of performing

exhaustive tests) one of the most cumbersome issues regarding

the practical implementtion of SVMs

� If C’s determination may be automated (as we expect) then the

theoretical advantages of SVMs may be fully exploited and the

shortcoming mentioned above will be eliminated.


Evolutionary Training of SVM for Multiple Category Classification Problems with Self-adaptive...

Documents

Transcript of Evolutionary Training of SVM for Multiple Category Classification Problems with Self-adaptive...