Linear Neural Network Training Algorithms for Real-World Benchmark Problems
Transcript of Linear Neural Network Training Algorithms for Real-World Benchmark Problems
This article was downloaded by:[HEAL-Link Consortium]On: 23 April 2008Access Details: [subscription number 772810551]Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
International Journal of ComputerMathematicsPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713455451
LINEAR NEURAL NETWORK TRAININGALGORITHMS FOR REAL-WORLD BENCHMARKPROBLEMSK. Goulianas a; M. Adamopoulos; S. Katsavounis b; Ch. FRAGAKIS b; C. C.Tsouros ba Department of Informatics, Technological Educational Institute of Thessaloniki,Greece.b Faculty of Engineering, Aristotle University of Thessaloniki, Greece.
Online Publication Date: 01 January 2002To cite this Article: Goulianas, K., Adamopoulos, M., Katsavounis, S., FRAGAKIS,
Ch. and Tsouros, C. C. (2002) 'LINEAR NEURAL NETWORK TRAINING ALGORITHMS FOR REAL-WORLDBENCHMARK PROBLEMS', International Journal of Computer Mathematics, 79:11, 1149 - 1167To link to this article: DOI: 10.1080/00207160213945URL: http://dx.doi.org/10.1080/00207160213945
PLEASE SCROLL DOWN FOR ARTICLE
Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf
This article maybe used for research, teaching and private study purposes. Any substantial or systematic reproduction,re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expresslyforbidden.
The publisher does not give any warranty express or implied or make any representation that the contents will becomplete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should beindependently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with orarising out of the use of this material.
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
Intern. J. Computer Math., 2002, Vol. 79(11), pp. 1149–1167
LINEAR NEURAL NETWORK TRAININGALGORITHMS FOR REAL-WORLD BENCHMARK
PROBLEMS
K. GOULIANASa, M. ADAMOPOULOSa,b, S. KATSAVOUNISc Ch. FRAGAKISc
and C. C. TSOUROSc
aDepartment of Informatics, Technological Educational Institute of Thessaloniki, Greece;bDepartment of Informatics, University of Macedonia, Thessaloniki, Greece;
cFaculty of Engineering, Aristotle University of Thessaloniki, Greece
(Received 22 June 2001; In final form 1 July 2001)
This paper describes the Adaptive Steepest Descent (ASD) and Optimal Fletcher-Reeves (OFR) algorithms for linearneural network training. The algorithms are applied to well-known pattern classification and function approximationproblems, belonging to benchmark collection Proben1. The paper discusses the convergence behavior andperformance of the ASD and OFR training algorithms by computer simulations and compares the results withthose produced by linear-RPROP method.
Keywords: Neural nets; Training algorithms; Iterative methods
C.R. Categories: I.5.1, I.2.6, G.1.3
1 INTRODUCTION
Linear feedforward neural network architectures have been proved capable of solving sys-
tems of linear equations [2, 5–7, 14, 15], and pattern classification problems [8]. Most of
these applications use LMS and Batch-LMS training [4, 17, 18], and require a selection
of appropriate parameters by the user, executed with a trial-and-error process. In real
world problems, it is essential to consider learning methods with a good average perfor-
mance. This paper describes some methods that have been shown to accelerate the conver-
gence of the learning phase, and that do not require the choice of critical parameters, like the
learning rate or the momentum. A simple two-layer feedforward neural network with linear
neuron functions is studied. Batch-LMS training is extended to implement steepest descent
algorithms, like the Adaptive Steepest Descent (ASD) and the Optimal Fletcher-Reeves
(OFR) algorithm. The performance of these algorithms is compared to linear-RPROP
training [9], a technique for optimizing the backpropagation training [10, 11], which uses
ISSN 0020-7160 print; ISSN 1029-0265 online # 2002 Taylor & Francis LtdDOI: 10.1080=0020716021000030684
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
a fixed update size not influenced by the magnitude of the gradient. Instead, only the sign of
the derivative is used to find the proper update direction. Those three methods are applied to
well-known benchmarks from the Proben1 collection. Proben1 contains data from the UCI
repository of machine learning databases for 9 pattern classification problems and 3 function
approximation problems. The 9 pattern classification problems are the following: cancer (a
dataset for diagnosis of breast cancer, originally obtained from the University of Wisconsin
Hospital, Madison, from Dr. William H. Wolberg), card (a dataset for approval or non-ap-
proval of a credit card to a customer), diabetes (a dataset for diagnosis of diabetes of Pima
Indians), gene (a dataset for detection of intron/exon boundaries in nucleotide sequences),
glass (a dataset for classification of glass types), heart (a dataset for heart disease predic-
tion), horse (a dataset for prediction of the fate of a horse that has a colic), soybean (a da-
taset for recognition of 19 diseases of soybeans), and thyroid (a dataset for diagnosis of
thyroid hyper- or hypofunction). The 3 function approximation problems are the following:
building (a dataset for prediction of energy consumption in a building), flare (a dataset for
prediction of sonar flares), and hearta (the analogue version of the heart disease diagnosis
problem). All the datasets are partitioned into training, validation, and test set, while the size
of the training, validation, and test set data files is 50%, 25%, and 25% respectively. Since
results may vary for different partitionings, Proben1 contains three different permutations of
each dataset. For instance, the problem cancer is available in three datasets cancer1, can-
cer2, and cancer3, which differ only in the ordering of the patterns. Validation set is
used as a pseudo test set in order to evaluate the quality of the network during training, a
method called cross-validation, which avoids overfitting, a problem created when many
training examples are available causing the loss of much of the regularities needed for
good generalization [13]. The method used for cross-validation is early stopping
[1, 12, 16], where training proceeds until a minimum of the error on validation set (and
not the training set) is reached.
The formulation of the above problems is as follows: Given a set of input patterns
xi ¼ ½xi1; xi
2; . . . ; xin�
T and the targets di ¼ ½di1; di
2; . . . ; dip�
T; i ¼ 1; 2; . . . ;m the task is to
find a set of weights W 2 <n�p that provides the best fit between the input/output pairs
ðxi; diÞ; i ¼ 1; 2; . . . ;m. A simple linear architecture for the above mapping is a hetero-asso-
ciative two-layer feedforward neural network, with n inputs and p output neurons shown in
Figure 1. The m patterns are presented to the input layer in a cyclical fashion and an output
FIGURE 1 Linear neural network for classification and function approximation problems.
1150 K. GOULIANAS et al.
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
yi ¼ ½ yi1; yi
2; . . . ; yip�
T is generated. The goal is to minimize the mean square error, or the cost
function
EðW Þ ¼Xp
j¼1
Eðw jÞ ¼1
2
Xp
j¼1
Xm
i¼1
ðdij yi
jÞ2¼
1
2
Xm
i¼1
Xp
j¼1
dij
Xn
k¼1
xikw
jk
!2
¼1
2X TW D�� ��2
ð1Þ
With X ¼ ½x1; x2; . . . ; xm�, and D ¼ ½d1; d2; . . . ; dm�;w j the weight vector of the jth output
neuron, and E(w j) its cost function. Using a general gradient approach, any minimum of the
cost function in Eq. (1) must satisfy HEðW Þ ¼ X ðX TW DÞ ¼ 0, which can be rewritten as
XX TW ¼ XD ð2Þ
or the equivalent system BW ¼ C with B ¼ XX T;B 2 <n�n and C ¼ XD;C 2 <n. Equation
(2) consists of p systems of normal equations, with B ¼ XX T positive definite and symmetric.
The solution of systems (2) for a non-singular matrix X T with m n, gives the unique least
mean square solution W ¼ XþD with Xþ ¼ ðXX TÞ1X the Moore-Penrose generalized
inverse [3].
The cost functions E(w j) defined in (1) are quadratic in the weights w j and for m n they
define convex hyper-paraboloidal surfaces with a single minimum, the global minimum, the
solutions of Eq. (2), which are unique, since the Hessian matrix of E(w j) at
w j;H2Eðw jÞ ¼ XX T is positive definite.
Since the mean squared error defined in (1) depends on the number of output coefficients,
and on the range of the output values used, Prechelt [8] suggests the use of the squared error
percentage, a normalization of these factors, as follows:
EðW Þ ¼Xp
j¼1
E w j� �
¼ 100 �ðomax ominÞ
m � p
Xp
j¼1
Xm
i¼1
dij yi
j
� �2
ð3Þ
with omin and omax to be the minimum and maximum value of output coefficients, p the num-
ber of output nodes, and m the number of patterns in the dataset.
The material is organized as follows. In Section 2 we simulate the ASD and OFR
algorithm with the above architecture and discuss convergence issues for obtaining estimates
of the optimal solution. In Section 3, we compare the performance of the ASD and OFR
training algorithm with linear-RPROP in the solution of the pattern classification and
function approximation benchmarking problems. Finally, in Section 4, we draw some final
conclusions.
2 ASD AND OFR METHODS FOR LINEAR NEURAL NETWORK TRAINING
With zero or random in [0:01, 0.01] initial weights wjk; j ¼ 1; 2; . . . ; p; k ¼ 1; 2; . . . ; n the
train set patterns xi ¼ ½xi1; xi
2; . . . ; xin�
T; i ¼ 1; 2; . . . ;m are presented to the network in a
NEURAL NET TRAINING 1151
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
cyclical fashion. Thus, with the presentation of the ith pattern, the outputs yðtþ1;iÞj ;
j ¼ 1; 2; . . . ; p; will be
yðtþ1;iÞj ¼ wðt;jÞxi ¼
Xn
k¼1
wðt;jÞk xi
k ð4Þ
with ðt þ 1; iÞ the step i of the training cycle t þ 1. Delta Rule is applied, and the discrepancy
between desired and calculated output dij and y
ðtþ1;iÞj , for every pattern i, i ¼ 1; 2; . . . ;m and
output neuron j, j ¼ 1; 2; . . . ; p is
eðtþ1;iÞj ¼ di
j yðtþ1;iÞj ð5Þ
We define the batch error dðtþ1Þk for every input neuron k, k ¼ 1; 2; . . . ; n to be
dðtþ1Þk ¼
Xm
i¼1
eðtþ1;iÞj xi
k ¼Xm
i¼1
dij y
ðtþ1;iÞj
� �xi
k ð6Þ
or in a matrix-vector form
dðtþ1Þ¼Xm
i¼1
eðtþ1;iÞj xi ¼
Xm
i¼1
dij y
ðtþ1;iÞj
� �xi ð7Þ
and the adaptive learning rate aðtþ1Þk as
aðtþ1Þk ¼
dðtþ1Þk dðtþ1Þ
k
dðtþ1Þk X TXdðtþ1Þ
k
ð8Þ
2.1 The Adaptive Steepest Descent (ASD) Method
The connection update after the presentation of all the training set patterns xi; i ¼ 1; 2; . . . ;m
at training cycle t þ 1, could have the form
wðtþ1;jÞk ¼ w
ðt;jÞk þ aðtþ1Þ
k
Xm
i¼1
ðdij y
ðtþ1;iÞj Þxi
k
¼ wðt;jÞk þ
dðtþ1Þdðtþ1Þ
dðtþ1ÞX TXdðtþ1Þ
Xm
i¼1
dij
Xn
q¼1
wðt;jÞq xi
q
!xi
k
¼ wðt;jÞk þ
dðtþ1Þdðtþ1Þ
dðtþ1ÞCdðtþ1Þ
Xm
i¼1
cik
Xn
q¼1
wðt;jÞq bi
q
!ð9Þ
with k ¼ 1; 2; . . . ; n and bkq; ci
k the corresponding elements of B and C, as defined in (2).
The operation of the ANN in Figure 1 using ASD method with the adaptive learning
rate aðtþ1Þk defined in (2) simulates the Adaptive Steepest Descent Method. Proof can be
found in [2].
1152 K. GOULIANAS et al.
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
2.2 The Optimal Fletcher-Reeves (OFR) Method
We define bðtþ1Þ to be
bðtþ1Þ¼
dðtþ1Þdðtþ1Þ
dðtÞdðtÞð10Þ
and the connection update after the presentation of all the training set patterns
xi; i ¼ 1; 2; . . . ;m at training cycle t þ 1, has the form
wðtþ1;jÞk ¼ w
ðt;jÞk þ aðtþ1Þ
k
Xm
i¼1
dij y
ðtþ1;iÞj
� �xi
k þ bðtþ1ÞDwðt;jÞk
¼ wðt;jÞk
dðtþ1Þdðtþ1Þ
dðtÞCdðtÞXm
i¼1
dij
Xn
q¼1
wðt;jÞq xi
q
!xi þ
dðtþ1Þdðtþ1Þ
dðtÞdðtÞDw
ðt;jÞk
¼ wðt;jÞk þ
dðtþ1Þdðtþ1Þ
dðtþ1ÞCdðtþ1Þ
Xm
i¼1
cik
Xn
q¼1
wðt;jÞq bi
q
!þdðtþ1Þdðtþ1Þ
dðtÞdðtÞDw
ðt;jÞk ð11Þ
with k ¼ 1; 2; . . . ; n and bkq; ci
k the corresponding elements of B and C, as defined in (2). The
operation of the ANN in Figure 1 using OFR method with the adaptive learning rate aðtþ1Þk
defined in (2) simulates the Optimal Fletcher-Reeves Method.
3 EXPERIMENTAL STUDY
In order to check the performance and the convergence behavior of the proposed algorithm
the benchmark problems used were taken from the Proben1 benchmark set, with the standard
Proben1 benchmarking rules. The error measures reported include: the training set error
(mean and standard deviation of minimum squared error percentage on training set, reached
at any time during training), validation set error (mean and standard deviation of minimum
squared error percentage on validation set, reached at any time during training), test set error
(mean and standard deviation of minimum squared error percentage on test set, at point of
minimum validation set error), and the test set classification error (i.e. the percentage of in-
correctly classified examples). For the classification problems, winner-takes-all method was
used to determine the classification, i.e. the output with the highest activation designates the
class, while in approximation problems, a threshold of 0.3 was used in the output, and the
network accepts an output as 0, if it is below 0.3, and as 1, if it is above 0.7. In order to
measure the training time, we also report the number of epochs used, the number of relative
epochs, i.e. the epochs needed to reach the minimum validation error, and the connection
traversals. One stopping criterion is the loss of generality. The generalization loss at epoch
t is defined as the relative increase of the validation squared error percentage over the mini-
mum so far in percent
GLðtÞ ¼ 100 �EvaðtÞ
mint0�t
Evaðt0Þ 1
0@
1A ð12Þ
NEURAL NET TRAINING 1153
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
and the algorithm stops as the generalization loss exceeds a threshold a ¼ 5. Another stop-
ping criterion is the training progress, defined in terms of a training strip. A training strip of
length k is a sequence of k epochs, and the training progress is how much is the average train-
ing error during the strip larger than the minimum error during the strip
PkðtÞ ¼ 1000 �
Pt02kkþ1���t Etrðt
0Þ
k � mint02kkþ1���t Etrðt0Þ 1
� �ð13Þ
Since training involves some kind of random generalization, in order to make reliable state-
ments about the performance of the three algorithms, we used 20 runs on each problem, for
each one of the three datasets. The data reported include the mean and standard deviation of
the above error and training measures for these 20 runs, along with generalization loss at end
of training.
The results of ASD and OFR training of the linear network in Figure 1 with the classifica-
tion and approximation problems, are compared to linear-RPROP, with the following
parameters used by Prechelt [8]: nþ ¼ 1:2; n ¼ 0:5; D0 2 0:005 � � � 0:02; Dmax ¼ 50;Dmin ¼ 0, and initial weights randomly chosen in ½0:01; 0:01�.
The first criterion checked is speed of convergence, i.e. the number of epochs required by
each algorithm to reach one of the three stopping criteria. In most problems, the best results
are achieved by OFR method, which outperforms the other two algorithms in 23 out of 30
datasets for the classification problems, and in 10 out of 12 datasets for the approximation
problems. Graphs 3.1 and 3.2 show the mean value of the epochs required by each algorithm
GRAPH 3.1 Mean value of iterations needed for convergence of classification problems (30 datasets).
GRAPH 3.2 Best run iterations needed for convergence of classification problems (30 datasets).
1154 K. GOULIANAS et al.
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
TA
BL
EII
I.1
Res
ult
sfo
rC
lass
ifica
tion
Pro
ble
ms.
Epo
chs
Rel
eva
nt
epo
chs
Connec
tion
trave
rsals
Tra
inin
gse
ter
ror
Va
lid
ati
on
set
erro
rTes
tse
ter
ror
Tes
tse
tcl
ass
ific
ati
on
Gen
era
lity
loss
Pro
ble
mM
eth
od
mea
nst
dev
mea
nst
dev
mea
nst
dev
mea
nst
dev
mea
nst
dev
mea
nst
dev
mea
nst
dev
mea
nst
dev
Can
cer1
RP
RO
P8
51
47
31
88
51
42
0.7
50
.01
20
.21
0.1
21
8.6
10
.13
15
.15
0.5
00
.18
0.2
8A
SD
40
02
31
40
02
0.7
40
.00
20
.20
0.0
01
8.6
30
.00
16
.09
0.0
00
.16
0.0
1O
FR
20
19
02
01
20
.74
0.0
02
0.0
70
.04
18
.61
0.0
51
5.7
90
.39
0.8
30
.28
Can
cer2
RP
RO
P8
42
36
31
98
42
31
8.7
10
.18
20
.56
0.1
02
2.0
30
.19
19
.03
0.4
51
.12
2.6
8A
SD
50
03
01
50
01
8.6
50
.00
20
.51
0.0
02
2.0
20
.01
18
.39
0.0
00
.13
0.0
1O
FR
24
41
10
24
41
8.6
40
.00
20
.47
0.0
22
2.1
40
.03
18
.94
0.1
30
.37
0.0
8
Can
cer3
RP
RO
P8
51
47
31
88
51
42
0.7
50
.01
20
.21
0.1
21
8.6
10
.13
15
.15
0.5
00
.18
0.2
8A
SD
48
22
10
48
22
0.4
30
.00
18
.70
0.0
02
0.3
40
.01
17
.66
0.2
50
.46
0.0
3O
FR
23
51
35
23
52
0.4
30
.01
18
.75
0.0
32
0.2
80
.07
16
.88
0.2
80
.33
0.1
6
Car
d1
RP
RO
P6
49
27
56
49
9.8
30
.01
8.9
30
.14
10
.67
0.2
01
3.7
10
.74
4.1
91
.23
AS
D8
83
13
08
83
10
.04
0.0
18
.21
0.0
11
0.4
70
.01
13
.95
0.0
05
.08
0.0
6O
FR
49
13
13
04
91
31
0.0
50
.07
28
.20
0.0
11
0.4
60
.01
13
.95
0.0
05
.90
0.8
6
Car
d2
RP
RO
P5
51
92
46
55
19
8.3
80
.35
10
.76
0.2
41
4.9
30
.26
19
.37
0.3
84
.38
1.4
9A
SD
23
14
21
12
31
48
.32
0.0
09
.71
0.0
11
3.6
70
.03
19
.77
0.0
05
.02
0.0
4O
FR
85
30
19
08
53
08
.34
0.0
39
.71
0.0
11
3.7
20
.02
19
.77
0.0
05
.59
0.7
7
Car
d3
RP
RO
P1
04
11
45
13
10
41
19
.47
0.0
08
.39
0.1
01
2.6
00
.21
14
.60
0.6
51
.75
1.1
9A
SD
15
84
26
11
58
49
.64
0.0
07
.68
0.0
11
2.3
40
.01
15
.45
0.2
95
.05
0.0
4O
FR
76
35
20
27
63
59
.65
0.0
47
.67
0.0
11
2.3
40
.03
15
.51
0.4
25
.69
0.8
1
Dia
bet
es1
RP
RO
P1
03
15
85
19
10
31
52
0.3
40
.01
22
.58
0.0
32
4.0
60
.09
38
.65
0.6
00
.19
0.1
6A
SD
30
01
00
30
02
0.3
20
.00
22
.54
0.0
02
4.0
20
.00
38
.54
0.0
00
.54
0.0
1O
FR
20
18
02
01
20
.31
0.0
02
2.5
10
.00
23
.89
0.0
13
7.5
00
.00
0.8
80
.04
Dia
bet
es2
RP
RO
P1
00
16
99
16
10
01
62
1.0
50
.01
20
.87
0.0
62
4.2
90
.08
37
.31
0.7
80
.02
0.0
3A
SD
30
01
50
30
02
1.0
30
.00
20
.71
0.0
02
3.9
80
.00
35
.94
0.0
00
.16
0.0
1O
FR
20
01
11
20
02
1.0
20
.00
20
.72
0.0
02
4.1
60
.09
36
.38
0.7
20
.22
0.0
2
Dia
bet
es3
RP
RO
P106
11
35
8106
11
20.4
00.0
123.0
30.1
422.6
40.1
737.0
61.2
42.5
50.5
7A
SD
25
03
02
50
20
.38
0.0
02
2.8
70
.01
22
.04
0.0
13
3.7
40
.47
3.7
00
.03
OF
R2
78
30
27
82
0.3
90
.01
22
.89
0.0
12
2.0
30
.01
33
.28
0.5
33
.65
0.2
2
Gen
e1R
PR
OP
30
01
86
30
02
1.4
80
.00
25
.55
0.0
52
5.0
30
.04
39
.06
0.6
00
.24
0.2
0A
SD
19
22
01
92
21
.48
0.0
02
4.7
30
.10
24
.46
0.0
64
0.7
00
.68
3.5
20
.43
OF
R2
02
20
20
22
1.4
80
.00
24
.74
0.0
82
4.5
30
.07
40
.66
0.6
63
.46
0.3
4
NEURAL NET TRAINING 1155
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
TA
BL
EII
I.1
(co
nti
nued
)
Epo
chs
Rel
eva
nt
epo
chs
Connec
tion
trave
rsals
Tra
inin
gse
ter
ror
Va
lid
ati
on
set
erro
rTes
tse
ter
ror
Tes
tse
tcl
ass
ific
ati
on
Gen
era
lity
loss
Pro
ble
mM
eth
od
mea
nst
dev
mea
nst
dev
mea
nst
dev
mea
nst
dev
mea
nst
dev
mea
nst
dev
mea
nst
dev
mea
nst
dev
Gen
e2R
PR
OP
30
01
95
30
02
1.6
20
.00
25
.20
0.0
32
4.9
90
.04
39
.74
0.5
00
.15
0.0
9A
SD
15
02
01
50
21
.62
0.0
02
4.5
40
.10
24
.54
0.0
94
1.1
40
.58
2.7
50
.39
OF
R1
50
20
15
02
1.6
20
.00
24
.54
0.1
12
4.5
70
.11
41
.05
0.7
72
.75
0.4
5
Gen
e3R
PR
OP
30
01
85
30
02
1.8
80
.00
24
.33
0.0
52
5.3
70
.07
41
.93
0.6
40
.21
0.2
0A
SD
15
03
01
50
21
.88
0.0
02
4.0
10
.08
24
.88
0.0
84
2.3
70
.41
1.4
50
.32
OF
R1
50
30
15
02
1.8
80
.00
24
.00
0.0
62
4.8
60
.06
42
.50
0.3
41
.50
0.2
6
Gla
ss1
RP
RO
P1
24
14
24
51
24
14
8.8
40
.01
9.7
10
.07
10
.13
0.1
24
7.6
73
.05
3.7
50
.69
AS
D1
57
34
41
15
73
8.8
00
.00
9.9
50
.01
9.7
70
.01
47
.17
0.0
01
.65
0.0
6O
FR
75
14
29
87
51
48
.79
0.0
29
.86
0.0
89
.80
0.1
14
3.3
02
.07
3.6
81
.95
Gla
ss2
RP
RO
P3
39
16
33
39
8.7
50
.16
10
.27
0.1
61
0.3
00
.14
55
.91
2.3
96
.19
1.0
9A
SD
25
07
02
50
8.6
20
.01
10
.54
0.0
11
0.3
70
.01
54
.72
0.0
05
.40
0.1
4O
FR
13
35
01
33
8.7
00
.11
10
.59
0.0
21
0.4
80
.01
52
.83
0.0
06
.16
0.9
7
Gla
ss3
RP
RO
P1
24
22
26
12
12
42
28
.72
0.0
29
.36
0.0
51
1.1
80
.20
60
.77
4.1
32
.03
0.5
5A
SD
18
23
46
21
82
38
.67
0.0
09
.41
0.0
01
0.9
40
.01
56
.60
0.0
01
.36
0.0
4O
FR
66
22
28
76
62
28
.68
0.0
69
.41
0.0
31
1.0
10
.07
56
.31
1.2
64
.07
4.4
0
Hea
rt1
RP
RO
P1
38
15
49
91
38
15
11
.19
0.0
11
3.2
50
.06
14
.31
0.0
52
1.0
80
.41
1.2
20
.55
AS
D1
61
21
60
21
61
21
1.2
10
.00
13
.13
0.0
01
4.1
30
.00
20
.43
0.0
00
.01
0.0
1O
FR
75
17
69
15
75
17
11
.19
0.0
21
3.1
10
.02
14
.08
0.0
22
0.4
80
.13
0.0
60
.06
Hea
rt2
RP
RO
P1
89
24
16
82
61
89
24
11
.67
0.0
21
2.2
10
.02
13
.53
0.0
31
6.5
00
.22
0.1
50
.11
AS
D2
16
32
15
32
16
31
1.6
60
.00
12
.22
0.0
01
3.6
00
.00
16
.52
0.0
00
.00
0.0
0O
FR
81
23
72
23
81
23
11
.64
0.0
21
2.2
20
.01
13
.63
0.0
41
6.5
40
.10
0.0
70
.08
Hea
rt3
RP
RO
P1
41
13
85
48
14
11
31
1.1
10
.00
10
.73
0.0
61
6.4
00
.11
23
.18
0.9
00
.42
0.4
8A
SD
18
72
13
72
18
72
11
.11
0.0
01
0.6
10
.00
16
.27
0.0
02
2.6
10
.00
0.0
90
.02
OF
R8
72
15
71
08
72
11
1.0
90
.01
10
.56
0.0
21
6.2
50
.04
22
.68
0.3
20
.58
0.3
2
Hea
rtc1
RP
RO
P1
20
27
82
40
12
02
71
0.2
00
.15
9.6
20
.12
16
.47
0.6
42
0.0
00
.97
0.8
21
.40
AS
D1
55
01
55
01
55
01
0.1
80
.00
9.6
10
.00
16
.00
0.0
02
0.0
00
.00
0.0
00
.00
OF
R6
61
34
41
86
61
31
0.1
60
.01
9.5
70
.03
16
.04
0.0
71
9.4
40
.66
0.4
00
.25
Hea
rtc2
RP
RO
P9
66
12
61
69
66
11
1.7
80
.93
16
.58
0.4
36
.66
0.9
83
.79
1.9
05
.38
2.2
5A
SD
17
00
28
11
70
01
1.2
30
.00
16
.54
0.0
16
.15
0.0
13
.23
0.6
62
.87
0.0
4O
FR
72
19
22
27
21
91
1.2
20
.01
16
.53
0.0
46
.22
0.1
02
.46
0.4
93
.41
0.4
1
Hea
rtc3
RP
RO
P2
06
13
22
06
11
.12
0.6
21
3.9
90
.35
13
.21
0.4
11
3.7
51
.44
8.3
13
.33
1156 K. GOULIANAS et al.
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
AS
D2
42
50
24
21
0.4
40
.06
13
.12
0.0
31
2.0
80
.02
15
.44
0.6
66
.06
0.5
3O
FR
20
05
02
00
10
.33
0.0
21
3.1
10
.03
12
.09
0.0
31
5.7
20
.54
8.7
50
.50
Ho
rse1
RP
RO
P3
06
13
43
06
11
.30
0.2
01
5.5
50
.24
12
.87
0.3
32
6.2
02
.60
6.2
50
.64
AS
D3
22
50
32
21
1.2
50
.05
15
.21
0.0
41
2.7
60
.05
26
.49
1.2
35
.59
0.3
6O
FR
29
25
02
92
11
.10
0.0
71
5.2
00
.05
12
.75
0.0
42
6.9
51
.09
6.8
40
.82
Ho
rse2
RP
RO
P3
27
14
33
27
8.9
00
.21
15
.73
0.2
71
7.3
40
.50
36
.50
1.3
15
.60
0.9
6A
SD
60
01
51
60
08
.36
0.0
11
5.6
80
.02
16
.62
0.0
63
5.8
00
.54
5.2
20
.13
OF
R2
73
13
12
73
8.4
40
.06
15
.69
0.0
31
6.6
60
.08
35
.51
0.5
16
.82
1.3
1
Ho
rse3
RP
RO
P2
27
92
22
71
0.7
30
.42
15
.43
0.2
81
5.2
40
.34
31
.87
1.8
25
.89
0.9
0A
SD
25
17
12
51
10
.26
0.0
31
5.3
30
.04
15
.13
0.0
53
3.2
00
.67
5.3
00
.29
OF
R2
00
71
20
01
0.2
41
0.0
31
5.3
10
.04
15
.13
0.0
53
3.4
90
.75
6.0
60
.35
Soy
bea
n1
RP
RO
P5
43
15
42
05
15
43
15
0.6
50
.00
0.9
80
.00
1.1
60
.00
9.5
70
.32
0.2
60
.11
AS
D1
19
82
01
19
82
01
19
82
00
.67
0.0
00
.96
0.0
01
.15
0.0
09
.35
0.1
80
.00
0.0
1O
FR
82
17
48
02
81
82
17
40
.67
0.0
00
.96
0.0
01
.15
0.0
19
.20
0.2
80
.07
0.0
8
Soy
bea
n2
RP
RO
P4
91
18
47
42
84
91
18
0.8
00
.00
0.8
10
.00
1.0
50
.00
4.1
20
.00
0.0
50
.04
AS
D1
01
31
51
01
21
51
01
31
50
.82
0.0
00
.82
0.0
01
.07
0.0
04
.12
0.0
00
.00
0.0
1O
FR
72
85
47
18
54
72
85
40
.82
0.0
00
.82
0.0
11
.07
0.0
14
.12
0.0
00
.06
0.0
9
Soy
bea
n3
RP
RO
P5
18
20
49
83
45
18
20
0.7
80
.00
0.9
60
.00
1.0
40
.00
7.0
90
.13
0.0
40
.03
AS
D1
12
82
51
12
82
51
12
82
50
.80
0.0
00
.96
0.0
01
.02
0.0
06
.53
0.1
80
.00
0.0
1O
FR
75
96
07
52
60
75
96
00
.79
0.0
00
.96
0.0
11
.03
0.0
16
.59
0.3
10
.09
0.1
3
Thy
roid
1R
PR
OP
24
65
02
43
51
24
65
03
.91
0.0
13
.95
0.0
24
.08
0.0
26
.53
0.0
30
.03
0.0
3A
SD
81
01
18
10
11
81
01
14
.03
0.0
04
.17
0.0
04
.22
0.0
06
.56
0.0
00
.01
0.0
1O
FR
34
61
21
34
41
20
34
61
21
3.9
70
.04
4.0
50
.09
4.1
30
.08
6.5
50
.02
0.0
10
.01
Thy
roid
2R
PR
OP
25
14
02
48
39
25
14
04
.10
0.0
13
.68
0.0
23
.86
0.0
26
.38
0.0
00
.05
0.0
5A
SD
91
41
09
13
10
91
41
04
.23
0.0
03
.81
0.0
03
.99
0.0
06
.38
0.0
00
.01
0.0
1O
FR
45
31
10
45
21
10
45
31
10
4.1
70
.04
3.7
70
.04
3.9
50
.04
6.3
80
.00
0.0
10
.01
Thy
roid
3R
PR
OP
28
06
52
75
67
28
06
54
.01
0.0
13
.48
0.0
14
.19
0.0
17
.23
0.0
20
.05
0.0
3A
SD
90
47
90
37
90
47
4.1
60
.00
3.6
50
.00
4.3
30
.00
7.1
70
.00
0.0
20
.01
OF
R4
24
12
64
22
12
54
24
12
64
.08
0.0
63
.56
0.0
64
.25
0.0
57
.22
0.0
60
.02
0.0
2
NEURAL NET TRAINING 1157
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
TA
BL
EII
I.2
Res
ult
sfo
rA
pp
rox
imat
ion
Pro
ble
ms.
Ep
och
sR
elev
ant
epoch
sC
on
nec
tio
ntr
ave
rsals
Tra
inin
gse
ter
ror
Va
lid
ati
on
set
erro
rTes
tse
ter
ror
Tes
tse
tcl
ass
ific
ati
on
Gen
era
lity
loss
Pro
ble
mM
eth
od
mea
nst
dev
mea
nst
dev
mea
nst
dev
mea
nst
dev
mea
nst
dev
mea
nst
dev
mea
nst
dev
mea
nst
dev
Buil
din
g1
RP
RO
P3
48
35
34
43
53
48
35
0.3
40
.00
0.3
70
.00
0.3
50
.00
0.2
90
.00
0.0
30
.03
AS
D5
44
24
54
42
45
44
24
0.3
40
.00
0.3
70
.00
0.3
50
.00
0.2
90
.00
0.0
00
.00
OF
R3
43
13
03
42
12
93
43
13
00
.33
0.0
00
.37
0.0
00
.34
0.0
00
.29
0.0
00
.00
0.0
1
Buil
din
g2
RP
RO
P3
40
26
33
52
73
40
26
0.3
40
.00
0.3
70
.00
0.3
50
.00
0.2
90
.00
0.0
40
.02
AS
D5
25
31
52
53
15
25
31
0.3
40
.00
0.3
70
.00
0.3
50
.00
0.2
90
.00
0.0
00
.00
OF
R3
67
11
83
66
11
83
67
11
80
.33
0.0
00
.37
0.0
00
.34
0.0
00
.29
0.0
00
.00
0.0
0
Buil
din
g3
RP
RO
P3
51
23
34
12
23
51
23
0.3
50
.00
0.3
50
.00
0.3
40
.00
0.2
90
.00
0.0
20
.01
AS
D4
62
44
46
24
44
62
44
0.3
50
.00
0.3
50
.00
0.3
50
.00
0.2
90
.00
0.0
10
.01
OF
R3
04
59
29
85
93
04
59
0.3
50
.00
0.3
50
.00
0.3
50
.00
0.2
50
.05
0.0
40
.07
Fla
re1
RP
RO
P2
42
08
62
42
00
.39
0.0
20
.34
0.0
10
.55
0.0
33
.84
0.4
56
.94
4.8
2A
SD
50
10
50
0.5
10
.03
0.4
30
.03
0.7
00
.04
5.2
60
.00
0.0
00
.00
OF
R5
01
05
00
.52
0.0
30
.43
0.0
20
.70
0.0
35
.26
0.0
00
.00
0.0
0
Fla
re2
RP
RO
P1
92
07
61
92
00
.46
0.0
30
.48
0.0
20
.31
0.0
12
.56
0.2
46
.29
4.6
7A
SD
50
10
50
0.6
10
.03
0.5
90
.03
0.3
30
.02
3.0
10
.00
0.0
00
.00
OF
R5
01
05
00
.61
0.0
30
.59
0.0
30
.33
0.0
23
.01
0.0
00
.00
0.0
0
Fla
re3
RP
RO
P3
41
91
17
34
19
0.4
10
.03
0.4
70
.02
0.3
60
.02
3.0
10
.12
4.6
36
.01
AS
D5
01
05
00
.58
0.0
20
.57
0.0
20
.43
0.0
23
.76
0.0
00
.00
0.0
0
1158 K. GOULIANAS et al.
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
OF
R5
01
05
00
.58
0.0
30
.58
0.0
30
.44
0.0
33
.76
0.0
00
.00
0.0
0
Hea
rta1
RP
RO
P6
35
71
71
36
35
74
.85
1.3
35
.07
1.0
65
.25
1.1
81
4.0
74
.72
8.4
76
.09
AS
D2
25
48
12
25
43
.84
0.0
04
.36
0.0
14
.65
0.0
21
0.9
20
.44
2.5
90
.26
OF
R8
72
31
21
08
72
33
.84
0.0
14
.35
0.0
14
.62
0.0
81
0.8
50
.52
3.1
70
.65
Hea
rta2
RP
RO
P9
81
29
01
19
81
24
.17
0.0
14
.27
0.0
34
.19
0.0
11
0.6
90
.57
0.1
00
.12
AS
D1
76
31
76
31
76
34
.18
0.0
04
.25
0.0
04
.19
0.0
01
0.8
70
.00
0.0
10
.01
OF
R7
51
97
31
97
51
94
.17
0.0
14
.25
0.0
14
.18
0.0
11
0.5
90
.21
0.0
10
.02
Hea
rta3
RP
RO
P9
53
18
13
49
53
14
.09
0.0
84
.15
0.0
64
.59
0.1
21
2.2
21
.10
0.9
22
.21
AS
D1
43
31
43
31
43
34
.08
0.0
04
.12
0.0
04
.58
0.0
01
2.1
10
.16
0.0
20
.02
OF
R5
87
53
95
87
4.0
60
.01
4.1
00
.02
4.5
60
.02
11
.69
0.3
10
.07
0.1
1
Hea
rtac
1R
PR
OP
75
41
69
41
75
41
4.1
00
.09
4.7
10
.04
2.7
30
.14
5.0
50
.82
2.7
44
.35
AS
D1
29
31
28
21
29
34
.05
0.0
04
.67
0.0
02
.65
0.0
15
.33
0.0
00
.03
0.0
2O
FR
74
24
71
25
74
24
4.0
50
.01
4.6
60
.02
2.6
80
.03
4.9
10
.62
0.0
40
.05
Hea
rtac
2R
PR
OP
13
69
41
36
4.2
31
.18
5.4
40
.93
4.8
11
.10
11
.23
3.2
51
3.9
71
0.6
3A
SD
50
20
50
3.8
90
.02
4.6
70
.04
4.0
70
.04
9.4
70
.41
15
.25
0.5
9O
FR
50
20
50
3.8
80
.02
4.6
70
.02
4.0
70
.03
9.4
70
.41
15
.10
0.3
4
Hea
rtac
3R
PR
OP
13
10
85
13
10
4.0
41
.28
5.8
90
.79
5.9
90
.82
16
.42
3.7
71
2.4
46
.71
AS
D5
02
05
03
.24
0.0
15
.10
0.0
45
.45
0.0
21
4.9
50
.69
6.6
20
.21
OF
R5
02
05
03
.24
0.0
15
.08
0.0
45
.45
0.0
31
5.0
90
.97
6.8
10
.25
NEURAL NET TRAINING 1159
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
for every dataset. OFR also requires the lowest number of relevant epochs, and connection
traversals. These results, are shown in Tables III.1 and III.2.
We also report some results of the above error and training measures of the 20 runs for
each of the three datasets of each problem, as it concerns the ‘‘best run’’, i.e. the one with
the lowest validation set error. These results of linear-RPROP, ASD and OFR training also
show that OFR outperforms all other algorithms in 19 out of 30 datasets in classification pro-
blems, and in 9 out of 12 datasets in approximation problems, while OFR needs the lowest
number of relevant epochs and connection traversals. Results, are shown in Graphs 3.3 and
3.4 and Tables III.3 and III.4.
The convergence behavior of the three algorithms, with respect to the test set classification
error during time, applied on the ‘‘best run’’, is shown in Graphs 3.5–3.18. It can be seen that
in most of the problems, the test set classification error for ASD and OFR methods decrease
rapidly, and reaches its minimum value.
The results shown on Tables III.1–III.4, and Graphs 3.1–3.18, show that real world problems
can be solved with linear neural networks, and training time for convergence can be acceler-
ated by using adaptive steepest descent methods, like ASD and OFR. Some of the problems are
very sensitive to overfitting, which suggests that using early stopping was very useful.
GRAPH 3.3 Mean value of iterations needed for convergence of approximation problems (12 datasets).
GRAPH 3.4 Best run iterations needed for convergence of approximation problems (12 datasets).
1160 K. GOULIANAS et al.
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
TA
BL
EII
I.3
Bes
tR
un
Res
ult
sfo
rC
lass
ifica
tio
nP
roble
ms.
Pro
ble
mM
eth
od
Epo
chs
Rel
eva
nt
epoch
sC
on
nec
tio
ntr
ave
rsa
lsTra
inin
gse
ter
ror
Va
lid
ati
on
set
erro
rTes
tse
ter
ror
Tes
tse
tcl
ass
ific
ati
on
Gen
era
lity
loss
Can
cer1
RP
RO
P1
05
79
10
52
0.7
52
0.0
31
8.4
51
4.3
70
.38
AS
D4
02
34
02
0.7
42
0.1
91
8.6
31
6.0
90
.17
OF
R2
09
20
20
.75
20
.01
18
.54
16
.67
1.3
0
Can
cer2
RP
RO
P9
54
39
51
8.6
62
0.3
82
1.9
51
7.8
20
.72
AS
D5
03
15
01
8.6
52
0.5
12
2.0
21
8.3
90
.13
OF
R3
01
03
01
8.6
62
0.4
42
2.1
91
8.9
70
.32
Can
cer3
RP
RO
P1
00
49
10
02
0.4
41
8.4
82
0.3
91
6.0
91
.23
AS
D5
02
15
02
0.4
21
8.7
02
0.3
41
7.8
20
.50
OF
R3
51
13
52
0.4
21
8.7
12
0.2
91
7.2
40
.54
Car
d1
RP
RO
P4
52
84
59
.86
8.6
41
0.6
71
3.9
56
.24
AS
D8
01
38
01
0.0
68
.19
10
.46
13
.95
5.0
3O
FR
55
13
55
10
.05
8.1
91
0.4
41
3.9
55
.21
Car
d2
RP
RO
P2
01
52
08
.48
10
.07
14
.85
19
.77
6.5
6A
SD
22
52
12
25
8.3
29
.70
13
.65
19
.77
5.0
1O
FR
65
19
65
8.3
59
.69
13
.70
19
.77
5.1
0
Car
d3
RP
RO
P1
15
23
11
59
.48
8.1
61
2.2
31
3.9
54
.45
AS
D1
55
27
15
59
.64
7.6
71
2.3
51
5.7
05
.00
OF
R7
52
47
59
.68
7.6
81
2.2
71
6.2
85
.06
Dia
bet
es1
RP
RO
P110
87
110
20.3
322.5
123.9
638.5
40.2
7A
SD
30
10
30
20
.32
22
.54
24
.01
38
.54
0.5
5O
FR
20
82
02
0.3
12
2.5
02
3.8
73
7.5
00
.91
Dia
bet
es2
RP
RO
P100
97
100
21.0
320.7
524.3
438.0
20.0
3A
SD
30
15
30
21
.03
20
.71
23
.99
35
.94
0.1
7O
FR
20
10
20
21
.02
20
.71
24
.20
35
.94
0.2
4
Dia
bet
es3
RP
RO
P105
33
105
20.4
022.8
322.5
037.5
03.2
2A
SD
25
32
52
0.3
82
2.8
62
2.0
53
4.3
83
.73
OF
R3
03
30
20
.39
22
.88
22
.03
33
.33
3.7
0
Gen
e1R
PR
OP
30
13
30
21
.48
25
.45
25
.01
39
.34
0.6
5A
SD
20
22
02
1.4
82
4.5
22
4.3
64
1.7
44
.38
OF
R1
52
15
21
.48
24
.63
24
.48
40
.48
3.8
2
Gen
e2R
PR
OP
30
16
30
21
.62
25
.15
24
.91
38
.59
0.3
2A
SD
15
21
52
1.6
22
4.3
42
4.5
04
0.9
83
.58
OF
R1
52
15
21
.62
24
.38
24
.49
42
.12
3.3
8
NEURAL NET TRAINING 1161
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
TA
BL
EII
I.3
(co
nti
nued
)
Pro
ble
mM
eth
od
Epo
chs
Rel
eva
nt
epo
chs
Co
nn
ecti
on
trave
rsals
Tra
inin
gse
ter
ror
Va
lid
ati
on
set
erro
rTes
tse
ter
ror
Tes
tse
tcl
ass
ific
ati
on
Gen
era
lity
loss
Gen
e3R
PR
OP
30
93
02
1.8
82
4.2
02
5.2
84
0.2
30
.77
AS
D1
53
15
21
.88
23
.91
24
.85
42
.62
1.8
9O
FR
15
31
52
1.8
82
3.9
02
4.7
54
2.6
21
.96
Gla
ss1
RP
RO
P1
40
18
14
08
.84
9.5
91
0.3
24
9.0
64
.87
AS
D1
60
45
16
08
.80
9.9
49
.75
47
.17
1.7
9O
FR
90
34
90
8.7
89
.73
9.6
34
3.4
03
.88
Gla
ss2
RP
RO
P2
01
42
08
.95
9.9
31
0.3
35
4.7
25
.45
AS
D2
57
25
8.6
21
0.5
11
0.3
65
4.7
25
.61
OF
R1
55
15
8.5
81
0.5
21
0.4
65
2.8
38
.71
Gla
ss3
RP
RO
P1
40
18
14
08
.71
9.2
41
1.0
26
0.3
83
.33
AS
D1
85
43
18
58
.67
9.4
01
0.9
55
6.6
01
.44
OF
R1
00
28
10
08
.61
9.3
61
0.9
55
6.6
03
.38
Hea
rt1
RP
RO
P1
35
39
13
51
1.1
81
3.1
51
4.3
62
2.1
71
.82
AS
D1
60
15
91
60
11
.21
13
.12
14
.13
20
.43
0.0
2O
FR
60
52
60
11
.18
13
.07
14
.06
20
.43
0.1
1
Hea
rt2
RP
RO
P1
60
13
71
60
11
.67
12
.18
13
.51
16
.52
0.1
7A
SD
22
02
20
22
01
1.6
61
2.2
21
3.6
01
6.5
20
.00
OF
R5
54
45
51
1.6
41
2.2
01
3.5
91
6.5
20
.33
Hea
rt3
RP
RO
P1
30
34
13
01
1.1
01
0.5
91
6.5
62
3.4
81
.42
AS
D1
90
13
51
90
11
.11
10
.61
16
.27
22
.61
0.1
2O
FR
65
52
65
11
.09
10
.53
16
.21
22
.17
0.7
0
Hea
rtc1
RP
RO
P1
25
17
12
51
0.1
69
.44
17
.45
22
.67
2.1
9A
SD
15
51
55
15
51
0.1
89
.60
16
.00
20
.00
0.0
0O
FR
75
24
75
10
.14
9.5
11
6.0
41
8.6
70
.74
Hea
rtc2
RP
RO
P3
01
53
01
1.8
71
5.8
06
.08
2.6
75
.31
AS
D1
70
27
17
01
1.2
31
6.5
26
.14
4.0
02
.99
OF
R9
52
69
51
1.2
31
6.3
96
.27
1.3
34
.01
Hea
rtc3
RP
RO
P2
01
12
01
0.6
61
3.2
41
2.6
41
6.0
01
2.7
2A
SD
20
52
01
0.5
51
3.0
61
2.0
81
6.0
05
.19
OF
R2
05
20
10
.32
13
.05
12
.06
16
.00
9.1
7
Ho
rse1
RP
RO
P2
51
42
51
1.4
31
5.1
81
2.9
12
7.4
75
.60
AS
D3
05
30
11
.29
15
.13
12
.70
26
.37
5.4
5O
FR
25
52
51
1.2
41
5.1
01
2.7
32
6.3
75
.45
1162 K. GOULIANAS et al.
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
Ho
rse2
RP
RO
P2
01
22
09
.36
15
.18
17
.48
35
.16
9.6
5A
SD
60
15
60
8.3
81
5.6
31
6.6
13
5.1
65
.14
OF
R3
01
33
08
.41
15
.65
16
.68
35
.16
6.4
5
Ho
rse3
RP
RO
P1
59
15
11
.01
14
.71
15
.33
30
.77
9.0
8A
SD
25
52
51
0.2
41
5.2
21
5.0
03
4.0
75
.88
OF
R2
07
20
10
.23
15
.25
15
.12
32
.97
6.3
7
Soy
bea
n1
RP
RO
P5
25
48
65
25
0.6
50
.98
1.1
68
.82
0.2
2A
SD
11
70
11
70
11
70
0.6
70
.96
1.1
59
.41
0.0
0O
FR
73
57
28
73
50
.67
0.9
61
.14
9.4
10
.01
Soy
bea
n2
RP
RO
P5
20
48
45
20
0.8
00
.81
1.0
54
.12
0.1
1A
SD
10
35
10
35
10
35
0.8
20
.82
1.0
74
.12
0.0
0O
FR
65
56
30
65
50
.82
0.8
11
.05
4.1
20
.02
Soy
bea
n3
RP
RO
P5
55
54
85
55
0.7
80
.96
1.0
37
.06
0.0
4A
SD
11
00
11
00
11
00
0.8
00
.96
1.0
26
.47
0.0
0O
FR
74
57
39
74
50
.79
0.9
41
.02
6.4
70
.00
Thy
roid
1R
PR
OP
29
02
87
29
03
.91
3.9
64
.08
6.5
60
.01
AS
D8
15
81
48
15
4.0
34
.17
4.2
26
.56
0.0
2O
FR
55
55
48
55
53
.88
3.8
73
.99
6.5
60
.03
Thy
roid
2R
PR
OP
31
02
97
31
04
.08
3.6
53
.82
6.3
80
.14
AS
D9
30
92
99
30
4.2
23
.81
3.9
96
.38
0.0
2O
FR
37
03
70
37
04
.10
3.6
73
.83
6.3
80
.00
Thy
roid
3R
PR
OP
34
03
37
34
04
.02
3.4
84
.19
7.2
20
.05
AS
D9
15
91
59
15
4.1
63
.64
4.3
27
.17
0.0
0O
FR
51
05
03
51
03
.98
3.4
64
.16
7.2
80
.07
NEURAL NET TRAINING 1163
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
GRAPHS 3.5, 3.6 Test set classification error for the best run of problems cancer1, card1.
TABLE III.4 Best Run Results for Approximation Problems.
Problem Method EpochsRelevantepochs
Connectiontraversals
Trainingset error
Validationset error
Testset error
Test setclassification
Generalityloss
Building1 RPROP 390 390 390 0.33 0.37 0.35 0.29 0.00ASD 470 470 470 0.34 0.37 0.35 0.29 0.00OFR 395 394 395 0.33 0.37 0.34 0.29 0.00
Building2 RPROP 360 357 360 0.33 0.37 0.35 0.29 0.06ASD 475 475 475 0.34 0.37 0.35 0.29 0.00OFR 480 475 480 0.33 0.37 0.34 0.29 0.00
Building3 RPROP 305 295 305 0.35 0.34 0.34 0.29 0.03ASD 525 525 525 0.35 0.34 0.34 0.29 0.00OFR 285 278 285 0.35 0.34 0.34 0.29 0.03
Flare1 RPROP 20 6 20 0.37 0.32 0.54 3.38 5.38ASD 5 1 5 0.47 0.39 0.65 5.26 0.00OFR 5 1 5 0.46 0.40 0.64 5.26 0.00
Flare2 RPROP 10 8 10 0.44 0.45 0.31 2.26 5.34ASD 5 1 5 0.57 0.54 0.30 3.01 0.00OFR 5 1 5 0.57 0.54 0.29 3.01 0.00
Flare3 RPROP 55 8 55 0.39 0.45 0.36 3.38 3.09ASD 5 1 5 0.54 0.55 0.40 3.76 0.00OFR 5 1 5 0.53 0.54 0.40 3.76 0.00
Hearta1 RPROP 40 10 40 3.94 4.26 4.60 12.17 8.67ASD 220 6 220 3.84 4.34 4.69 11.30 2.96OFR 75 6 75 3.82 4.34 4.68 11.30 3.67
Hearta2 RPROP 105 86 105 4.16 4.20 4.19 10.87 0.34ASD 175 175 175 4.18 4.25 4.18 10.87 0.00OFR 70 64 70 4.16 4.23 4.18 10.43 0.06
Hearta3 RPROP 95 57 95 4.06 4.07 4.56 11.30 1.06ASD 140 139 140 4.08 4.11 4.58 12.17 0.04OFR 50 48 50 4.06 4.06 4.52 11.30 0.02
Heartac1 RPROP 15 9 15 4.19 4.63 3.14 6.67 7.85ASD 125 125 125 4.05 4.67 2.66 5.33 0.00OFR 75 75 75 4.04 4.62 2.72 5.33 0.00
Heartac2 RPROP 10 7 10 3.74 4.49 4.28 6.67 11.97ASD 5 2 5 3.90 4.59 4.11 9.33 16.32OFR 5 2 5 3.89 4.64 4.09 9.33 15.91
Heartac3 RPROP 20 9 20 2.89 5.12 5.07 12.00 7.43ASD 5 2 5 3.23 5.00 5.43 16.00 7.04OFR 5 2 5 3.24 5.00 5.44 16.00 6.84
1164 K. GOULIANAS et al.
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
GRAPHS 3.7, 3.8 Test set classification error for the best run of problems diabetes1, gene1.
GRAPHS 3.9, 3.10 Test set classification error for the best run of problems glass1, heart1.
GRAPHS 3.11, 3.12 Test set classification error for the best run of problems heartac1, horse1.
NEURAL NET TRAINING 1165
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
4 CONCLUSIONS
In this paper we discussed a linear neural network design and implementation for solving
pattern classification and function approximation problems, taken by the Proben1 benchmark
collection. Batch-LMS neural network training rule is modified in order to lead to Adaptive
GRAPHS 3.13, 3.14 Test set classification error for the best run of problems soybean1, thyroid1.
GRAPHS 3.15, 3.16 Test set classification error for the best run of problems building1, flare1.
GRAPHS 3.17, 3.18 Test set classification error for the best run of problems hearta1, heartac1.
1166 K. GOULIANAS et al.
Dow
nloa
ded
By:
[HE
AL-
Link
Con
sorti
um] A
t: 22
:35
23 A
pril
2008
Steepest Descent (ASD) and Optimal Fletcher-Reeves (OFR) methods. These methods have
been shown to accelerate the convergence of the learning phase, and they do not require the
choice of critical parameters, like the learning rate or the momentum. When these methods
are applied to real-world benchmarking problems, they produce adequate solutions, although
they use a linear neural network architecture. They guarantee fast network convergence, gen-
erating a Least Square Solution, and they have good convergence properties. An extension of
the presented architecture could be used, introducing multilayer networks with sigmoidal hid-
den nodes. Such an architecture has been tested by Prechelt [8], who used one-hidden and
two-hidden layer networks, with various numbers of hidden nodes, and linear-RPROP train-
ing. The results of these architectures for some problems were worse than those obtained
using linear networks, and the tendency to overfit was much higher than for linear networks,
which suggests that introducing non-linearity is not an improvement.
References
[1] Amari, S., Murata, N., Muller, K. R., Finke, M. and Yang, H. (1995) Asymptotic statistical theory of overtrainingand cross-validation, METR 95-06; Department of Mathematical Engineering and Information Physics,University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113, Japan.
[2] Goulianas, K., Adamopoulos, M. and Margaritis, K. G. (1997) Structured artificial neural networks for fast batchLMS algorithms, Neural, Parallel and Scientific Computations 5(4), 549–562.
[3] Golub, G. H. and Van Loan, C. F. (1983) Matrix Computations (The Johns Hopkins University Press, Baltimore,MD).
[4] Hassoun, M. H. (1995) Fundamentals of Artificial Neural Networks (The MIT Press, Cambridge, MA).[5] Luo Zhi-Quan (1991) On the convergence of the LMS algorithm with adaptive learning rate for linear
feedforward networks, Neural Computation 3, 226–245.[6] Margaritis, K. G., Adamopoulos, M., Goulianas, K. and Evans, D. J. (1994) Artificial neural networks and
iterative linear algebra methods, Parallel Algorithms and Applications 3, 31–44.[7] Polycarpou, M. and Ioannou, P. (1992) Learning and convergence analysis of neural-type structured networks,
IEEE Transactions on Neural Networks 3(1), 39–50.[8] Prechelt Lutz (1994) Proben1 – A Set of Neural Network Benchmark Problems and Benchmarking Rules,
Technical Report, Universitat Karlsruhe 21/94.[9] Riedmiller, M. and Braun, H. (1993) A direct adaptive method for faster backpropagation learning: the RPROP
algorithm, Proceedings of the IEEE International Conference on Neural Networks, San Fransisco, CA, April1993, IEEE.
[10] Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986) Learning internal representation by error propagation.In: Rumelhart, D. E. and McClelland, J. (eds.), Parallel Distributed Processing I (MIT Press, Cambridge, MA).
[11] Rumelhart, D. E. and McClelland, J. (1986) Parallel Distributed Processing: Explorations in the Microstructureof Cognition (MIT Press, Cambridge, MA).
[12] Sarle, W. S. (1995) Stopped training and other remedies for overfitting, Proceedings of the 27th Symposium onthe Interface of Computing Science and Statistics, pp. 352–360.
[13] Stuart, G., Bienenstock, E. and Doursat, R. (1992) Neural networks and the bias/variance dilemma, NeuralComputation 4, 1–58.
[14] Wang, L. X. and Mendel, J. M. (1991) Three-dimensional structured networks for matrix equation solving, IEEETransactions on Computers 40(12), 1337–1346.
[15] Wang, L. X. and Mendel, J. M. (1992) Parallel structured networks for solving a wide variety of matrix algebraproblems, Journal of Parallel and Distributed Computing 14, 236–247.
[16] Wang, C., Venkatesh, S. S. and Judd, J. S. (1994) Optimal stopping and effective machine complexity in learning,NIPS6, 303–310.
[17] Widrow, B. and Hoff, M. E. Jr. (1960) Adaptive switching circuits, IRE Western Electric Show and ConventionRecord, Part 4, 96–104.
[18] Widrow, B. and Lehr, M. (1990) 30 years of adaptive neural networks: perceptron, madaline and back-propagation, Proceedings of the IEEE 78(9), 1415–1442.
NEURAL NET TRAINING 1167