The I.F.A.S.T. Model Allows the Prediction of Conversion to Alzheimer Disease in Patients with Mild...

15
Current Alzheimer Research, 2010, 7, 173-187 173 1567-2050/10 $55.00+.00 ©2010 Bentham Science Publishers Ltd. The I.F.A.S.T. Model Allows the Prediction of Conversion to Alzheimer Disease in Patients with Mild Cognitive Impairment with High Degree of Accuracy M. Buscema 1,* , E. Grossi 2 , M. Capriotti 1 , C. Babiloni 3,4 and P. Rossini 4,5 1 Semeion Research Centre of Sciences of Communication, Via Sersale, 117, 00128 Rome, Italy; 2 Bracco SpA Medical Department, Via E. Folli, 50, 20134 Milan, Italy; 3 Department of Human Physiology and Pharmacology, University of Rome La Sapienza, 00185 Rome, Italy; 4 Casa di Cura S. Raffaele, Cassino, Italy; 5 Department of Clinical Neuroscien- ces, University of Rome Campus Biomedico, 00155 Rome, Italy Abstract: This paper presents the results obtained with the innovative use of special types of artificial neural networks (ANNs) assembled in a novel methodology named IFAST (implicit function as squashing time) capable of compressing the temporal sequence of electroencephalographic (EEG) data into spatial invariants. The aim of this study is to test the potential of this parallel and nonlinear EEG analysis technique in providing an automatic classification of mild cognitive impairment (MCI) subjects who will convert to Alzheimer’s disease (AD) with a high degree of accuracy. Eyes-closed resting EEG data (10-20 electrode montage) were recorded in 143 amnesic MCI subjects. Based on 1-year follow up, the subjects were retrospectively classified to MCI converted to AD and MCI stable. The EEG tracks were suc- cessively filtered according to four different frequency ranges, in order to evaluate the hypotheses that a specific range, corresponding to specific brain wave type, could provide a better classification (0.12 Hz, 12.2 – 29.8 Hz; 30.2 – 40 Hz, and finally Notch Filter 48 – 50 Hz). The spatial content of the EEG voltage was extracted by IFAST step-wise procedure using ANNs. The data input for the classification operated by ANNs were not the EEG data, but the connections weights of a nonlinear auto-associative ANN trained to reproduce the recorded EEG tracks. These weights represented a good model of the peculiar spatial features of the EEG patterns at scalp surface. The classification based on these parameters was binary and performed by a supervised ANN. The best results distinguishing between MCI stable and MCI/AD reached to 85.98%.(012 Hz band). And confirmed the working hypothesis that a correct automatic classification can be obtained extracting spatial information content of the resting EEG voltage by ANNs and represent the basis for research aimed at integrating spatial and temporal information content of the EEG. These results suggest that this low-cost procedure can reliably distinguish eyes-closed resting EEG data in individual MCI subjects who will have different prognosis at 1-year follow up, and is promising for a large-scale periodic screening of large populations at amnesic MCI subjects at risk of AD. Keywords: Artificial neural networks, EEG processing, implicit function as squashing time, mild cognitive impairment, Alzheimer disease, auto contractive map. 1. INTRODUCTION In previous papers we have shown that with the innova- tive use of special types of artificial neural networks assem- bled in a novel methodology capable of compressing the temporal sequence of EEG data into spatial invariants is pos- sible to accurately classify and distinguish patients affected by Mild Cognitive Impairment (MCI) from patients affected by Alzheimer Disease [1-3]. The proposed methodology is straightforward: resting eyes-closed EEG data are recorded (10-20 electrode system; common average reference; 128-Hz frequency sampling) *Address correspondence to this author at the Semeion Research Centre of Sciences of Communication, Via Sersale, 117, 00128 Rome, Italy; E-mail: [email protected] from 19 channels for one minute. The spatial content of the EEG voltage is extracted by a step-wise procedure using advanced Artificial neural networks. The core of the proce- dure is that the ANNs do not classify individuals by directly using the EEG data as an input, rather, the data inputs for the classification were the weights of the connections within a non linear Auto-Associative ANN trained to generate the recorded EEG data. These connection weights represented an optimal model of the peculiar spatial features of the EEG patterns at scalp surface. The classification based on these weight is then performed by a supervised ANN. The best results distinguishing between AD and MCI with rigorous training-testing protocols were equal to 92.33%. The com- parative results obtained with the best method so far de- scribed in the literature, based on blind source separation and Wavelet pre-processing, are 80.43% (p<0.001). These results

Transcript of The I.F.A.S.T. Model Allows the Prediction of Conversion to Alzheimer Disease in Patients with Mild...

Current Alzheimer Research, 2010, 7, 173-187 173

1567-2050/10 $55.00+.00 ©2010 Bentham Science Publishers Ltd.

The I.F.A.S.T. Model Allows the Prediction of Conversion to Alzheimer Disease in Patients with Mild Cognitive Impairment with High Degree of Accuracy

M. Buscema1,*

, E. Grossi2, M. Capriotti

1, C. Babiloni

3,4 and P. Rossini

4,5

1Semeion Research Centre of Sciences of Communication, Via Sersale, 117, 00128 Rome, Italy;

2Bracco SpA Medical

Department, Via E. Folli, 50, 20134 Milan, Italy; 3Department of Human Physiology and Pharmacology, University of

Rome La Sapienza, 00185 Rome, Italy; 4Casa di Cura S. Raffaele, Cassino, Italy;

5Department of Clinical Neuroscien-

ces, University of Rome Campus Biomedico, 00155 Rome, Italy

Abstract: This paper presents the results obtained with the innovative use of special types of artificial neural networks

(ANNs) assembled in a novel methodology named IFAST (implicit function as squashing time) capable of compressing

the temporal sequence of electroencephalographic (EEG) data into spatial invariants. The aim of this study is to test the

potential of this parallel and nonlinear EEG analysis technique in providing an automatic classification of mild cognitive

impairment (MCI) subjects who will convert to Alzheimer’s disease (AD) with a high degree of accuracy.

Eyes-closed resting EEG data (10-20 electrode montage) were recorded in 143 amnesic MCI subjects. Based on 1-year

follow up, the subjects were retrospectively classified to MCI converted to AD and MCI stable. The EEG tracks were suc-

cessively filtered according to four different frequency ranges, in order to evaluate the hypotheses that a specific range,

corresponding to specific brain wave type, could provide a better classification (0.12 Hz, 12.2 – 29.8 Hz; 30.2 – 40 Hz,

and finally Notch Filter 48 – 50 Hz).

The spatial content of the EEG voltage was extracted by IFAST step-wise procedure using ANNs. The data input for the

classification operated by ANNs were not the EEG data, but the connections weights of a nonlinear auto-associative ANN

trained to reproduce the recorded EEG tracks. These weights represented a good model of the peculiar spatial features of

the EEG patterns at scalp surface. The classification based on these parameters was binary and performed by a supervised

ANN.

The best results distinguishing between MCI stable and MCI/AD reached to 85.98%.(012 Hz band). And confirmed the

working hypothesis that a correct automatic classification can be obtained extracting spatial information content of the

resting EEG voltage by ANNs and represent the basis for research aimed at integrating spatial and temporal information

content of the EEG.

These results suggest that this low-cost procedure can reliably distinguish eyes-closed resting EEG data in individual MCI

subjects who will have different prognosis at 1-year follow up, and is promising for a large-scale periodic screening of

large populations at amnesic MCI subjects at risk of AD.

Keywords: Artificial neural networks, EEG processing, implicit function as squashing time, mild cognitive impairment, Alzheimer disease, auto contractive map.

1. INTRODUCTION

In previous papers we have shown that with the innova-tive use of special types of artificial neural networks assem-bled in a novel methodology capable of compressing the temporal sequence of EEG data into spatial invariants is pos-sible to accurately classify and distinguish patients affected by Mild Cognitive Impairment (MCI) from patients affected by Alzheimer Disease [1-3].

The proposed methodology is straightforward: resting eyes-closed EEG data are recorded (10-20 electrode system; common average reference; 128-Hz frequency sampling)

*Address correspondence to this author at the Semeion Research Centre of

Sciences of Communication, Via Sersale, 117, 00128 Rome, Italy; E-mail: [email protected]

from 19 channels for one minute. The spatial content of the EEG voltage is extracted by a step-wise procedure using advanced Artificial neural networks. The core of the proce-dure is that the ANNs do not classify individuals by directly using the EEG data as an input, rather, the data inputs for the classification were the weights of the connections within a non linear Auto-Associative ANN trained to generate the recorded EEG data. These connection weights represented an optimal model of the peculiar spatial features of the EEG patterns at scalp surface. The classification based on these weight is then performed by a supervised ANN. The best results distinguishing between AD and MCI with rigorous training-testing protocols were equal to 92.33%. The com-parative results obtained with the best method so far de-scribed in the literature, based on blind source separation and Wavelet pre-processing, are 80.43% (p<0.001). These results

174 Current Alzheimer Research, 2010, Vol. 7, No. 2 Buscema et al.

confirmed the working hypothesis and represented the basis for research aimed at integrating spatial and temporal infor-mation content of the EEG.

Alzheimer Disease is a final step along a several-years track passing through a condition in which there is not yet a disease, but only a significant, but still minor compromise of only one cognitive territory (i.e. memory). This condition is nowadays recognizable through standardized neuropsy-chological tests and is named Mild Cognitive Impairment (MCI). MCI subjects are quite numerous and represent a group at very high-risk of developing dementia in the follow-ing 5 years (about 50 to 70% of cases, while the remaining to not proceed, but remain MCI or even return to normal). On the basis of obtained results our new hypothesis is that thanks to our methodology of EEG signal analysis it is pos-sible to make an early discrimination of the evolution of MCI subjects to Alzheimer disease in the following two years using just the EEG recording at the time of the initial identification of the MCI state. If this hypothesis would be confirmed, this approach could lead to a remarkable change in the scenario aiming to identify subjects at risk to develop dementia while they are still competent on the cognitive side. This would be of paramount importance while the present pharmacologic research is in the process to offer new dis-ease-modifying agents able to prevent the occurrence or pro-gression to Alzheimer Disease.

2. THE METHOD: IFAST (IMPLICIT FUNCTION AS

SQUASHING TIME)

The IFAST procedure has been reported in previous stud-ies [1-3]. This method aims to understand the implicit func-tion in a multivariate data series by compressing the tempo-ral sequence of data into spatial invariants. It is based on three general observations:

1. Every multivariate sequence of signals coming from the same natural source is a complex asynchronous dynamic system, highly nonlinear, in which each channel’s behav-iour is understandable only in relation to all the others.

2. The implicit function defining the above mentioned asyn-chronous process is the conversion of that same process into a complex hyper-surface, representing the interaction in time of all the channels’ behaviour. The parameters of the nonlinear function define a meta-pattern of interaction of all channels in time.

3. The 19 channels in the EEG represent a dynamic system characterised by asynchronous parallelism. The nonlinear implicit function that defines them as a whole represents a meta-pattern that translates into space (hyper-surface) the temporal interactions among all the channels.

The IFAST method aims to synthesise each patient’s 19-channel EEG track by the connection parameters of an auto-associated nonlinear artificial neural network (ANN), previ-ously trained about that same track’s data (in this kind of network, the input vector is the target for the output vector).

The core of IFAST is that the ANNs do not classify sub-jects by directly using the EEG data as an input, but the con-nection parameters themselves. These connection weights represent an optimal model of the peculiar spatial features of

the EEG patterns at the scalp surface for the final classifica-tion. The classification itself is based on these weights and is performed by standard supervised ANNs.

There can be several topologies and learning algorithms for auto-associated nonlinear ANNs: the selected ANN be of the auto associated type the transfer functions defining it be non linear and differentiable at any point. Furthermore, it is required that all the processing made on every patient be carried out with the same type of ANN, and that the initial randomly generated weights have to be the same in every learning trial. This means that, for every EEG, every ANN has to have the same starting point, even if that starting point is random.

The experimental design we present in this paper in-cludes two steps:

2.1. The Squashing Phase

It consists in squashing and compressing an EEG track in order to project, on the connections of a nonlinear auto asso-ciated ANN, the invariant patterns of that track. If we had an EEG track with 19 channels in 10-20 standard position, and a sampling frequency of 128 Hz for about 60s, the squashing phase may be represented as follows:

if

()iF = implicit function of the i-th EEG;

iX = matrix of the values of the i-th EEG;

,

*

j kiW = trained matrix of the connections of the i-th EEG (* =

objective of the squashing);

,0 j kW = random starting matrix, the same for all EEGs.

Then, in the case of a two layered auto-associated:

, ,

*

0( , , )j k j ki i i iX F X W W= with

,0 0j j

W = .

2.2. The Validation Protocol, which Includes Two Steps

2.2.1. A First ANNs Processing

The classification based on the weights of the squashing phase is performed by standard supervised ANNs. We em-ployed the so-called 5 2 cross-validation protocol (see Fig. 1) [4].

This is a robust protocol that allows one to evaluate the allocation of classification errors. In this procedure, the study sample is randomly divided ten times into two subsamples, always different but containing a similar distribution of cases and controls. The ANNs’ good or excellent ability to diag-nostically classify all patients in the sample from the results of the confusion matrices of these 10 independent experi-ments would indicate that the spatial invariants extracted and selected with our method truly relate to the functioning qual-ity of the brains examined through their EEG. At the same time LDA is used as first benchmark.

2.2.2. Noise Elimination and Optimization

We believe that not all spatial models contained in an EEG refer to the brain’s functioning quality. Other invariant

The I.F.A.S.T. Model Allows the Prediction of Conversion to Alzheimer Disease Current Alzheimer Research, 2010, Vol. 7, No. 2 175

patterns, relating to specific characteristics of that brain at that moment, could be present: anxiety level, recurring thoughts, background noise in that minute-long recording, etc. Our hypothesis is that the health and cerebral quality invariant are more significant than the others, so, it is rec-ommended to separate the functioning invariants and the cerebral quality invariants from the ones that are not needed for this task. This new phase includes the utilization of a new system, named Twist, based on a genetic algorithm, GenD, developed at Semeion Research Centre [5, 6]. The new com-pressed dataset was finally split into two halves, (training and test) using for the final binary classification. We will show that the new procedure has obtained better perform-ances.

Fig. (1). 5 x 2 validation protocol for the independent identification

of the spatial invariants of EEGs. (LDA: Linear Discriminant

Analysis).

3. THE SQUASHING PHASE. A NEW ARTIFICIAL

NEURAL NETWORK: AUTO CONTRACTIVE MAP

The phase of the IFAST is defined as ‘‘squashing”, since it consists in compressing an EEG track in order to project on the connections of an auto-associated ANN the invariant patterns of that track. In our previous works we have used four different types of auto-associated ANNs to run this search.

1. A backpropagation without a hidden unit layer and with-out connections on the main diagonal (for short, AutoBp) [7, 8].

2. A new recirculation network (for short: NRC) is an original variation [9] of a known ANN, already described in literature [10], not considered to be useful to the issue of auto-associating between variables.

3. An auto associative multilayer perceptron, used with an auto-associative purpose (encoding) thanks to its hidden units layer, that decomposes the input vector into main nonlinear components. The algorithm used to train the MLP was a typical backpropagation algorithm.

4. Elman’s hidden recurrent can be used for auto-associating purposes, again using the back-propagation algorithm (auto-associative hidden recurrent [11]).

This squashing procedure is performed by a new Artifi-cial Neural Network architecture, the Auto Contractive Map (AutoCM) [12, 13], which allows for basic improvements in both robustness of use in badly specified and/or computa-tionally demanding problems, and output usability and intel-ligibility. The statistically-oriented literature has developed a variety of methods with different power and usability, all of which, however, share a few basic problems, among which the most outstanding are: the nature of the a-priori assump-tions that have to be made on the data-generating process, the near impossibility to compute all the joint probabilities among the vast number of possible couples and n-tuples that are in principle necessary to reconstruct the underlying proc-ess’ probability law, and the difficulty of organizing the out-put in an easily grasped, ready-to-access format for the non-technical analyst. The consequence of the first two weak-nesses is the fact that when analyzing poorly understood problems characterized by heterogeneous sets of potentially relevant variables, traditional methods can become very un-reliable when not unusable. The consequence of the last one is that, also in the cases where traditional methods manage to provide a sensible output, its statement and implications can be so articulated to become practically un-useful or, even worse, easily misunderstood.

AutoCMs ‘spatialize’ the correlation among variables by constructing a suitable embedding space where a visually transparent and cognitively natural notion such as ‘closeness’ among variables reflects accurately their associations.

The AutoCM is characterized by a three-layer architec-ture: an Input layer, where the signal is captured from the environment, a Hidden layer, where the signal is modulated inside the AutoCM, and an Output layer, through which the AutoCM feeds back upon the environment on the basis of the stimuli previously received and processed (Fig. 2).

Each layer contains an equal number of N units, so that the whole AutoCM is made of 3N units. The connections between the Input and the Hidden layers are mono-dedicated, whereas the ones between the Hidden and the Output layers are fully saturated, i.e. at maximum gradient. Therefore, given N units, the total number of the connections, Nc, is given by Nc = N (N + 1).

Fig. (2). An example of an AutoCM with N = 4.

All of the connections of AutoCM may be initialized either by assigning a same, constant value to each, or by as-

176 Current Alzheimer Research, 2010, Vol. 7, No. 2 Buscema et al.

signing values at random. The best practice is to initialize all the connections with a same, positive value, close to zero.

The learning algorithm of AutoCM may be summarized in a sequence of four characteristic steps:

1. Signal Transfer from the Input into the Hidden layer;

2. Adaptation of the values of the connections between the Input and the Hidden layers;

3. Signal Transfer from the Hidden into the Output layer;

4. Adaptation of the value of the connections between the Hidden and the Output layers.

Notice that steps 2 and 3 may take place in parallel.

We write as m[s]

the units of the Input layer (sensors), scaled between 0 and 1; as m

[h] the units of the Hidden layer,

and as m[t ]

the units of the Output layer (system target). We moreover define v, the vector of mono-dedicated connec-tions; w, the matrix of the connections between the Hidden and the Output layers; and n, the discrete time that spans the evolution of the AutoCM weights, or, put another way, the number of cycles of processing, counting from zero and stepping up one unit at each completed round of computa-tion: Nn .

In order to specify the steps 1-4 that define the AutoCM algorithm, we have to define the corresponding signal for-ward-transfer equations and the learning equations, as fol-lows:

1. Signal transfer from the Input to the Hidden layer:

[ ] [ ] ( )

( )1

n

n

ih s

i i

vm m

C= (1)

where C is a positive real number not lower than 1, which we will refer to as the contraction parameter (see below for comments), and where the (n) subscript has been omitted from the notation of the input layer units, as these remain constant at every cycle of processing.

2. Adaptation of the connections )(niv through the variation

( )niv , which amounts to trapping the energy difference gen-erated according to equation (1):

vi( n )

= mi

sm

i( n )

h

( ) 1v

i( n )

C

(2)

v

i( n+1)

= vi( n )

+ vi( n )

(3)

3. Signal transfer from the Hidden to the Output layer:

( )

( ) ( )

,[ ]

1

1n

n n

Ni jh

i j

j

wNet m

C=

= (4)

mi( n )

[t ]= m

i( n )

[h] 1Net

i( n )

C (5)

4. Adaptation of the connections ( ), ni jw through the variation

( ), ni jw , which amounts, accordingly, to trapping the energy

difference as to equation (5):

wi, j

( n )

= mi( n )

[h] mi( n )

[t ]( ) 1w

i, j( n )

Cm

j( n )

[h] (6)

( 1) ( ) ( ), , ,n n ni j i j i jw w w+

= + (7)

Even a cursory comparison of (1) and (5) and (2-3), (6-7), respectively, clearly shows how both steps of the signal transfer process are guided by the same (contraction) princi-ple, and likewise for the two weight adaptation steps (for which we could speak of an energy entrapment principle).

Notice how the term ( )

[ ]

n

h

jm in (6) makes the change in the

connection ( ), ni jw proportional to the quantity of energy lib-

erated by node ( )

[ ]

n

h

im in favour of node ( )

[ ]

n

t

im . The whole learn-

ing process, which essentially consists of a progressive ad-

justment of the connections aimed at the global minimization

of energy, may be seen as a complex juxtaposition of phases

of acceleration and deceleration of velocities of the learning

signals (adaptations ( ), ni jw and

v

i( n )

) inside the ANN con-

nection matrix. To get a clearer understanding of this feature

of the AutoCM learning mechanics, begin by considering its

convergence condition:

limn

vi( n )

= C (8)

Indeed, when v

i( n )

= C , then ( )

0niv = (according to eq

2), and ( )

[ ] 0n

h

jm = (according to eq 1) and, conse-

quently,( ), 0ni jw = (as from eq 6): the AutoCM then con-

verges.

There are, moreover, four variables that play a key role in the learning mechanics of AutoCM. Specifically,

1.

i( n )

is the contraction factor of the first layer of AutoCM

weights:

( )

( )1

n

n

i

i

v

C=

As it is apparent from (1), the parameter C modulates the transmission of the Input signal into the Hidden layer by ‘squeezing’ it for given values of the connections; the actual extent of the squeeze is indeed controlled by the value of C, thereby explaining its interpretation as the contraction pa-rameter. Clearly, the choice of C and the initialization of the connection weights must be such that the contraction factor is a number always falling within the [0 , 1] range, and de-creasing at every processing cycle n, to become infinitesimal as n diverges.

2.

i, j( n )

is, analogously, the contraction factor of the second

layer of AutoCM weights which, once again given the ini-

tialization choice, falls strictly within the unit interval:

C

wn

n

ji

ji

)(

)(

,

, 1=

The I.F.A.S.T. Model Allows the Prediction of Conversion to Alzheimer Disease Current Alzheimer Research, 2010, Vol. 7, No. 2 177

As for the previous layer, the value of the contraction factor is modulated by the contraction parameter C.

3. )(ni is the difference between the Hidden and the Input

nodes:

][][

)()(

h

i

s

ii nnmm=

It is a real function of n, and it always takes positive, decreasing values in view of the contractive character of the signal transfer process.

4. )(ni is, likewise, the difference between the Output and

the Hidden nodes:

][][

)()()(

t

i

h

ii nnnmm=

It is, by the same token, a real function with positive val-ues, decreasing in n.

The second step to gain insight into the AutoCM learning

mechanics is to demonstrate how, during the AutoCM learn-

ing phase,

vi( n )

describes a parabola arc, always lying in

the positive orthant. To see this, re-write equation (2) as:

[ ] [ ]

[ ]

( ) ( )

( )

( ) ( )

1 1

1

n n

n

n n

i is s

i i i

i is

i

v vv m m

C C

v vm

C C

= =

=

(2a)

Remembering how )(ni was defined, we can write

)(1)(

ni

i

C

vn

= and then further re-write (2a) as a function of

the contraction factor )(ni :

[ ]

[ ]

( ) ( ) ( )

( ) ( )

(1 ) (1 (1 ))

(1 )

n n n

n n

s

i i i i

s

i i i

v m

m

= =

=

(2b)

Keeping in mind the definition of )(ni , and letting the

values of the input layer units decrease along the unit inter-

val, one can easily check that the )(niv parabola arc (2b)

meets the following condition:

)()()(0

nnn iii Cv << (2c)

Equation (2c) tells us that the adaptation of the connec-

tions between the Input and Hidden layers,

vi( n )

, will be

always smaller than the adaptation that v

i( n )

needs to reach

up C. Actually, )()()()()1( nnnnn iiiii vCCvvv +=+=

+

,

but from (2c) we know that 0)()( nn ii Cv , which

proves our claim. As a consequence, )(niv will never exceed

C. However, the convergence condition requires that

Cvni

n=

)(lim , which in turn implies the following:

( )

lim 0i nn

= , [ ] 0lim)(

=h

n nim , [ ] 0lim

)(

=t

n nim ,

[ ]s

in

mni

=)(

lim , and 0lim)(

=nin

(8)

In view of the contractive character of the signal transmis-sion process at both of the AutoCM’s layer levels, combin-ing equations (1) and (5) we can also write:

[ ]s

i

h

i

t

i mmmnn

][][

)()(; (1-5)

and in particular, we can reformulate them as follows:

)()(

][][

nn i

s

i

h

i mm = (1a)

and

=C

Netmm n

nn

i

i

s

i

t

i

)(

)()(1][][ (5a)

We are now in the position to clarify the relationship

between

vi( n )

and wi, j

( n )

. From equation (1-5), we can

stipulate:

)()(

][][

nn i

s

i

h

i mm = (1b)

where

i( n )

is a small positive real number;

and

)()()(

][][

nnn i

h

i

t

i mm = (5b)

where )(ni is another small positive real number. As already

remarked, such small positive numbers must become close to

0 as n diverges. We can also write

)()()()(

][][

nnn ii

s

i

t

i mm += .(5c)

At this point, we can easily reformulate equation (2) as:

vi( n )

= mi

sm

i( n )

h

( ) 1v

i( n )

C =

i( n )

i( n )

(2d)

And, likewise, equation (6) as:

[ ] [ ]( )

[ ]

[ ]

( )

( ) ( ) ( ) ( )

( ) ( ) ( )

,

,

,

1

n

n n n n

n n n

i jh t h

i j i i j

s

i i j j i

ww m m m

C

m

= =

=

(6b)

noting that

lim wi, j

( n )

= 0

0

(6e)

Plugging (6b) into (7) and remembering the definition of the contraction factor

( ), ni j yields:

178 Current Alzheimer Research, 2010, Vol. 7, No. 2 Buscema et al.

w

i, j( n+1)

= C 1i, j

( n )( ) +

i( n )

i( n )

mj

s

i( n )

(7a)

Finally, from (7a) and (8), we can conclude that:

)1(lim)()( ,, nn jiji

nCw = (7b)

In a nutshell, the learning mechanics of the AutoCM

boils down to the following. At the beginning of the training,

the Input and Hidden units will be very similar (see equation

(1)), and, consequently,

vi( n )

will be very small (see equa-

tion (2d)), while for the same reason )(ni (see its definition

above) at the beginning will be very big and ( ), ni jw bigger

than

vi( n )

(see equations (2d) and (6b)). During the train-

ing, while v

i( n )

rapidly increases as the processing cycles n

pile up, m

i( n )

[h] decreases, and so do accordingly

)(ni and

)(ni . Consequently, ( ), ni jw rolls along a downward slope,

whereas

vi( n )

slows down the pace of its increase. When

)(ni becomes close to zero, this means that m

i( n )

[h] is now

only slightly bigger than ( )

[ ]

n

t

im (see equation (5b)).

vi( n )

is

accordingly getting across the global maximum of the equa-

tion [ ]

)()()()1(

nnn ii

s

ii mv = , so once the critical point has

been hit,

vi( n )

will in turn begin its descent toward zero.

There are a few important peculiarities of Auto-CMs with respect to more familiar classes of ANNs, that need special attention and call for careful reflection:

• Auto-CMs are able to learn also when starting from ini-tializations where all connections are set at the same value, i.e., they do not suffer the problem of the sym-metric connections.

• During the training process, Auto-CMs always assign positive values to connections. In other words, Auto-CMs do not allow for inhibitory relations among nodes, but only for different strengths of excitatory connec-tions.

• Auto-CMs can learn also in difficult conditions, namely, when the connections of the main diagonal of the second layer connection matrix are removed. In the context of this kind of learning process, Auto- CMs seem to recon-struct the relationship occurring between each couple of variables. Consequently, from an experimental point of view, it seems that the ranking of its connections matrix translates into the ranking of the joint probability of oc-currence of each couple of variables.

• Once the learning process has occurred, any input vec-tor, belonging to the training set, will generate a null output vector. So, the energy minimization of the train-ing vectors is represented by a function trough which the

trained connections absorb completely the input training vectors. Thus, AutoCM seems to learn how to transform itself in a ‘dark body’.

• At the end of the training phase ( 0, =jiw ), all the

components of the weights vector v reach up the same

value:

( )lim

nin

v C= .(8)

The matrix w, then, represents the AutoCM knowledge about the whole dataset.

One can use the information embedded in the w matrix to compute in a natural way the joint probability of occurrence among variables:

;

1

,

,

,

=

=N

j

ji

ji

ji

w

wp (9)

1)( ,

][==

N

i

ji

s

j pmP (10)

The new matrix p can be read as the probability of transi-tion from any state variable to anyone else:

ji

s

j

t

i pmmP ,

][][ )( = .(11)

The software IFAST (developed in Borland C) [14] pro-duces the squashing phase through the training operated by this network; in the “MetaTask” section the user can define the whole procedure by selecting.

• the files that will be processed (in our case every com-plete EEG),

• the type of network,

• the sequence of the records for every file (generally ran-dom),

• the number of epochs of training,

• a training stop criterion (number of epochs or minimum RMSE),

• the number of hidden nodes of the auto associated net-work, which determines the length of the output vector of the file processed

• the number of matrices, depending on the type of the auto associated network selected,

• the learning coefficient and delta rate.

The ‘‘squashing’’ operations of IFAST has been carried out blindly, based only on the patients’ EEG track, without any indication of their clinical state.

In the following table we report all the parameters of the squashing procedure (Table 1)

The number of weights is 171 columns because we ex-tract the upper triangular part of the final output matrix of AutoCm, which is symmetric with null diagonal.

After this processing, we have an independent dataset, whose records were the squashing of the original EEG tracks

The I.F.A.S.T. Model Allows the Prediction of Conversion to Alzheimer Disease Current Alzheimer Research, 2010, Vol. 7, No. 2 179

and whose variables were the trained weights of the AutoCM.

Table 1. AutoCM Parameters Used During the Processing

AutoCm Parameters

Number of inputs 19

Number of outputs 19

Number of state units 0

Number of hidden units 0

Number of weights 171

Number of epochs 200

Learning coefficient 0.1

Projection coefficient Null

3.1. The Semantic Meaning of the Connections Matrix

Generated by AutoCM NN

The association matrix of the trained AutoCM is also the optimal matrix of forces that minimizes the field’s energy, with the assigned dataset as the set of constraints

1:

, , , ,; 0, 0,

:

the number of variables of the assigned dataset (the 19 channels sampled);

The number of records of the assigned dataset (the time of E

N N N Mq q q

i j k i j i k i j i k

i j k q

E f f f A A A A

where

N

M

= > >

=

=

,

,

,

,

, ,

EG track);

, , the value of the record(q) at the variable (i), or (j), or (k);

1.0 ;

1.0 ;

, The weight found by Auto CM between the variables (i) and (j), or (i) and

q q q

i j k

i j

i j

i k

i k

i j i k

f f f

wA

C

wA

C

A A

=

=

=

= (k).

For each subject the AutoCM matrix of connections specifies the global strength of association of each EEG channel with the others, along the EEG track section (1 min-ute). Consequently, each subject of the whole sample is rep-resented by a weighted network whose nodes are the 19 channels of the EEG. In each one of these networks the strength of association between the 19 channels is specific for any subject. So, the values of the weights of each net-works (each one for each subject) establishes how each part of the brain of that subject was synchronized with the others, during the EEG. The analysis of these findings could be very useful from clinical viewpoint. A specific analysis of these matrices is planned, but in this research we want to show only the information power of these weighted networks in order to predict the MCI conversion.

4. THE VALIDATION PROTOCOL

First of all, a new dataset called “Diagnostic DB” was created for easier understanding. The diagnostic gold stan-

1 This equation was found with the help of Roberto Benzi (Tor Vergata University,

Rome), in the context of a basic research project about the connections among physical

issues and formalisms and Auto Contractive Map ANNs.

dard has been established, for every patient, in a way that is completely independent of the clinical and instrumental ex-aminations (magnetic resonance imaging, etc.) carried out by a group of experts whose diagnosis has been also recon-firmed in time. The initial diagnosis was MCI for the whole group of examined subjects. The EEG signals examined in the present report were collected for all these subjects at T0, that is the time when diagnosis of MCI status was carried out. Thereafter, these subjects were followed-up with stan-dardized neuropsychological tests in the following two years. AT the end of the follow-up period this original group of subjects was divided into the following two classes, based on specific inclusion criteria:

(a) elderly patients with “cognitive decline” fitting criteria for Alzheimer Disease = MCI converted;

(b) elderly patients still in the MCI condition = MCI stable;

We rewrote the last generated dataset, adding to every Hms vector the diagnostic class that an objective clinical ex-amination had assigned to every patient. The Hms vectors represent the invariant traits s as defined by the squashing phase for every m-th subject EEG track, that is, the columns number of the connections matrix depending on the specific auto associated network used. Then the dataset is ready for the next steps.

4.1. The 5 2 Cross-Validation Protocol

At this point, we used normal, supervised feedforward ANNs to calculate the following classification function:

( , *)y H r=

where y is the diagnostic class of the patient {MCI Con-verted, MCI Stable}; a proper nonlinear function, simple or complex; H the ANN’s input vector, containing the in-variants that IFAST found, and r* is the weight ma-trix/matrices defining parameters for the function that must be approximated.

The experimental design consisted in 10 different and independent processing for the classification MCI Converted vs MCI Stable. Every experiment was conducted in a blind and independent manner in two directions: training with sub-sample A and blind testing with subsample B versus training with subsample B and blind testing with subsample A.

4.2. Noise Elimination and Optimization (Twist Phase)

The choice of following a different methodology was due to the will of improving the classification results and remov-ing causes of loss of information. In the former study, the dataset coming from the squashing phase was compressed by another auto associated ANN, in the attempt of eliminating the invariant pattern, codified from the previous ANN, relat-ing to specific characteristic of the brain (anxiety level, background level, etc.) which is not useful for the classifica-tion, leaving the most significant ones unaltered. Then the new compressed datasets were split into two halves, (training and test) using T&T evolutionary algorithm [15] for the final binary classification.

Rather in this work, the elimination of the noisiest fea-tures and the classification run parallel to each other.

180 Current Alzheimer Research, 2010, Vol. 7, No. 2 Buscema et al.

This new phase is called TWIST and includes the utiliza-tion of two systems, T&T and IS (Input Selection), both based on a genetic algorithm, GenD, developed at Semeion Research Centre [16, 17].

T&T systems are robust data resampling techniques able to arrange the source sample into subsamples, each one with a similar probability density function. In this way the data split into two or more subsamples in order to train, test, and validate the ANN models more effectively. The IS system is an evolutionary system for feature selection based on a wrapper approach. While the filter approach looks at the inner properties of a dataset providing a selection that is in-dependent of the classification algorithm to be used after-wards, in the wrapper approach various subsets of features are generated and evaluated using a specific classification model using its performances as a guidance to optimization of subsets.

The IS system reduces the amount of data while conserv-ing the largest amount of information available in the dataset. The combined action of these two systems allows us to solve two frequent problems in managing artificial neural networks:

(1) the size and quality of the training and testing sets,

(2) the large number of variables which, apparently, seem to provide the largest possible amount of information.

Some of the attributes may contain redundant informa-tion, which is included in other variables, or confused infor-mation (noise) or may not even contain any significant in-formation at all and be completely irrelevant. Genetic algo-rithms have been shown to be very effective as global search strategies when dealing with nonlinear and large problems.

4.2.1. T&T

The “training and testing” algorithm is based on a popu-lation of n ANNs managed by an evolutionary system. In its simplest form, this algorithm reproduces several distribution models of the complete dataset (one for every ANN of the population) in two subsets (d[tr], the training set, and d[ts], the testing set). During the learning process each ANN, ac-cording to its own data distribution model, is trained on the subsample d[tr] and blind-validated on the subsample d[ts].

The performance score reached by each ANN in the test-ing phase represents its “fitness” value (i.e., the individual probability of evolution). The genome of each “network in-dividual” thus codifies a data distribution model with an as-sociated validation strategy. The n data distribution models are combined according to their fitness criteria using an evo-lutionary algorithm. The selection of “network individuals” based on fitness determines the evolution of the population, that is, the progressive improvement of performance of each network until the optimal performance is reached, which is equivalent to the better division of the global dataset into subsets. The evolutionary algorithm mastering this process, named “genetic doping algorithm” (GenD for short), created at Semeion Research Centre, has similar characteristics to a genetic algorithm [18-22], but it is able to maintain an inner instability during the evolution, carrying out a natural in-crease of biodiversity and a continuous “evolution of the evolution” in the population.

The elaboration of T&T is articulated in two phases. In a preliminary phase, an evaluation of the parameters of the fitness function that will be used on the global dataset is per-formed. The configuration of a standard backpropagation network that most “suits” the available dataset is determined: the number of layers and hidden units, some possible gener-alizations of the standard learning law, the fitness values of the population’s individuals during evolution. The parame-ters thus determined define the configuration and the initiali-zation of all the individual networks of the population and will then stay fixed in the following computational phase. The accuracy of the ANN performance with the testing set will be the fitness of that individual (i.e., of that hypothesis of distribution into two halves of the whole dataset). In the computational phase, the system extracts from the global dataset the best training and testing sets. During this phase, the individual network of the population is running, accord-ing to the established configuration and the initialization parameters.

4.2.2. IS

Parallel to T&T runs “Input Selection”, an adaptive sys-tem, based on the same evolutionary algorithm GenD, con-sisting of a population of ANN, in which each one carries out a selection of the independent and relevant variables on the available database. The elaboration of IS, as for T&T, is developed in two phases. In the preliminary phase, a stan-dard backpropagation ANN is configured in order to avoid possible over fitting problems. In the computational phase, each individual network of the population, identified by the most relevant variables, is trained on the training set and tested on the testing set.

The evolution of the individual network of the population is based on the algorithm GenD. In the I.S. approach, the GenD genome is built by n binary values, where n is the car-dinality of the original input space. Every gene indicates if an input variable is to be used or not during the evaluation of the population fitness. Through the evolutionary algorithm GenD, the different “hypotheses” of variable selection, gen-erated by each ANN of the population, change over time, at each generation; this leads to the selection of the best combi-nation of input variables. As in the T&T systems, the genetic operators crossover and mutation are applied on the ANNs population; the rates of occurrence for both operators are self-determined by the system in an adaptive way at each generation. When the evolutionary algorithm no longer im-proves its performance, the process stops, and the best selec-tion of the input variables is employed on the testing subset. The software based on TWIST phase algorithm (developed in C-Builder, [23]), allows the configuration of the genetic algorithm GenD:

• the population (the number of individual networks),

• number of hidden nodes of the standard BP,

• number of epochs,

• the output function SoftMax,

• the cost function (classification rate in our case).

The generated outputs are the couple of files SetA and SetB (subsets of the initial db defined by the variables se-

The I.F.A.S.T. Model Allows the Prediction of Conversion to Alzheimer Disease Current Alzheimer Research, 2010, Vol. 7, No. 2 181

lected) that will be used in the second step of the validation protocol.

In Fig. (3) we show the whole validation protocol.

Fig. (3). The whole validation protocol.

5. EXPERIMENTAL SETTINGS

5.1. Subjects and Diagnostic Criteria

In this study we have analyzed data of 143 patients with amnesic Mild Cognitive Impairment (91 females; 53 males). Inclusion criteria for MCI aimed at selecting elderly persons with objective cognitive deficits, especially in the memory domain, who did not meet yet criteria for a diagnosis of de-mentia or AD [24, 25]: i) objective memory impairment on neuropsychological evaluation, as defined by performances 1.5 standard deviation below the mean value of age and edu-cation-matched controls for a test battery including Busckhe-Fuld and Memory Rey tests; ii) normal activities of daily living; and iii) clinical dementia rating score of 0.5. Exclu-sion criteria were: i) mild AD; ii) evidence of other concomi-tant dementia iii) evidence of concomitant extra pyramidal symptoms; iv) clinical and indirect evidence of depression as revealed by Geriatric Depression Scale scores 13; v) other psychiatric diseases, epilepsy, drug addiction, and vi) current or previous uncontrolled systemic diseases or recent trau-matic brain injuries. During a clinical follow-up of about 14 months 51 MCI subjects –from now designed MCI Con-verted- showed progression to a clinically evident AD (ac-cording with the NINCDS-ADRDA criteria;[26]). Instead,

92 MCI subjects remained stable within that period (MCI Stable).

In the Table 2 we report the main details of study popula-tion.

5.2. Data Pre-Processing

EEG data were recorded in the late morning in a fully awake, resting state (eyes-closed) from 19 electrodes posi-tioned according to the International 10–20 System (i.e. Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, O2; 0.3–70 Hz band pass; cephalic reference; see Fig. 4). To monitor eye movements, the horizontal and verti-cal electrooculogram (0.3–70 Hz band pass) was also col-lected. All data were digitized in continuous recording mode (5 min of EEG; 128–256 Hz sampling rate). In order to keep constant the level of vigilance, an experimenter controlled on-line the subject and the EEG traces and alerted the subject any time there were signs of behavioural and/or EEG drowsiness. Continuous EEG data were down-sampled to 128 Hz (EEG track).

Fig. (4). Electroencephalographic montage. EEG recordings were

performed (0.3–70 Hz band pass; cephalic reference) from 19 elec-

trodes positioned according to the International 10–20 System.

The EEG tracks were successively filtered according to four different frequency ranges, in order to evaluate the hy-potheses that a specific range, corresponding to specific brain wave type, could provide a better classification:

Table 2. Demographic and Neuropsychological Data of MCI Subjects

MCI Converted MCI Stable

N 51 92

MMSE 25.88 (±2.25 SD) [Range 20 -30] 26.75 (±2.36 SD) [Range 15-30]

Age 73.65 (±5.95 SD) [Range 63-85] 70.60 (±9.07 SD) [Range 28-87]

Gender: Female/Male 31/21 60/32

182 Current Alzheimer Research, 2010, Vol. 7, No. 2 Buscema et al.

- 0 - 12 Hz;

- 12.2 – 29.8 Hz;

- 30.2 – 40 Hz;

- Notch Filter 48 – 50 Hz

6. RESULTS

6.1 Global Results

Regarding the 5x2 CV protocol, the experimental design consisted in 10 different and independent processing for the classification MCI Stable versus MCI Converted. Every ex-periment was conducted in a blind and independent manner in two directions: training with subsample A and blind test-ing with subsample B versus training with subsample B and blind testing with subsample A.

Table 3 shows the mean results summary for the classifi-cation compared to the performances of the noise reduction protocol (TWIST phase), for the different frequency ranges, where

SE = sensibility,

SP = specificity,

VP+ = positive predictive value,

VP =negative predictive value,

LR+ = likelihood ratio for positive test results (benchmark value 2),

LR = likelihood ratio for negative test results (benchmark value 0.2),

AUC = area under ROC curve (average ROC curve calcu-lated by the threshold method).

The artificial neural network used was a Feed Forward Back propagation with two hidden layers (36 and 12 nodes for the 5x2CV protocol and one hidden layer with 12 nodes for the Twist phase).

The phase of noise reduction (TWIST) has produced bet-ter performances over all the frequency ranges. In the range

0-12 HZ this procedure has reduced the variables of input (from 171 to 56) and obtained the best results (accuracy of 85.98% with the high specificity of 91.74% with respect to the 5x2CV protocol results).

Fig. (5) shows the respective average Roc curves. These results might confirm the working hypotheses that the infor-mation codified by the IFAST procedure in the attempt to discriminate individual EEG data is contained in the range 0-12 HZ.

Fig. (5). Average ROC curves of the results of the 5x2CV protocol

and the noise reduction (band 0-12 Hz).

6.2. Individual Classification

Additional information might be obtained by the evalua-tion of the classification values obtained by the different types of ANNs used within the general IFAST procedure. In the present control analysis, we used the data filtered in the range 0-12 HZ, processed by Twist algorithm (noise reduc-tion) and a subset of 6 Feed Forward Back Propagation net-works which obtained best results with the dataset “A” as training dataset and the dataset B as testing dataset of the Cross Validation procedure. The dataset “B” include 42 sub-jects (14 MCI Converted and 28 MCI Stable. The following Table 4 shows the results.

Table 3. Classification Mean Results

Frequency Protocols SE SP Accuracy. VP VN FP FN VP+ VP- RV+ RV- AUC

5x2CV 61.29 79.57 70.43 15.6 36.6 9.4 9.9 65.74 79.27 3.75 0.48 0.67 0-12 HZ

Twist 80.21 91.74 85.98 19 42 4 6.5 83.66 89.58 10.1 0.22 0.85

5x2CV 72.63 76.3 74.47 18.5 35.1 10.9 7 64.1 83.74 3.39 0.35 0.74 12.2-29.8 HZ

Twist 76.43 81.36 78.9 19.5 36 10 6 70.86 85.19 5 0.29 0.74

5x2CV 72.54 74.78 73.66 18.4 34.4 11.6 7 62.24 83.47 3.09 0.36 0.75 30.2-40 HZ

Twist 68.57 83.7 76.13 17.5 38.5 7.5 8 69.71 82.86 4.39 0.38 0.72

5x2CV 74.17 65.43 69.8 18.9 30.1 15.9 6.6 55.21 82.31 2.34 0.39 0.66 0-48/50-60 HZ

Twist 81.37 85 83.19 20 39 7 5.5 74.31 88.36 5.5 0.21 0.82

The I.F.A.S.T. Model Allows the Prediction of Conversion to Alzheimer Disease Current Alzheimer Research, 2010, Vol. 7, No. 2 183

Each ANN gave a classification value from 0 to 1, where the values from 0 to 0.5 corresponded to the classification as MCI Stable and the values from 0.51 to 1 to the classifica-tion as MCI Converted.

The following (Fig. 6) shows the output results of the 6 neural networks, that is, how the networks have classified the subjects belonging to the dataset B.

The following Figs. (7, 8) show histograms of the outputs of the network on some subjects.

7. DISCUSSION

Mild Cognitive Impairment is a common condition de-fined as transitional state between normality and frank de-mentia of Alzheimer type for the majority of subjects. Clini-cally is characterized by subjective and objective memory loss beyond the expected for age and educational level, al-though a broad range of other cognitive impairments may appear. The daily living activities are generally preserved.

Approximately 15% of the patients convert to dementia every year, so about half of them is expected to become de-mented within 3 years. Since a non-trivial percentage of sub-ject do not convert to dementia, it is essential to find reliable predictors; this would help in starting as soon as possible pharmaceutical and rehabilitative treatment in those who will proceed toward frank dementia, as well as to avoid un-necessary stress in those cases never progressing toward de-mentia. Many investigators have tried to predict the conver-sion of MCI to Alzheimer disease using different ap-proaches, with controversial results.

Researchers in this area have approached the problem from multiple perspectives by attempting to develop predic-tive models based on neuropsychological testing, brain imag-ing with MRI, SPECT and PET, analytical feature extraction models for electroencephalograms (EEGs) and evoked po-tential, and the use of CSF biomarkers genotyping and pro-teomics.

Table 4. Results of the Subset of 6 Back Propagation Network on Dataset “B” after the Noise Reduction Phase

n SE SP A.M.Acc. W.M.Acc. VP VN FP FN AUC

1 92.86 89.29 91.07 90.48 13 25 3 1 0.921

2 92.86 92.86 92.86 92.86 13 26 2 1 0.957

3 100 85.71 92.86 90.48 14 24 4 0 0.932

4 100 82.14 91.07 88.1 14 23 5 0 0.926

5 92.86 89.29 91.07 90.48 13 25 3 1 0.935

6 92.86 85.71 89.29 88.1 13 24 4 1 0.888

Mean 95.24 87.5 91.37 90.08 13.3 24.5 3.5 0.7 0.92

Fig. (6). Box Plot with the results of the individual classification obtained by the 6 networks. The subject in red are MCI Converted, those in

green are MCI Stable

184 Current Alzheimer Research, 2010, Vol. 7, No. 2 Buscema et al.

Fig. (7). Examples of classification of MCI Converted subjects based on the outputs of the selected subset of networks.

Fig. (8). Examples of classification of MCI Stable subjects based on the outputs of the selected subset of networks.

Comprehensive neuropsychological tests alone generally do not allow high accuracy rates, rarely surpassing 70% sen-sitivity and specificity even when combined with the pres-ence of the APOE epsilon 4 allele (a genetically determined risk factor for dementia)[27-29].

CSF biochemical markers are being developed with en-couraging results. Beta-amyloid 42 protein is usually lower in converters than in people with stable cognitive status and tau protein is higher. The sensitivity is substantial but speci-ficity is so far low [30]. An epitome of tau protein (P231) looks more specific of Alzheimer's disease and therefore a promising biomarker. In the blood, high beta-amyloid pro-tein levels indicate risk of conversion but only a few studies have been published so far [31].

Hippocampal or entorhinal atrophy on MRI is one of the most used radiological markers of conversion but quantifica-tion of atrophy is not simple as it is subject to artefacts and anatomic variations. The results obtained so far are not satis-factory [32, 33].

Proton Magnetic Resonance Spectroscopy (MRS) and Positron Emission Tomography (PET) are emerging as the most promising predictive tools. The highest degree of accu-racy (>90%) has been achieved by means of PET plus either memory performance or APOE4 genotype [34]. However,

the samples of the published studies are mostly small, and PET instruments are expensive, complex technologies and not widely available.

Hirao et al. [35] described the use of regional cerebral blood flow SPECT in the prediction of rapid conversion to Alz-heimer's disease in 76 amnesic mild cognitive impairment patients. The logistic regression model revealed that reduced rCBF in the inferior parietal lobule, angular gyrus, and pre-cunei has high predictive value and discriminative ability.

Other studies have focused the brain electromagnetic rhythm biomarkers as principal source of information.

It is well known that delta, theta, and alpha rhythms are affected in dementia, delta and theta showing an increment in various dementias. A recent study [36] has mapped (LORETA) source differences of resting EEG rhythms be-tween mild AD and vascular dementia (VaD). There was a decline of central, parietal, temporal, and limbic alpha 1 (low alpha) sources specific for mild AD group with respect to Nold and VaD groups. Furthermore, occipital alpha 1 sources showed a strong decline in mild AD compared to VaD group. Finally, distributed theta sources were largely abnormal in VaD but not in mild AD group.

Study of non-linear dynamics (NDA) of EEG indicates that electromagnetic brain activity in patients affected by AD

The I.F.A.S.T. Model Allows the Prediction of Conversion to Alzheimer Disease Current Alzheimer Research, 2010, Vol. 7, No. 2 185

exhibits a decrease in complexity. EEG rhythms loose the usual modulation in complexity as observed comparing eyes-open vs eyes-close condition [37-39] and that might arise from neuronal death, deficiency in neurotransmission and/or loss of connectivity in local neuronal networks. Non linear dynamics also evaluate parameters which deal with the flexibility of information processing of the brain intended as the ability to reach different states of information processing starting from identical initial conditions [40, 41]. Also here there is a significant drop in AD with a decrement in flexibil-ity of information processing. In general, the decrease of EEG complexity in AD might be attributable to decreased nonlinear dynamics, possibly associated with cognitive de-cline.

Modern EEG can accurately index normal and abnormal brain aging to facilitate non-invasive analysis of cortico–cortical connectivity and neuronal synchronization of firing and coherence of rhythmic oscillations at various frequen-cies. In a recent review about the work done before 2007 [42] a perspective of these issues by assaying different neu-rophysiological methods and integrating the results with functional brain imaging findings has been provided. It is concluded that discrimination between physiological and pathological brain aging clearly emerges at the group level, with applications at the individual level also suggested. Inte-grated approaches utilizing neurophysiological techniques together with biological markers and structural and func-tional imaging are promising for large-scale, low-cost and non-invasive evaluation of at-risk populations.

In very recent years other works have been published on this topic.

Missonnier used endogenous event-related potentials (ERP) and brain rhythm synchronization during memory activation to predict cognitive decline in mild cognitive im-pairment [43]. The authors assessed P200 and N200 latencies as well as beta event-related synchronization (ERS) in 16 elderly controls (EC), 29 MCI cases and 10 patients with AD during the successful performance of a pure attentional de-tection task as compared with a highly working memory demanding two-back task. At 1 year follow-up, 16 MCI pa-tients showed progressive cognitive decline (PMCI) and 13 remained stable (SMCI). Univariate models showed that P200 and N200 latencies in the two-back task were signifi-cantly related to the SMCI/PMCI distinction with areas un-der the receiver operating characteristic curve of 0.93 and 0.78 respectively. The combination of EEG hallmarks was the stronger predictor of MCI deterioration with 90% of cor-rectly classified MCI cases.

In another study carried out in Greece, Papaliagkas at al. [44] have determined if changes in latencies and amplitudes of the major waves of Auditory Event-Related Potentials (AERP), correlate with memory status of patients with mild cognitive impairment (MCI) and conversion to Alzheimer's disease (AD). 54 MCI patients were re-examined after an average period of 14(+/- 5.2) months. During this time pe-riod 5 patients converted to AD. The establishment of a N200 latency cut-off value of 287 ms resulted in a sensitivity of 100% and a specificity of 91% in the prediction of MCI patients that converted to AD.

Prichep [45] has reported results from initial quantitative electroencephalography (QEEG) evaluations of 44 normal elderly subjects (with only subjective reports of memory loss), predicting future cognitive decline or conversion to dementia, with high prediction accuracy (approximately 95%). In this report, source localization algorithms were used to identify the mathematically most probable underly-ing generators of abnormal features of the scalp-recorded EEG from these patients with differential outcomes. Using this QEEG method, abnormalities in brain regions identified in studies of AD using MEG, MRI, and positron emission tomography (PET) imaging were found in the premorbid recordings of those subjects who go on to decline or convert to dementia.

Using logistic regression, an R2 of 0.93 (p < 0.001) was obtained between baseline QEEG features and probability of future decline, with an overall predictive accuracy of 90%.

The major criticism to all these studies from a methodo-logical point of view is the lack of blind classification with trained models on new cases whom the model has never seen: i.e. cross validation protocol.

The internal validation of the prediction accuracy is one of the most important problems in statistical analysis. In fact, due to the restriction of training procedures to just a part of the data set, generally on half, the potential loss of power to recognize hidden patterns emerges. In our study the issue of optimization of the training and testing procedure was ad-dressed with the use of the evolutionary algorithm training and testing, which ensured that the two halves of the data set contained the same amount of relevant information. Thus, the best division of the whole data set into a training and a testing set was reached after a finite number of generations. Finally ANNs are able to identify variables combinations that are likely to produce accurate predictions of MCI out-come for a single individual.

So the use of a rigorous training and testing protocol with blind classification of new cases never seen by the model is one of the major point of strength of our study.

The results obtained in this work open new avenues in the management of MCI patients, showing that the EEG in-formation is crucial in predicting the fate of these subjects. This evidence is at our knowledge the first appeared in the literature.

The possibility to perform individual classifications with artificial neural networks is another point of strength of our study.

A major unavoidable pitfall of the translation of group statistics onto an individual level is linked to the problem of the wide confidence interval of classifications. Within classi-cal statistical approaches the individual is assimilated into a subgroup of individuals who have, on average, a given prob-ability of an event.. For this reason, predictive models can dramatically fail when applied to the single individual. In other words, on single subject level the confidence interval would be wider than the mean accuracy rate at a group level. In addition to their increased power as modelling techniques, neural networks allow for the building up several independ-ent models which, have different predictive capacity in clas-sifying patients according to certain targets, due to slight

186 Current Alzheimer Research, 2010, Vol. 7, No. 2 Buscema et al.

differences in their architecture, topology and learning laws. This way it is possible to produce a set of neural networks with high training variability able to independently process a set of new patients and to predict their outcome. Therefore when a new patient has to be classified, thanks to this sort of consensus of independent judges acting simultaneously, a specific distribution of output values could be obtained with a resulting descriptive statistics.

According to the above reasoning it could be possible to establish a degree of confidence of a specific classification suitable for the individual patient. The examples with the output of six independent neural networks on the same sub-ject provide interesting insights to this new philosophy.

The main limitation of this study is the relatively small number of subjects. It is important to note however at this regard that artificial neural networks, at variance with the classical statistical tests, can deal complexity even with rela-tively small samples and to the subsequent unbalanced ratio between variables and records. In this connection, it is im-portant to note that adaptive learning algorithms of inference, based on the principle of a functional estimation as artificial neural networks, overcome the problem of dimensionality.

Anyway a multicenter study with a larger population is currently in progress to define clinical and biological charac-teristics predictive of a positive response to treatment.

Once the results obtained with mathematical models gen-erated by ANNs are confirmed in a larger setting, they could pave the way to implementing prognostic tools in AD ther-apy. Even physicians with no specific training could use these.

8. CONCLUSION

This study demonstrates in a relatively large population of MCI, followed-up for a reasonably long interval that the EEG signal –when evaluated via a system of ‘classification’ as the reported one- contains highly sensitive and specific information to properly identify at the very early stages those subjects who will convert to more severe cognitive decline, fitting with the definition of Dementia.

This is of paramount importance, because it is known that medical and rehabilitative treatments are more effica-cious when started earlier. Moreover, since several drugs with potentially disease-modifying properties are submitted to clinical trials, new horizons for the cure of Dementia are soon opening. This means that the national health systems will face the problem of screening large sample of elderly populations in order to identify as early as possible the ‘high risk’ subjects who will best benefit of an early treatment. Modern types of EEG analysis –as in the case of the present method of signal evaluation- are excellent candidates for being employed in this screening procedures, since the basic technique (machines for digital EEG recordings) is widely available, the procedures are non-invasive and can be easily repeated whenever doubtful and –not secondary- their execu-tion requires little financial effort.

REFERENCES

[1] Buscema M, Rossini P, Babiloni C, Grossi E. The IFAST model, a novel parallel nonlinear EEG analysis technique, distinguishes mild

cognitive impairment and Alzheimer’s disease patients with high

degree of accuracy. Artif Intelligence Med 40(2): 127-141 (2007). [2] Buscema M, Capriotti M, Bergami F, Babiloni C, Rossini P and

Grossi E. The implicit function as squashing time model. a novel parallel nonlinear eeg analysis technique distinguishing mild cogni-

tive impairment and Alzheimer's disease subjects with high degree of accuracy. Comput Intell Neurosci 2007: 15 (2007).

[3] Rossini PM, Buscema M, Capriotti M, Grossi E, Rodriguez G, Del Percio C, et al. Is it possible to automatically distinguish resting

EEG data of normal elderly vs. mild cognitive impairment subjects with high degree of accuracy? Clin Neurophysiol 119: 1534-1545

(2008). [4] Dietterich TG. Approximate statistical tests for comparing super-

vised classification learning algorithms. Neural Comput 10(7): 1895-1923 (1998).

[5] Buscema M. Genetic doping algorithm (GenD): theory and applica-tions. Expert Systems 21(2) pp. 63-79: (2004).

[6] Buscema M. TWIST Software, Semeion Software #32, Rome, Italy (2005).

[7] Rumelhart DE, Smolensky P, McClelland JL, Hinton GE. Sche-mata and sequential thought processes in PDP models. Parallel dis-

tributed processing: explorations in the microstructure of cognition. (McClelland JL, Rumelhart DE, Eds.) Cambridge, Mass, USA: The

MIT Press vol. 2: pp. 7-57 (1986). [8] Chauvin Y, Rumelhart DE. Eds. Backpropagation: Theory, Archi-

tectures, and Applications. Hillsdale, NJ, USA: Lawrence Erlbaum Associates (1995).

[9] Buscema M. Recirculation neural networks. In: Buscema M, Ed. Substance use and misuse. Special Issue on Artificial Neural Net-

works and Complex Social Systems; vol. 33(2): pp. 383-388 (1998).

[10] Hinton GE, McClelland JL. Learning representation by recircula-tion. In Proc IEEE Conf Neural Infor Process Syst (1988).

[11] Elman JL. Finding structure in time. Cogn Sci 14(2): 179-211 (1990).

[12] Buscema M, Grossi E, Snowdon D, Antuono P. Auto-contractive maps: an artificial adaptive system for data mining. an application

to Alzheimer disease. Curr Alzheimer Res 5: 481-498 (2008). [13] Buscema M, Grossi E. The semantic connectivity map: an adapting

self-organising knowledge discovery method in data bases. Experi-ence in gastro-oesophageal reflux disease. Int J Data Mining Bioin-

formatics 2(4): (2008). [14] Buscema M. I FAST Software, ver 7.0, Semeion Software #32,

Rome, Italy (2005-2007). [15] Buscema M, Grossi E, Intraligi M, Garbagna N, Andriulli A, Breda

M. An optimized experimental protocol based on neuro-evolutionary algorithms: application to the classification of dyspep-

tic patients and to the prediction of the effectiveness of their treat-ment. Artif Intell Med 34(3): 279-305 (2005).

[16] Buscema M. Input Search (I.S.), ver 2.0, Semeion Software #17, Rome, Italy (2002).

[17] Buscema M. T&T (Training & Testing) reverse, ver 2.0, Semeion, Software #16, Rome, Italy (2007).

[18] Davis L. Handbook of Genetic Algorithms. New York, NY, USA: Van Nostrand Reinhold (1991).

[19] Harp S, Samed T, Guha A. Designing application specific neural networks using the genetic algorithm. Touretzky D, Ed. San Mateo,

Calif, USA: Morgan Kaufman, vol. 2: pp. 447-454 (1990). [20] Mitchell M. An Introduction to Genetic Algorithms. Cambridge,

Mass, USA: The MIT Press, (1996). [21] Quagliarella D, Periaux J, Polani C, Winter G. Genetic Algorithms

and Evolution Strategies in Engineering and Computer Science. Chichester, UK, JohnWiley & Sons (1998).

[22] Rawling G. Foundations of Genetic Algorithms. San Mateo, Calif, USA: Morgan Kaufman (1991).

[23] Buscema M. TWIST Software, Semeion Software #32, Rome, Italy (2005).

[24] Petersen RC, Smith GE, Ivnik RJ, Tangalos EG, Schaid SN, Thi-bodeau SN, et al. Apolipoprotein E status as a predictor of the de-

velopment of Alzheimer’s disease in memory-impaired individuals. JAMA 273: 1274-1278 (1995).

[25] Petersen RC, Doody R, Kurz A, Mohs RC, Morris JC, Rabins PV, et al. Current concepts in mild cognitive impairment. Arch Neurol

58(12): 1985-1992 (2001). [26] McKhann G, Drachman D, Folstein M, Katzman R, Price D, Stad-

lan EM. Clinical diagnosis of Alzheimer’s disease: report of the

The I.F.A.S.T. Model Allows the Prediction of Conversion to Alzheimer Disease Current Alzheimer Research, 2010, Vol. 7, No. 2 187

NINCDS-ADRDA Work Group under the auspices of Department

of Health and Human Services Task Force on Alzheimer’s Disease. Neurology 34: 939-944 (1984).

[27] Fellows L, Bergman H, Wolfson C, Chertkow H. Can clinical data predict progression to dementia in amnestic mild cognitive impair-

ment? Can J Neurol Sci 35(3): 314-322 (2008). [28] Tabert MH, Manly JJ, Liu X, Pelton GH, Rosenblum S, Jacobs M,

et al. Neuropsychological prediction of conversion to Alzheimer disease in patients with mild cognitive impairment. Arch Gen Psy-

chiatry 63(8): 916-924 (2006). [29] Griffith HR, Netson KL, Harrell LE, Zamrini EY, Brockington JC,

Marson DC. Amnestic mild cognitive impairment: diagnostic out-comes and clinical prediction over a two-year time period. J Int

Neuropsychol Soc 12(2): 166-175 (2006). [30] Ewers M, Buerger K, Teipel SJ, Scheltens P, Schröder J,

Zinkowski RP, et al. Multicenter assessment of CSF-phosphorylated tau for the prediction of conversion of MCI. Neu-

rology 69(24): 2205-2212 (2007). [31] Parnetti L, Lanari A, Silvestrelli G, Saggese E, Reboldi P. Diagnos-

ing prodromal Alzheimer's disease: role of CSF biochemical mark-ers. Mech Ageing Dev 127(2): 129-32 (2006).

[32] Smith EE, Egorova S, Blacker D, Killiany RJ, Muzikansky A, Dickerson BC, et al. Magnetic resonance imaging white matter hy-

perintensities and brain volume in the prediction of mild cognitive impairment and dementia. Arch Neurol 65(1): 94-100 (2008).

[33] Devanand DP, Pradhaban G, Liu X, Khandji A, De Santi S, Segal S, et al. Hippocampal and entorhinal atrophy in mild cognitive im-

pairment: prediction of Alzheimer disease. Neurology 68(11): 828-836 (2007).

[34] Mosconi L, Perani D, Sorbi S, Herholz K, Nacmias B, Holthoff V, et al. MCI conversion to dementia and the APOE genotype: a pre-

diction study with FDG-PET. Neurology 63(12): 2332-2340 (2004).

[35] Hirao K, Ohnishi T, Hirata Y, Yamashita F, Mori T, Moriguchi Y, et al. The prediction of rapid conversion to Alzheimer's disease in

mild cognitive impairment using regional cerebral blood flow SPECT. The prediction of rapid conversion to Alzheimer's disease

in mild cognitive impairment using regional cerebral blood flow

SPECT. Neuroimage 28(4): 1014-1021 (2005). [36] Gianotti L, Künig G, Lehmann D, Faber P, Pascual-Marqui R,

Kochi K, et al. Correlation between disease severity and brain elec-tric LORETA tomography in Alzheimer’s disease. Clin Neuro-

physiol 118(1): 186-196 (2007). [37] Pritchard WS, Stam CJ. Nonlinearity in human resting, eyes-closed

EEG: an in-depth case study. Acta Neurobiol Exp (Wars) 60(1): 109-21 (2000).

[38] Besthorn C, Sattel H, Geiger-Kabisch C, Zerfass R, Förstl H. Pa-rameters of EEG dimensional complexity in Alzheimer's disease.

Electroencephal Clin Neurophysiol 95(2): 84-89 (1995). [39] Jeong J, Kim SY, Han SH. Non-linear dynamical analysis of the

EEG in Alzheimer's disease with optimal embedding dimension Electroencephal and Clin Neurophysiol 106(3): 220-228 (1998).

[40] Röschke J,Aldenhoff J. The dimensionality of human's electroen-cephalogram during sleep, Biological Cybernetics. Ber-

lin/Heidelberg: Springer vol. 64(4): pp. 307-313 (1991). [41] Fell J, Mann K, Röschke J, Gopinathan MS. Nonlinear analysis of

continuous ECG during sleep II. Dynamical measures. Biol Cybern 82(6): 485-91 (2000).

[42] Rossini PM, Rossi S, Babiloni C, Polich J. Clinical neurophysiol-ogy of aging brain: from normal aging to neurodegeneration. Prog

Neurobiol 83(6): 375-400 (2007). [43] Missonnier P, Deiber MP, Gold G, Herrmann FR, Millet P, Michon

A, et al. Working memory load-related electroencephalographic parameters can differentiate progressive from stable mild cognitive

impairment. Neuroscience 150(2): 346-356 (2007). [44] Papaliagkas V, Kimiskidis V, Tsolaki M, Anogianakis G. Useful-

ness of event-related potentials in the assessment of mild cognitive impairment. BMC Neurosci 5: 9:107 (2008).

[45] Prichep LS, John ER, Ferris SH, Rausch L, Fang Z, Cancro R, et al. Prediction of longitudinal cognitive decline in normal elderly

with subjective complaints using electrophysiological imaging. Neurobiol Aging 27(3): 471-481 (2006).

Received: March 19, 2009 Revised: October 13, 2009 Accepted: October 14, 2009