Post on 08-Apr-2023
Current Proteomics, 2007, 4, 53-66 53
1570-1646/07 $50.00+.00 ©2007 Bentham Science Publishers Ltd.
Multivariate Statistical Tools for the Evaluation of Proteomic 2D-maps: Recent Achievements and Applications
Emilio Marengo*, Elisa Robotti and Marco Bobba
Department of Environmental and Life Sciences, University of Eastern Piedmont, Via Bellini 25/G, 15100 Alessandria,
Italy
Abstract: Two dimensional polyacrylamide gel electrophoresis (2D-PAGE) maps represent an unavoidable tool in many
fields connected with proteome research, such as development of new diagnostic assays or new drugs. Unfortunately the
information contained in the maps is often so complex that its recognition and extraction usually requires complex statisti-
cal treatments. Statistics accompanies many phases of 2D-PAGE maps management - from the spot revelation to maps
matching, as well as the extraction and rationalisation of useful information. This review describes and reports the most
recent achievements in the field of statistical tools applied to proteome research by two-dimensional gel electrophoresis
(2D-GE). The first section is devoted to briefly describe the theoretical aspects of the multivariate methods mostly
adopted in this field such as Principal Component Analysis, Cluster Analysis, Classification methods, Artificial Neural
Networks. The most recent applications are then described explaining the analysis of spot volume datasets from standard
differential analysis as well as the direct analysis of 2D maps images. Applications are also reported about the use of mul-
tivariate tools in the analysis of DNA and RNA profiles.
Key Words: Principal component analysis, classification methods, linear discriminant analysis, soft-independent model of class analogy, image analysis, moment functions, fuzzy logic, spot volume data.
INTRODUCTION
Two dimensional gel-electrophoresis (2D-GE) has un-
dergone a rapid development in the last few years for the
separation and analysis of protein extracts in many fields of
proteomic research, e.g. clinical chemistry, botany, microbi-
ology, toxicology, food security and control. In spite of be-
ing a very powerful tool for protein analysis, 2D-GE is char-
acterised by low reproducibility, particularly due to the com-
plexity of the specimen and instrumental technique adopted
to obtain the final electrophoretic maps. The same limitation
also limits one-dimensional (1D) gel electrophoresis (Righ-
etti et al., 2001). The complexity of the sample covering a
wide range of properties, structures and molecular weights
contributes to the complexity of the final map. In addition,
the instrumental technique itself (from sample preparation to
the electrophoretic run) can further affect reproducibility of
2D-GE.
These limitations of 2D-GE made it mandatory to use the
dedicated software packages to analyse the information con-
tained in two-dimensional maps (2D-maps) allowing to take
into consideration in some way the intrinsic uncertainty of
the technique. Many software became available in the last
few years for the comparison of 2D-maps (PDQuest, Pro-
genesis, Melanie, Z3, Phoretix, Z4000, etc.) (Anderson et al.,
1981; Mahon et al., 2001; Rubinfeld et al., 2003). All com-
mercial solutions available present advantages and disadvan
*Address correspondence to this author at the Department of Environmental
and Life Sciences - University of Eastern Piedmont - Via Bellini 25/G -
15100 Alessandria, Italy; Tel: +39 0131 360272; Fax: +39 0131 360250;
E-mail: marengoe@tin.it
tages (Almeida et al., 2005; Campostrini et al., 2005; Molloy
et al., 2003; Moritz et al., 2003; Raman et al., 2002; Rosen-
gren et al., 2003; Voss et al., 2000; Wheelock et al., 2005)
but they are almost all based on a multi-step procedure per-
forming the analysis of sets of 2D-maps from the digitalised
images of the gels themselves, obtained by laser densitome-
try, phosphor imaging and via a CCD camera. The analysis
of digitalised images involves several steps (described here
with particular reference to the PDQuest system (Garrels et
al., 1979, 1984, 1989):
1) Scanning: it turns each gel image into pixel data and each
pixel is characterised by x-y coordinates indicating its
position on the 2D-image and a Z value corresponding to
its signal intensity (optical density value - OD).
2) Filtering images: a pre-processing step eliminating noise,
background effects, specks and imperfections.
3) Automated spot detection: a step identifying the spots
present on each gel independently. The operator has to
select: the faintest spot (to set the sensitivity and mini-
mum peak value); the smallest spot (to set the size scale
parameter); the largest spot (to set the maximum size of
the spots to be detected). A final smoothing is applied to
remove spots close to the background level. Spots are lo-
cated on the gel image (i.e. each spot is identified by x-y
coordinates indicating its position), substituted by ideal
Gaussian distributions and quantified by the sum of the
OD values within each Gaussian distribution.
4) Matching of protein profile: sets of 2D-gels can be edited
and matched to one another in a “match set”. Each spot is
matched to the same spot in all of the other gels of the set
54 Current Proteomics, 2007, Vol. 4, No. 1 Marengo et al.
under investigation. For this purpose, landmarks are
needed. Reference spots are used by PDQuest to align
and position match set members for matching. The iden-
tification of the landmarks sets some parameters account-
ing for distortions existing among the gels to be com-
pared.
5) Normalisation: it is applied to the maps to compensate
gel-to-gel variations due to sample preparation and load-
ing as well as staining and destaining procedures etc.
6) Differential analysis: it allows the comparison of differ-
ent sets of 2D-maps i.e. control and diseased samples.
Within each group of 2D-maps, a “sample group” is cre-
ated containing the average values of all the spots identi-
fied. The comparison of the groups is carried out on
“sample groups” to find differentially expressed proteins.
Usually, only spots showing a two-fold variation are ac-
cepted as significantly changed (100% variation). This
procedure allows to avoid differences due to the large
experimental error rather than actual systematic varia-
tions.
7) Statistical analysis: it is applied to identify the differen-
tially expressed proteins. Statistical analysis is usually
based on Student’s t-test (p<0.05).
Since the final result of the overall procedure appears to
depend on the accuracy of the software package adopted, the
choice of the most suitable analysis software is critical. Thus
the step of image analysis adds another source of uncertainty
to the final result.
Commercial software packages are certainly powerful
tools for 2D-maps analysis but they present two main disad-
vantages. The first one is related to human interference (in-
troduced mainly in steps 2 and 3) and the second one is re-
lated to the problem of replicas. The comparison of groups
of 2D-maps (i.e. control and diseased or control and drug
treated) is usually performed on the basis of the “sample
group” obtained for each class. In this way, single replicas
are not considered and the information about the reproduci-
bility of the maps is not taken into proper consideration.
In summary, the large number of spots present on each
map and the low reproducibility of 2D gel-electrophoresis
makes it worse to achieve a clear classification of samples
and difficult to use 2D-PAGE maps for diagnostic/pro-
gnostic purposes or for drug-design studies. Mainly for these
reasons, many papers have recently appeared in literature
making use of robust multivariate statistical tools for the
evaluation of sets of 2D-maps. The multivariate methods
developed can be applied both to spot volume datasets com-
ing from the differential analysis carried out by classical
software packages or to the direct analysis of 2D-PAGE im-
ages.
This review reports the more recent applications of mul-
tivariate tools to the analysis of spot volume datasets (or to
profiles of DNA and RNA fragments) as well as to the
analysis of 2D-maps images. First, a section is devoted to
give insight on the theoretical aspects of the most wide-
spread multivariate tools applied in proteomics. Later, a sec-
tion is devoted to some applications of artificial neural net-
works.
THEORY
Principal Component Analysis
PCA (Massart et al., 1988; Vandeginste et al., 1998) is a
multivariate pattern recognition method representing the
objects, described by the original variables, into a new refer-
ence system given by new variables called Principal Compo-
nents (PCs). Each PC is calculated so that it explains the
maximum possible amount of residual variance contained in
the original dataset. The PCs are calculated hierarchically.
The first one explains the maximum variance and the second
one carries the maximum residual variance and so on. In this
way, experimental noise and random variations are collected
in the last PCs (this is true if experimental noise represents a
minor contribution with respect to systematic variations). In
addition, PCs show other important features: they are related
to the original reference system since they represent a linear
combination of the original variables; they are orthogonal to
each other thus containing independent sources of informa-
tion; their hierarchical structure makes a dimensionality re-
duction of the original dataset possible considering only few
PCs accounting for the most significant amount of variance.
The results of PCA provide two main tools for data
analysis: the scores (the co-ordinates of the samples in the
new reference system) and the loadings (the weights of the
original variables on each PC). The analysis of score and
loading plots (score and loadings represented on the space
given by two PCs at a time) allows to reach two main tar-
gets: a) the identification of groups of samples (score plot)
showing a similar or opposite behaviour (samples grouped
together or in opposite positions with respect to the origin of
the axes); b) the identification of the reasons (loading plot)
for the similarities and diversities identified within the sam-
ples. Therefore, PCA is a very powerful visualisation tool
which allows the representation of multivariate datasets by
means of only few PCs identified as the most relevant.
Cluster Analysis
Cluster analysis techniques are unsupervised pattern rec-
ognition methods that allow to identify the existence of
groups of samples or variables in a dataset through the inves-
tigation of the relationships between the objects or the vari-
ables. The most used clustering methods belong to the ag-
glomerative hierarchical methods (Massart et al., 1983,
1988; Vandeginste et al., 1998) where the objects are
grouped (linked together) on the basis of a measure of their
similarity. The most similar objects or groups of objects are
linked first. The final result is a graph (dendrogram) where
the objects are represented on the X axis and are connected
at decreasing levels of similarity along the Y axis.
The results of hierarchical clustering methods depend on
the specific measure of similarity and on the linking method
adopted so, different methods are usually applied to have a
general idea of the number of groups present. Clustering
techniques can be applied both to the original variables and
Multivariate Tools in 2D-maps Evaluation Current Proteomics, 2007, Vol. 4, No. 1 55
to the results of PCA (scores of the significant PCs) thus
achieving a clustering of the samples eliminating the contri-
bution of experimental error and exploiting only useful
sources of variation.
Classification Methods
Several multivariate supervised classification methods
are available in literature; linear discriminant analysis
(LDA), soft-independent model of class analogy (SIMCA)
and partial least squares discriminant analysis (PLS-DA) will
be briefly described below since they have recently been
exploited for classification purposes in proteomic datasets.
Linear Discriminant Analysis
LDA (Eisenbeis et al., 1972; Klecka et al., 1980) is a
Bayesian classification method performing the classification
of the samples present in a dataset considering its multivari-
ate structure.
In Bayesian classification methods, an object x is as-
signed to the class g for which the posterior probability
P(g/x) is maximum:
=
k
k
g
xkfP
xgfPxgP
)/(
)/()/(
where Pg is the prior probability of class g; Pk is the prior
probability of class k (k g); )/( xgf is the probability
density function of class g; )/( xkf is the probability den-
sity function of class k.
Each class is usually described by a Gaussian multivari-
ate probability distribution:
))()(2/1(
2/12/
1
||)2()|( gig
Tgi cxScx
g
p
ge
S
Pxgf =
where Pg is the prior probability of class g; Sg is the covari-
ance matrix of class g; cg is the centroid of class g; p is the
number of descriptors.
The argument of the exponential function is the Mahala-
nobis distance between object x and the centroid of the class
g and it takes into consideration the class covariance struc-
ture (i.e. its shape) since it contains the covariance matrix.
Each object is classified in class g if the so-called discrimi-
nant score is minimum:
D(g | x) = (xi cg )T Sg
1 (xi cg ) + ln | Sg | 2 lnPg
In LDA, the covariance matrix of each class is approxi-
mated with the pooled (between the classes) covariance ma-
trix and all the classes are considered as having a common
shape (i.e. a weighted average of the shape of the classes
present).
The variables contained in the LDA model discriminating
the classes present in the dataset can be chosen by a stepwise
algorithm, selecting iteratively the most discriminating vari-
ables. As already mentioned for Cluster Analysis, LDA can
be performed on both the original variables or on principal
components.
Soft-Independent Model of Class- Analogy
SIMCA classification method (Wold, 1976) is based on
PCA. Each class is described by its relevant PCs. The sam-
ples of each class are contained in the so-called SIMCA
boxes defined by the relevant PCs of each class. Describing
each class by its relevant PCs corresponds to classify the
samples not taking into account the experimental uncertainty
and spurious information. This method is also useful when
small datasets are analysed (more variables than objects)
since it performs a substantial dimensionality reduction.
SIMCA classification starts with a previous PCA calcu-
lated on each class independently with the identification of
the relevant PCs for each class: they define the so-called
class model.
The classification rule of object i is based on a Fisher’s
F-test so that object i is classified in class g if:
2
2
g
ig
rsd
rsd< ))1)((2,1,( == gggg AnApApcriticF
where rsdig is the residual standard deviation of object i on
class g; rsdg is the residual standard deviation of class g;
Fcritic is the critical value of F defining the SIMCA box; is
the significance level (usually set at 0.05, corresponding to a
probability level of 95%); 1, 2 are the degrees of freedom
of the numerator and denominator of the F-test respectively.
SIMCA gives some important statistics useful for a deep
analysis of the classification performed. The Modelling
Power (MP) of each variable on each class model is a meas-
ure of the weight that each variable presents on each class
model, i.e. the ability of the variable of describing and char-
acterising the corresponding class, defined as:
vc
vc
vcsd
rsdMP = 1
where sdvc is the standard deviation of variable v on class c;
rsdvc is the residual standard deviation of variable v of the
objects of class c from the model of their own class.
The modelling power ranges from 0 (variable irrelevant
on the definition of the class model) to 1.
The Discrimination Power (DP) is instead a measure of
the ability of each variable to discriminate between two
classes (c and g) at a time. The greater the discrimination
power, the more a variable weights on the classification of
an object in class c or g. It is defined as:
vgvc
vgcvcg
vcrsdrsd
rsdrsdDP
22
22
+
+=
where rsd2
vcg is the square residual standard deviation of
variable v of the objects of class c from the model of class g;
( ,v1=p–Ag,v2=(p–Ag)(ng –Ag –1))
56 Current Proteomics, 2007, Vol. 4, No. 1 Marengo et al.
rsd2
vgc is the square residual standard deviation of variable v
of the objects of class g from the model of class c; rsd2
vc is
the square residual standard deviation of variable v of the
objects of class c from the model of their own class; rsd2
vg is
the square residual standard deviation of variable v of the
objects of class g from the model of their own class.
The discrimination power is defined positive, but can
assume positive or negative values (it is not limited e.g. from
0 to 1).
Partial Least Squares – Discriminant Analysis (PLS-DA)
Partial Least Squares (PLS) (De Noord et al., 1994; Kle-
inbaum et al., 1988; Martens et al., 1989) is a multivariate
regression method allowing to establish a relationship be-
tween one or more dependent variables (Y) and a group of
descriptors (X). X- and Y-variables are modelled simultane-
ously to find the latent variables (LVs) in X that will predict
the latent variables in Y and at the same time account for the
largest possible information present in X. So, in this case the
latent variables are selected on the basis of explaining con-
temporarily both descriptors and predictors. These latent
variables are similar to the principal components calculated
from PCA - the first one accounts for the largest amount of
information followed by the other components that account
for the maximum residual variance. As for PCs, the last LVs
are mostly responsible for random variations and experimen-
tal error. The optimal number of LVs, i.e. modelling infor-
mation in X useful to predict the response Y but avoiding
overfitting, is determined on the basis of the residual vari-
ance in prediction. Cross-validation techniques are adopted
for evaluating the predictive ability and select the optimal
number of latent variables.
PLS was contrived to model continuous responses but it
can be applied even for classification purposes by establish-
ing an appropriate Y related to the belonging of each sample
to a class. In this case it is called Partial Least Squares – Dis-
criminant Analysis (PLS-DA). In the case of proteomic data,
one response variable for each group of samples is usually
adopted. Each response variable is assigned a 1 value for the
samples belonging to the corresponding class and a 0 value
for the samples belonging to the other classes.
Artificial Neural Networks
Artificial Neural Networks (ANNs) (Zupan and Gastei-
ger, 1993) are mathematical algorithms that allow to solve
complex problems by simulating the human brain function-
ing. Back-Propagation Artificial Neural Networks (BP-
ANNs) are mainly dedicated to model the behaviour of com-
plex systems where they usually provide better results than
Ordinary Least Squares (OLS) especially when non-linear
relationships are present. The main problem connected with
their application is due to the big risk of overfitting which
must be handled with particular care.
A back-propagation network consists of:
- an input layer, where each neuron is associated to an ex-
perimental variable;
- one or more layers of processing neurons, the so-called
hidden layers;
- an output layer, where each neuron is associated to a re-
sponse.
Fig. (1). General ANN architecture.
The signal moves from the input layer towards the output
layer (Fig. 1). In this process each neuron uploads all the
neurons of the successive layers, transferring a portion of the
value (signal) it has accumulated. The portion of signal that
is transferred is regulated by a transfer function, usually hav-
ing a sigmoid shape. For central values of the signal, the
portion transferred is approximately proportional to the sig-
nal itself and at the extreme values of the signal, the portion
transferred is either null or close to one. In every neuron of
the hidden layers and of the output layer, the signals coming
from every neuron of the previous layer are accumulated
applying a multiplying weight. These weights are optimized
during the network training by the back-propagation algo-
rithm (Wythoff et al., 1993; Walczak et al., 1996; Goh et al.,
1995; Zhang et al., 2002a, 2002b) which allows the determi-
nation of the weights associated to each couple of connected
neurons providing a correct output when a certain input vec-
tor is entered. In this process, every experiment of the train-
ing set is presented in turn to the network and the weights are
corrected to decrease the error committed by the network in
estimating the corresponding responses. In each cycle which
constitutes a learning epoch, all experiments are presented
once to the network; the iterations of the learning epochs are
repeated until the network produces satisfactory results.
The number of hidden layers and of neurons in each hid-
den layer and the geometry of the network (i.e. the connec-
tion of the neurons of different layers) must be selected in
order to achieve a satisfactory fitting ability associated at the
same time to a satisfactory predictive ability. By increasing
the number of hidden layers and/or neurons in the hidden
layers, it is possible to obtain very flexible ANNs with in-
credible modeling ability, but this may cause the network to
learn the data by heart with no generalization of the rules
which determines the system behaviour and functioning.
So it is very important to check the predictive ability of the
artificial neural networks by cross-validation techniques i.e.
by partitioning the available dataset into training set and test
set.
Multivariate Tools in 2D-maps Evaluation Current Proteomics, 2007, Vol. 4, No. 1 57
The optimal learning rate (which determines the speed
at which the weights change) and momentum μ (which takes
into consideration the correction made on the previous cycle
in order to prevent damping oscillations around the network
optimum) have to be optimized (usually by a trial and error
procedure).
SPOT VOLUME DATASETS
Spot volume datasets generated by the differential analy-
sis via dedicated software can be effectively analysed
through multivariate statistical tools due to their large di-
mensionality (a large number of spots identified on each
map) and the intrinsic difficulty of identifying small differ-
ences existing between groups of maps when a large number
of spots are contemporarily detected on each sample.
Multivariate methods are therefore effective tools due to
their ability in clearly representing the multivariate structure
of the dataset achieving in the meantime the elimination of
the contribution of the experimental error. Several methods
have been recently applied in proteomics to spot volume
datasets such as pattern recognition methods (PCA, Cluster
Analysis), classification methods (LDA, SIMCA, PLS-DA)
and artificial neural networks (ANNs).
PCA represents the first and most exploited tool in pro-
teomic datasets analysis and it can be considered nowadays a
quite classical approach. The first applications reported in
literature are from the mid eighties (Anderson et al., 1984;
Tarroux et al., 1987).
However, the application of PLS-DA is more recent. One
of the first applications (Jessen et al., 2002) demonstrated
how information can be extracted from 2-DE data by dis-
crimination PLS with variable selection. In this case, two
examples were compared belonging to time course of post
mortem proteome changes in muscle tissues of pigs. PLS
was applied relating the volumes of the detected spots to a
binary variable indicating each animal or the sampling time
(increasing time after the animal death). PLS proved to be
successful in the identification of the spots characterised by a
systematic variation. A variable selection (Jack-knife) proce-
dure was also adopted to identify only the spots with actual
relevant variations.
From this starting applications, many papers appeared
reporting the use of PCA, classification tools and PLS-DA to
the analysis of proteomic datasets. Since the applications
involve several fields in proteomics, the most interesting
papers appeared in literature are presented hereafter accord-
ing to the different application fields such as clinical pro-
teomics, botany and food safety, microbiology and toxicol-
ogy. The last paragraph describes the application of artificial
neural networks to spot volume data.
Clinical Proteomics
Clinical Proteomics represents one of the most important
fields in proteomic research and many applications of multi-
variate tools are reported in this area (Drew et al., 2006;
Gottfries et al., 2004; Karp et al., 2005; Iwadate et al., 2004;
Verhoeckx et al., 2004a, 2004b; Marengo et al., 2004a, c,
2006; Fujii et al., 2005).
One of the most recent study is by Drew et al. (2006),
reporting the effect of salicylate on the oxidative stress in the
rat colon. Salicylic acid, a dietary plant-based phenolic com-
pound and also the main metabolite of aspirin, decreases
oxidative stress and contributes to the colon protective ef-
fects of plant-based diets. In this study a rat was supple-
mented with salicylic acid (1 mg/kg diet) and 2D-PAGE was
carried out from soluble colon protein extracts. PCA on a
subset of 55 spots (out of a total of 124 spots), identified as
relevant by a first differential analysis based on t-tests and
ANOVA, showed the effective clusterisation of the samples
according to the spots identified. PLS analysis was then ap-
plied to search for relationships between the protein expres-
sion and the dietary treatment and biochemical data.
Another interesting study was reported by Gottfries et al.
(2004) who applied PCA and PLS-DA to two datasets - the
first is represented by samples of cerebrospinal fluid from
control and diseased individuals (12 control, 15 Alzheimer’s
disease, 15 Fronto-temporal dementia and 10 Parkinson’s
disease), giving a final dataset of 52 samples described by 96
spots identified; the second is represented by liver samples
from normal and obese mice (6 groups of samples of 4 to 8
animals each), giving a dataset of 30 samples described by
603 spots identified. In both cases, the first three PCs were
able to clearly separate the groups of samples present. The
first latent variable computed by PLS-DA allowed the identi-
fication of the spots responsible for the differences existing
between each pair of groups.
PLS-DA was applied also by Karp et al. (2005) to dem-
onstrate its ability in identifying the differences existing in
three proteomic datasets. The first dataset used brain samples
from control individuals and patients of schizophrenia: ten
gels were run for each group of samples and a total of 1505
protein spots were detected on the 20 maps obtained. The
second dataset was obtained from mouse liver samples for a
circadian time course study. Three separate time-points
within a circadian cycle were chosen and three samples were
used to describe each of these time points and a total of 1100
protein spots were detected in this case. A final dataset was
inserted in the study for which no difference was expected
between the two groups of samples. Twelve samples were
obtained from Erwinia carotovora soluble proteins (1057
spots identified) and divided in two groups. PLS-DA proved
to be successful in the identification of the differences be-
tween the groups of each dataset while no separation of the
samples in groups was possible for the last dataset for which
no difference was expected.
Discriminant analysis was applied by Iwadate et al.
(2004) for the classification of human gliomas. 85 tissue
samples (52 glioblastoma multiforme, 13 anaplastic astrocy-
tomas, 10 atrocytomas, 10 normal brain tissues) were com-
pared on the basis of their proteomic pattern. Cluster analysis
was able to distinguish control samples from glioma tissues.
Discriminant analysis extracted a set of 37 proteins differen-
tially expressed based on histological grading.
58 Current Proteomics, 2007, Vol. 4, No. 1 Marengo et al.
PCA was applied to identify the differences due to
macrophage maturation in the U937 human lymphoma cell
line (Verhoeckx et al., 2004a). PCA was able to identify the
variations between samples belonging to different macro-
phage maturation times; usual t-tests identified a smaller
number of biomarkers. A similar application of the multi-
variate procedure (Verhoeckx et al., 2004b) involves the
characterisation of anti-inflammatory compounds.
Some studies by Marengo et al. (2004a, c, 2006) make
use of PCA coupled to both cluster analysis and classifica-
tion tools to identify the differences between groups of maps.
The first paper (Marengo et al., 2004a) describes the study of
two different cell lines of control and drug-treated pancreatic
ductal carcinoma cells. A total of 435 spots was identified
from 18 samples. The first three PCs calculated by PCA al-
lowed the clear separation of the four groups of samples. The
results were further confirmed by cluster analysis.
The other two studies (Marengo et al., 2004c, 2006) de-
scribe the use of SIMCA to the classification of proteomic
maps. The first application (Marengo et al., 2006) showed
PCA effectiveness in the identification of the differences
between adrenal glands proteomic profiles belonging to
healthy and diseased nude mice. SIMCA was applied for the
classification of the samples in the two classes and it was
able to correctly classify all the samples (the first PC in the
SIMCA model of each class) and allowed the identification
of the most discriminating spots (analysis of the discriminat-
ing powers). 84 polypeptide chains were found to be up- or
down-regulated out of a total of 700 spots detected. A simi-
lar approach was followed by the same authors for compar-
ing the phenotypic expression of mantle cell lymphoma
GRANTA-519 and MAVER-1 cell lines (Marengo et al.,
2004c). Even in this case, PCA and SIMCA were able to
correctly classify the samples present and to identify the rea-
sons responsible for the differences identified.
Finally, Fujii et al. (2005) studied the histological sub-
types of lymphoid neoplasms over 42 cell lines from human
lymphoid neoplasms. Different statistical methods were used
to identify the discriminating spots: (i) Wilcoxon or
Kruskal–Wallis tests to find the spots whose intensity was
significantly (p<0.05) different among the cell line groups,
(ii) statistical-learning methods to prioritize the spots accord-
ing to their contribution to the classification, and (iii) unsu-
pervised classification methods to validate the classification
robustness by the selected spots. 31 spots resulted to be sig-
nificant and 24 were also identified by mass spectrometry.
Botany and Food Safety
Two of the first applications of multivariate techniques to
proteomic datasets in botany and in the field of food quality
and control were by Dewettinck et al. (1997) and Alika et al.
(1995). In these applications, the samples were not described
by the spot volumes (the authors applied SDS-PAGE) but by
the volume profile along the SDS strip. The study by Dewet-
tinck et al. (1997) reported the use of PCA and discriminant
analysis for the comparison of SDS-PAGE profiles of four
Belgian cheeses (Passendale, Wijnendale, Nazareth and Oud
Brugge compared to other international brands) with differ-
ent grades of maturity. PCA was able to separate Nazareth
and Oud Brugge and in a minor way Passendale and Wi-
jnendale. Discriminant Analysis allowed the correct classifi-
cation of the four samples.
Another study by Alika et al. (1995), was devoted to the
characterisation of 27 maize accessions (Bendel State, Nige-
ria) through the different band mobility of maize zein protein
in SDS-PAGE, coupled to PCA and cluster analysis. Cluster
analysis identified five clusters while PCA separated the
accessions with yellow kernels from those with early matur-
ity. Moreover, samples from the same geographical area
were grouped together.
More recently two different studies were done by
Tuomainen et al. (2006) and Lilley et al. (2006). Tuomainen
et al. (2006) reported the application of PCA to the spot vol-
umes of three plant accessions (Thlaspi caerulescens) at
various metal (Zn and Cu) exposures in order to verify metal
hyper-accumulation in plants. PCA was applied to verify the
separation of the protein profiles of the three plant acces-
sions at various metal exposures and to detect groups of pro-
teins responsible for the differences. PCA allowed a clear
separation of the samples according to the type of accession
while the effects of metal exposures were less pronounced.
48 spots were identified as relevant in the differentiation of
the groups of samples. The possible roles of some of the pro-
teins in heavy metal accumulation and tolerance were also
discussed. On the contrary, Lilley et al. (2006) reported the
use of methods for quantitative proteomics for the charac-
terisation of plant organelle.
Another interesting application in the field of food qual-
ity and control is by Kjaersgard et al. (2006) who studied the
change in the proteomic profile of cod muscle samples dur-
ing different storage conditions. The authors studied 11 stor-
age conditions including the storage temperature (studied at
two levels), the storage period (studied at 4 levels) and the
chill storage period (studied at 5 levels). Each sample was
replicated twice on different batches. The application of
PCA allowed the separation of the samples according to the
frozen storage time while no information emerged about the
other two parameters. PLS-DA with variable selection (jack-
knife procedure) was then applied to identify the spots rele-
vant for the differentiation of the samples according to the
storage time.
Finally, Olias et al. (2006) investigated whether dry-
cured hams from two European countries can be distin-
guished using SDS-PAGE. The dataset consisted of 37
commercial hams (19 Spanish, 18 French). 4 protein frac-
tions were extracted from each sample (each fraction ana-
lysed in triplicate lanes). The complete extraction process
was carried out in duplicate. In total, 118 gels were analyzed.
The inter-gel registration was carried out using a genetic
algorithm (GA). Feature selection was also performed using
a GA to pass subsets of features to the LDA routine. Cross-
validated classification success rates were 84, 91, 81 and
85%, respectively for the four fractions. SDS-PAGE proved
to be a sufficiently quantitative method and the authors con-
Multivariate Tools in 2D-maps Evaluation Current Proteomics, 2007, Vol. 4, No. 1 59
cluded that it can be potentially used to verify the regional
speciality dry-cured hams.
Microbiology
Some of the first applications of PCA and cluster analysis
to the study of DNA and RNA fragments of several biologi-
cal systems in the field of microbiology were done by Couto
et al. (1995), Johansson et al. (1995) and Boon et al. (2002).
The study by Boon et al. (2002) reported the diversity of
bacterial groups of activated sludge samples that received
waste water from four different types of industry by PCR-
DGGE. Statistical analysis and Shannon diversity index
evaluation of the band patterns were used to identify the dif-
ferences between the samples. Cluster analysis, multidimen-
sional scaling and PCA clearly clustered two of the four ac-
tivated sludge types separately.
Other applications were from the group of Gadea (1999,
2000) who studied the immunological diagnosis of hydatido-
sis. The first study (Gadea et al., 2000) reported the applica-
tion of discriminant techniques to serological patterns ob-
tained by Enzyme-linked Immuno-electro-Transfer Blotting
(EITB) and by conventional immunological tests in order to
differentiate the residual antibody patterns present in healed
hydatidosis from the ones present in active hydatidosis. Dis-
criminant analysis of the serological patterns obtained by
EITB and conventional serology correctly classified 92.54%
of patients (93.3% if patients are differentiated according to
the time elapsed since surgery). This method detected the
presence of active hydatidosis in 95.6% of patients for whom
abdominal ultrasonography had confirmed the presence of
active hydatid cysts. The global specificity was 88.9%.
A previous study from the same authors (Gadea et al.,
1999) reported a similar case where the method was applied
to 67 patients, 25 with active hydatid cysts (24 hepatic and 1
pulmonary) and 42 without a history of hydatid disease and
was compared with the results obtained by conventional
serology. IETB and discriminant analysis were more
sensitive than conventional serological diagnosis and
detected 100% of patients with an active hepatic hydatid cyst
with a 100% specificity. However, this method failed to
detect an uncomplicated hyaline pulmonary hydatid cyst.
Kovarova and co-workers (1998, 2000) and De Moor and
co-workers (2003) carried out some of the first applications
of multivariate tools to microarray data. More recently, Cor-
rea et al. (2007) evaluated the effect of plant variety and
Azospirillum brasilense inoculation on the microbial com-
munities colonizing roots and leaves of tomato plants. Mi-
crobial communities of the rhizoplane and phyllosphere were
analysed by DGGE of PCR-amplified 16S rRNA, sixty days
after planting. Differences on the bacterial communities be-
tween the two tomato types were detected by PCA of the
DGGE fingerprints.
Webster et al. (2007) investigated the structure and com-
position of microbial communities inhabiting the soft coral
Alcyonium antarcticum across three differentially contami-
nated sites within McMurdo Sound (Antarctica). Microbial
communities were revealed at all sites using culture-based
analysis, DGGE, 16S rRNA gene clone-library analysis and
FISH. Multivariate analysis of DGGE band patterns and
PCA of quantitative FISH data revealed no distinct differ-
ences in community composition between differentially con-
taminated sites. The study (the first investigation of micro-
bial communities associated with Antarctic soft corals) sug-
gests that spatially stable microbial associations exist across
an environmental impact gradient.
Another application was by Shoji et al. (2006) who esti-
mated the microbial community in a biological phosphorus
removal process under different electron acceptor conditions
by PCR-DGGE and PCA. A lab-scale sequencing batch re-
actor fed with municipal wastewater was operated under
anaerobic-aerobic, anaerobic-anoxic-aerobic and anaerobic-
anoxic conditions. The results obtained from 16S rRNA-
based PCA showed that little oxygen supply caused the dete-
rioration of aerobic bacteria, including aerobic polyphos-
phate-accumulating organisms (PAOs). Moreover, it also
reflects the existence of nitrate-utilizing denitrifiers.
Other two applications were by Licht et al. (2006) and
Fry et al. (2006). The study by Licht et al. (2006) reported
the effects of selected carbohydrates on composition and
activity of the intestinal microbiota. Five groups of eight rats
were fed a western type diet containing cornstarch (reference
group), sucrose, potato starch, inulin or oligofructose. Prin-
cipal Component Analysis of profiles of the faecal microbi-
ota obtained by DGGE of PCR amplified bacterial 16S
rRNA genes as well as of Reverse Transcriptase-PCR ampli-
fied bacterial 16S rRNA, resulted in different phylogenetic
profiles for each of the five animal groups. Even though su-
crose and cornstarch are both easily digestible and are not
expected to reach the large intestine, the DGGE band pat-
terns obtained indicated that these carbohydrates indeed af-
fected the composition of bacteria in the large gut. Also the
two fructans resulted in completely different molecular fin-
gerprints of the faecal microbiota, indicating that even
though they are chemically similar, different intestinal bacte-
ria ferment them.
The second study by Fry et al. (2006), reported the rela-
tionship between prokaryotic community composition and
biogeochemical processes in deep subseafloor sediments
from the Peru Margin, by PCA on DGGE band patterns.
Toxicological Studies
Several studies have been carried out reporting the use of
multivariate tools in toxicology. Kleno et al. (2004) studied
the mechanism of action of hydrazine toxicity in rat liver
samples by PCA and PLS. PCA was carried out on a dataset
of 30 samples (5 animals x 3 doses of hydrazine x 2 times
after the administration) described by 431 spots revealed on
the 2D maps. PC1 was responsible for sample differentiation
according to the 3 dose levels while PC4 allowed the separa-
tion of the two times after the administration but only for the
largest dose level. Since the loadings analysis did not pro-
duce a clear identification of the most discriminating spots, a
PLS-DA model was applied to model the dose level of hy-
drazine (variable selection according to Jack-knifing). PLS
allowed the identification of the spots responsible for the
60 Current Proteomics, 2007, Vol. 4, No. 1 Marengo et al.
differences between the samples according to the dose level
administered. The results proved that some spots identified
by PLS were not considered relevant by standard t-tests.
Amin et al. (2004), Heijne et al. (2003) and Anderson et
al. (1996) applied PCA to toxicological studies. The study
by Amin et al. (2004) was on gene expression profile played
by three nephrotoxicants (cisplatin, gentamicin and puromy-
cin) on rats, as a function of time after the initial administra-
tion. PCA and cluster analysis allowed the separation of the
samples according to dose and time of administration and
renal toxicity. The study by Heijne et al. (2003) reported the
acute hepatotoxicity induced in rats by administration of
bromobenzene. Control and treated samples could have been
effectively separated by PCA for both protein and gene ex-
pression profiles. Some of the significant proteins (found to
change upon bromobenzene treatment) were also identified
by mass spectrometry. Finally, Anderson et al. (1996) inves-
tigated the effects of five peroxisome proliferators on the
protein profile in the livers of treated mice at 5- and 35-day
time points. PCA was carried out on a set of a total of 107
liver protein spots: the first PC was identified as a global
measure of peroxisome proliferation by its correlation with
enzymatic peroxisomal -oxidation, while PC2 separated the
samples on the basis of time of exposure.
Another study was done by Perrot et al. (2001) reporting
the use of PCA to compare the protein expression of gel-
entrapped Escherichia coli cells exposed to a cold shock at
4°C with those of exponential- and stationary-phase free
floating cells. The authors covered a total of 10 different
incubation conditions, replicating each experiment 3 times
and running each gel in duplicate. PCA was carried out on
the 203 spots identified as significantly higher than those
corresponding to the synthesis at 37°C using the average
spot intensities for each experimental condition adopted.
PCA pointed out that the protein response of immobilized
cells after the cold shock is significantly different from those
of exponential- and stationary-phase free-floating organisms.
The analysis of the loadings identified 9 families of proteins.
Application of Artificial Neural Networks (ANNs)
More recently, some papers have appeared in literature
reporting the use of ANNs to the study of proteomic spot
volume datasets (Ramadan et al., 2005; Izawa et al., 2006;
Bloom et al., 2007).
The first application was by Ramadan et al. (2005) who
applied PLS and Back-Propagation Artificial Neural Net-
works for the estimation of soil properties. The two multi-
variate calibration methods were applied to microbial com-
munity DNA to predict soil properties (%Sand, %Silt,
%Clay, %Nitrogen, %Organic Carbon, DNA) in environ-
mental soil samples. The microbial community DNA was
extracted from 48 environmental soil samples and each sam-
ple was replicated in order to obtain a total of 256 DNA band
patterns. Each band pattern was described by a total of 320
variables (volumes along the band pattern). The samples
were divided into two groups: 171 samples for the training
set and 85 for the test set. PLS did not provide the best re-
sults (R2 < 0.80). ANN performed better than PLS (5 neu-
rons in the hidden layer) but they performed even better if
ANN is carried out on the first 39 PCs (R2 > 0.85).
Another application (Izawa et al., 2006) used artificial
neural networks to the recognition of culture state by two-
dimensional gel electrophoresis. Proteomic technologies
were applied to the examination of nutrient components in
culture broth. Natural nutrients are often used in fermenta-
tion processes such as in the production of baker's yeast,
alcoholic beverages, amino acids, and pharmaceuticals. The
catabolic activities of the microorganisms in these processes
vary with the species used. A total of 23 gels were run: 7
from control E. Coli, 10 belonging to Fe-deficient E. Coli, 6
belonging to Mg-deficient E. Coli. More than 300 spots were
identified on each final map. A three-layers ANN was used:
357 neurons in the input layer (i.e. one neuron for each spot),
100 neurons in the hidden layer and 3 neurons in the output
layer (one neuron for each culture state). Leave-one-out
cross-validation was adopted. Sensitivity analysis allowed
the identification of about 20 spots as significant for the
identification of the correct culture state of each sample.
The recent study by Bloom et al. (2007) reported the use
of ANNs for the discrimination of six common types of ade-
nocarcinoma. 2D-GE was used to analyze the proteomic
expression pattern of 77 similarly appearing (histomorphol-
ogy) adenocareinomas from 6 different types of sites of ori-
gin which were ovary, colon, kidney, breast, lung and stom-
ach. Discriminating sets of proteins were identified and used
to train an ANN, with leave-one-out cross validation. Differ-
ent ANN structures were investigated by limiting the number
of input neurons from 60 to 600 (input neurons were associ-
ated to differentially expressed spots ranked according to
their significance in standard t-tests). The best accuracy was
reached with an ANN architecture giving a final number of
227 spots relevant for classification. Some spots were identi-
fied as relevant for different classes of samples while other
spots were identified as relevant only for one class.
The great number of applications of PCA, PLS and other
multivariate tools in proteomics gives a clear idea of the im-
portance of multivariate methods in this field. In fact, such
techniques are able to identify a larger number of variables
(spots) relevant for the discrimination between the classes of
samples compared to the classical t-tests usually carried out
by standard software packages.
IMAGE ANALYSIS
An alternative to the analysis of spot volume datasets is
the direct analysis of proteomic 2D-maps images that can
also represent an alternative to the use of standard software
packages. Some methods are available in literature for the
analysis of 2D-maps images not based on the standard ap-
proach presented in the introduction. These methods are not
yet so much widespread to be included in common software
packages. The different approaches present in literature de-
scribe the use of ANNs, fuzzy logic principles and the calcu-
lation of mathematical moments. Such procedures represent
the frontier in bioinformatics and some of them are yet under
Multivariate Tools in 2D-maps Evaluation Current Proteomics, 2007, Vol. 4, No. 1 61
further development. The main principles related to these
methods will be presented below together with a review of
the most interesting applications present in literature.
Fuzzy logic
As already pointed out in the introduction, 2D gel-
electrophoresis is affected by a low reproducibility leading to
differences recorded even among replicas of the same elec-
trophoretic run. The differences can consist of changes in
spots position, size and shape. The description of the posi-
tion of each spot in terms of x-y coordinates is therefore dif-
ficult to accomplish. Marengo et al. (2003 b, c, d, 2004b)
proposed a procedure allowing to take into consideration the
information about the uncertainty in spot position and shape,
exploiting fuzzy logic principles coupled to multivariate sta-
tistical tools. The procedure develops in four main steps:
1) Image digitalization. Each map image is turned into a
grid of a given step containing in each cell the optical
density (OD) ranging from 0 to 1. The contribution to the
signal due to the background is eliminated by turning the
values below a selected threshold (e.g. 0.3/0.4) into 0.
2) Image de-fuzzyfication. The effect due to the destaining
protocol is eliminated turning the digitalised image into a
grid of binary values: 0 corresponds to a cell where no
signal is detected and 1 to a cell where a value above the
threshold is present.
3) Image re-fuzzyfication. The information about the spatial
uncertainty eliminated in step 2 together with the contri-
bution due to the destaining protocol, is reintroduced
here. Each cell containing a 1 value is substituted by a 2-
D probability function (2-D Gaussian function). The
probability of finding a signal in cell xi, yj when a signal
is already present in the cell xk, yl is then given by:
+
=2
2
2
2
2
)()(
)1(2
1
2
1),,,( y
lj
x
kiyyxx
yx
lkji eyxyxf
where is the correlation between 1st and 2
nd dimension;
(xi, yj) is the position of the spot influencing the spot in
position (xk, yl); y is the standard deviation along the 1st
dimension; x is the standard deviation along the 2nd
di-
mension.
The parameter is usually set at 0 (complete independ-
ence of the two electrophoretic runs); x and y corre-
spond to the standard deviations of the 2D Gaussian
function along the x and y axis and they can be set identi-
cal (identical uncertainty along the two electrophoretic
runs: = x = y) or at different values, usually x = 1.5
y (uncertainty along the second dimension – molecular
mass - about 50% larger than that along the first dimen-
sion). A change in the parameter corresponds to a
change of the distance at which an occupied cell exerts
its effect. When larger values are adopted, the perturba-
tion operates at a larger distance; small values instead
correspond to a perturbation operating at a smaller dis-
tance (spots acting a smaller effect on their neighbour-
hood and a more crisp final image). In general, best re-
sults are expected for intermediate levels of the pa-
rameters, corresponding to not too fuzzy maps (not too
blurred final images). The value of the signal Sk in each
cell xi, yj of the final map is calculated by the sum of the
effect of all neighbour cells xj’, yj’ containing spots:
( )=
=nji
jijik yxyxfS,1','
'' ,,,
The procedure turns each digitalised image into a virtual
map containing in each cell the sum of the influence of
all the spots of the original 2-D PAGE. These virtual
maps can be called fuzzy maps.
4) Application of multivariate tools to fuzzy maps. The final
fuzzy maps are then analysed by several multivariate
tools for diagnostic/prognostic purposes.
Marengo et al. (2003b) proposed this procedure based on
fuzzy logic principles and presented two approaches for the
evaluation of fuzzy maps: 1) the coupling of PCA and classi-
fication tools; 2) the use of Multi Dimensional Scaling tech-
niques (MDS).
In the first approach (Marengo et al., 2003d), PCA and
LDA were applied to compare eight 2D-maps belonging to
control and mantle cell lymphoma samples. PCA was ap-
plied to images by the previous unwrapping of each image
i.e. turning each image into a series of variables describing
the signal in each position of the map. The authors used 200
x 200 pixel images, giving a total of 40000 variables for each
map. The significant PCs calculated were used to build a
LDA model to classify the samples; the selection of the vari-
ables to be included in the LDA model was performed by a
stepwise algorithm in forward search (Fto-enter = 4.0). The
procedure was repeated for increasing values of the pa-
rameter to identify the best value providing the completely
correct classification of the samples and the smallest number
of components in the LDA model. The best value ranged
from 1.75 to 2.25 and the corresponding LDA model in-
cluded PC1 and PC4. The analysis of their loadings gave in-
sight on the reasons for the differences between the two
groups of samples.
The other two applications by Marengo et al. (2003c,
2004b) reported the use of Multi Dimensional Scaling
(MDS). This technique performs a dimensionality reduction
and an effective graphical representation of the data on the
basis of the similarity calculated between couples of objects.
MDS searches for the smallest number of dimensions in
which objects can be represented as points, matching in the
meantime as much as possible the distances between the
objects in the new reference system with those calculated in
the original one. As for the previous application, increasing
values of parameter were investigated to identify the one
providing the best classification. For each value, a similar-
ity matrix was built, matching all couples of maps; the simi-
larity between two fuzzy maps k and l was calculated as the
ratio between the common signal SCkl (the sum of all signals
present in both maps) and the total signal STkl,:
62 Current Proteomics, 2007, Vol. 4, No. 1 Marengo et al.
(=
=ni
l
i
k
ikl SSSC,1
,min
(=
=ni
l
i
k
ikl SSST,1
,max
where n is the number of cells in the grid. Skl ranges from 0
(two completely different maps) to 1 (identical maps). In
both the applications, the optimal values, could be effec-
tively identified.
Moment Functions
Moment functions have been widely used in image
analysis in applications related to invariant pattern recogni-
tion, object classification, pose estimation, image coding and
reconstruction (Zenkouar et al., 1997; Yin et al., 2002; Hu et
al., 1962; Teague, 1980; Li et al., 1991). A set of moments
computed from a digital image represents global characteris-
tics of the image shape and provides a lot of information
about different types of geometrical features of the image.
Among moment functions, the first to be applied to images
were geometric moments since they are quite easy to com-
pute. Other moment functions were then introduced for im-
age processing as orthogonal moments, rotational moments
and complex moments which are useful tools in the field of
pattern recognition and can be used to describe the features
of objects such as the shape, area, border, location and orien-
tation. Naturally each moment function has its own advan-
tages in specific applications.
The most widespread moments are the orthogonal ones,
e.g. Legendre (Chong et al., 2004; Mukundan et al., 1995;
Zhou et al., 2002) and Zernike moments (Wee et al., 2004;
Kan et al., 2002; Khotanzad et al., 1990) that can attain a
zero value of redundancy measure in a set of moment func-
tions so that these orthogonal moments correspond to inde-
pendent characteristics of the image. Moments with orthogo-
nal base functions can be used to represent the image by a set
of mutually independent descriptors with a minimum amount
of redundant information. Orthogonal moments have some
important properties: they are more robust than non-
orthogonal moments in presence of noise in the image; they
allow the analytical reconstruction of an image intensity
function from a finite set of moments using the inverse mo-
ment transform.
Among orthogonal moments, Legendre moments are the
most widespread and can be implemented as feature descrip-
tors for 2D-PAGE maps classification. The main advantages
arising from the use of Legendre moments to cluster maps
derive from the possibility to obtain invariance to translation,
scale effects and rotation. The original maps can be used for
classification without any further pre-treatment. Due to the
large number of calculated moments, some moments are
present that are not related to the purpose of classification
and for them methods for variable selection should be ap-
plied (i.e. Stepwise Linear Discriminant Analysis).
The Legendre polynomials form a complete orthogonal
set inside the unit circle. Moments with the Legendre poly-
nomials as kernel functions were first introduced by Teague
(1980). The kernel of Legendre moments are products of
Legendre polynomials defined along rectangular image co-
ordinate axes inside a unit circle. The two-dimensional Leg-
endre moments of order ( )qp + of an image intensity map
( )yxf , are defined as:
Lpq =2p +1( ) 2q +1( )
4Pp1
1
1
1
(x) Pq (y) f (x, y)dxdy;
x, y 1, 1[ ],
where Legendre polynomial, )(xPp , of order p is given
by:
Pp x( ) = 1( )p k
21
2 pp + k( )!xk
p k
2!p + k
2!k!k=0
p
p k=even
The recurrence relation of Legendre polynomials, )(xPp ,
is:
Pp x( ) =2p 1( ) xPp 1 x( ) p 1( )Pp 2 x( )
p,
where ( ) 10 =xP , ( ) xxP =1 and 1>p . Since the region of
definition of Legendre polynomials is the interior of [ ]1 ,1 ,
a square image of NN pixels with intensity function
( )jif , , ( )1,0 Nji , is scaled in the region
1,1 << yx .
Legendre moments can be expressed in discrete form as:
Lpq = pq Pp xi( )j=0
N 1
i=0
N 1
Pq yj( ) f i, j( ) ,
where the normalizing constant is:
pq =2p +1( ) 2q +1( )
N 2
ix and
jy denote the normalized pixel coordinates in the
range [ ]1 ,1 :
11
2=
N
ixi
and 11
2=
N
jy j
The reconstruction of the image function from the calcu-
lated moments can be performed by the following inverse
transformation:
( ) ( ) ( )jqip
p
p
q
q
pq xPxPjif= =
=max max
0 0
,
The study by Marengo et al. (2005a) reports an interest-
ing application of Legendre moments to a set of 2D-PAGE
)
)
Multivariate Tools in 2D-maps Evaluation Current Proteomics, 2007, Vol. 4, No. 1 63
maps belonging to two different cell lines of control (un-
treated) and drug-treated pancreatic ductal carcinoma cells.
The Legendre moments were used as discriminant variables
to obtain the correct classification of the 18 samples. Each
digitalised 2D-map was described by a 200x200 matrix of
pixels (whose value ranges from 0 to 1, according to the cor-
responding staining intensity). Moments up to a maximum
order of 100 were computed from each image. The final
dataset was described by 18 samples and 10201 variables.
LDA was applied with variable selection according to the
stepwise algorithm in forward search (Fto-enter = 4.0). The
results showed that only six Legendre moments were neces-
sary in order to correctly classify the 18 samples.
Other Methods
Schultz et al. (2004), together with the application of
PCA and PLS to spot volume data, applied PCA to the
analysis of gel images after their digitalisation and unwrap-
ping. The choice of the alignment procedure for the sets of
gels proved to be determinant for the final result. PCA
proved to be effective in the identification of the groups of
maps present.
Pietrogrande and co-workers (2002, 2003, 2005, 2006)
developed a method for evaluating spot overlapping in 2D-
PAGE maps based on the quantitative theory of peak over-
lapping previously developed and extended to 2-D separa-
tions (Pietrogrande et al., 2002, 2003). The map is divided
into many strips in order to obtain 1-D separations on which
the statistical procedure can be applied. Several important
informations can be extracted such as the number of proteins
present, the model describing distribution of interdistance
between adjacent spots in both separation dimensions and
the presence of repeated interdistances in spot positions in
the maps. The regularities suggest specific protein modifica-
tions. In a more recent paper (Pietrogrande et al., 2005), the
same authors apply a mathematical method based on the
study of the 2-D autocovariance function (2D-ACVF) com-
puted on an experimental digitized map. The first part of the
2D-ACVF allows the estimation of the number of proteins
present in the sample and of the separation performance.
Moreover, the 2D-ACVF plot is a powerful tool in identify-
ing order in the spot position and singling it out from the
complex separation pattern. The results allow to obtain spe-
cific information such as sample complexity, separation per-
formance and identification of spot trains related to post-
translational modifications.
Another study was reported by Marengo et al. (2003a)
who applied three-way PCA to the identification of the dif-
ferences among groups of 2D-maps. Three-way PCA was
preceded by data transformation to scale all the samples and
make them comparable. For this purpose, maximum scaling
was selected and the digitalized 2-D PAGE maps were
scaled one at a time to the maximum value for each map.
This method was successfully applied to datasets of human
lymph-nodes and rat sera allowing the identification of the
main differences existing among the sets of 2D-maps.
A more recent application uses Fast Fourier Transform to
cluster proteomics data (Bensmail et al., 2005). This work
presents new algorithms to cluster and derive meaningful
patterns of expression from mass-spectrometry proteomic
signals. Raw data were processed and transformed from a
real space data-expression to a complex space data-
expression using discrete Fourier transformation. Then a
thresholding approach was applied to denoise and reduce the
length of each spectrum. Bayesian clustering was applied to
the reconstructed data. The method provided very good re-
sults and was compared to other algorithms: K -means, Ko-
honen self-organizing maps, linear discriminant analysis.
CONCLUSIONS
As already pointed out in the introduction, the main dis-
advantages of 2D-GE are represented by the low reproduci-
bility of the technique and the large complexity of the signal
(complex 2D-maps with a large number of spots). The dif-
ferent methods presented here are useful to take into account
one or both these aspects.
First of all, a decision must be made in selecting methods
for the analysis of spot volume datasets or methods for the
direct analysis of 2D-map images. In the case of spot volume
datasets, each multivariate method applied is always sub-due
to the first step of differential analysis carried out by dedi-
cated software packages. In this case, the contribution due to
human interference in the differential analysis is not avoided
but it can be almost eliminated or at least taken into account
in the subsequent multivariate analysis. PCA, PLS-DA and
other robust multivariate tools allow to extract the most sys-
tematic amount of information by means of only the first
PCs calculated thus eliminating human interference and ran-
dom variations that are contained in the last PCs. This is true
even for SIMCA that exploits the principles of PCA and for
LDA when applied to the significant PCs. Therefore, the use
of multivariate tools to spot volume datasets allows the ef-
fective treatment of the complexity characterising 2D-PAGE
maps. Regarding the low reproducibility of the technique,
several methods presented are more or less effective accord-
ing to their abilities of being robust to random variations in
the data. However, this approach does not completely avoid
human interference.
Different considerations can be given for methods for
image analysis. They obviously appear complex but they
take into proper consideration both the low reproducibility of
the technique and the large complexity of the signal to be
processed. Moreover, they avoid the first pre-treatment by
standard software packages, thus limiting human interfer-
ence.
A final conclusion can be drawn about the relevance to
proteomics of the two types of methods reported here. Ap-
proaches based on spot volume dataset appear more immedi-
ate in the identification of the possible biomarkers i.e. spots
characterised by significant differences between groups of
samples. On the other hand, methods based on image analy-
sis appear very promising for what regards the development
of automatic tools for clinical diagnosis and in this perspec-
tive, it is the authors’ opinion that they represent the most
interesting applications.
64 Current Proteomics, 2007, Vol. 4, No. 1 Marengo et al.
ABBREVIATIONS
1D = One-dimensional
2-D ACVF = Two-Dimensional AutoCoVariance Function
2D = Two-dimensional
2D-GE = Two-Dimensional-Gel Electrophoresis
2D-PAGE = Two-Dimensional-Polyacrylamide Gel Electrophoresis
ANN = Artificial Neural Networks
ANOVA = ANalysis Of VAriance
BP-ANN = Back-Propagation Artificial Neural Networks
CCD = Charge-Coupled Device
DGGE = Denaturing Gradient Gel Electrophoresis
DNA = DeoxyriboNucleic Acid
DP = Discrimination Power
EITB = Enzyme-linked Immuno-electro Transfer Blotting
FISH = Fluorescent In Situ Hybridisation
GA = Genetic Algorithm
LDA = Linear Discriminant Analysis
LV = Latent Variable
MDS = Multi Dimensional Scaling
MP = Modelling Power
OD = Optical Density
OLS = Ordinary Least Squares
PAOs = Polyphosphate-Accumulating Organisms
PC = Principal Component
PCA = Principal Component Analysis
PCR-DGGE = Polymerase Chain Reaction-Denaturing Gradient Gel Electrophoresis
PLS-DA = Partial Least Squares Discriminant Analysis
RNA = RiboNucleic Acid
SDS-PAGE = Sodium Dodecyl Sulphate – Polyacryla-mide Gel Electrophoresis
SIMCA = Soft-Independent Model of Class Analogy
REFERENCES
Alika, J.E., Akenova, M.E. and Fatokun C.A. (1995). Variation among maize (Zea mays L) accessions of Bendel State, Nigeria - Numerical
analysis of zein protein band patterns. Genet. Resour. Crop. Ev. 42: 393-9
Almeida, J.S., Stanislaus, R., Krug, E. and Arthur, J.M. (2005). Normalisa-tion and analysis of residual variation in two-dimensional gel electro-
phoresis for quantitative differential proteomics. Proteomics 5: 1242-9 Amin, R.A., Vickers, A.E., Sistare, F., Thompson, K.L., Roman, R.J.,
Lawton, M., Kramer, J., Hamadeh, H.K., et al. (2004). Identification of putative gene-based markers of renal toxicity. Environ. Health Persp.
112: 465-79
Anderson, N.L., EsquerBlasco, R., Richardson, F., Foxworthy, P. and
Eacho, P. (1996). The effects of peroxisome proliferators on protein abundances in mouse liver. Toxicol. Appl. Pharm. 137: 75-89.
Anderson, N.L., Hofmann, J.P., Gemmell, A. and Taylor, J. (1984). Global approaches to quantitative analysis of gene-expression patterns ob-
served by use of two-dimensional gel electrophoresis. Clin. Chem. 30: 2031-6.
Anderson, N.L., Taylor, J., Scandora, A.E., Coulter, B.P. and Anderson, N.G. (1981). The TYCHO System for computer analysis of two-
dimensional gel electrophoresis patterns. Clin. Chem. 27: 1807-20. Bensmail, H., Golek, J., Moody, M.M., Semmes, J.O., Haoudi, A. (2005) A
novel approach for clustering proteomics data using Bayesian fast Fou-rier transform. Bioinformatics 21: 2210-24
Bloom, G.C., Eschrich, S., Zhou, J.X., Coppola, D. and Yeatman, T.J. (2007). Elucidation of a protein signature discriminating six common
types of adenocarcinoma. Int. J. Cancer 120: 769-75. Boon, N., De Windt, W., Verstraete, W. and Top, E.M. (2002). Evaluation
of nested PCR-DGGE (denaturing gradient gel electrophoresis) with group-specific 16S rRNA primers for the analysis of bacterial commu-
nities from different wastewater treatment plants. FEMS Microbiol. Ecol. 39: 101-12.
Campostrini, N., Areces, L.B., Rappsilber, J., Pietrogrande, M.C., Dondi, F., Pastorino, F., Ponzoni, M. and Righetti, P.G. (2005) Spot overlapping
in two-dimensional maps: a serious problem ignored for much too long. Proteomics 5: 2385-95.
Chong, C., Raveebdram, P. and Mukundan, R. (2004). Translation and scale invariants of Legendre moments. Pattern Recogn. 37: 119-29.
Correa, O.S., Romero, A.M., Montecchia, M.S. and Soria, M.A. (2007). Tomato genotype and Azospirillum inoculation modulate the changes in
bacterial communities associated with roots and leaves. J. Appl. Micro-biol. 102: 781-6.
Couto, M.M.B., Vogels, J.T.W.E., Hofstra, H., Husiintveld, J.H.J. and Van-dervossen, J.M.B.M. (1995). Random amplified polymorphic DNA and
restriction enzyme of PCR amplified RDNA in taxonomy – 2 Identifi-cation techniques for food-borne yeasts. J. Appl. Bacteriol. 79: 525-35.
De Moor, B., Marchal, K., Mathys, J. and Moreau, Y. (2003). Bioinformatics: Organisms from Venus, technology from Jupiter, algo-
rithms from Mars. Eur. J. Control 9: 237-78. De Noord, O.E. (1994) Multivariate calibration standardization. Chemometr.
Intell. Lab. Syst. 25: 85–97. Dewettinck, K., Dierckx, S., Eichwalder, P. and Huyghebaert, A. (1997).
Comparison of SDS-PAGE profiles of four Belgian cheeses by multi-variate statistics. Lait 77: 77-89.
Drew, J.E., Padidar, S., Horgan, G., Duthie, G.G., Russell, W.R., Reid, M., Duncan, G. and Rucklidge, G.J. (2006). Salicylate modulates oxidative
stress in the rat colon: A proteomic approach. Biochem. Pharmacol. 72: 204-16.
Eisenbeis, R.A. (Ed.) (1972). Discriminant Analysis and Classification Procedures: Theory and Applications, Lexington, USA
Fry, J.C., Webster, G., Cragg, B.A., Weightman, A.J. and Parkes, R.J. (2006). Analysis of DGGE profiles to explore the relationship between
prokaryotic community composition and biogeochemical processes in deep subseafloor sediments from the Peru Margin. FEMS Microbiol.
Ecol. 58: 86-98. Fujii, K., Kondo, T., Yokoo, H., Yamada, T., Matsuno, Y., Iwatsuki, K. and
Hirohashi, S. (2005). Protein expression pattern distinguishes different lymphoid neoplasms. Proteomics 5: 4274-86.
Gadea, I., Ayala, G., Diago, M.T., Cunat, A. and Garcia de Lomas, J. (1999). Immunological diagnosis of human cystic echinococcosis: Util-
ity of discriminant analysis applied to the enzyme-linked immunoelec-trotransfer blot. Clin. Diagn. Lab. Immun. 6: 504-8.
Gadea, I., Ayala, G., Diago, M.T., Cunat, A. and Garcia de Lomas, J. (2000). Immunological diagnosis of human hydatid cyst relapse: Utility
of the enzyme-linked immunoelectrotransfer blot and discriminant analysis. Clin. Diagn. Lab. Immun. 7: 549-52.
Garrels, J.I. (1979). Two dimensional gel electrophoresis and computer analysis of proteins synthesized by clonal cell lines. J. Biol. Chem. 254:
7961-77. Garrels, J.I. (1989). The QUEST system for quantitative analysis of two-
dimensional gels. J. Biol. Chem. 264: 5269-82. Garrels, J.I., Farrar, J.T., Burwell IV, C.B., in: Celis, J.E. and Bravo, R.
(Eds.) (1984). Two-Dimensional Gel Electrophoresis of Proteins. Aca-demic Press, Orlando, FA, USA pp. 38-91.
Goh, A.T.C. (1995). Backpropagation Neural Networks for modelling com-plex systems. Artif. Intell. Engin. 9: 143-51.
Multivariate Tools in 2D-maps Evaluation Current Proteomics, 2007, Vol. 4, No. 1 65
Gottfries, J., Sjogren, M., Holmberg, B., Rosengren, L., Davidsson, P. and
Blennow, K. (2004). Proteomics for drug target discovery. Chemometr. Intell. Lab. Syst. 73: 47-53.
Heijne, W.H.M., Stierum, R.H., Slijper, M., van Bladeren, P.J. and van Ommen, B. (2003). Toxicogenomics of bromobenzene hepatoxicity: a
combined transcriptomics and proteomics approach. Biochem. Pharma-col. 65: 857-75.
Hu, M.K. (1962) Visual pattern recognition by moment invariants. IRE Transac. Inform. Theory 8: 179-87.
Iwadate, Y., Sakaida, T., Hiwasa, T., Nagai, Y., Ishikura, H., Takiguchi, M. and Yamaura, A. (2004). Molecular classification and survival predic-
tion in human gliomas based on proteome analysis. Cancer Res. 64: 2496-501.
Izawa, N., Kishimoto, M., Konishi, M., Omasa, T., Shioya, S. and Ohtake, H. (2006). Recognition of culture state using two-dimensional gel elec-
trophoresis with an artificial neural network. Proteomics 6: 3730-8. Jessen, F., Lametsch, R., Bendixen, E., Kjaersgard, I.V.H. and Jorgensen,
B.M. (2002). Extracting information from two-dimensional electropho-resis gels by partial least squares regression. Proteomics 2: 32-5.
Johansson, M.L., Quednau, M., Ahrne, S. and Molin, G. (1995). Classifica-tion of lactobacillus-plantarum by resctriction-endonuclease analysis of
total chromosomal DNA using conventional agarose-gel electrophore-sis. Int. J. Syst. Bacteriol. 45: 670-5.
Kan, C. and Srinath, M.D. (2002). Invariant character recognition with Zernike and orthogonal Fourier-Mellin moments. Pattern Recogn. 35:
143-54. Karp, N.A., Griffin, J.L. and Lilley, K.S. (2005). Application of partial least
squares discriminant analysis to two-dimensional difference gel studies in expression proteomics. Proteomics 5: 81-90.
Khotanzad, A. and Hong, Y.H. (1990). Invariant image recognition by Zernike moments. IEEE T. Pattern Anal. 12: 489-97.
Kjaersgard, I.V.H., Norrelykke, M.R. and Jessen, F. (2006). Changes in cod muscle proteins during frozen storage revealed by proteome analysis
and multivariate data analysis. Proteomics 6: 1606-18. Klecka, W.R. (Ed.) (1980). Discriminant Analysis, Sage Publications, Bev-
erly Hills, USA Kleinbaum, D., Kupper, L. and Muller, K. (1988). Applied Regression
Analysis and Other Multivariate Methods, 2nd ed., Pws-Kent, Boston Kleno, T.G., Leonardsen, L.R., Kjeldal, H.O., Laursen, S.M., Jensen, O.N.
and Baunsgaard, D. (2004). Mechanisms of hydrazine toxicity in rat liver investigated by proteomics and multivariate data analysis. Pro-
teomics 4: 868-80. Kovarova, H., Hajduch, M., Korinkova, G., Halada, P., Krupickova, S.,
Gouldsworthy, A., Zhelev, N. and Strnad, M. (2000). Proteomics ap-proach in classyfing the biochemical basis of the anticancer activity of
the new olomoucine-derived synthetic cyclin-dependent kinase inhibi-tor, bohemine. Electrophoresis 21: 3757-64.
Kovarova, H., Radzioch, D., Hajduch, M., Sirova, M., Blaha, V., Macela, A., Stulik, J. and Hernychova, L. (1998). Natural resistance to intracel-
lular parasites: A study by two-dimensional gel electrophoresis coupled with multivariate analysis. Electrophoresis 19: 1325-31.
Li, B.C. and Shen, J. (1991). Fast computation of moment invariants. Pat-tern Recogn. 24: 807-13.
Licht, T.R., Hansen, M., Poulsen, M. and Dragsted, L.O. (2006). Dietary carbohydrate source influences molecular fingerprints of the rat faecal
microbiota. BMC Microbiol. 6: Art. No. 98. Lilley, K.S. and Dupree, P. (2006). Methods of quantitative proteomics and
their application to plant organelle characterization. J. Exp. Bot. 57: 1493-9.
Mahon, P. and Dupree, P. (2001) Quantitative and reproducible two-dimensional gel analysis using Phoretix 2D Full. Electrophoresis 22:
2075-85. Marengo, E., Bobba, M., Robotti, E. and Liparota, M.C. (2005). Use of
Legendre moments for the fast comparison of 2D-PAGE maps images. J. Chromatogr. A 1096: 86-91.
Marengo, E., Leardi, R., Robotti, E., Righetti, P.G., Antonucci, F. and Cec-coni, D. (2003). Application of three-way principal component analysis
to the evaluation of two-dimensional maps in proteomics. J. Prot. Res. 2: 351-60.
Marengo, E., Robotti, E., Bobba, M., Liparota, M.C., Antonucci, F., Rus-tichelli, C., Zamò, A., Chilosi, M., et al. (2006). Characterisation of the
proteomic profiles of two human lymphoma cell lines by two-dimensional gel-electrophoresis and multivariate statistical tools. Elec-
trophoresis 27: 484-94. Marengo, E., Robotti, E., Cecconi, D., Scarpa, A. and Righetti, P.G. (2004).
Identification of the regulatory proteins in human pancreatic cancers
treated with Trichostatin-A by 2D-PAGE maps and Multivariate Statis-
tical Analysis. Anal. Bioanal. Chem. 379: 992-1003. Marengo, E., Robotti, E., Cecconi, D., Scarpa, A. and Righetti, P.G. (2004).
Application of fuzzy logic principles to the classification of 2D-PAGE maps belonging to human pancreatic cancers treated with Trichostatin-
A. Proceedings of 2004 IEEE International Conference on Fuzzy Sys-tems, Budapest, Hungary, 1: 359-64.
Marengo, E., Robotti, E., Gianotti, V. and Righetti P.G. (2003). A new approach to the statistical treatment of 2D-Pages in proteomics using
fuzzy logic. Ann. Chim. Rome 93: 105-16. Marengo, E., Robotti, E., Gianotti, V., Righetti, P.G., Domenici, E. and
Cecconi, D. (2003). A new integrated statistical approach to the diag-nostic use of proteomic two-dimensional maps. Electrophoresis 24:
225-36. Marengo, E., Robotti, E., Righetti, P.G. and Antonucci, F. (2003). A new
approach based on fuzzy logic and Principal Component Analysis for the classification of 2D-maps in health and disease: application to lym-
phomas. J. Chromatogr. A 1004: 13-28. Marengo, E., Robotti, E., Righetti, P.G., Campostrini, N., Pascali, J. and
Ponzoni, M. (2004). Study of Proteomic changes associated with healthy and tumoral murine samples in Neuroblastoma by Principal
Component Analysis and classification methods. Clin. Chim. Acta 345: 55-67.
Martens, H. and Naes, T. (1989). Multivariate Calibration, Wiley, London Massart, D.L., Kaufman, L., in: Elving, P.J. and Winefordner, J.D. (Eds.)
(1983). The interpretation of Analytical Chemical Data by the Use of Cluster Analysis, Wiley, New York, USA
Massart, D.L., Vandeginste, B.G.M., Deming, S.M., Michotte, Y. and Kaufman, L. (1988). Chemometrics: a textbook, Amsterdam: Elsevier
Molloy, M.P., Brzezinski, E.E., Hang, J., McDowell, M.T. and VanBogelen, R.A. (2003). Overcoming technical variation and biological variation in
quantitative proteomics. Proteomics 3: 1912-9. Moritz, B. and Meyer, H.E. (2003). Approaches for the quantifications of
protein concentration ratios. Proteomics 3: 2208-20. Mukundan, R. and Ramakrishnan, K.R. (1995). Fast computation of legen-
dre and zernike moments. Pattern Recogn. 28: 1433-42. Olias, R., Maldonado, B., Radreau, P., Le Gall, G., Mulholland, F.,
Colquhoun, I.J. and Kemsley, E.K. (2006). Sodium dodecyl sulphate-polyacrylamide gel electrophoresis of proteins in dry-cured hams: Data
registration and multivariate analysis across multiple gels. Electropho-resis 27: 1288-99.
Perrot, F., Hebraud, M., Charlionet, R., Junter, G.A. and Jouenne, T. (2001). Cell immobilisation induces changes in the protein response of Esch-
ierichia Coli K-12 to a cold shock. Electrophoresis 22: 2110-9. Pietrogrande, M.C., Marchetti, N., Dondi, F. and Righetti, P.G. (2002). Spot
overlapping in two-dimensional polyacrylamide gel electrophoresis separations: A statistical study of complex protein maps. Electrophore-
sis 23: 283-291. Pietrogrande, M.C., Marchetti, N., Dondi, F. and Righetti P.G. (2003). Spot
overlapping in two-dimensional polyacrylamide gel electrophoresis maps: Relevance to proteomics. Electrophoresis 24: 217-24.
Pietrogrande, M.C., Marchetti, N., Tosi, A., Dondi, F. and Righetti P.G. (2005). Decoding two-dimensional polyacrylamide gel electrophoresis
complex maps by autocovariance function: A simplified approach use-ful for proteomics. Electrophoresis 26: 2739-48.
Pietrogrande, M.C., Marchetti, N., Dondi, F. and Righetti P.G. (2006). De-coding 2D-PAGE complex maps: Relevance to proteomics. J. Chroma-
togr. B 833: 51-62. Ramadan, Z., Hopke, P.K., Johnson, M.J. and Scow, K.M. (2005). Applica-
tion of PLS and Back-Propagation Neural Networks for the estimation of soil properties. Chemometr. Intell. Lab. Syst. 75: 23-30.
Raman, B., Cheung, A. and Marten, M.R. (2002). Quantitative comparison and evaluation of two commercially available, two-dimensional elec-
trophoresis image analysis software packages, Z3 and Melanie. Elec-trophoresis 23: 2194-202.
Righetti, P.G., Stoyanov, A. and Zhukov, M. (2001). The proteome revis-ited: theory and practice of the relevant electrophoretic steps, Elsevier,
Amsterdam Rosengren, A.T., Salmi, J.M., Aittokallio, T., Westerholm, J., Lahesmaa, R.,
Nyman, T.A. and Nevalainen, O.S. (2003). Comparison of PDQuest and Progenesis software packages in the analysis of two dimensional
electrophoresis gels. Proteomics 3: 1936-46. Rubinfeld, A., Keren-Lehrer, T., Hadas, G. and Smilansky, Z. (2003). Hier-
archical analysis of large-scale two-dimensional gel-electrophoresis ex-periments. Proteomics 3: 1930-5.
66 Current Proteomics, 2007, Vol. 4, No. 1 Marengo et al.
Schultz, J., Gottlieb, D.M., Petersen, M., Nesic, L., Jacobsen, S. and Son-
dergaard, I. (2004). Explorative data analysis of two-dimensional elec-trophoresis gels. Electrophoresis 25: 502-11.
Shoji, T., Nittami, T., Onuki, M., Satoh, H. and Mino, T. (2006). Microbial community of biological phosphorus removal process fed with munici-
pal wastewater under different electron acceptor conditions. Water Sci. Technol. 54: 81-9.
Tarroux, P., Vincens, P. and Rabilloud, T. (1987). HERMeS: A second generation approach to the automatic analysis of two-dimensional elec-
trophoresis gels. Part V: Data analysis. Electrophoresis 8: 187-99. Teague, M.R. (1980). Image analysis via the general theory of moments. J.
Opt. Soc. Am. 70: 920-30. Tuomainen, M.H., Nunan, N., Lehesranta, S.J., Tervahauta, A.I., Hassinen,
V.H., Schat, H., Koistinen, K.M., Auriola, S., et al. (2006). Multivariate analysis of protein profiles of metal hyperaccumulator Thlaspi caerules-
cens accessions. Proteomics 6: 3696-706. Vandeginste, B.G.M., Massart, D.L., Buydens, L.M.C., De Jong, S., Lewi,
P.J. and Smeyers-Verbeke, J. (1998). Handbook of Chemometrics and Qualimetrics: Part B, Amsterdam: Elsevier.
Verhoeckx, K.C.M., Bijlsma, S., de Groene, E.M., Witkamp, R.F., van der Greef, J. and Rodenburg, R.J.T. (2004). A combination of proteomics,
principal component analysis and transcriptomics is a powerful tool for the identification of biomarkers for macrophage maturation in the U937
cell line. Proteomics 4: 1014-28. Verhoeckx, K.C.M., Bijlsma, S., Jespersen, S., Ramaker, R., Verheij, E.R.,
Witkamp, R.F., van der Greef, J. and Rodenburg, R.J.T. (2004). Characterization of anti-inflammatory compounds using transcriptom-
ics, proteomics, and metabolomics in combination with multivariate data analysis. Int. Immunopharmacol. 4: 1499-514.
Voss, T. and Haberl, P. (2000). Observations on the reproducibility and matching efficiency of two-dimensional electrophoresis gels: conse-
quences for comprehensive data analysis. Electrophoresis 21: 3345-50.
Walczak, B. (1996). Neural networks with robust backpropagation learning
algorithm. Anal. Chim. Acta 322: 21-9. Webster, N.S. and Bourne, D. (2007). Bacterial community structure associ-
ated with the Antarctic soft coral, Alcyonium antarcticum. FEMS Mi-crobiol. Ecol. 59: 81-94.
Wee, C., Paramesran, R. and Takeda, F. (2004). New computational meth-ods for full and subset Zernike moments. Inform. Sciences 159: 203-20.
Wheelock, A.M. and Buckpitt, A.R. (2005). Software-induced variance in two-dimensional gel electrophoresis image analysis. Electrophoresis
26: 4508-20. Wold, S. (1976). Pattern recognition by means of disjoint principal compo-
nents models. Pattern Recogn. 8: 127-39. Wythoff, B.J. (1993). Backpropagation neural networks - A tutorial.
Chemometr. Intell. Lab. Syst. 18: 115-55. Yin, J., Rodolfo De Pierro, A. and Wei, M. (2002). Analysis for the recon-
struction of a noisy signal based on orthogonal moments. Appl. Math. Comput. 132: 249-63.
Zenkouar, H. and Nachit, A. (1997). Images compression using moments method of orthogonal polynomials. Mat. Sci. Eng. B 49: 211-5.
Zhang, L. and Subbarayan, G. (2002). An evaluation of back-propagation neural networks for the optimal design of structural systems: Part I.
Training procedures. Comput. Method Appl. M. 191: 2873-86. Zhang, L. and Subbarayan, G. (2002). An evaluation of back-propagation
neural networks for the optimal design of structural systems: Part II. Numerical evaluation. Comput. Method Appl. M. 191: 2887-904.
Zhou, J.D., Shu, H.Z., Luo, L.M. and Yu, W.X. (2002). Two new algo-rithms for efficient computation of Legendre moments. Pattern Recogn.
35: 1143-52. Zupan, J. and Gasteiger, J. (1993). Neural network for chemist: an introduc-
tion. Wiley: New York.
Received: May 1, 2007 Revised: July 10, 2007 Accepted: July 10, 2007