Abdi WiresCS Mfa 2013

download Abdi WiresCS Mfa 2013

of 31

Transcript of Abdi WiresCS Mfa 2013

  • 8/12/2019 Abdi WiresCS Mfa 2013

    1/31

    Overview

    Multiple factor analysis: principalcomponent analysis for multitable

    and multiblock data setsHerve Abdi1, Lynne J. Williams2 and Domininique Valentin3

    Multiplefactor analysis(MFA,also called multiple factorial analysis)is an extensionof principal component analysis (PCA) tailored to handle multiple data tables thatmeasure sets of variables collected on the same observations, or, alternatively,(in dual-MFA) multiple data tables where the same variables are measured ondifferent sets of observations. MFA proceeds in two steps:Firstit computes a PCA ofeach data table and normalizes each data table by dividing all its elements by thefirst singular value obtained from its PCA. Second, all the normalized data tablesare aggregated into a grand data table that is analyzed via a (non-normalized)PCA that gives a set of factor scores for the observations and loadings for thevariables. In addition, MFA provides for each data table a set of partial factorscores for the observations that reflects the specific view-point of this data table.Interestingly, the common factor scores could be obtained by replacing the originalnormalized data tables by the normalized factor scores obtained from the PCA ofeach of these tables. In this article, we present MFA, review recent extensions, andillustrate it with a detailed example. 2013 Wiley Periodicals, Inc.

    How to cite this article:

    WIREs Comput Stat2013. doi: 10.1002/wics.1246

    Keywords: multiple factor analysis (MFA); multiple factorial analysis; multiblock

    correspondence analysis; STATIS; INDSCAL; multiblock barycentric discriminantanalysis (MUDICA); multiple factor analysis barycentric discriminant analysis(MUFABADA); barycentric discriminant analysis (BADA); generalized Procrustesanalysis (GPA); generalized singular value decomposition; principal componentanalysis; consensus PCA; multitable PCA; multiblock PCA

    INTRODUCTION

    Multiple factor analysis (MFA, also sometimesnamed multiple factorial analysis to avoid theconfusion with Thurstones multiple factor analysis

    described in Ref 1) is a generalization of principalcomponent analysis (PCA). Its goal is to analyzeseveral data sets of variables collected on the same setof observations, oras in its dual versionseveral setsof observations measured on the same set of variables

    Correspondence to: [email protected] of Brain and Behavioral Sciences, The University ofTexas at Dallas, Richardson, TX, USA2Rotman Research Institute, Baycrest, Toronto, ON, Canada3Universite de Bourgogne, Dijon, France

    (see Ref 2). As such, MFA is part of the multitable(also called multiblock or consensus analysis320)PCA family which comprises related techniquessuch as STATIS, multiblock correspondence analysis(MUDICA), and SUM-PCA.

    MFA is a recent technique (ca 1980) thatoriginated from the work of the French statisticiansBrigitte Escofier and Jerome Pag`es (see Refs 14,21,22,for an introduction and for example see Ref 23, foran extensive and comprehensive review see Ref 24).The goals of MFA are (1) to analyze several datasets measured on the same observations; (2) toprovide a set of common factor scores (often calledcompromise factor scores); and (3) to project eachof the original data sets onto the compromise toanalyze communalities and discrepancies. The main

    2013 Wiley Periodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    2/31

    Overview wires.wiley.com/compstats

    idea behind MFA is remarkably simple and akin tothe idea behind the Z-score normalization that makesvariables comparable by dividing each element of avariable by the variable standard deviation (i.e., thesquare root of the variance) of this variable. For aPCA a notion similar to the standard deviation is

    the singular value which is the square root of aneigenvalue (which can be seen as a variance). So, inMFA each data table is normalized by dividing allof its elements by the first singular value of this datatable. This transformation ensures that the length (i.e.,the singular value) of the first principal component ofeach data table is equal to 1 and therefore that nodata table can dominate the common solution onlybecause it has a larger inertia on its first dimension.

    MFA is a popular method for analyzing mul-tiple sets of variables measured on the same obser-vations and it has been recently used in variousdomains, such as sensory and consumer scienceresearch (a domain where MFA applications anddevelopments have been particularly rich and var-ied, see Refs 9,21,2540), chemometry and pro-cess monitoring,9,4143 ecology,4453 agriculture54,55,broadcasting,56 geology,57 neuroimaging,4,5,5860 med-icine and health,6164 genetics,54,65,66 statistical qual-ity control,27,6772 economy,73 and molecular biologyto name but a few.46,50 In addition to being usedin several domains of applications, MFA is also avigourous domain of theoretical developments thatare explored later in this article.

    When to Use ItMFA is used when several sets of variables have beenmeasured on the same set of observations. The numberand/or nature of the variables used to describe theobservations can vary from one set of variables to theother, but the observations should be the same in allthe data sets.

    For example, the data sets can be measurementstaken on the same observations (individuals or objects,e.g., students) at different occasions (e.g., semesters).In this case, the first data set corresponds to the datacollected at time 1 (e.g., the first semester), the second

    one to the data collected at time 2 (e.g., the secondsemester) and so on. The goal of the analysis, thenis to evaluate how the positions of the observationschange over time (note, however, that MFA does notexplicitly models the time variable as it does notmake assumptions about the relationships betweenthe measurements).

    In another example, the different data sets canbe the same observations (e.g., wines) evaluated bydifferent subjects (e.g., wine experts) or groups of sub-jects with different variables (e.g., each wine expert

    evaluates the wines with his/her own set of scales).In this case, the first data set corresponds to the firstsubject, the second one to the second subject and soon. The goal of the analysis, then, is to evaluate ifthere is an agreement between the subjects or groupsof subjects.

    Alternatively, dual-MFA can be used when thesamevariables are measured on different populationsor on different participants. When both observationsand variables are the same for all data tables thetechnique could be called (by analogy with STATIS,see, Ref 3) partial triadic MFA.

    The Main IdeaThe general idea behind MFA is to normalize each ofthe individual data sets so that their first principalcomponent has the same length (as measured bythe first singular value of each data table) and

    then to combine these data tables into a commonrepresentation of the observations sometimes calledthe compromise, or the consensus. This compromiseis obtained from a (non-normalized) PCA of thegrand table obtained from the concatenation ofthe normalized data tables. This PCA decomposesthe variance of the compromise into a set of neworthogonal variables (i.e., the principal componentsalso often called dimensions, axes, factors, or evenlatent variables) ordered by the amount of variancethat each component explains. The coordinates ofthe observations on the components are called factorscoresand these can be used to plot maps of the ob-servations in which the observations are representedas points such that the distances in the map bestreflect the similarities between the observations. Thepositions of the observations as seen by each dataset are called partial factor scores and can be alsorepresented as points in the compromise map. Theaverage of the factor scores of all the tables gives backthe factor score of the compromise. A pictorial sketchof the technique is provided in Figure 1.

    As the components are obtained by combiningthe original variables, each variable contributes a cer-tain amount to each component. This quantity, called

    theloadingof a variable on a component, reflects theimportance of that variable for this component andcan also be used to plot maps of the variables thatreflect their association. Squared loadings can alsobe used to evaluate the importance of variables. Avariation over squared loadings, called contributionsevaluate the importance of each variable as the pro-portion of the explained variance of the componentby the variable. The contribution of a data table to acomponent can be obtained by adding the contribu-tions of its variables. These contributions can then be

    2013 Wiley Per iodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    3/31

    WIREs Computational Statistics Multiple factor analysis

    Step 1:Ktables of Jkvariables collected on the same observations

    J1.1

    I1

    In

    J1.k

    Jk.1

    Jk.k

    JK.1

    JK.k

    J1.1

    J1.k

    Jk.1

    Jk.k

    JK.1

    JK.k

    I1

    In

    Step 2: Compute generalized PCA on each of the Ktables (whereis the first singular value of each table)

    GPCA GPCA GPCA

    1

    1 kk

    K

    K

    Step 3: Normalize each table by dividing by its first singular value ()

    11

    k1

    K

    1

    =

    =

    =

    =

    =

    =

    Step 4: Concatenate the Knormalized tables

    Step 5: Compute a generalized PCA on the concatenated table

    GPCA

    F I G U R E 1 | The different steps of MFA.

    used to draw plots expressing the importance of thedata tables in the common solution.

    NOTATIONS AND PRELIMINARIES

    Matrices are denoted by boldface uppercase letters

    (e.g.,X), vectors by boldface lowercase letters (e.g.,q),elements of vectors and matrices are denoted by italiclower case letters with appropriate indices if needed(e.g.,xij is an element of X). Blocks of variables(i.e., tables) are considered as sub-matrices of largermatrices and are represented in brackets separatedby vertical bars (e.g., a matrix X made of two sub-matrices X[1] and X[2] is written X=

    X[1]|X[2]

    ).

    The identity matrix is denoted by I, a vector of onesis denoted by 1 (indices may be used to specify thedimensions if the context is ambiguous). The transposeof a matrix is denoted by . The inverse of a matrix

    is denoted by 1

    . When applied to a square matrix,thediagoperator takes the diagonal elements of thismatrix and stores them into a column vector; whenapplied to a vector, the diag operator stores theelements of this vector on the diagonal elements ofa diagonal matrix. The trace operator computes thesum of the diagonal elements of a square matrix.Thevecoperator transforms a matrix into a vector bystacking the elements of a matrix into a column vector.The standard product between matrices is implicitlydenoted by simple juxtaposition or by when it needsto be explicitly stated (e.g.,XY=X Y is the productof matricesXandY). The Hadamard or element-wiseproduct is denoted by(e.g.,XY).

    The raw data consist of K data sets collectedon the same observations. Each data set is also calleda table, a sub-table, or sometimes also a block ora study (in this article we prefer the term table oroccasionally block). The data for each table are storedin anIJ[k]rectangular data matrix denoted byY[k],where I is the number of observations and J[k] thenumber of variables collected on the observationsfor the k-th table. The total number of variables isdenoted byJ(i.e.,J=

    J[k]). Each data matrix is, in

    general, preprocessed (e.g., centered, normalized) and

    the preprocessed data matrices actually used in theanalysis are denoted by X[k] (the preprocessing stepsare detailed below in sectionMore on Preprocessing).

    The K data matrices X[k], each of dimensionsI rows by J[k] columns, are concatenated into thecompleteIby Jdata matrix denoted byX:

    X= X[1]| . . . |X[k]| . . . |X[K] . (1)A mass, denoted by mi, is assigned to each

    observation. These masses are collected in the mass

    2013 Wiley Periodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    4/31

    Overview wires.wiley.com/compstats

    vector, denoted bym, and in the diagonal elements ofthe mass matrix denoted byM, which is obtained as

    M=diag {m} . (2)

    Masses are non-negative elements whose sum equals

    one. Often, equal masses are chosen withmi= 1

    I.To each matrix X[k], we associate its cross-

    productmatrix defined as

    S[k]=X[k]XT[k]. (3)

    A cross-product matrix of a table expresses the patternof relationships between the observations as seen bythis table. Note that because of the block structure ofX, the cross product ofX can be expressed as

    XXT =X[1]| . . . |X[k]| . . . |X[K] X[1]| . . . |X[k]| . . . |X[K]T

    =

    k

    X[k]XT[k]

    =

    k

    S[k]. (4)

    Singular Value Decomposition andGeneralized Singular Value DecompositionMFA is part of the PCA family and therefore its main

    analytical tool is the singular value decomposition(SVD) and the generalized singular value decompo-sition (GSVD) of a matrix (see for tutorials, Refs7480). We briefly describe these two methods below.

    SVDRecall that the SVD of a given IJ matrix Zdecomposes it into three matrices as:

    X=UVT with UTU=VTV=I (5)

    where U is the Iby L matrix of the normalized left

    singular vectors (with L being the rank of X), Vthe J by L matrix of the normalized right singularvectors, and the L by L diagonal matrix of the Lsingular values; also , u, and v are, respectively,theth singular value, left, and right singular vectors.Matrices U and V are orthonormal matrices (i.e.,UTU=VTV=I). The SVD is closely related to andgeneralizes the well-known eigendecomposition as Uis also the matrix of the normalized eigenvectors ofXXT, V is the matrix of the normalized eigenvectorsofXTX, and the singular values are the square root of

    the eigenvalues ofXXT andXTX (these two matriceshave the same eigenvalues).

    Key property: the SVD provides the bestreconstitution (in a least squares sense) of the originalmatrix by a matrix with a lower rank.

    GSVDThe GSVD generalizes the SVD of a matrix by incor-porating two additional positive definite matrices(recall that a positive definite matrix is a squaresymmetric matrix whose eigenvalues are all positive)that represent constraints to be incorporated inthe decomposition (formally, these matrices areconstraints on the orthogonality of the singularvectors, see Refs 75,78 for more details). Specificallylet M denote an I by I positive definite matrixrepresenting the constraints imposed on the rowsof anIby JmatrixX, andA a Jby Jpositive definitematrix representing the constraints imposed on the

    columns ofX. MatrixM is almost always a diagonalmatrix of the masses of the observations (i.e., therows); whereas matrix A implements a metric onthe variables and if often but not always diagonal.Obviously, when M=A=I, the GSVD reduces tothe plain SVD. The GSVD ofX, taking into accountMandA, is expressed as (compare with Eq. (5)):

    X=PQT with PTMP=QTAQ=I (6)

    where P is the I by L matrix of the normalizedleft generalized singular vectors (with L being therank of X), Q the J by L matrix of the normalizedgeneralized right singular vectors, and the L by Ldiagonal matrix of the L generalized singular values.The GSVD implements the whole class of generalizedPCA which includes (with a proper choice of matricesM and A and preprocessing of X) techniques suchas discriminant analysis, correspondence analysis,canonical variate analysis, etc. With the so calledtriplet notation, that is used as a general frameworkto formalize multivariate techniques, the GSVD ofX under the constraints imposed by M and A isequivalent to the statistical analysis of the triplet(X, A, M)(see, Refs 8185).

    Key property: the GSVD provides the bestreconstitution (in a least squares sense) of the originalmatrix by a matrix with a lower rank under theconstraints imposed by two positive definite matrices.The generalized singular vectors are orthonormal withrespect to their respective matrix of constraints.

    THE DIFFERENT STEPS OF MFA

    MFA comprises three main steps: In the first step,a PCA of each data table is performed and the first

    2013 Wiley Per iodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    5/31

    WIREs Computational Statistics Multiple factor analysis

    singular value of each table recorded. In the secondstep a grand matrix is created by concatenating all thedata tables and a non-normalized generalized PCA isperformed by decomposing the grand matrix with aGSVD where the column weights are obtained fromthe first (squared) singular value of each data table.

    An equivalent way of performing this second step is todivide all the elements of a data table by the tables firstsingular value, concatenate the data tables into a granddata table and then perform a non-normalized PCA ofthis grand data table. In the thirdstep, the observationpartial factor scores for each table are computed byprojecting each data table onto the common space.

    As MFA boils down to the PCA of the granddata table, the usual PCA indices can be computedto identify the important components, observations,and variables (see Ref 75). In addition some indicesspecific to MFAcan also be derived to quantify theimportance of each table in the common solution.

    Step 1: PCA of Each Data TableIn the first step of MFA each data table is analyzed viaa standard PCA. Specifically, each table is expressedvia its SVD as

    X[k]=U[k][k]VT[k] withUT[k]U[k]=VT[k]V[k]=I. (7)

    From the SVD of each table we also obtain its factorscores(as in any standard PCA) that are computed as

    G[k]=U[k][k]. (8)The matricesU[k] and V[k] store (respectively) the leftand right singular vectors of the table X[k], whosesingular values are stored in the diagonal of the(diagonal) matrix [k]

    1,k, . . . , ,k, . . . , L,k =diag [k] . (9)

    In MFA, the weight of a table is obtained from thefirst singular value of its PCA. This weight, denoted byk, is equal to the inverse of the first squared singularvalue:

    k=1

    21,k

    =21,k

    . (10)

    For convenience, the weights can be gathered ina J by 1 vector denoted a where each variable isassigned the weight of the matrix to which it belongs.Specifically,a is constructed as:

    a=

    11T[1], . . . , k1

    T[k], . . . , K1

    T[K]

    , (11)

    where 1[k] stands for a J[k] vector of ones.Alternatively, the weights can be stored as thediagonal elements of a diagonal matrix denoted by Aobtained as

    A=diag {a}

    =diag 11T[1], . . . , k1T[k], . . . , K1T[K] .(12)

    Step 2: Generalized PCA of XGSVD ofXAfter the weights have been collected, they are usedto compute the GSVD of X under the constraintsprovided by M (masses for the observations) and A(squared singular value derived weights for the Ktables). This GSVD is expressed as:

    X=PQT with PTMP=QTAQ=I. (13)

    This GSVD corresponds to a generalized PCA ofmatrix Xand, consequently, will provide factor scoresto describe the observations and factor loadingsto describe the variables. Each column of P andQ refers to a principal component also called adimension (because the numbers in these columnsare often used as coordinates to plot maps, see75 for more details). In PCA, Eq. (13) is oftenrewritten as

    X=FQT with F=P (14)

    where F stores the factor scores (describing theobservations) and Q stores the loadings (describingthe variables). Note, incidentally, that in the tripletnotation, MFA is equivalent to the statistical analysisof the triplet(X, A, M).

    Because the matrix X concatenates K tables, eachof them, in turn, comprisingJ[k] variables, the matrixQof the left singular vectors can be partitioned in thesame way asX. Specifically,Q can be expressed as acolumn block matrix as:

    Q=

    Q[1]

    ...

    Q[k]...

    Q[K]

    =

    QT[1]| . . . |QT[k]| . . . |QT[K]T

    , (15)

    where Q[k] is aJ[k] byL (withL being the rank ofX)matrix storing the right singular vectors correspondingto the variables of matrix X[k]. With this in mind,

    2013 Wiley Periodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    6/31

    Overview wires.wiley.com/compstats

    Eq. (13) can re-expressed as:

    X= X[1]| . . . |X[k]| . . . |X[K] =PQT=P

    QT[1]| . . . |QT[k]| . . . |QT[K]

    TT

    =P QT[1]| . . . |QT[k]| . . . |QT[K]=

    PQT[1]| . . . |PQT[k]| . . . |PQT[K]

    . (16)

    Note, that, the pattern in Eq. (13) does not completelygeneralize to Eq. (16) because, if we define A[k] as

    A[k]=kI, (17)

    we have, in general, QT[k]

    A[k]Q[k]=I.

    Factor ScoresThe factor scores for X represent a compromise(i.e., a common representation) for the set of the Kmatrices. Recall that these compromise factor scores,are computed (cf., Eqs (13) and (14)) as

    F=P. (18)

    Factor scores can be used to plot the observations asdone in standard PCA for which each column of Frepresents a dimension. Note that the variance of thefactor scores of the observations is computed using

    their masses (stored in matrix M) and can be foundas the diagonal of the matrix FTMF. This varianceis equal, for each dimension, to the square of thesingular value of this dimension as shown by

    FTMF=PTMP = 2. (19)

    As in standard PCA, F can be obtained fromXby combining Eqs (13) and (18) to get:

    F=P =XAQ. (20)

    Taking into account the block structure ofX,A, and

    Q, Eq. (13) can also be rewritten as (cf., Eq. (17)):

    F=XAQ = X[1]| . . . |X[k]| . . . |X[K] A

    Q[1]...

    Q[k]...

    Q[K]

    =

    k

    X[k]A[k]Q[k]=

    k

    kX[k]Q[k]. (21)

    This equation suggests that the partial factor scoresfor a table can be defined from the projection of thistable onto its right singular vectors (i.e., Q[k]). Specif-ically, the partial factor scores for the kth table arestored in a matrix denoted byF[k] computed as

    F[k]=K k X[k]Q[k]. (22)Note that the compromise factor scores matrix is thebarycenter (also called centroid or center of gravitysee Ref 86) of the partial factor scores because it is theaverage of allK partial factor scores (cf., Eq. (20)):

    1

    K

    k

    F[k]=1

    K

    k

    KkX[k]Q[k] =

    k

    kX[k]Q[k] = F.

    (23)

    Also as in standard PCA, the elements ofQ are

    loadings and can be plotted either on their own oralong with the factor scores as a biplot (see Refs87,88). As the loadings come in blocks (i.e., theloadings correspond to the variables of a table), itmakes sense to create a biplot with the partial factorscores (i.e.,F[k]) for a block and the loadings (i.e.,Q[k])for this block. In doing so, it is often practical tonormalize the loadings such that their variance iscommensurable with the variance of the factor scores.This can be achieved, for example, by normalizing,for each dimension, the loadings of a block such thattheir variance is equal to the square of the singularvalue of the dimension or even to the singular valueitself (as illustrated in the example that we presentin a following section). These biplots are helpful forunderstanding the statistical structure of each block,even though the relative positions of the factor scoresand the loadings are not directly interpretable becauseonly the projections of observations on the loadingvectors can be meaningfully interpreted in a biplot(cf., Refs 87,88).

    An alternative pictorial representation of thevariables and the components plots the correlationsbetween the original variables of X and the factorscores. These correlations are plotted as two-

    dimensional maps in which a circle of radius one(called the circle of correlation75,89) is also plotted.The closer to the circle a variable is, the better thisvariable is explained by the components used tocreate the plot (see Refs 23,24 for examples). Loadingsand correlations as often used interchangeably becausethese two concepts are very similar and, sometimes thenames loading is used for both concepts (see Ref 75).In fact, loadings and correlation differ only by anormalization factor: the sum of the squared loadingsof all the variables for a given dimension is equal to

    2013 Wiley Per iodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    7/31

    WIREs Computational Statistics Multiple factor analysis

    one whereas the sum of the squared correlations ofall the dimensions for a given variable is equal to one(and therefore it is always possible to transform oneset into the other).

    HOW TO FIND THE IMPORTANTELEMENTS: CONTRIBUTIONS, ETC.

    Contributions of Observations, Variables,and Tables to a DimensionIn MFA, just like in standard PCA, the importance ofa dimension (i.e., principal component) is reflected byits eigenvalue which indicates how much of the totalinertia (i.e., variance) of the data is explained by thiscomponent.

    To better understand the relationships betweencomponents, observations, variables, and tablesand also to help interpret a component, we canevaluate how much an observation, a variable, ora whole table contribute to the inertia extractedby a component. In order to do so, we com-pute descriptive statistics, called contributions (seeRefs 78,8991 and Ref 75, p. 437ff.). The stability ofthese descriptive statistics can be assessed by cross-validation techniques such as the bootstrap whoseresults can be used to select the relevant elements fora dimension.

    Contribution of an Observation to a DimensionAs stated in Eq. (19), the variance of the factor

    scores for a given dimension is equal to its eigenvalue(i.e., the square of the singular value) associated withthis dimension. If we denote , the eigenvalue of agiven dimension, we can rewrite Eq. (19) as

    =

    i

    mi f2i, (24)

    wheremi andfi, are, respectively, the mass of the ithobservation and the factor score of the ith observationfor the th dimension. As all the terms mi f2i, arepositive or null, we can evaluate the contribution

    of an observation to a dimension as the ratio ofthe squared weighted factor score by the dimensioneigenvalue. Formally, the contribution of observationito component, denoted ctri,, is computed as

    ctri,=mi f2i,

    . (25)

    Contributions take values between 0 and 1, and fora given component, the sum of the contributionsof all observations is equal to 1. The larger a

    contribution, the more the observation contributesto the component. A useful heuristic is to base theinterpretation of a component on the observationsthat have contributions larger than the averagecontribution. Observations with high contributionsand whose factor scores have different signs can then

    be contrasted to help interpreting the component.Alternatively (as described in a later section) we canderive pseudo tstatistics (called bootstrap ratios) inorder to find the observations important for a givendimension.

    Contributions of a Variable to a DimensionAs we did for the observations, we can findthe important variables for a given dimension bycomputing variable contributions. The variance of theloadings for the variables is equal to one when the weights are taken into account (cf., Eq. (13)). So if wedenote by aj the weight for the j th variable (recall

    that all variables from the same table share the sameweight cf., Eq. (11)), we have

    1=

    j

    aj q2j, (26)

    where qi, is the loading of the jth variable for theth dimension. As all terms aj q2i, are positive ornull, we can evaluate the contribution of a variable toa dimension as its squared weighted loading for thisdimension. Formally, the contribution of variablej tocomponent , denoted ctrj,, is computed as

    ctrj,=aj q2

    j,. (27)

    Variable contributions take values between 0 and1, and for a given component, the contributions ofall variables sum to 1. The larger a contribution ofa variable to a component the more this variablecontributes to this component. Variables with highcontributions and whose loadings have different signscan then be contrasted to help interpreting thecomponent.

    Contribution of a Table to a DimensionSpecific to multiblock analysis is the notion of

    a table contribution. As a table comprises severalvariables, the contribution of a table can simply bedefined as the sum of the contributions of its variables(a simple consequence of the Pythagorean theoremthat states that squared lengths are additive). So thecontribution of table k to component is denotedctrk, and is defined as

    ctrk,=J[k]

    j

    ctrj,. (28)

    2013 Wiley Periodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    8/31

    Overview wires.wiley.com/compstats

    Table contributions take values between 0 and 1,and for a given component, the contributions of alltables sum to 1. The larger a contribution of a tableto a component, the more this table contributes tothis component. The contributions of the tables for agiven dimension sum to one, an alternative approach

    re-scales the contributions so that the sum of thecontributions for a dimension is now equal tothe eigenvalue of this dimension. These re-scaledcontributions are called partial inertias and aredenotedIpartial. The partial inertias are obtained fromthe contributions by multiplying the contributions fora dimension by the dimension eigenvalue.

    Table contributions and partial inertias can beused to create plots that show the importance of thesetables for the components. These plots can be drawnone component at a time or two (or rarely three)components at a time in a manner analogous to factormaps.

    How to Analyze the Between-TableStructureTo evaluate the similarity between two tables onecan compute coefficients of similarity between datatables. A traditional coefficient is Escoufiers RVcoefficient (see Refs 92,93, see, also Refs 94 and95 for alternatives), which can be interpreted as a noncentered squared coefficient of correlation betweentwo matrices. TheRVcoefficient varies between 0 and1 and reflects the amount of variance shared by two

    matrices. Specifically, theRVcoefficient between datatablesk and k is computed as

    RVk,k=trace

    X[k]X

    T[k]

    X[k]XT[k]

    trace

    X[k]X

    T[k]

    X[k]XT[k]

    trace

    X[k]X

    T[k]

    X[k]XT[k]

    .

    (29)

    A slightly different coefficient, called the Lgcoefficient, is also often used in the context of MFA.This coefficient reflects the MFA normalization andtakes positive values. Specifically, the Lgcoefficientbetween data tablesk and k is computed as

    Lg(k,k)=trace

    X[k]X

    T[k]

    X[k]XT[k]

    2

    1,k 2

    1,k

    =trace

    X[k]XT[k]

    X[k]XT[k]

    (k k ). (30)

    An eigen-decomposition of theK by K between-table of RV or Lg coefficients can provide factorscores for the tables that can be used to plot maps ofthe tables in an analogous way to the STATIS method(see Ref 3).

    ALTERNATIVE PRESENTATIONOF MFA

    MFA with Cross-Product MatricesAn alternative computational approach to MFAparticularly useful for generalizing to other data typesand for comparison to other methodsuses the cross-product matrices S[k]. In this context, the first step isto compute an average cross-product matrix (calleda compromise by analogy with the STATIS method,see 3) which is computed as the weighted averageof the S[k] matrices with weights provided by the

    elements of . Specifically, the compromise cross-product matrix is denotedS[+] and is computed as

    S[+]=Kk

    kS[k]. (31)

    Note that S[+] can also be directly computed fromXas

    S[+]=XAXT. (32)

    The compromise matrix being a weighted sumof cross-product matrices, is also a cross-productmatrix (i.e., it is a positive semi-definite matrix)and therefore its eigendecomposition amounts to aPCA. The generalized eigendecomposition under theconstraints provided by matrixM of the compromisegives:

    S[+]=PPT with PTMP=I. (33)

    Eqs (32) and (13), together indicate that thegeneralized eigenvectors of the compromise are theleft generalized singular vectors of X (cf., Eq. (13))and that the eigenvalues ofS[+] are the squares of thesingular values ofX (i.e., = 2). The loadings canbe computed by rewriting Eq. (13) as

    Q=XTMP1. (34)

    Similarly, the compromise factor scores can becomputed fromS[+] (cf., Eq. (20)) as

    F=S[+]MP1. (35)

    2013 Wiley Per iodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    9/31

    WIREs Computational Statistics Multiple factor analysis

    In this context, the loadings for the variables fromtablek are obtained from Eq. (34) as

    Q[k]=XT[k]MP1. (36)

    The factor scores for table k are obtained from Eqs(36) and (22) as

    F[k]=KkX[k]Q[k]=KkX[k]XT[k]MP1

    =KkS[k]MP1. (37)

    MFA as Simple PCAMFA can also be computed as the simple PCA of theset of the X[k] matrices, each weighted by the squareroot of its respective weight (this assumes, as it isthe case in general for MFA, that matrix M is equalto 1II). Specifically, if we define the matrix

    X= 1X[1]| . . . |[k]X[k]| . . . |KX[K] , (38)whose (simple) SVD is given by

    X=PQT withPTP=QTQ=I. (39)Then the factor scores for the observations can beobtained as

    F=P. (40)The loadings for thekth table are obtained as

    Q[k]=

    1

    Q[k]. (41)To prove this last identity, we use the relations

    between the simple SVD and the GSVD (see Refs74,76,78,79 for more details). To do so, we firstneed to re-expressXas a function ofX andA as:X= 1X[1]| |[k]X[k]| |KX[K] =XA 12 .

    (42)

    To obtain this result we use here the facts (1) thatA is defined as blocks ofK values [k] and (2) thatA being a diagonal matrix with positive elements

    is positive definite (and therefore its square root isuniquely determined). Rewriting the GSVD of X interms ofXshows that

    X=XA 12= PQTA 12 (43)and therefore that

    Q=QA 12 , (44)which is equivalent to Eq. (41) (and completes theproof).

    MFA from the Table Factor ScoresThe whole set factor scores derived from the PCAof one of the tables (i.e., from Eq. (8)) spansthe same space as the column space of this table.Therefore, the MFA can be computed directly fromthese factor scores, and so the MFA factor scores

    (and loadings) can be obtained from the SVD ofthe table factor scores (as defined in Eq.(8)) insteadthan from the SVD of X. This approach can beparticularly useful when the number of variables insome tables is significantly larger than the numberof observations (a configuration referred to as theN

  • 8/12/2019 Abdi WiresCS Mfa 2013

    10/31

    Overview wires.wiley.com/compstats

    SUPPLEMENTARY ELEMENTS (A.K.A.OUT OF SAMPLE)

    As in standard PCA, we can use the resultsof the analysis to compute approximate statistics(e.g., factor scores, loadings, or optimum weights)

    for new elements (i.e., elements that have not beenused in the analysis). These new elements arecalled supplementary,75,78,8991,102105 illustrative, orout of sample elements.106 In contrast with thesupplementary elements, the elements actually usedin the analysis are calledactiveelements. The statisticsfor the supplementary elements are obtained byprojecting these elements onto the active space. In theMFA framework, we can have supplementary rowsand columns (like in PCA) but also supplementarytables. Supplementary rows for which we have valuesfor all J variables and supplementary variables forwhich we have measurements for all Iobservations

    are projected in the same way as for PCA (see Ref 75,pp. 436ff.). Computing statistics for supplementarytables, however, is specific to MFA.

    Supplementary Rows and ColumnsAs MFA is a generalized PCA, we can addsupplementary rows and columns as in standardPCA (see Ref 75 for details). Note incidentally, thatthis procedure assumes that the supplementary rowsand columns are scaled in a manner comparableto the rows and columns of the original matrix X.Specifically, from Eqs (22) and (13), we can computethe factor scores, denoted fsup for a supplementaryobservation (i.e., a supplementary row of dimensions1 byJrecording measurements on the same variablesas the whole matrix X). This supplementary row isrepresented by a 1 by Jvector denoted rTsup (whichhas been preprocessed in the same way as X), thesupplementary factor scores are computed as

    fsup=rTsupAQ. (45)

    Loadings are denoted qsup for a new columnwhich is itself denoted by an I by 1 vector osup

    (note that osup needs to have been pre-processed in away comparable with the tables). These loadings areobtained, in a way similar to Eq. (45) as

    qsup=osupMP1. (46)

    Supplementary Partial ObservationsIn some cases, we have supplementary observa-tions for only one (or some) table(s). In this case,called a supplementary partial observation, we canobtain the supplementary partial factor scores for

    this observation from Eq. (22). Specifically, let xTsup[k]denote a 1 byJ[k]vector of measurements collected onthe J[k] variables of tablek (note that x

    Tsup[k] should

    have been pre-processed in the same way as the wholematrixX[k]). The partial factor scores for this supple-mentary observation from tablek are obtained as:

    fsup[k]=K xTsup[k] Q[k]. (47)

    Incidentally, the factor scores of a supplementaryobservation collected on all tables can also be obtainedas the average of the supplementary partial factorscores (see Eqs (23) and (45)).

    To compute the loadings for a supplementaryvariable for a specific table, it suffices to pre-processthis variable like the variables of this table (and thisincludes dividing all the data of the table by the firstsingular value of this table) and then to use Eq. (46).

    Supplementary TablesAs MFA involves tables, it is of interest to be ableto project a whole table as a supplementary element.This table will include new variables measured on thesame observations described by the active tables. Sucha table is represented by the I by Jsup matrix Ysup.The matrix Ysup is preprocessed in the same manner(e.g., centered, normalized) as the Y[k] matrices togive the supplementary matrix Xsup which is alsonormalized by dividing all its elements by its firstsingular value (i.e., this normalization ensures that the

    first singular value ofXsupis equal to 1). This matrixwill provide supplementary factor scores and loadingsfor the compromise solution, as described below.

    Factor ScoresTo obtain the factor scores for a new table, the firststep is to obtain (from Eq. (46)) the supplementaryloadings which are computed as

    Qsup=XTsupMP1. (48)

    Then, using Eq. (22), (and taking into account thatit first singular value is equal to one) we obtain the

    supplementary factor scores for the new tableXsupas

    Fsup=KXsupQsup=KXsupXTsupMP1

    =KSsupMP1. (49)

    INFERENTIAL ASPECTS

    MFA is a descriptive multivariate technique, but itis often important to be able to complement thedescriptive conclusions of an analysis by assessing

    2013 Wiley Per iodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    11/31

    WIREs Computational Statistics Multiple factor analysis

    if its results are reliable and replicable. For example,a standard question, in the PCA framework, is to findthe number of reliable components and most of theapproaches used in PCA will work also for MFA. Forexample, we can use of the informal scree test (a.k.aelbow) and the more formal tests RESS, PRESS, and

    Q2

    statistics (see for details, Refs 75, p. 440ff. andalso 107109). We can also use techniques such asthe jackknife110 or the bootstrap111,155 to identifyimportant observations, variables, or tables. Theseapproaches are implemented differently if we considerthe observations, variables, or even tables as being arandom or a fixed factor (see Ref 112).

    Bootstrap for the Factor ScoresIf we assume that the tables constitute a randomfactor (i.e., the tables are independent and identicallydistributedi.i.dand sampled from a potentially

    infinite set of tables) and if we consider theobservations as a fixed factor, we may want to estimatethe stability of the compromise factor scores. Such anevaluation could be done, for example, by using ajackknife or a bootstrap approach.

    We briefly sketch here a possible bootstrapapproach for the factor scores (see Ref 113 for theproblem of bootstrapping in the context of the SVD,and Refs 114,115 for a review of the bootstrap in thePCA context, and Refs 116119 for recent applica-tions and developments to MFA). The main idea is touse the properties of Eq. (23) which indicate that thecompromise factor scores are the average of the par-tial factor scores. Therefore, we can obtain bootstrapconfidence intervals (CIs) by repeatedly sampling withreplacement from the set of tables and compute newcompromise factor scores (this approach correspondsto the partial bootstrap of Ref 114, see Ref 59 foran alternative approach using split-half resampling).From these estimates we can also compute bootstrapratios for each dimension by dividing the mean ofthe bootstrap estimates by their standard deviation.These bootstrap ratios are akin totstatistics and canbe used to detect observations that reliably contributeto a given dimension. So, for example, for a given

    dimension and a given observation a value of thebootstrap ratio larger than 2 will be considered reli-able (by analogy with a tlarger than 2 which wouldbe significant atp< .05). When evaluating bootstrapratios, the multiple comparisons problem can be takeninto account by using, for example, a Bonferroni-typecorrection (see Ref 120) and instead of using criticalvalues corresponding to say p< .05 we would usevalues corresponding top < .05I .

    More formally, in order to compute a bootstrapestimate, we need to generate a bootstrap sample.

    To do so, we first take a sample of integers withreplacement from the set of integers from 1 to K.Recall that, when sampling with replacement, anyelement from the set can be sampled zero, one, ormore than one times. We call this set B(for bootstrap).For example with five elements, a possible bootstrap

    set could be B= {1, 5, 1, 3, 3}. We then generatea new data set (i.e., a new X matrix comprising Ktables) using matricesX[k] with these indices. So withK=5, this would give the following bootstrap set

    X[1], X[5], X[1], X[3], X[3]

    . (50)

    From this set we would build a data matrix denotedX1 that would then be analyzed by MFA. This analy-sis would provide a set ofbootstrapped factor scores(denotedF1) obtained by projecting the bootstrappeddata table as a supplementary element (see Eqs (37)and (49)). Interestingly, as a consequence of thebarycentric properties of the factor scores (see Eq.(23)), this last step can also be directly obtained bycomputing F1 as the weighted average of the corre-sponding partial factor scores. We then repeat theprocedure a large number of times (e.g., L=1000)and generateLbootstrapped matrices of factor scoresF. From these bootstrapped matrices of factor scores,we can derive CIs and estimate the mean factorscores as the mean of the bootstrapped factor scores.Formally, F

    denotes the bootstrap estimated factor

    scores, and is computed as

    F= 1L

    L

    F . (51)

    In a similar way, the bootstrapped estimate of thevariance is obtained as the variance of theFmatrices.Formally,2F denotes the bootstrapped estimate ofthe variance and is computed as

    2F= 1L

    L

    F F

    F F

    . (52)

    Bootstrap ratios (denoted T) are computed bydividing the bootstrapped means by the correspondingbootstrapped standard deviations (denoted F and isthe square root of the bootstrapped estimate of thevariance). These bootstrap ratios are often interpretedas Studentststatistics.

    Bootstrapped Confidence IntervalsThe bootstrap factor scores (i.e., the Fs) can alsobe used to compute CIs for the observations. For agiven dimension, the bootstrapped factor scores of an

    2013 Wiley Periodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    12/31

    Overview wires.wiley.com/compstats

    observation can be ordered from the smallest to thelargest and a CI for a given p value can be obtainedby trimming the upper and lower 1p proportion of the

    distribution. In general, for a given dimension, thebootstrap ratios, and the CIs will agree in detectingthe relevant observations. In addition, CIs can also be

    plotted directly on the factor scores map as confidenceellipsoids or confidence convex-hulls which comprise a1 p proportion of the bootstrapped factor scores (seeRef 59 and our example illustration in Figure 8). Whenthe ellipsoids or convex-hulls of two observations donot overlap, these two observations can be consideredas reliably different. Like for the bootstrap ratios, inorder to correct for the potential problem of multiplecomparisons, a Bonferroni type of correction can alsobe implemented when plotting hulls or ellipsoids(see Ref 59 for details). Recent work118 suggeststhat these CIs could be too liberal (i.e., too small)when the number of tables is large and that a betterprocedure would be to use instead a total bootstrap(i.e., recomputing the factors scores from the wholebootstrap tables rather than from the partial factorscores cf., Ref 114).

    AN EXAMPLE

    To illustrate MFA, we selected a (fictitious) examplepreviously used (see Ref 3) to illustrate theSTATIS method which is a possible alternative toMFA. In this example, a set of wines was describedby a group of expert tasters called assessors. This

    type of data could be analyzed using a standard PCA,but this approach obviously neglects the inter-assessordifferences. MFA has the advantages of providing acommon space for the products (i.e., the factor scores),as well as information about how each assessor relatesto this common space (i.e., the partial factor scores).

    Specifically, this example concerns 12 winesmade from Sauvignon Blanc grapes coming from threewine regions (four wines from each region): NewZealand, France, and Canada. Ten expert assessorswere asked to evaluate these wines. The assessorswere asked (1) to evaluate the wines on 9-point rating

    scales, using four variables considered as standardfor the evaluation of these wines (cat-pee, passion-fruit, green pepper, and mineral) and, (2) if they feltthe need, to add some variables of their own (someassessors choose none, some choose one or two morevariables). The raw data are presented in Table 1. Thegoals of the analysis were twofold:

    1 to obtain a typology of the wines, and

    2 to discover agreement (if any) between theassessors.

    For the example, the data consist of K=10tables (one for each assessor) shown in Table 1. Forexample, the first table denotedY[1] is equal to

    Y[1]=

    8 6 7 4 1 67 5 8 1 2 86 5 6 5 3 4

    9 6 8 4 3 52 2 2 8 7 33 4 4 9 6 15 3 5 4 8 35 2 4 8 7 48 6 8 4 4 74 6 2 5 3 48 4 8 1 3 35 3 6 4 4 2

    . (53)

    Each table was then preprocessed by firstcentering and normalizing each column such that itsmean is equal to 0 and the sum of the square values

    of all its elements is equal to 1. For example,X[1], thepre-processed matrix for Assessor 1 is equal to:

    X[1]=

    0.30 0.32 0.18 0.09 0.44 0.270.16 0.13 0.31 0.45 0.31 0.570.02 0.13 0.04 0.03 0.17 0.020.43 0.32 0.31 0.09 0.17 0.12

    0.520.45 0.49 0.39 0.37 0.170.390.06 0.22 0.51 0.24 0.470.110.26 0.09 0.09 0.51 0.170.110.45 0.22 0.39 0.37 0.02

    0.30 0.32 0.31 0.09 0.03 0.420.25 0.32 0.49 0.03 0.17 0.02

    0.300.06 0.31 0.45 0.17 0.170.110.26 0.04 0.09 0.03 0.32

    .

    (54)

    PCA of the Data TablesA PCA of each of the data table will then beperformed. For example, the SVD of the first datatable gives

    X[1] = U[1][1]VT[1] withUT[1]U[1] = VT[1]V[1] = I, (55)

    with

    U[1]=

    0.32 0.26 0.03 0.09 0.32 0.220.38

    0.06 0.35

    0.49 0.17 0.41

    0.06 0.14 0.15 0.07 0.14 0.100.30 0.00 0.06 0.45 0.13 0.23

    0.49 0.06 0.24 0.23 0.19 0.010.38 0.20 0.28 0.41 0.00 0.520.210.40 0.12 0.08 0.65 0.010.310.16 0.42 0.19 0.32 0.46

    0.29 0.05 0.39 0.29 0.26 0.250.07 0.63 0.18 0.39 0.37 0.30

    0.210.45 0.44 0.14 0.01 0.230.100.27 0.39 0.17 0.27 0.16

    ,

    (56)

    2013 Wiley Per iodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    13/31

    WIREs Computational Statistics Multiple factor analysis

    TABLE 1 Raw Data (TablesY[1] ThroughY[10])

    Assessor 1 Assessor 2 Assessor 3 Assessor 4 Assessor 5

    V1 V2 V3 V4 V5 V6 V1 V2 V3 V4 V7 V8 V1 V2 V3 V4 V9 V10 V1 V2 V3 V4 V8 V1 V2 V3 V4 V11 V12

    NZ1 8 6 7 4 1 6 8 6 8 3 7 5 8 6 8 3 7 2 9 5 8 2 6 9 6 9 3 8 2

    NZ2 7 5 8 1 2 8 6 5 6 3 7 7 8 7 7 2 8 2 8 7 7 3 5 7 7 7 1 9 2

    NZ3 6 5 6 5 3 4 6 6 6 5 8 7 8 7 7 6 9 1 8 8 9 2 7 7 7 7 1 7 2

    NZ4 9 6 8 4 3 5 8 6 8 4 6 6 8 2 8 3 9 3 8 8 9 4 7 8 9 7 5 6 1

    FR1 2 2 2 8 7 3 2 3 1 7 4 3 3 4 3 6 4 6 4 2 2 4 3 4 4 4 2 4 4

    FR2 3 4 4 9 6 1 4 3 4 9 3 5 4 3 4 8 3 9 3 2 2 6 2 4 5 5 6 1 5

    FR3 5 3 5 4 8 3 3 3 2 7 4 4 5 4 5 2 3 6 4 4 4 6 4 6 5 7 2 3 1

    FR4 5 2 4 8 7 4 4 3 5 5 3 3 6 3 7 7 1 7 5 2 2 9 4 6 6 5 8 4 5

    CA1 8 6 8 4 4 7 8 6 9 5 5 6 8 5 9 1 5 2 7 5 6 3 2 8 6 8 2 5 4

    CA2 4 6 2 5 3 4 5 5 5 6 5 8 5 5 4 6 5 1 5 6 6 4 4 6 6 6 4 6 3

    CA3 8 4 8 1 3 3 8 4 8 3 7 7 8 3 7 3 5 4 7 3 6 1 6 7 4 8 4 5 1

    CA4 5 3 6 4 4 2 5 3 7 4 8 5 5 4 4 5 4 3 5 2 2 6 6 5 5 5 5 6 1

    Assessor 6 Assessor 7 Assessor 8 Assessor 9 Assessor 10

    V1 V2 V3 V4 V13 V1 V2 V3 V4 V1 V2 V3 V4 V14 V5 V1 V2 V3 V4 V15 V1 V2 V3 V4

    NZ1 8 5 6 2 9 8 5 8 4 7 6 7 4 9 2 8 6 9 1 7 8 6 7 5

    NZ2 6 6 6 2 4 7 6 8 4 6 5 6 2 7 2 8 7 9 1 6 7 5 7 3

    NZ3 7 7 7 2 7 6 7 6 3 6 6 6 4 9 2 7 7 8 4 7 7 6 6 2

    NZ4 8 7 8 2 8 7 8 6 1 8 7 8 2 8 2 8 9 9 3 9 8 7 7 4

    FR1 3 2 2 7 2 4 2 3 6 3 3 4 4 4 4 3 4 4 5 4 2 3 1 7

    FR2 3 3 3 3 4 4 4 4 4 4 4 4 7 3 6 5 5 5 7 2 3 3 3 9

    FR3 4 2 3 3 3 4 3 4 4 5 3 5 3 3 5 5 5 5 6 3 4 2 5 8

    FR4 5 3 5 9 3 5 3 5 7 6 4 6 3 2 4 5 5 6 5 3 3 4 2 8

    CA1 7 7 7 1 4 8 4 9 4 8 6 5 4 5 4 8 7 8 4 7 8 6 7 4

    CA2 4 6 2 4 6 4 7 5 2 5 7 5 4 6 1 5 6 4 5 6 5 6 4 4

    CA3 7 4 8 2 3 8 5 7 3 7 4 8 2 6 2 8 4 7 4 5 7 4 8 5

    CA4 4 5 3 3 7 4 3 5 2 5 4 6 2 4 3 5 4 5 3 4 5 4 6 6

    V1, Cat Pee; V2, Passion Fruit; V3, Green Pepper; V4, Mineral; V5, Smoky; V6, Citrus; V7, Tropical; V8, Leafy; V9, Grassy; V10, Flinty; V11, Vegetal; V12,Hay; V13, Melon; V14, Grass; V15, Peach.

    [1]=diag

    2.0370.8860.7010.5890.402

    0.255

    , (57)

    and

    V[1]=

    0.450.25 0.02 0.44 0.03 0.730.38 0.62 0.17 0.26 0.58 0.200.420.47 0.04 0.39 0.19 0.65

    0.40 0.39 0.20 0.69 0.41 0.020.410.38 0.41 0.25 0.67 0.06

    0.37 0.19 0.87 0.22 0.12 0.05

    .(58)

    From this PCA we obtain the following factorscores for the first table:

    G[1]=U[1][1]

    =

    0.65 0.23 0.02 0.05 0.13 0.060.770.05 0.25 0.29 0.07 0.110.13 0.13 0.11 0.04 0.06 0.030.60 0.00 0.04 0.26 0.05 0.06

    0.99 0.05 0.17 0.14 0.08 0.000.77 0.18 0.19 0.24 0.00 0.130.430.35 0.09 0.05 0.26 0.000.640.15 0.29 0.11 0.13 0.12

    0.59 0.04 0.27 0.17 0.10 0.060.15 0.56 0.12 0.23 0.15 0.08

    0.430.40 0.31 0.08 0.00 0.060.200.24 0.27 0.10 0.11 0.04

    .(59)

    2013 Wiley Periodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    14/31

    Overview wires.wiley.com/compstats

    From Eq. (57), we have found that the firstsingular value of X[1] had a value of 2.037, andtherefore the weight for the first table is obtained as(cf., Eq. (10)):

    1

    =1

    2

    1,1 =21,1

    =1

    2.0372=

    0.241. (60)

    Creating the Weight VectorWe collect the value of the weights of the Ktables into aK by one weight vector denoted by

    =

    0.241

    0.239

    0.275

    0.273

    0.307

    0.3020.417

    0.272

    0.264

    0.309

    . (61)

    There areK=10 values in (see Eqs (2) and (61)).These values are stored in the J=53 1 vector

    a, which can itself be used to fill in the diagonalelements of the 53 53 diagonal matrixA:

    a=

    1[1]

    0.241

    1[2] 0.2391[3] 0.2751[4] 0.2731[5] 0.3071[6] 0.3021[7] 0.4171[8] 0.2721[9] 0.264

    1[10] 0.309

    and

    A=diag {a}=diag

    1[1]

    0.241

    1[2] 0.2391[3] 0.2751[4] 0.2731[5] 0.3071[6] 0.3021[7] 0.4171[8] 0.2721[9] 0.264

    1[10] 0.309

    , (62)

    where1[k] is anJ[k] by 1 vector of ones.

    Generalized PCA of XThe 12 12 diagonal mass matrix M for theobservations (with equal masses set to m i= 1I= .08)is given as

    M=diag {m}

    =diag

    1

    12 1

    =diag

    0.080.080.080.080.080.080.080.080.080.080.08

    .(63)

    The GSVD of X with matrices M (Eq. (63))and A (Eq. (62)) gives X=PQT. The eigenvalues(denoted ) are equal to the squares of the singularvalues and are often used to informally decide upon thenumber of components to keep for further inspection.The eigenvalues and the percentage of inertia that theyexplain are given in Table 2

    The pattern of the eigenvalue distributionsuggests to keep two or three dimensions for futureexamination and we decided (somewhat arbitrarily)to keep only the first two dimensions for this example.

    The matrix Q of the right singular vectors(loadings for the variables) is given in Table 3, thematrixP of the right singular vectors and the matrix of the singular values are given below:

    P=

    1.117 0.4660.922 0.0930.867 1.2951.270 0.473

    1.564 0.3661.440 0.3080.921 0.584

    1.054 1.1630.762 1.0510.083 2.158

    0.542 1.4630.418 0.217

    =diag

    0.8780.351

    .

    (64)

    Factor ScoresThe factor scores for X show the best two-dimensionalrepresentation of the compromise of the K tables.Using Eqs (18), (20), and (64), we obtain:

    2013 Wiley Per iodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    15/31

    WIREs Computational Statistics Multiple factor analysis

    TABLE 2 Eigenvalues and Percentage of Explained Inertia of the MFA ofX

    Component

    1 2 3 4 5 6 7 8 9 10 11

    Singular value() 0.878 0.351 0.301 0.276 0.244 0.198 0.176 0.158 0.137 0.116 0.106

    Eigenvalue(

    =2) 0.770 0.123 0.091 0.076 0.060 0.039 0.031 0.025 0.019 0.013 0.011

    cumulative 0.770 0.893 0.984 1.060 1.120 1.159 1.190 1.215 1.233 1.247 1.258

    % Inertia() 61 10 7 6 5 3 2 2 1 1 1

    cumulative 61 71 78 84 89 92 94 96 97 98 100

    F=

    P=

    XAQ=

    0.980 0.1630.809 0.0330.761 0.4541.115 0.166

    1.373 0.1281.264 0.1080.808 0.2050.925 0.408

    0.669 0.3690.073 0.757

    0.476 0.5130.367 0.076

    . (65)

    In the F matrix, each row represents a wineand each column is a component. Figure 2a shows thewines in the space created by the first two components.The first component (with an eigenvalue equal to1=0.8782 =0.770) explains 61% of the inertia, andcontrasts the French and New Zealand wines. The sec-

    ond component (with an eigenvalue of2=0.3512 =0.123) explains 10% of the inertia and is more delicateto interpret from the wines alone (its interpretationwill becomes clearer after looking at the loadings).

    Partial Factor ScoresThe partial factor scores (which are the projections ofthe tables onto the compromise) are computed fromEq. (22). For example, for the first assessor, the partialfactor scores of the 12 wines are obtained as:

    F[1]=K1X[1]Q[1]

    =10 0.241

    1.037 0.1551.179 0.5960.213 0.1040.946 0.446

    1.546 0.6761.176 0.7470.698 0.1661.006 0.063

    0.922 0.4860.189 0.936

    0.643 0.6400.323 0.036

    (66)

    and are displayed in Figure 3 (i.e., first top left panel,see also Figure 2). From Figure 3, we can see that,for all the assessors, Component 1 separates theNew Zealand from the French Sauvignon Blancs,a configuration replicating the pattern seen in thecompromise (Figure 2a). However, the assessors showa large inter-individual differences in how they rated

    the Canadian wines.The original variables are analyzed, as in

    standard PCA, by computing loadings which are givenin Table 3 for the first two dimensions. The loadingsare also plotted in a biplot fashion in Figure 3. Herewe show the partial factor scores (of the wines) alongwith the loadings for each assessor (which we havere-scaled so that their variance, for each dimension,is equal to the singular value of the compromise).From these plots, we can see that for all the assessors,the New Zealand wines are rated as having a morecat-pee aroma, with some green pepper and passion

    fruit, while the French wines are rated as being moremineral, smoky, or hay-like.

    Determining the Importance of the Tablesin the CompromiseThere are two ways to determine which tables playthe largest role in the compromise: contributions andpartial inertias. The contribution of a table reflectsthe proportion of the variance of a dimension thatcan be attributed to this table (see Eq. (28)). Thelarger the contribution of a table to a component, themore important this table is for this component. Thecontributions for the tables are as follows:

    ctrk,=

    0.101 0.0950.100 0.0680.101 0.1520.096 0.0490.098 0.0630.101 0.1040.102 0.2240.096 0.1340.100 0.0530.105 0.057

    . (67)

    2013 Wiley Periodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    16/31

    Overview wires.wiley.com/compstats

    TABLE3

    Weights

    ,Loadings,Squ

    aredLoadings,andContributionsfortheMFA

    ofX

    Assessor1

    Assessor2

    Assessor3

    V1

    V2

    V3

    V4

    V5

    V6

    V1

    V2

    V3

    V4

    V7

    V8

    V1

    V2

    V3

    V4

    V9

    V10

    -Weights

    0.2

    41

    0.2

    41

    0

    .241

    0.2

    41

    0.2

    41

    0.2

    41

    0.2

    39

    0.2

    39

    0.2

    39

    0.2

    39

    0.2

    39

    0.2

    39

    0.2

    75

    0.2

    75

    0.2

    75

    0.27

    5

    0.2

    75

    0.2

    75

    Loadings(Q)

    Dim1

    0

    .294

    0

    .267

    0

    .260

    0.2

    41

    0.2

    86

    0

    .233

    0

    .297

    0

    .296

    0

    .267

    0.2

    56

    0

    .238

    0.2

    22

    0

    .305

    0

    .136

    0

    .258

    0.20

    3

    0

    .277

    0.2

    67

    Dim2

    0.3

    18

    0

    .248

    0

    .396

    0

    .184

    0.1

    61

    0.1

    29

    0.1

    83

    0

    .178

    0.2

    00

    0

    .240

    0

    .113

    0.3

    33

    0.2

    34

    0

    .228

    0.3

    79

    0

    .365

    0

    .297

    0.2

    83

    Squaredloadings

    Dim1

    0.0

    87

    0.0

    71

    0

    .068

    0.0

    58

    0.0

    82

    0.0

    54

    0.0

    88

    0.0

    88

    0.0

    71

    0.0

    66

    0.0

    57

    0.0

    49

    0.0

    93

    0.0

    18

    0.0

    67

    0.04

    1

    0.0

    77

    0.0

    71

    Dim2

    0.1

    01

    0.0

    62

    0

    .157

    0.0

    34

    0.0

    26

    0.0

    17

    0.0

    34

    0.0

    32

    0.0

    40

    0.0

    58

    0.0

    13

    0.1

    11

    0.0

    55

    0.0

    52

    0.1

    44

    0.13

    3

    0.0

    88

    0.0

    80

    Contribution

    1000

    Dim1

    21

    17

    16

    14

    20

    13

    21

    21

    17

    16

    14

    12

    26

    5

    18

    11

    21

    20

    Dim2

    24

    15

    38

    8

    6

    4

    8

    8

    10

    14

    3

    27

    15

    14

    39

    37

    24

    22

    Assessor4

    Assessor5

    Assessor6

    A

    ssessor7

    V1

    V2

    V3

    V4

    V8

    V1

    V2

    V3

    V4

    V11

    V12

    V1

    V2

    V3

    V4

    V13

    V1

    V2

    V3

    V4

    -Weights

    0.2

    73

    0.2

    73

    0.2

    73

    0.2

    73

    0.2

    73

    0.3

    07

    0.3

    07

    0

    .3070

    .307

    0.3

    070

    .307

    0.3

    02

    0.3

    02

    0.3

    02

    0.3

    02

    0.3

    02

    0.4

    17

    0

    .417

    0.4

    170

    .417

    Loadings(Q)

    Dim1

    0

    .313

    0.2

    61

    0.3

    03

    0.2

    30

    0.2

    05

    0.2

    96

    0.2

    130

    .2680

    .124

    0.2

    59

    .177

    0

    .302

    0.2

    77

    0.2

    65

    0.2

    31

    0.2

    05

    0.2

    750

    .246

    0.2

    770

    .180

    Dim2

    0.0

    82

    0.3

    53

    0.1

    69

    0.0

    66

    0.1

    17

    0.2

    01

    0.2

    490

    .2580

    .132

    0.1

    44

    .019

    0.2

    15

    0.2

    74

    0.3

    28

    0.0

    31

    0.3

    40

    0.3

    800

    .410

    0.2

    900

    .376

    SquaredLoadings

    Dim1

    0.0

    98

    0.0

    68

    0.0

    92

    0.0

    53

    0.0

    42

    0.0

    88

    0.0

    45

    0

    .0720

    .015

    0.0

    67

    .031

    0.0

    91

    0.0

    77

    0.0

    70

    0.0

    53

    0.0

    42

    0.0

    76

    0

    .061

    0.0

    760

    .032

    Dim2

    0.0

    07

    0.1

    25

    0.0

    29

    0.0

    04

    0.0

    14

    0.0

    40

    0.0

    62

    0

    .0670

    .017

    0.0

    21

    .000

    0.0

    46

    0.0

    75

    0.1

    07

    0.0

    01

    0.1

    16

    0.1

    44

    0

    .168

    0.0

    840

    .141

    Contribution

    1000

    Dim1

    27

    19

    25

    14

    12

    27

    14

    22

    5

    20

    10

    28

    23

    21

    16

    13

    31

    25

    32

    14

    Dim2

    2

    34

    8

    1

    4

    12

    19

    20

    5

    6

    0

    14

    23

    32

    0

    35

    60

    70

    35

    59

    2013 Wiley Per iodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    17/31

    WIREs Computational Statistics Multiple factor analysis

    TABLE3

    Continued

    Assessor8

    Assessor9

    Asses

    sor10

    V1

    V2

    V3

    V4

    V14

    V5

    V1

    V2

    V3

    V4

    V15

    V1

    V2

    V3

    V4

    -Weights

    0.2

    72

    0.2

    72

    0.2

    72

    0.2

    72

    0.2

    72

    0.27

    2

    0.2

    64

    0.2

    64

    0.2

    64

    0.2

    64

    0.2

    64

    0.3

    09

    0.3

    09

    0.3

    09

    0.3

    09

    Loadings(Q)

    Dim1

    0

    .276

    0

    .247

    0

    .235

    0.1

    38

    0

    .286

    .239

    0

    .303

    0

    .235

    0

    .287

    0.2

    51

    0

    .296

    0

    .323

    0

    .274

    0

    .286

    0.2

    82

    Dim2

    0.3

    09

    0

    .376

    0.2

    31

    0

    .219

    0

    .261

    0.29

    3

    0.2

    41

    0

    .221

    0.2

    26

    0.0

    83

    0

    .188

    0.0

    80

    0

    .262

    0.1

    87

    0.2

    72

    SquaredLoadings

    Dim1

    0.0

    76

    0.0

    61

    0.0

    55

    0.0

    19

    0.0

    82

    0.05

    7

    0.0

    92

    0.0

    55

    0.0

    82

    0.0

    63

    0.0

    88

    0.1

    04

    0.0

    75

    0.0

    82

    0.0

    80

    Dim2

    0.0

    96

    0.1

    42

    0.0

    54

    0.0

    48

    0.0

    68

    0.08

    6

    0.0

    58

    0.0

    49

    0.0

    51

    0.0

    07

    0.0

    35

    0.0

    06

    0.0

    69

    0.0

    35

    0.0

    74

    Contribution

    1000

    Dim1

    21

    17

    15

    5

    22

    16

    24

    14

    22

    17

    23

    32

    23

    25

    25

    Dim2

    26

    39

    15

    13

    19

    23

    15

    13

    13

    2

    9

    2

    21

    11

    23

    V1

    ,CatPee;V2

    ,PassionFruit;V3

    ,GreenPepper;V4

    ,Mineral;V5

    ,Smoky;V6

    ,Citrus;V7

    ,Tropical;V8

    ,Leafy;V9

    ,Grassy;V10

    ,Flinty;V

    11

    ,Vegetal;V12

    ,Hay;V13

    ,Melon;V14

    ,Grass;

    V15

    ,Peach.

    The contributions can also be plotted to obtain avisual representation of the importance of the studies.Figure 4 shows the relative contributions of each ofthe tables to Components 1 and 2. From this figure,we can see that Assessor 10 contributes the most to thefirst component of the compromise, while Assessor 7

    contributes most to the second component. Assessor 5,by contrast, contributes the least to both Components1 and 2.

    In addition to using contributions, we candetermine a tables importance with partial inertiathat gives the proportion of the compromise vari-ance (i.e., inertia) explained by the table. This isobtained by multiplying, for each dimension, thetable contribution by the eigenvalue of the dimen-sion. For our example, the partial inertias denotedIpartialare as follows:

    Ipartial=

    0.0779 0.0117

    0.0771 0.0084

    0.0778 0.0186

    0.0743 0.0060

    0.0751 0.0078

    0.0776 0.0128

    0.0787 0.0275

    0.0736 0.0165

    0.0771 0.0065

    0.0810 0.0070

    . (68)

    Like the contributions, the partial inertia can be

    plotted to get a visual representation. From Figure 5,we see that Assessor 10 accounts for the mostinertia on the first dimension, while Assessor 5accounts for the lowest proportion of the totalvariance.

    Supplementary TableAs we would like to know what qualities of the winesare associated with the assessors ratings, we wantedto include some chemical components of the winesas variables, namely, titratable acidity, pH, alcohol,and residual sugar. The values for these variables are

    shown in Table 4. However, because these propertiesare qualitatively different from the assessors ratings(i.e., did not come from the same population), wedid not want to include them as active elementsin the analysis and therefore projected them as asupplementary table.

    To obtain the factor scores, the first stepis to center and normalize each variable of thesupplementary data table. Then after computing itsfirst singular value (equal to 1.3867), each element ofthe centered and normalized table is divided by the

    2013 Wiley Periodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    18/31

    Overview wires.wiley.com/compstats

    (a) (b)

    F I G U R E 2 | Compromise of the 10 tables. (a) Factor scores (wines). (b) Assessors partial factor scores projected into the compromise assupplementary elements. Each assessor is represented by a dot, and for each wine a line connects the wine factor scores to the partial factors scores

    of a given assessor for this wine. (1=0.770,1=61%;2=0.123,2=10%).

    first singular value. This gives the matrix Xsup(whosefirst singular value is now equal to 1):

    Xsup=

    0.094 0.081 0.315 0.1390.152 0.171 0.143 0.288

    0.023 0.015 0.315 0.139

    0.470 0.032 0.143 0.3620.210 0.213 0.200 0.234

    0.039

    0.146

    0.200

    0.110

    0.307 0.051 0.029 0.4080.094 0.093 0.372 0.085

    0.295 0.033 0.029 0.0890.074 0.111 0.143 0.085

    0.023 0.033 0.200 0.0140.159 0.625 0.029 0.110

    .

    (69)

    We then compute the supplementary loadings,which are obtained as (cf., Eq. (48)):

    Qsup=XTsupMP1 =

    0.125 0.0090.024 0.0320.173 0.2980.201 0.037

    .(70)

    From these loadings, we can see that, on Component 1,the New Zealand Sauvignon Blancs are more acidic,have a greater alcohol content, and more residualsugar than the French wines.

    Next, we compute the supplementary factorscores for the first 2 components as (cf., Eq. 49):

    Fsup=KXsupQsup=KXsupXTsupMP1

    =

    0.727 0.9540.677 0.4630.857 0.9861.556 0.615

    1.030 0.7710.651 0.594

    1.241 0.281

    0.910 1.178

    0.506 0.0350.011 0.353

    0.281 0.600

    0.219 0.089

    . (71)

    In a manner similar to the biplot approach we usedfor the assessors (see Figure 3) we plotted together,

    in a biplot way, the supplementary partial factorscores and loadings for the chemical properties ofthe wines (see Figure 6). This figure confirms theinterpretation that we reached from the numericalvalues.

    BootstrapTo estimate the stability of the compromise factorscores, we used a bootstrap approach. We generated1000 bootstrap samples that gave 1000 estimatedbootstrapped factor scores. For example, for the first

    2013 Wiley Per iodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    19/31

    WIREs Computational Statistics Multiple factor analysis

    F I G U R E 3 | Partial factor scores and variable loadings for the first two dimensions of the compromise space. The loadings have been re-scaled tohave a variance equal the singular values of the compromise analysis.

    0 0.05 0.10

    0.05

    0.1

    0.15

    0.2

    1

    2

    5

    4

    3

    9

    7

    2

    6

    8

    10

    1

    F I G U R E 4 | Contributions of the tables to the compromise. Thesizes of the assessors icons are proportional to their contribution to

    Components 1 and 2.

    bootstrapped sample, we sampled with replacementin the set of the integers from 1 to 10 and obtainedthe following bootstrap set

    B= 6 4 4 2 9 3 1 1 2 8 . (72)

    0 80

    0.5

    1

    1.5

    2

    2.5

    x 103

    x 1031

    2

    1 2 3 4 5 6 7

    54

    3

    9

    7

    26

    8

    101

    F I G U R E 5 | Partial Inertias of the tables. The sizes of the assessorsicons are proportional to their explained inertia for Components 1

    and 2.

    With this sample, we then computed the bootstrappedestimate of the factor scores as the projection assupplementary element of the whole bootstrappedmatrix. Interestingly (for computational efficiency)these factor scores can directly be obtained as the

    average of the partial factor scores. Specifically, thebootstrapped estimate of the factor scores from thefirst bootstrap sample, is denoted F1 and computedas:

    F1=1

    K

    kB

    F[k]

    =F[6]+ F[4]+ F[4]+ F[2]+ F[9]+ F[3]+F[1]+ F[1]+ F[2]+ F[8]

    2013 Wiley Periodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    20/31

    Overview wires.wiley.com/compstats

    TABLE 4 SupplementaryTable: Chemical Properties of the Wines

    Chemical Properties

    Titratable Acidity pH Alcohol Residual Sugar

    NZ1 5.60 3.38 14.00 3.00

    NZ2 5.30 3.53 13.50 3.60

    NZ3 6.20 3.27 14.00 3.00

    NZ4 8.50 3.19 13.50 3.90

    F1 5.00 3.60 12.50 1.50

    F2 5.88 3.00 12.50 2.00

    F3 4.50 3.33 13.00 0.80

    F4 5.60 3.40 12.00 2.10

    CA1 7.60 3.30 13.00 2.80

    CA2 5.70 3.43 13.50 2.10

    CA3 6.20 3.30 12.50 2.50

    CA4 6.90 2.20 13.00 2.00

    =

    0.986 0.0400.817 0.0480.765 0.4321.088 0.004

    1.335 0.1571.252 0.1430.856 0.1640.919 0.393

    0.619 0.2650.096 0.692

    0.524 0.4490.340 0.059

    . (73)

    From the bootstrap estimates, we can alsocompute bootstrap ratios, which, like t statistics,can be used to find the observations that reliablycontribute to a given component. To get the bootstrapratios, we first computed the mean of the bootstrapsamples which are equal to (see Eq. (51))

    F= 1

    L

    L

    F=

    0.979 0.1630.810 0.037

    0.762

    0.456

    1.115 0.1671.374 0.1281.261 0.1100.809 0.2050.927 0.410

    0.672 0.3740.075 0.763

    0.476 0.5150.369 0.081

    , (74)

    and the standard deviations as (see Eq. (52)):

    2

    1

    1

    4

    3

    3

    4

    1

    4

    Acidity Ph

    Alcohol

    Sugar

    F I G U R E 6 | Supplementary table: chemical components of thewines. Supplementary partial scores and loadings. (cf., Figure 2a).

    F=

    0.038 0.1270.077 0.1050.093 0.0860.070 0.1560.068 0.1200.082 0.1170.094 0.0970.060 0.1100.099 0.149

    0.073 0.1390.077 0.0770.072 0.124

    . (75)

    Then, a bootstrap ratio is computed by dividingeach bootstrap mean by its standard deviation. Thisgives the following bootstrap ratios:

    T=

    25.787 1.28610.456 0.3518.212 5.270

    15.907

    1.068

    20.056 1.06115.401 0.940

    8.628 2.12215.454 3.7336.768 2.515

    1.024 5.4976.224 6.687

    5.091 0.654

    . (76)

    These bootstrap ratios can be plotted as a bar chartto give a visual representation of which observations

    2013 Wiley Per iodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    21/31

    WIREs Computational Statistics Multiple factor analysis

    NZ1

    NZ2

    NZ3

    FR1

    FR2

    FR3

    FR4

    CA1

    CA3

    CA4

    CA2

    NZ4

    Dimension 1

    +33

    NZ1

    NZ2

    NZ3

    NZ4

    FR1

    FR2

    FR3

    FR4

    CA1

    CA2

    CA3

    CA4

    Dimension 2

    +33 0

    0

    F I G U R E 7 | Bootstrap ratio plot for Components 1 and 2.

    most reliably contribute to a component (see Figure 7).To be conservative and to take into account themultiple comparison problem, we chose a bootstrapratio critical value of3. This value correspondroughly to a Bonferroni corrected p-value for J=53comparisons (i.e.,p-corrected is equal to 0.0553 0.001which approximatively corresponds to a t valueof 3).

    1

    2

    12

    43

    3

    2

    4

    2

    1

    34

    F I G U R E 8 | Bootstrap confidence ellipses plotted on Components 1and 2.

    From the bootstrap ratios shown in Figure 7 (seealso Figure 8), we can see that all of the winesexceptthe second Canadian Sauvignon Blanccontributereliably to the first component, with the New Zealand

    wines separated from the French wines. This confirmsour previous interpretation of Figure 2a. However,only 4 wines contribute reliably to the secondcomponent, with the third New Zealand SauvignonBlanc grouping with the second Canadian wine andthe fourth French wine grouping together with thethird Canadian wine.

    We also used the set of bootstrapped factorsscores to obtain the 95% CIs around the factor scores(see Figure 8). Here, around each wine, we fittedan ellipsoid that comprises 95% of the bootstrappedfactor scores. This ellipsoid represents the possiblepositions of a wine for replications of the analysis

    (assuming that the assessors were randomly sampledfrom a population of assessors). When the ellipsesof two wines do not overlap, these two winescan be considered as reliably differentiated by theassessors.

    RECENT DEVELOPMENTS

    HMFA: Hierarchical Multiple FactorAnalysisIn standard MFA, each variable belongs to one

    and only one table. (e.g., a variable comes fromone and only one assessor). In an analysis ofvariance framework, this structure is defined as anesting relation stating that the experimental factorcorresponding to the variables is nested in thetable factor (see Ref 112). Formally we say that onefactor is nested in another factor if every level ofthe first factor occurs in conjunction with one andonly level of the second factor (i.e., a given variablerepresents the rating of one and only one assessor).In some cases, the variables (i.e., the variable factor)

    2013 Wiley Periodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    22/31

    Overview wires.wiley.com/compstats

    can be nested in a first factor that is itself nested withinanother factor. For example, the first five assessors (seeFigure 2) are males and the rest of the assessors arefemales and therefore the assessor factor is nested inthe gender factor (i.e., one assessor is either a maleor a female). Note that, by transitivity, the variables

    are also nested in the gender factor. The analysisof data sets with such a nested structure can beperformed with the recently developed HMFA (seeRefs 121,122, Ref 2 for a dual version, and Refs26,29,30,54 for applications), that can be seen asan integration of MFA with the analysis of variancetradition.112

    In HMFA, the MFA normalization is appliedhierarchically. First all the tables at the first level arenormalized by dividing them by their first singularvalue, then a subsequent SVD is performed on eachgroup of (normalized) tables and the data of allthe tables of a given group are then divided by thefirst singular value obtained from this last PCA. Theprocedure will then continue till the highest level ofthe hierarchy is reached. The grand matrix X is thenanalyzed as indicated in Eq. (38). Alternatively, the weights can be computed hierarchically, stored in theA matrix (see Eq. (12)), and then analyzed as indicatedin Eq. (13).

    For example, the first singular value of thefirst table was equal to 2.037 (cf., Eq. (57)) whenthe first five tables (each normalized by division bytheir first singular value) are analyzed by a newPCA, we now find that the first singular value of

    this men-table is equal to 2.154. Therefore theHMFA- weight for the first table will be equalto (2.037 2.154)2 =0.0519. Taking into accountthat the first singular value of the PCA of the last fivetables is equal to 2.169 and the values from Eq. (61),we can compute the HMFA weights as:

    HMFA=

    0.241

    0.239

    0.275

    0.273

    0.307

    0.3020.417

    0.272

    0.264

    0.309

    2.1542

    2.1542

    2.1542

    2.1542

    2.1542

    2.1692

    2.1692

    2.1692

    2.1692

    2.1692

    =

    0.0519

    0.0514

    0.0592

    0.0588

    0.0661

    0.06430.0886

    0.0579

    0.0560

    0.0656

    .

    (77)

    From HMFA, we can build the aHMFA vector andthe AHMFA weight matrix (see Eqs (11) and (12)). AGSVD with M and AHMFA (see Eq. (13)) will then

    provide the following factors scores (see Eq. (14)):

    FHMFA=

    0.454 0.0750.375 0.0140.352 0.2110.515 0.076

    0.635 0.0600.585 0.0500.373 0.0950.428 0.189

    0.309 0.1700.034 0.350

    0.220 0.2370.170 0.035

    . (78)

    As with standard MFA, these factors scores couldhave been obtained from a simple SVD of the grandmatrix storing the (twice) normalized data matrices.

    In our example, the singular value of the tableof the male assessors (i.e., 2.154) is almost identicalto the singular value of the table of the femaleassessors (i.e., 2.169), and, therefore, the effect ofthe normalization due to gender is very small. In fact,the factor scores of the HMFA (given in Eq. 78) arealmost perfectly correlated with their respective factorscores of the plain MFA (given in Eq. 65).

    DUAL-MFA

    In dual-MFA, the data consist ofK sets of observationsmeasured on the same set of variables (see Ref 2, seealso Ref 123 for a review of related methods). In thiscase, the pattern of Eq. (1) is replaced by

    X=

    X[1]

    ...

    X[k]...

    X[K]

    , (79)

    with X[k] being an I[K] by J matrix (measuring the Jvariables for thekth sample of observations).

    Here, instead of computing K cross-productmatrices between the observations, we computeK cross-product covariance matrices between thevariables (one per set of observations). The dual-MFA approach then follows the same steps asstandard MFA and will provide a compromise mapfor the variables (instead of the observations in MFA),and partial loadings for each table.

    An issue in dual-MFA is how to center thevariables. If the variables are centered for the wholetable (i.e., the mean of all the observations of a

    2013 Wiley Per iodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    23/31

    WIREs Computational Statistics Multiple factor analysis

    variable is subtracted from each observation) thenthe analysis will be sensitive to differences in meansbetween the tables (this will correspond to a maineffect of the tables). If the data are consideredas ipsative (i.e., the differences are meaningful onlywithin a table, see Ref 124) then the variables should

    be centered separately for each table.

    Procrustes MFAProcrustes MFA (PMFA) (see Ref 36) can be used toanalyze several Euclidean distance matrices describ-ing the same observations. To do so, PMFA combinesideas from metric multidimensional scaling (MDS seeRefs 101,125127,153 for details) and MFA. Recallthat, in MDS, a (Euclidean) distance matrix betweensome observations is transformed into a set of fac-tor scores, ordered by their variance (i.e., eigenvalue),such that the distances between observations com-

    puted from their factor scores match (as well as possi-ble) the original distances. These factor scores are thenused in lieu of the variables of a plain MFA. Alterna-tively, in PMFA, each distance matrix, is analyzed viaMDS and then the factor scores of each matrix are nor-malizeda la MFA such that the first eigenvalue of eachset of factor scores is equal to one, then all the nor-malized factor scores are collected into a grand matrixthat is then analyzed with a plain PCA (see Eq. (38)).

    MFA for Qualitative DataMFA was originally developed as an extension ofPCA and as such it requires quantitative variables. Justlike PCA, MFA can be extended to handle qualitativevariables. The equivalent of PCA for qualitative datais correspondence analysis or multiple correspondenceanalysis (see Refs 78,90,128,129). When dealingwith multiple qualitative data tables, MFA becomesMFACT (for multiple factor analysis of contingencytables, see Refs 24,130132, see also Refs 72,133 foran alternative approach called simultaneous analysis).The main idea of MFACT is the same as MFA namelyto normalize each table by its first singular value.When all the rows in each table have the same sum (as

    in multiple correspondence analysis), this approachboils down to implementing the MFA normalizationand then to proceed to the analysis of the grand datatable (with correspondence analysis instead of PCA).When the rows of each table do not have the samesum then these sums need to be taken into accountand each table needs to be expressed as the deviationsto its own centroid (see Refs 130,131 for details).This approach can also be extended to the analysisof mixture of quantitative and qualitative data (seeRefs 134,154).

    MFA Barycentric Discriminant Analysis:MUFABADAThe idea of MFA can also be incorporated inthe multiblock barycentric discriminant analysisframework (MUBADA, see Refs 4,5,10,135,136).Recall that MUBADA and related techniques,

    generalize discriminant analysis to the multitable casewhen the goal is to assign observations to a priordefined groups when these observations are describedaccording to several tables. As in illustration, forour example, the goal of the analysis would be touse the description provided by the ten assessors toassign each of the twelve wines to one of the threewine groups (i.e., France, Canada, or New Zealand).To do so, a multitable PCA is performed on thetable of the barycenters (i.e., the mean vector of eachcategory) of the groups and each of the observationsis projected as supplementary element. The factor

    scores of the observations and the groups are thenused to compute the Euclidean distances between eachobservation and all the groups. Then each observationis assigned to its nearest group. In the multiple factoranalysis version of MUBADA (called multiple factorbarycentric discriminant analysis or MUFABADA),each data table is first centered and normalized andthe table of the group barycenters is computed. Foreach group barycenter table, the first singular valueis computed and this singular value is then used tonormalize (like in MFA) the group barycenters and theobservation tables. A standard MFA is then performedon the group barycenters grand data table and the

    observations are projected as supplementary elements.In order to assess the quality of the discrimi-

    nation, confusion matrices can be computed (as fixedor random effect models), and bootstrap confidenceellipsoid can be computed (in our example, thebootstrap can be performed in the factor space and willtreat both wines and assessors as random factors). Asan illustration, the result of MUFABADA performedon the wines is shown in Figure 9. Interestingly, theresult is almost identical to the more sophisticatedcanonical STATIS analysis of the same data (cf., Ref3, for the original analysis), and in fact the same figure

    can illustrate both techniques.

    RELATED METHODS

    MFA is part of the multitable family and also ofthe Procrustes family. A thorough evaluation of thecomplex relationships between all these techniques isbeyond the scope of this article, but some directions forcomparisons could be of interest. MFA is also closelyrelated to generalized canonical correlation analysisGCCA. We will look first at the relationship between

    2013 Wiley Periodicals, Inc.

  • 8/12/2019 Abdi WiresCS Mfa 2013

    24/31

    Overview wires.wiley.com/compstats

    F I G U R E 9 | MUFUBADA. Left : Discriminant factor scores for the three wine groups (regions). Right: Discriminant factor scores for the three winegroups with bootstrapped 95% confidence intervals.

    MFA and GCCA and then at the relationshipsbetween MFA and the other techniques.

    MFA and Generalized CanonicalCorrelation AnalysisAn important multitable technique is general-ized canonical correlation analysis (GCCA). InGCCA,137139 the goal is to maximize the inner-product of the factor scores of the compromise undera constraint that makes it equivalent to maximizingthe correlations between the compromise factor scoresand the partial factor scores. Specifically, in GCCA,we seek a matrix ofJweights by L dimensions (withL being the number of dimensions for the solution)W

    gccasuch that

    Fgcca=XWgccawithtrace

    FTgccaFgcca

    =max . (80)

    under the constraints that

    WTgcca Wgcca=I (81)

    where is aJbyJblock diagonal matrix with eachJ[k]by J[k] diagonal block equal to

    1IX

    T[k]

    X[k] (note that,

    here, as in several equivalent techniques, the inversecan be replaced by a pseudo-inverse138,139). The set ofWgcca weights is obtained from the following GSVDofX:

    X=PgccagccaQTgcca withPTgccaMPgcca

    =QTgcca 1

    Qgcca=I. (82)

    The factors scoresFgccaare obtained as

    Fgcca=Pgccagcca=X 1

    Qgcca

    =XWgcca withWgcca= 1

    Qgcca. (83)

    So, in brief, GCCA and MFA are bothmultiblock technique. By contrast to MFA, GCCAis an anisotropic technique (i.e., it gives a different

    weight to each variable, whereas MFA gives the sameweight to all variables of a table). Also GCCAminimizes a well defined criterion. Further work isneeded to develop the comparison between GCCA,MFA, and their variants.

    MFA and Procrustes MethodsInterestingly, MFAwas originally98,140 developed asa possible alternative