Global, local and unique decompositions in OnPLS for multiblock data analysis

13
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/authorsrights

Transcript of Global, local and unique decompositions in OnPLS for multiblock data analysis

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/authorsrights

Author's personal copy

Analytica Chimica Acta 791 (2013) 13– 24

Contents lists available at SciVerse ScienceDirect

Analytica Chimica Acta

j ourna l ho mepage: www.elsev ier .com/ locate /aca

Global, local and unique decompositions in OnPLS for multiblock dataanalysis�

Tommy Löfstedta, Daniel Hoffmanb, Johan Trygga,∗

a Computational Life Science Cluster (CLiC), Department of Chemistry, Umeå University, SE-90187 Umeå, Swedenb Computational Life Science Cluster (CLiC), Department of Molecular Biology, Umeå University, SE-90187 Umeå, Sweden

h i g h l i g h t s

• Extending OnPLS by introducinglocally joint and unique variation.• Each matrix is fully decomposed in

several parts.• Different parts are related to subsets

of matrices and can be analysed sep-arately.• OnPLS facilitates interpretation in

multiblock data analysis.• Example on metabolomic, proteomic

and transcriptomic data.

g r a p h i c a l a b s t r a c t

a r t i c l e i n f o

Article history:Received 10 December 2012Received in revised form 3 May 2013Accepted 19 June 2013Available online 26 June 2013

Keywords:OnPLSOrthogonal partial least squaresMultiblock analysisGlobalLocal and uniquevariation

a b s t r a c t

OnPLS is an extension of O2PLS that decomposes a set of matrices, in either multiblock or path modelanalysis, such that each matrix consists of two parts: a globally joint part containing variation shared withall other connected matrices, and a part that contains locally joint and unique variation, i.e. variation thatis shared with some, but not all, other connected matrices or that is unique in a single matrix.

A further extension of OnPLS suggested here decomposes the part that is not globally joint into locallyjoint and unique parts. To achieve this it uses the OnPLS method to first find and extract a globally jointmodel, and then applies OnPLS recursively to subsets of matrices that contain the locally joint and uniquevariation remaining after the globally joint variation has been extracted. This results in a set of locallyjoint models. The variation that is left after the globally joint and locally joint variation has been extractedis (by construction) not related to the other matrices and thus represents the strictly unique variation ineach matrix. The method’s utility is demonstrated by its application to both a simulated data set and areal data set acquired from metabolomic, proteomic and transcriptomic profiling of three genotypes ofhybrid aspen. The results show that OnPLS can successfully decompose each matrix into global, local andunique models, resulting in lower numbers of globally joint components and higher intercorrelations ofscores. OnPLS also increases the interpretability of models of connected matrices, because of the locallyjoint and unique models it generates.

© 2013 Published by Elsevier B.V.

� Special Issue: CAC 2012 Paper presented at the XIII Conference on Chemometricsin Analytical Chemistry (CAC 2012), Budapest, Hungary, 25–29 June 2012.∗ Corresponding author. Tel.: +46 90 7866917; fax: +46 90 7867655.

E-mail address: [email protected] (J. Trygg).

1. Introduction

Experimental sciences, e.g. biology, chemistry and medicine,have to a large extent become information sciences, hence bioin-formatics, chemometrics and their equivalents in other disciplinesare now key elements of pure and applied research [1]. Further,with the increasing availability of multiple high-throughput sys-tems for parallel analyses of multiple variables, massive amountsof data are being collected [2]. Extracting useful information from

0003-2670/$ – see front matter © 2013 Published by Elsevier B.V.http://dx.doi.org/10.1016/j.aca.2013.06.026

Author's personal copy

14 T. Löfstedt et al. / Analytica Chimica Acta 791 (2013) 13– 24

such data sets is a nontrivial task that requires powerful computa-tional methods to identify common trends and detect underlyingpatterns [3].

A family of methods capable of analysing several different setsof (possibly massive amounts of) variables measured on the sameset of samples are called multiblock methods. Examples of suchmethods include the following.

Hierarchical methods, such as Hierarchical PCA and PLS, Consen-sus PCA and SUM-PCA etc., are capable of analysing several blocksof data simultaneously, often by putting them side-by-side in thesame model. Such models often relate the matrices by superscores.Superscores are a set of scores from multiple blocks that sum-marises all variation in all blocks, as opposed to scores for eachblock (block scores) that summarises the joint variation in theircorresponding block only [4–8].

Several related, but more general, multiblock methods that useother, more clearly stated, criteria have been proposed, see e.g.[9–11]. Examples include generalised canonical correlation, MAX-BET and MAXDIFF. These methods do not usually find superscoresthat describe all matrices simultaneously, but instead find sepa-rate models with blockscores for each matrix that are related bythe maximised criteria.

“Path modelling”, or “structural equation modelling (SEM)”approaches comprise another important class of multiblock meth-ods. Path modelling connects a number of data sets and allowsanalysis of paths along which information is considered to flowfrom one block to another. These paths may represent (for instance)a known time sequence, an assumed causality order, or some otherchosen organisational scheme [12,13]. One of the most frequentlyused criteria for estimating parameters in path models is calledpartial least squares (PLS, see e.g. [14,15]).

Tenenhaus and Hanafi [16] presented an important review ofsome of the methods mentioned above and showed how they relateto the path modelling approaches. See also Ref. [17].

There is a risk when projection based latent variable methodsare employed that variation in a matrix that is not related (i.e.orthogonal) to the other matrices is incorporated in the result-ing joint model’s score and loadings [18,19]. The importance ofanalysing and interpreting such unrelated (unique) variation hasbeen recognised, but the methods employed for this purpose haveuntil recently focussed on at most two blocks [20–23].

In particular, orthogonal projections to latent structures (OPLSand O2PLS) has demonstrated the potential to integrate and ana-lyse “omics” data sets while separating systematic variations andnoise as well as unique structures unrelated to the other block fromthe joint variation. O2PLS is an extension of OPLS and is both anexploratory and predictive bidirectional modeling tool. Generally,O2PLS produces three main outputs, the joint variation that existsbetween X and Y, the unique variation in X, and the unique variationin Y.

An attempt has been published to apply such methods sequen-tially in multiblock modelling, with extraction of locally joint andunique variation [24], but the results of this approach depend onthe order in which the matrices are modelled and its global objec-tive function remains unclear. A related approach is called paralleland orthogonalised partial least squares, or PO-PLS; this methodworks in a regression settings where several blocks of indepen-dent variables, X1, . . ., Xn, relate to another block of dependentvariables, Y [25,26]. PO-PLS decomposes the variation in the inde-pendent blocks into parts with joint (between all independentblocks, or between subsets of the blocks) and unique variation.PO-PLS uses a combination of PLS regression, generalised canon-ical correlation and PCA to achieve this decomposition. While notthe same, the approach used in this paper to separate globally joint,locally joint and unique variation is related to the approach used inPO-PLS.

The authors recently generalised the O2PLS method [22,23] toa symmetric multiblock method called OnPLS [27], which finds aglobally joint (all-against-all) model that does not depend on theorder in which the matrices are modelled. O2PLS is a special case,with n = 2 blocks, in this generalisation to OnPLS. The OnPLS methodwas further developed for cases in which all blocks do not needbe connected to all other blocks, but instead conceptualises pathmodels [28].

The OnPLS method facilitates interpretation of a multiblockmodel (by separating the different types of variation), improvesthe regression (in terms of correlation between components andoverall predictive power) and reduces the complexity of the jointmodel (in terms of the number of components) [27,28]. There is,however, an obvious limitation of OnPLS as described in [27,28].These approaches only distinguish between what is globally jointand what is not, implying that the remaining variation is unique,when it may in fact contain systematic variation that is locally jointfor some combination of matrices, but just not for all matrices.

This paper extends OnPLS to model not only the globally jointvariation, but also the locally joint variation that is shared betweensome (but not all) matrices, and the remaining unique variation ineach matrix. The locally joint variation is variation that is joint ina subset of matrices, but unrelated to the matrices excluded fromthis subset.

This paper is organised as follows: Section 2 presents the OnPLSmethod as it has been presented previously. Then two approachesare presented for finding and separating the globally joint, locallyjoint and unique variation; Section 3 presents two illustrative appli-cations (one on simulated data and another on data acquired fromcombined metabolomic, proteomic and transcriptomic analyses ofthree genotypes of hybrid aspen); Section 4 discusses the resultsand presents conclusions.

2. Method and theory

We will start in Section 2.1 by presenting the “conventional”OnPLS as it was presented in [27,28]. This includes describing howto find a representation of the globally joint space in each matrix(Section 2.1.2), how to use this representation to extract (deflate)everything that is locally joint or unique (Section 2.1.3) and finallyhow to build a multiblock or path model of the deflated matrices(Section 2.1.4).

This is followed by Section 2.2 that describes the novel exten-sion of OnPLS presented in this paper. Section 2.2 describes how toextend the “conventional” OnPLS to separate the locally joint andunique variation into separate parts. Two approaches are suggestedin order to fully decompose the matrices in all separate parts. Thefirst method is presented in Section 2.2.1 and is called the “fullapproach”. The second method is called the “partial approach” andis presented in Section 2.2.2.

2.1. OnPLS

OnPLS was recently proposed as a general extension to O2PLSfor multiblock or path model cases [27,28]. This section describesbriefly what has been published before about how OnPLS worksand how this relates to O2PLS.

Section 2.1.1 describes the fundamental decomposition ofO2PLS and how this notion is generalised in OnPLS. Section 2.1.2describes how to find the joint weight matrix, the initial represen-tation of the joint part that is used in Section 2.1.3. Section 2.1.3describes how to use this joint representation, the weight matrix,to deflate the locally joint and unique variation from each block. Thedeflated matrices are then used in Section 2.1.4 to find the globallyjoint model linking all connected matrices.

Author's personal copy

T. Löfstedt et al. / Analytica Chimica Acta 791 (2013) 13– 24 15

2.1.1. Locally joint and unique variationThis section briefly describes the motivation behind the O2PLS

method and explains how this notion is generalised to the multi-block case in OnPLS.

In O2PLS [22,23] the objective is to find and separate joint andunique variation in two matrices, X (M × N) and Y (M × K), such that

X = XG + XU + E (1)

and

Y = YG + YU + F, (2)

where E and F are residual matrices containing noise, YTXU =XTYU = XT

GXU = YTGYU = 0, and cTYT

GXGw is maximised, subject to‖w ‖ = ‖ c ‖ =1.

The original OnPLS proposes an analogue to this approach inmultiblock and path model analysis where n blocks, Xi (M × Ni), fori = 1, . . ., n, are to be analysed. OnPLS decomposes each matrix intoone part that maximally covaries with all other connected matrices,and another that contains variation orthogonal to at least one of theother matrices. That is, OnPLS decomposes each matrix as

Xi = XG,i︸︷︷︸globally joint

+ XLU,i︸︷︷︸locally joint and unique

+ Ei︸︷︷︸residual noise

, (3)

where the variation in XLU,i is either unique in Xi or jointly sharedwith only some of the other connected matrices. I.e. XT

j XLU,i maybe non-zero if locally joint variation is shared between Xi and Xj,but we always have XT

G,jXLU,i = 0 since all variation shared with allother matrices is collected in the XG,i matrices.

2.1.2. Find a globally joint representationOnPLS mimicks O2PLS by first finding a common joint represen-

tation. In OnPLS, the structures corresponding to the globally jointvariation for each of the blocks Xi, for i = 1, . . ., n, are found by firsttaking the SVD of all pairs of matrices

Vi,j�i,jWT

i,j = XTj Xi, (4)

where Xi (M × Ni) and Xj (M × Nj) are connected for j = 1, . . ., n with

j /= i. These weight matrices, Wi,j (Ni× Ai,j, where Ai,j is the num-ber of joint components between Xi and Xj), are then put in anaugmented matrix that is also subject to SVD. That is, let

Wi�iVTi =

[ Wi,1| . . . | Wi,i−1| Wi,i+1| . . . | Wi,n

](5)

for all connected matrices Xi and Xj (this approach is known as

SUM-PCA [5]). The weight matrix Wi (Ni× A, where A ≤ min {Ai,j}is the number of globally joint components) now represents thestructures in Xi corresponding to all the variation that Xi shareswith the other (connected) matrices.

This is performed for each matrix, Xi, such that we obtain oneweight matrix Wi for each matrix Xi, for i = 1, . . ., n.

The purpose of OnPLS is now to find a score matrix Ti, that doesnot contain any locally joint or unique variation, such that XG,i =TiW

Ti .

2.1.3. Find and extract locally joint and unique variationTo extract the locally joint and unique variation the Xi matrices

are orthogonalised with respect to the weight matrices Wi, as

XLU,i = Xi(Ii − WiWTi ) = Xi − TiW

Ti . (6)

However, note that since

Ti = XiWi = (XG,i + XLU,i + Ei)Wi = XG,iWi + XLU,iWi + EiWi, (7)

in which XLU,iWi need not be zero, these score matrices, Ti, may thuscontain variation that is not globally joint between all matrices.

Since the variation in XLU,iWi is not globally joint, it needs tobe extracted before the multiblock or path model is built in orderto have only strictly globally joint variation in Ti. This overlappingvariation is found, in analogy with O2PLS, by maximising∥∥∥T

Ti tLU,i

∥∥∥2=

∥∥∥TTi XLU,iwLU,i

∥∥∥2, (8)

where wLU,i is clearly orthogonal to Wi. Note that

XTj tLU,i = XT

G,jtLU,i + XTLU,jtLU,i + ET

j tLU,i︸︷︷︸=0

= XTG,jXiwLU,i + XT

LU,jXiwLU,i

= XTG,jXG,iwLU,i + XT

G,jXLU,i︸ ︷︷ ︸=0

wLU,i + XTLU,jXG,i︸ ︷︷ ︸=0

wLU,i + XTLU,jXLU,iwLU,i

= XTG,jTiW

T

i wLU,i︸ ︷︷ ︸=0

+ XTLU,jXLU,iwLU,i

= XTLU,jXLU,iwLU,i,

(9)

which may thus be non-zero for locally joint variation in Xj.The locally joint and unique weight vectors of Eq. (8) are thus

found as the eigenvectors corresponding to the largest eigenvaluesin

XTLU,iTiT

Ti XLU,iwLU,i = �maxwLU,i. (10)

Once such a weight vector is known, the score vector is calcu-lated, tLU,i = XiwLU,i, and a loading vector is computed as pLU,i =XT

i tLU,i/(tTLU,itLU,i). The locally joint and unique variation found is

deflated one component at a time by

Xi ← Xi − tLU,ipTLU,i =

(Ii −

tLU,itTLU,i

tTLU,itLU,i

)Xi. (11)

These locally joint and unique score vectors, tLU, thus capturelocally joint and unique variation; or, stated differently, variationnot globally joint in all blocks. Finding and extracting these scorevectors maximises the true globally joint sum of covariances andincreases the correlations between the globally joint score vectors[27,28].

2.1.4. Find a globally joint multiblock modelAfter the locally joint and unique variation have been found

and extracted (deflated), the global model can be built using thedeflated matrices as input.

The objective of nPLS (the joint model, presented in [27,28]) isto maximise the sum of covariances between all connected scorevectors. I.e. to maximise

fC(w1, . . . , wn) =n∑

i=1

n∑j=1,j /= i

ci,jwTi XT

i Xjwj =n∑

i=1

n∑j=1,j /= i

ci,jtTi tj, (12)

with the constraints that ‖wi ‖ =1, and where C is an adjacencymatrix in which the elements ci,j are 1 if the matrices Xi and Xj areconnected and 0 otherwise. Two algorithms have been used in theliterature to find a solution to this problem, and two new proofs oftheir convergence were recently presented in [28]. Once the locallyjoint and unique variations have been extracted, this approach isused to find a model of the globally joint variation.

The algorithms thus find weight vectors, wi, corresponding toeach matrix Xi, for i = 1, . . ., n. We then compute the scores

ti = Xiwi (13)

Author's personal copy

16 T. Löfstedt et al. / Analytica Chimica Acta 791 (2013) 13– 24

n=4 1 2 3 4

n=3 1 2 3 1 2 4 1 3 4 2 3 4

n=2 1 2 1 3 1 4 2 3 2 4 3 4

n=1 1 2 3 4

Fig. 1. Illustration of decompositions in OnPLS. An example of how an OnPLS model of n = 4 matrices is decomposed into a globally joint model, locally joint submodels andmodels of each matrix’s unique variation. The globally joint variation is the variation shared by all n matrices. The locally joint variation is all variation shared between atleast two and at most n − 1 matrices. All this variation is collectively called locally joint. The unique variation is the variation in each matrix that is not shared with any othermatrices.

and the loadings

pi =XT

i ti

tTi ti

(14)

and deflate the matrices by

X(h+1)i

= X(h)i− ti,hpT

i,h =(

Ii −tit

Ti

tTi ti

)X(h)

i, (15)

where h defines the order of the component. These deflated matri-ces are then used to compute the higher order components, i.e. nextset of globally joint components.

The computations described in Sections 2.1.2, 2.1.3 and 2.1.4comprise the OnPLS method. The OnPLS method is described inAlgorithm 1 in Appendix A.

2.2. Locally joint variation

An obvious limitation of the approach described in the previ-ous sections is that it does not distinguish between unique andlocally joint variation, it only distinguishes between globally jointand other variation.

The globally joint variation is defined as the variation presentin all n matrices and the unique variation as the variation presentin only one matrix. The locally joint variation, on the other hand, isall variation shared between at least two and at most n − 1 matri-ces. This can also be regarded as (for instance) the variation sharedbetween Xi and Xj but not with Xk. See Figs. 1 and 2 for illustrationsof these different parts.

Fig. 2 is often particularly helpful for understanding this. The setanalogy in Fig. 2 shows that each matrix can be split into 2n−1 parts,e.g. in the n = 3 case we would split X1 into

X1 = (X1 ∩ X2 ∩ X3)︸ ︷︷ ︸globally joint part

+ ((X1 ∩ X2) \ X3) + ((X1 ∩ X3) \ X2)︸ ︷︷ ︸locally joint parts

+ (X1 ∩ (X2 ∪ X3))︸ ︷︷ ︸unique part

, (16)

where ∪ is the set union operator, ∩ is the set intersection oper-ator, \ is the set difference operator and S is the set complement.X1 ∩ X2 ∩ X3 would thus be the globally joint part of the variation(shared with all matrices), (X1 ∩ X2) \ X3 and (X1 ∩ X3) \ X2 wouldbe the locally joint parts (shared with some but not all matrices)and X1 ∩ (X2 ∪ X3) would be the unique part (shared with no othermatrices).

With many matrices, however, keeping track of 2n−1 parts foreach matrix is not feasible, hence we lump all the locally joint vari-ation together in a locally joint part when discussing the modelsin the text below. This is not a restriction in itself, and the meth-ods described below do not depend on how the found components

are stored. We do this to simplify the notation and the algorithmdescriptions. In order to separate the locally joint variation in dif-ferent submodels, all that is needed is to save the local modelsseparately in Steps 26–28 in the Full algorithm, and in Steps 22–24in the Partial algorithm.

The approaches described below are thus intended to remedythe drawback of the originally proposed OnPLS method by splittingthe matrices into three parts: the globally joint, locally joint andunique parts. We then have a completely decomposed model ofthe form

Xi = XG,i︸︷︷︸globally joint

+ XL,i︸︷︷︸locally joint

+ XU,i︸︷︷︸unique

+ E i︸︷︷︸residual

, (17)

in which all locally joint variation is put in the same matrix XL,i forbrevity.

2.2.1. Full approachThe variation that the full approach finds is illustrated in

Figs. 1 and 2. The locally joint and unique variation are split (in con-trast to the originally proposed OnPLS method, which keeps themtogether) so that all three kinds of variation are separated.

First, the globally joint variation is found and extracted asdescribed in Section 2.1 and in [27,28]. The globally joint varia-tion is thus not changed by the subsequent decomposition of thelocally joint and unique variation.

Fig. 2. The parts of an OnPLS model. The methods described in this article aim todivide the variation in each matrix considered into three parts: a globally joint part,i.e. the part that each matrix shares with all other matrices (the black area in thecentre); a locally joint part containing the variance that each matrix shares withsome but not all of the other matrices (the areas marked by lines, dots and squares);and a part containing variance unique in each matrix (open areas).

Author's personal copy

T. Löfstedt et al. / Analytica Chimica Acta 791 (2013) 13– 24 17

Fig. 3. An illustration of the full and the partial approaches. An illustration of the full and partial approaches with three blocks. The left part describes how Algorithm 2 (seeAppendix B) works and the right part describes how Algorithm 3 (see Appendix B) work. Both the full and the partial algorithms first compute the global model, connectingall n matrices. The full algorithm then computes a full set of models for successively smaller subsets of matrices, and selects the model with the highest score (model quality).Found models are deflated, and the procedure restarted either with the same cardinality of the subsets, or with a smaller subset. The partial model, on the other hand,computes a model of all blocks, removes the “weakest”, builds a new model and repeats until no block is “weak”. When this happens it deflates the found variation andrestarts. Both the full and the partial approaches continues these procedures until there is no more variation connecting the matrices. The variation that is left is thus byconstruction unique; a PCA model is built to model the unique variation in each block.

We then recursively compute a new “regular” OnPLS model foreach combination of n − 1 matrices, and identify the combinationof matrices that yields the highest value of fC in Eq. (12). Fromthis combination of matrices (neglecting other combinations) wedeflate and extract one locally joint component, as described abovefor the globally joint components.

This procedure is repeated for n − 1 matrices until there are nomore significant locally joint components for that number of matri-ces, then repeat the procedure for n − 2 matrices, and so on, untilall components in all combinations of successively fewer matriceshave been examined.

Note that some combinations of matrices will represent mod-els that are not connected according to the original path structureof the model defined by C. These submodels need be discardedentirely, or the original path structure need be revised.

When no more locally joint variation can be found for any matri-ces, the remaining variation in each matrix is necessarily unique.A cross validated PCA of the remaining variation is computed toseparate systematic variation from noise. The systematic variationfound is the unique variation for the matrix.

The approach described here does not depend on the order inwhich the matrices are processed, the variation found is optimaland equivalent (but on a lower level) to the globally joint variation.

The full approach is explained in the left-hand part of Fig. 3 andin Algorithm 2 in Appendix B.

2.2.2. Partial approachFor n matrices there are 2n− n − 2 combinations of matrices with

potential locally joint variation that needs to be investigated in thefull approach (all combinations minus each matrix on its own, thecombination of all matrices and the empty set). When n is small thisis not problematic, but if n is about ten or larger the calculationsbecome prohibitive with current personal computers. Thereforewe propose use of the following approach for cases where the fullapproach is not feasible.

A one-component nPLS or OnPLS model of all n matrices is cre-ated. The significance of the components found is examined and

the matrix corresponding to the least significant component is dis-carded. A new nPLS or OnPLS model is created of the n − 1 matricesthat were kept, and this is repeated until no more matrices arediscarded, i.e. until all components in the matrices that are keptare significant. If no matrices were discarded at all, the globallyjoint model may need to be revised. If all or all but one matrix isdiscarded, there are no more locally joint components.

Note that matrices can only be discarded if the resulting modelis connected according to the original structure of the model asdefined by C. This means that the next to least significant compo-nent may need to be discarded if the least significant componentwould result in a disconnected path model, and so on. A “regu-lar” OnPLS model is then created of the combination of matricesthat were deemed significant. From this combination of matriceswe extract one locally joint component that is deflated from thematrices as described above, then the procedure starts again fromthe beginning.

The significance of the model and its components canbe assessed using various appropriate techniques, e.g. cross-validation, analysis of components’ contributions to variance,t-tests and tests of correlations between components. In the exam-ples below, components had to contribute more than 1 % ofthe variance and have coefficients of correlation with the othersexceeding 0.5 to be deemed significant.

When the locally joint model has been extracted the unique vari-ation is found as in the full approach by a cross validated PCA of theresidual matrices to separate systematic variation from noise.

This approach is also symmetric, it is less computationallydemanding than the full approach and as we will see below it alsogives very good results.

The partial approach is explained in the right-hand part of Fig. 3and in Algorithm 3 in Appendix B.

2.3. Model validation and selecting the numbers of components

The numbers of components in the first step of OnPLS, i.e. wherethe SVD between pairs of blocks is computed (in Eqs. 4 and 5), was

Author's personal copy

18 T. Löfstedt et al. / Analytica Chimica Acta 791 (2013) 13– 24

Table 1Components in Example 1. The distribution of components in the simulated data set (Example 1). Each column indicates a component and each row indicates a matrix (andthus which components are present in each matrix). The columns for which all matrices have checkmarks represents globally joint variation (components 1 and 2). Thecolumns for which two or more, but not all, matrices have checkmarks represent locally joint variation (components 3 through 10). Finally, the columns for which only onematrix has a checkmark represent unique variation (columns 12 through 16). The last column indicates the amount of variation (sum of squares) for each matrix and the lastrow indicates the amount of variation for each component. Note also that about 1% of normally distributed noise was added to each matrix (not indicated in this table).

Matrix Component SS

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

X1√ √ √ √ √ √ √ √ √ √ √

44X2

√ √ √ √ √ √ √ √ √ √ √44

X3√ √ √ √ √ √

17X4

√ √ √ √ √ √12

X5√ √ √ √ √ √

9X6

√ √ √3

SS 1 1 1 1 4 4 4 9 9 9 1 1 1 1 1 1 44

decided using cross validation (performed using SIMCA 12, MKSUmetrics AB, Umeå, Sweden).

This is also how we determined the adjacency matrix C. Thenumbers of component between pairs of matrices were put as theelements of C. The lowest non-zero numbers of components inC was considered an upper limit of the numbers of globally jointcomponents.

The numbers of non-globally joint components was determinedby an extensive search of combinations of non-globally joint com-ponents for each block. For each numbers of non-globally jointcomponents the following criteria was used to differentiate thedifferent models, namely

n∑i=1

n∑j=1

ci,jcorr(ti, tj) + |Ci|n∑

i=1

corr(wi, pi), (18)

where |Ci| is the number of non-zero elements of the ith row of theadjacency matrix Ci. This scaling constant is added to normalisethe values (i.e. put them on the same level) of the two sums; withthis normalisation they have the same maximum. The correlationbetween scores and the correlation between weights and loadingsare the two quantities that increase in O2PLS when unique com-ponents are removed. This is thus a generalisation to OnPLS ofthat property of O2PLS. This quantity increases up to a limit foran increased number of components, and when the quantity doesnot increase anymore, we have found the numbers of non-globallyjoint components to use. The components are also subject to thecorrelation and variance constraints mentioned above.

The ModelQuality function in Algorithm 2 and 3 is Eq. (12)and the StrongEnough function honors the constraints mentionedabove that the correlation between connected scores be at least 0.5and that the contribution of each component be more than 1% (i.e.an R2 value higher than 0.01). This criteria (correlation and variancethresholds) are the same criteria that was used in [25,26], but withlower thresholds. We thus allow models with weaker componentsthan those used in [25,26], but these criteria could be altered tobetter fit the need of the analyst, the analysis or the data. It is veryimportant that highly correlated variables also carry explained vari-ance in order to avoid to model noise. Further, it is important thatcomponents be correlated, but if they are only weakly correlating,we assume that they belong to a submodel and exclude them. Also,because of the excessive amounts of models built, we need to usecriteria that are fast to compute. It is simply not possible to per-form e.g. cross validation, because it would take too much time tocompute.

Finally, the unique variation left after the globally and locallyjoint variation has been extracted is found using a cross validatedPCA (also performed using SIMCA 12, MKS Umetrics AB, Umeå,Sweden).

3. Applications

3.1. Simulated data

The simulated data set considered here consists of six matrices,Xi, with i = 1, . . ., 6, which have two globally joint components, vary-ing numbers of locally joint components (none, three or eight) anda single unique component each (see Table 1 and Fig. 4). About 1%of normally distributed noise was added to each matrix. The dataset is identical to that used to illustrate the use of OnPLS in Example4 in [27], apart from the addition of the unique components.

3.2. Integrated analysis of metabolomic, proteomic andtranscriptomic data

The real data set considered in this paper was acquired froma metabolomic, proteomic and transcriptomic analysis of threehybrid aspen (Populus tremula × Populus tremuloides) genotypes,reported in [24]. The genotypes were: wild-type (WT), usedas a reference; G5, carrying several antisense constructs of thegrowth-related gene PttMYB21a; and G3, carrying a single antisenseconstruct of the gene. Ten replicated samples of xylem tissue werecollected from stems at three internode positions (denoted A–C: Aclose to the top, B and C equally spaced farther down the stem) ofplants representing each of the three genotypes. Of the resulting 90samples, 36 were selected using a multivariate selection strategy,and three of the selected samples were identified as outliers andremoved.

The samples were analysed using GC/TOFMS, and 281 metabo-lites corresponding to various peaks were identified and quantified.Proteins were extracted from frozen tissue powder and digested,then recovered peptides were analysed by UPLC-MS, resulting in

X6X2X1

X3 X4

X5

X6

Fig. 4. Illustration of how the matrices overlap in Example 1. Schematic diagramshowing how the six matrices in Example 1 overlap. The matrices all overlap inthe centre, there are particular combinations of overlaps between some, but not all,matrices and each matrix has a unique part that does not overlap with any othermatrices. Note that X6 has been split into two parts: one that overlaps with all othermatrices and another (to the right) that does not overlap with any other matrices.

Author's personal copyT.

Löfstedt

et

al.

/

Analytica

Chimica

Acta

791 (2013) 13– 2419

Table 2Summary statistics for Example 1. Summary statistics of the Full and Partial approaches. The summary for the Full approach is presented in the first column, and the summary for the Partial approach is presented in the secondcolumn. The R2 values for each component are presented in the rows for three parts: one for the global, local and unique parts. The components are listed from top to bottom in each part, and the amount of variance (if any) theycapture in each block is listed from left to right for each block as the column labels indicate. The values in parentheses are the true values. Below the sum of each part is the modified RV coefficient between the true and extractedcomponents (this value is 1 for a perfect model). In the last row the model’s total R2 value is presented. This value should be about 0.99, since about 1% of noise was added to each matrix. Note that the R2 values for the locallyjoint models are differing between the full and the partial approaches, but also that the variation that the local models capture for each matrix are very similar to the true variation (when looking at the RVmod values). The locallyjoint variation is thus distributed differently among the components in the two approaches, but they both capture the same variation.

Matrix: Full model Partial model

X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6

GlobalR2(tG,1pT

G,1) 0.02 0.06 0.06 0.16 0.19 0.33 0.02 0.06 0.06 0.16 0.19 0.33R2(tG,2pT

G,2) 0.02 0.00 0.06 0.01 0.07 0.33 0.02 0.00 0.06 0.01 0.07 0.33Sum: 0.04 (0.05) 0.06 (0.05) 0.12 (0.12) 0.18 (0.17) 0.27 (0.22) 0.67 (0.67) 0.04 (0.05) 0.06 (0.05) 0.12 (0.12) 0.18 (0.17) 0.27 (0.22) 0.67 (0.67)RVmod(TGPT

G, XG) 0.99 0.84 1.00 0.99 0.91 1.00 0.99 0.84 1.00 0.99 0.91 1.00

LocalR2(tL,1pT

L,1) 0.19 0.25 0.18 0.32 0.33 – 0.41 0.39 0.57 – – –R2(tL,2pT

L,2) 0.32 0.24 0.48 – 0.26 – 0.11 0.04 0.21 0.52 – –R2(tL,3pT

L,3) 0.36 0.31 – 0.13 0.05 – 0.32 0.30 – – – –R2(tL,4pT

L,4) 0.02 0.02 0.11 – 0.05 – 0.04 0.05 – – 0.54 –R2(tL,5pT

L,5) 0.02 0.02 0.02 0.26 – – 0.03 0.01 – 0.13 0.06 –R2(tL,6pT

L,6) – 0.02 – 0.09 – – – 0.06 0.03 0.07 0.06 –Sum: 0.90 (0.93) 0.84 (0.93) 0.79 (0.82) 0.80 (0.75) 0.68 (0.67) – (–) 0.91 (0.93) 0.86 (0.93) 0.80 (0.82) 0.72 (0.75) 0.66 (0.67) – (–)RVmod(TLPT

L , XL) 1.00 0.95 0.99 0.96 0.97 – 0.99 0.97 1.00 1.00 0.98 –

UniqueR2(tU,1pT

U,1) 0.03 0.07 0.07 0.02 0.04 0.33 0.03 0.05 0.06 0.09 0.06 0.33R2(tU,2pT

U,2) 0.02 0.03 0.02 – – – – 0.02 0.01 – – –Sum: 0.05 (0.02) 0.09 (0.02) 0.08 (0.06) 0.02 (0.08) 0.04 (0.11) 0.33 (0.33) 0.03 (0.02) 0.07 (0.02) 0.07 (0.06) 0.09 (0.08) 0.06 (0.11) 0.33 (0.33)RVmod(TUPT

U, XU) 0.69 0.33 0.96 0.72 0.81 0.99 0.96 0.46 0.97 0.99 0.85 0.99Total sum: 0.99 0.99 0.99 0.99 0.99 0.99 0.98 0.99 0.99 0.99 0.99 0.99

Author's personal copy

20 T. Löfstedt et al. / Analytica Chimica Acta 791 (2013) 13– 24

Metabolite data Protein data Transcript dat a

Global

Local Local Local

Global

Unique

Residual

Global

Unique

Residual

32.2 %

33.8 %28.0 %

6.0 %

36.9 %

32.6 % 28.7 %

1.8 % 41.0 %

16.7 %

22.6 %

19.7 %

Unique

Residual

Fig. 5. Variation explained by the matrices in Example 2. Variation explained by the different parts of the OnPLS model. “Global” here means the globally joint variation, and“Local” means locally joint variation. The numbers for the globally joint model are similar to the amount of variation found by the sequential O2PLS approach. The elevatedamount of joint variation in the transcript model is explained in [24] as being related to a reduction of noise before building the model.

the detection of 3132 peptide markers. The transcript profiling wasperformed using cDNA microarrays resulting in 27 648 single spot-ted cDNA clones from the Populus genus. See [24] for more informa-tion about sample selection, data acquisition, preprocessing, etc.

The authors of [24] described a method for multiblock data anal-ysis with O2PLS type deflation of locally joint and unique variationusing a sequential application of O2PLS that they applied to thedata. In the study reported here, these data were analysed usingOnPLS and the results were compared to the results of the sequen-tial O2PLS method presented in [24].

4. Results

4.1. Simulated data

One Full model and one Partial model were created. The vari-ation captured is presented in Table 2. Table 2 is divided in twomain columns, the first for the Full model and the second for thePartial model; and three main rows, the first for the global model(two components), the second for the locally joint models (up tosix components) and the third for the unique models (up to twocomponents). In each of the two main columns there are six sub-columns, one for each block. A dash means that the corresponding

submodel did not partake in that component. I.e. the first locallyjoint component was joint between X1, X2, X3, X4 and X5; andthe fourth locally joint component was joint between X1, X2, X3and X5 in the full model and between X1, X2 and X5 in the partialmodel. X6 had no locally joint components at all, which is seen inboth the full and partial models by the dashes in the columns forthis block. The rows indicated with “Sum” contain the sum of thevariance explained by components on the rows above; the rowsindicated by RV coefficients show the RV coefficients of the modelfor each matrix and the corresponding true part (an RV coefficientsof 1 means a perfect model).

The amount of globally joint variation extracted now is the sameas in [27] (when adjusted for the extra unique component). Theamount of locally joint variation found is adequate, and interest-ingly the amount of variation found is closer to the true value forthe Partial model. The amount of unique variation found is of thecorrect order, although slightly deviating from the true amount forall matrices except for X6 (which does not have any locally jointcomponents).

The modified RV coefficients [29] between the true andextracted components, presented in Table 1, indicate that themodel of the globally joint variation was very good, for all matri-ces except X2 (modified RV coefficient, 0.84) in which there still

−0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16

−0.1

−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08

0.1

G3-A-02

G3-A-05

G3-A-07G3-A-09

G3-B-02

G3-B-05G3-B-07G3-B-09

G3-C-02G3-C-05

G3-C-07

G3-C-09

G5-A-01

G5-A-05

G5-A-06

G5-A-09

G5-B-05G5-B-06

G5-B-09

G5-C-05G5-C-06

G5-C-09

WT-A-01WT-A-04WT-A-09

WT-B-01

WT-B-04WT-B-09

WT-B-10

WT-C-01

WT-C-04WT-C-09

WT-C-10

Joint t1

Join

tt 2

Joint internode effect

Internodegradient

ABC

A−0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16

−0.12

−0.1

−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

G3-A-02

G3-A-05

G3-A-07

G3-A-09

G3-B-02

G3-B-05G3-B-07

G3-B-09

G3-C-02

G3-C-05

G3-C-07 G3-C-09

G5-A-01G5-A-05G5-A-06

G5-A-09

G5-B-05G5-B-06

G5-B-09

G5-C-05

G5-C-06G5-C-09

WT-A-01

WT-A-04

WT-A-09

WT-B-01

WT-B-04WT-B-09

WT-B-10

WT-C-01

WT-C-04

WT-C-09

WT-C-10

Joint t1

Join

tt 3

Joint genotype effect

G5 separation

G3 separation

G3G5WTB

Fig. 6. Score plots for the metabolomic data. A score plot for the metabolic data. (A) The first and second joint score vectors (t1 and t2) clearly separates the internodes. (B)The first and third joint score vectors (t1 and t3) clearly separates genotypes.

Author's personal copy

T. Löfstedt et al. / Analytica Chimica Acta 791 (2013) 13– 24 21

−0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0. 1

−0.1

−0.05

0

0.05

0.1

G3-A-02

G3-A-05

G3-A-07

G3-A-09 G3-B-02

G3-B-05

G3-B-07

G3-B-09

G3-C-02

G3-C-05G3-C-07

G3-C-09

G5-A-01G5-A-05

G5-A-06

G5-A-09

G5-B-05

G5-B-06

G5-B-09

G5-C-05

G5-C-06

G5-C-09

WT-A-01

WT-A-04

WT-A-09

WT-B-01WT-B-04

WT-B-09

WT-B-10

WT-C-01

WT-C-04

WT-C-09

WT-C-10

Joint t1

Join

tt 2

Locally joint score plot, metabolites

ABC

A

−0.1 −0.05 0 0.05 0. 1−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08

G3-A-02

G3-A-05

G3-A-07

G3-A-09G3-B-02

G3-B-05

G3-B-07

G3-B-09

G3-C-02

G3-C-05

G3-C-07

G3-C-09

G5-A-01G5-A-05

G5-A-06

G5-A-09

G5-B-05

G5-B-06

G5-B-09

G5-C-05

G5-C-06

G5-C-09

WT-A-01

WT-A-04

WT-A-09

WT-B-01

WT-B-04

WT-B-09

WT-B-10WT-C-01

WT-C-04

WT-C-09

WT-C-10

Joint t1

Join

tt 2

Locally joint score plot, protein s

ABCC

−0.6 −0. 4 −0. 2 0 0. 2 0. 4 0. 6 0. 8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

Raffinose

Myo-inositol

Glucose 6-pho sphate

Galactinol

Succinate

Fatty acid

Inositol

Glucose-6-phosphate

1,2,3-Butantriol

Phosphate-compound

Asparagine

Oxo-

Succinate

Sugar methoxyamine

Sucrose

Quinic acid

L-Tryptophan

AsparagineAsparagine

Serotonin

Ethanolamine

3-Cyanoalanine

Sitosterol, β-

β-Alanine

Gluconic acid

Valine

Myo-inositol

Joint pcorr,1

Join

tp

corr,2

Locally joint correlation loadings, metab olitesIdentified and strongNot identified but strong

−0.8 −0. 6 −0. 4 −0. 2 0 0. 2 0. 4 0. 6

−0.6

−0.4

−0.2

0

0.2

0.4

Cinnamyl alcohol dehydrogenase.Caffeoyl-CoA O-methyltransferase

Tubulin α chai n

Glyceraldehyde-3-pho sphate dehydrogenase

Malate dehydrogenase

Cobalamin

Adenosine kinase

Sucrose synthase

NAD-dependent malate dehydrogenaseUbiquitin Heat shock protein

Isoflavone reductase

4-coumarate:CoA ligas e.

Sucrose synthase

Cobalamin

Coba n

Elongation factor

Peroxiredoxin

Glucose-6-phosphate dehydrogenase

Adenosine kinase

Joint pcorr,1

Join

tp

corr, 2

Locally joint correlation loadings, pro tein s

Identified and strongNot identifie d but strong

NAD-dependent malate dehydrogenaselami Hydroxymethyltransferase

Adenosine kinase

Sucrose synthase

-

B

D

2- glutarate

Hydroxymethyltransferase

Fig. 7. Locally joint score and loading plots between metabolites and proteins. (A) The first and second locally joint score vectors of the metabolites. (B) The first and secondlocally joint correlation loadings for the metabolites. (C) Corresponding locally joint score plot for the proteins. (D) Corresponding locally joint correlation loading plot forthe proteins. There is a significant separation between the metabolite profiles of internodes B and C, which seems to be related to differential levels of participants (and thuspresumably activity) of the Krebs cycle, e.g. the highlighted metabolites succinate and 2-oxo-glutarate.

seems to be some locally joint or unique variation. The model ofthe locally joint variation was also very good, with modified RVcoefficients higher than or equal to 0.95 in both the full and partialapproaches. The full approach extracted the true unique variationfrom X3 and X6, close to the true variation in X1, X4 and X5, butnot very close in X2. The partial approach extracted the true uniquevariation in all matrices except X2. Since X2 had the lowest modifiedRV coefficients in the globally joint, locally joint and unique mod-els, some variation apparently spills from the global to the local andunique models and/or from the local to the unique model. A smalloverlap was confirmed by the modified RV coefficients between theextracted models and the true components (not shown here).

The correlations between pairs of score vectors increases asthe purity of components increases, i.e. as the locally joint and/orunique variation is extracted. The sum of all pair-wise correlationsbetween the globally joint components in these models was greaterthan 9.16 for all matrices (close to the maximum possible value,10), indicating very good extraction. Compare to Fig. 7(a) and (b)in [27].

The correlations between the weights and loadings for the twocomponents in these models also indicate that the true underlyingcomponents were extracted, because the lowest value is 1.90, veryclose to the maximal attainable value of 2 (if all non-joint varia-tion has been removed, corresponding weight and loading vectorsshould all have correlations close to 1).

A simulation study was conducted to investigate the generalproperties of the two model approaches and to compare OnPLSand nPLS models with different numbers of matrices (3–4), sizes(60 × 40–60 × 80) and numbers of globally joint (1–3), locally joint(1–3) and unique (1–3) components (with 1% normally distributednoise added in each case); totalling 108 models for each approach.These 108 models thus yielded 108 values for each measure.

Finding the locally joint components took almost three timeslonger on average using the full approach than the partial approachwhen n = 3, and more than six times longer when n = 4.

We calculated 108 modified RV coefficients for each matrixbetween the true and extracted globally joint components. The sumof these modified RV coefficients for each model was larger for the

Author's personal copy

22 T. Löfstedt et al. / Analytica Chimica Acta 791 (2013) 13– 24

OnPLS models (2.64 when n = 3 and 3.57 when n = 4, on average)than for the corresponding nPLS models (1.75 when n = 3 and 2.35when n = 4, on average) for all but one case (99.1%), and in the casewhere the RV sum was smaller for the OnPLS model the differencewas only about 0.1.

The sum of modified RV coefficients for each locally jointmodel was slightly larger for the full approach than for the par-tial approach when n = 3, but the sum was larger for the partialapproach when n = 4, by about 0.2. The sum of modified RVcoefficients for each unique model was larger for the partialapproach than for the full approach, by about 0.1 when n = 3 andabout 0.5 when n = 4. The sums of correlations between pairs ofscore vectors for the true and extracted components were higherin the OnPLS models than in the corresponding nPLS models. Simi-larly, the sum of correlations between weight and loading vectors inthe OnPLS models were higher than their sums in the nPLS models.

The amount of joint variation found (R2 values for each matrix)for the OnPLS and nPLS models were compared to the true amountof joint variation. The correlations between the 108 model valuesand the 108 true values were higher in all cases for OnPLS modelscompared to the nPLS models.

The locally joint variation found for the OnPLS models was alsocompared to the true amount of locally joint variation. The corre-lations were similar, but on average slightly higher for the partialapproach than for the full approach. A corresponding comparison ofthe unique OnPLS variation and true variation showed that the cor-relation for each matrix was slightly higher for the partial approachthan for the full approach.

4.2. Integrated analysis of the metabolomics, proteomics andtranscriptomic data

An OnPLS model of the omic data (Example 2) was constructedwith: three globally joint components; four locally joint X1–X2components; three locally joint X1–X3 components; three locallyjoint X2–X3 components; and twelve, twelve and eight unique com-ponents for each matrix.

The amount of variation explained by the different parts of themodel for each data set is presented in Fig. 5. The globally joint vari-ation is the part of each matrix that is shared with all other matrices,the locally joint variation is the part of each matrix shared withone of the other matrices and the unique variation is systematicvariation that is not represented in any of the other matrices.

The same two main effects that were found by the global modelin [24] were also found by the global OnPLS model. The first maineffect is an internode gradient reflecting the developmental stage ofthe stem samples, as illustrated in Fig. 6(A), and the second is a sep-aration of genotypes G3 and G5 from the wild-type, as illustratedin Fig. 6(B). Similar conclusions can be drawn from the correlationloading plots of the globally joint OnPLS model as in [24] since thesame, or similar, variables end up in corresponding positions inthese plots (not shown).

The globally joint models are thus very similar when usingOnPLS and sequential applied O2PLS. The benefit of OnPLS (in addi-tion to having a clear objective function and being symmetric) isthat it separates locally joint and unique variation, as will be illus-trated next.

The first and second locally joint score vectors in a local modelbetween metabolites and proteins are plotted in Fig. 7(A) and (C).The corresponding locally joint correlation loadings are plottedin Fig. 7 (B) and (D). The variables highlighted in the correlationloading plots were found as the 25 variables, identified by Bylesjöet al. [24], with the greatest Mahalanobis distances from the origin.Note that this is essentially an O2PLS model between metabo-lites and proteins, but where the variation that is related between

metabolites, proteins and transcripts has been removed. This modelis thus unrelated to the transcript data.

The score plots mainly separates the internode B samples fromthe internode C samples, particularly so for the metabolites (wherea t-test gives p ≈ 0.01). The proteins do not have a significant sep-aration on the 95% level (the separation is marginally significantwith p ≈ 0.057), but their local score vectors are strongly correlatedto the local score vectors of the metabolite data (for which the cor-relation is greater than 0.86 for all four locally joint components). Inaddition, the first two locally joint components capture 15% of thevariation in the metabolites and 11% of the variation in the proteins,so the separation is considered biologically interesting.

Four distinct groups can be observed among the identifiedmetabolites that strongly contribute to the correlation loadings ofthe metabolite data (Fig. 7 (B)). After searching the KEGG (KyotoEncyclopedia of Genes and Genomes) database we saw that fourof the identified metabolites (ˇ-alanine, asparagine, succinate and2-oxo-glutarate) are all involved in the alanine, aspartate and gluta-mate metabolism pathway, while as mentioned above two others(succinate and 2-oxo-glutarate) are involved in the Krebs (citric acid)cycle.

In the correlation loading plot for the protein data, Fig. 7 (D),we can observe three different groups of loadings. However, only afew of the corresponding proteins were identified by Bylesjö et al.[24], one of which was malate dehydrogenase, an enzyme catalyzingthe conversion of malate to oxalacetic acid in the Krebs cycle. Theseparation between internodes B and C in the local model couldthus be related to the Krebs cycle. This separation is different fromthe internode effect that was found in the globally joint model inFig. 6(A) and is not present in the transcript data.

5. Discussion and conclusions

This paper presents a novel extension of the recently publishedOnPLS multiblock data analysis method [27]. The method is wellsuited for data exploration studies where the objective is to findjoint structures between several different sets of measurements.OnPLS finds joint structures within all matrices, within all subsetsof measurements and also the unique structures that only exist inone of the blocks.

When OnPLS was used, before the extension presented here,the local and unique variation was lumped together and analysedtogether, but in this paper the OnPLS method has been extended toallow decompositions of global, local and unique variation in multi-block models. The variation that is shared with some, but not all,other matrices is found and denoted locally joint. The remainingvariation that is left, after the globally joint and locally joint varia-tion has been extracted, is by construction unique.

The extended OnPLS algorithm that is presented in this paperapplies the “original” OnPLS algorithm [27] recursively on smallerand smaller subsets of blocks to find variation that is joint betweenall combinations of blocks. OnPLS achieves this by applying O2PLSon pairs of matrices, finding an analogue to the weight matrices inO2PLS for each block and then using the same approach as in O2PLSto extract all variation that is not joint between all involved blocks.

OnPLS can handle any number of blocks, but the complexity ofthe model, and the computational burden, increases with n, thenumber of blocks. Perhaps the most natural way to find the locallyjoint variation is to examine successively smaller subsets of matri-ces and find the joint variation between them. The computationalburden of this approach, called the full approach in this paper, growsexponentially with n. Because of this the partial approach was sug-gested as a computationally less demanding approach to the sameproblem. It was shown in this paper that these two approachesboth give satisfying and comparable results. Both approaches were

Author's personal copy

T. Löfstedt et al. / Analytica Chimica Acta 791 (2013) 13– 24 23

designed in this way to be symmetic, i.e. they do not favour anyorder in which the blocks are analysed.

An OnPLS model is interpreted in much the same way asits related methods (such as Generalised cacnonical correlation,MAXDIFF, PLS-PM and O2PLS, and even PLS regression). But OnPLShas an interpretational advantage compared to previous methodsin that joint structures between subsets of blocks, or unique struc-tures within one block, can be interpreted separately.

The synthetic examples presented in this paper showed thatthe proposed method is capable of extracting relevant variation forall models (global, local and unique) and that both the proposedapproaches give satisfying results. The real example showed thatthe OnPLS method is able to extract biologically relevant informa-tion from both the global and local models of metabolite, proteinand transcript data. The extended OnPLS method found local vari-ation between metabolites and proteins, that turned out to containbiologically relevant information pertaining to the Krebs cycle.

The main drawback of the extended OnPLS method, as it standscurrently, is in how to decide how many components to extractin each model. Full cross validation of all submodels is not feasiblebecause of the excessive amount of models built, and especially so ifthe blocks are large. The results change when different numbers ofcomponents are extracted, and the components will be distributeddifferently among the submodels. While we are confident fromexperience that the, fairly simple, model validation rules we usedin this work work well, this is still an important topic for futurework.

This paper shows that the extended OnPLS method gives satis-fying and important decompositions of a set of matrices in partsthat are related between all of the, between subsets of them andparts that are unique in each block.

Acknowledgements

Many thanks to Dr. Max Bylesjö for helping us use the data from[24]. This research was supported by the Swedish Research Council(JT) grant no. 2011-6044, MKS Umetrics AB (TL), and the SwedishNational Strategic e-Science Research Program eSSENCE (JT).

Appendix A. The OnPLS algorithm

Algorithm 1. The OnPLS algorithmInput: A set of n matrices Xi , for i = 1, . . ., n, and adjacency matrix COutput: Globally joint score matrices TG,i , weight matrices WG,i andloadings PG,i; and non-globally joint score matrices TLU,i , weightmatrices WLU,i and loadings PLU,i

1: {Find pairwise joint spaces}2: for i = 1 to number of matrices n do3: for j = 1 to number of matrices n, with j /= i do

4: Vi,j�i,jWT

i,j← SVD(XT

j Xi)

5: Si ← [Si|Wi,j]6: end for

7: Wi�iVTi ← SVD(Si)

8: end for9: {Build non-globally joint model}10: for i = 1 to number of matrices n do11: for a = 1 to number of non-globally joint components in Xi do

12: Ti ← XiWi

13: XLU,i ← Xi − TiWT

i

14: wLU,i,a ← EIG(XT

LU,iTiTT

i XLU,i)15: tLU,i,a← XiwLU,i,a

16: pLU,i,a ← XTi tLU,i,a/(tT

LU,i,atLU,i,a)17: Xi ← Xi − tLU,i,apT

LU,i,a

18: WLU,i← [WLU,i|wLU,i,a]19: TLU,i← [TLU,i|tLU,i,a]20: PLU,i← [PLU,i|pLU,i,a]21: end for

22: end for23: {Build joint nPLS model}24: for a = 1 to number of globally joint components do25: ({wG,i,a}, {tG,i,a}, {pG,i,a}) ← nPLS({Xi})26: for i = 1 to number of matrices n do27: Xi ← Xi − tG,i,apT

G,i,a

28: WG,i← [WG,i|wG,i,a]29: TG,i← [TG,i|tG,i,a]30: PG,i← [PG,i|pG,i,a]31: end for32: end for

Appendix B. The full and partial approaches

Algorithm 2. The full approachInput: A set of matrices Xi , where i = 1, . . ., nOutput: Globally joint weight, score and loading matrices WG,i , TG,i andPG,i; locally joint weight, score and loading matrices WL,i , TL,i and PL,i;unique score and loading matrices TU,i and PU,i; and residual matricesEi .1: for cardinalities C ← n, . . ., 2 do2: loop3: fmax← 04: Mmax← ()5: for all subsets {Xk} ⊆ {Xi} of cardinality |{Xk}| = C do6: if Connected({Xk}) then7: ({wk}, {tk}, {pk}) ← OnPLS({Xk})8: if StrongEnough({wk}, {tk}, {pk}) then9: if ModelQuality({wk}, {tk}, {pk}) > fmax then10: fmax← ModelQuality({wk}, {tk}, {pk})11: Mmax← ({Xk}, {wk}, {tk}, {pk})12: end if13: end if14: end if15: end for16: if fmax = 0 then17: Break loop.18: end if19: for all Xj ∈ {Xk} in Mmax do20: Xj ← Xj − tjpT

j

21: ifC = n then22: WG,j← [WG,j|wj]23: TG,j← [TG,j|tj]24: PG,j← [PG,j|pj]25: else26: WL,j← [WL,j|wj]27: TL,j← [TL,j|tj]28: PL,j← [PL,j|pj]29: end if30: end for31: end loop32: end for33: for i ← 1, . . ., n do34: TU,iP

TU,i ← PCAcv(Xi)

35: end for

Algorithm 3. The partial approachInput: A set of matrices Xi , where i = 1, . . ., nOutput: Globally joint weight, score and loading matrices WG,i , TG,i andPG,i; locally joint weight, score and loading matrices WL,i , TL,i and PL,i;unique score and loading matrices TU,i and PU,i; and residual matricesEi .1: findMoreComponents ← True

2: while findMoreComponents = True do3: loop4: ({wi}, {ti}, {pi}) ← nPLS({Xi})5: if ¬StrongEnough({wi}, {ti}, {pi}) then6: findMoreComponents ← False7: Break loop.8: end if9: ({Xk}, {wk}, {tk}, {pk}) ← RemoveWeakest({Xi}, {wi}, {ti}, {pi})10: if ¬Connected ({Xk}) ∨ |{Xk}| < 2 do11: findMoreComponents ← False12: Break loop.13: end if14: if StrongEnough({wk}, {tk}, {pk})15: for all Xj ∈ {Xk} do

Author's personal copy

24 T. Löfstedt et al. / Analytica Chimica Acta 791 (2013) 13– 24

16: Xj ← Xj − tjpTj

17: if |{Xk}| = n then18: WG,j← [WG,j|wj]19: TG,j← [TG,j|tj]20: PG,j← [PG,j|pj]21: else22: WL,j← [WL,j|wj]23: TL,j← [TL,j|tj]24: PL,j← [PL,j|pj]25: end if26: end for27: Break loop.28: end if29: end loop30: end while31: for i ← 1, . . ., n do32: TU,iP

TU,i ← PCAcv(Xi)

33: end for

References

[1] J. van der Greef, P. Stroobant, R. van der Heijden, The role of analyti-cal sciences in medical systems biology, Curr. Opin. Chem. Biol. 8 (2004)559–565.

[2] A. Fukushima, M. Kusano, H. Redestig, M. Arita, K. Saito, Integrated omicsapproaches in plant systems biology, Curr. Opin. Chem. Biol. 13 (2009)532–538.

[3] S.E. Richards, M.-E. Dumas, J.M. Fonville, T.M. Ebbels, E. Holmes, J.K. Nicholson,Intra- and inter-omic fusion of metabolic profiling data in a systems biologyframework, Chemometr. Intell. Lab. Syst. 104 (2010) 121–131.

[4] J.M.F. Ten Berge, H.A.L. Kiers, V. Van der Stel, Simultaneous component analysis,Stat. Appl. 4 (1992) 277–392.

[5] A.K. Smilde, J.A. Westerhuis, S. de Jong, A framework for sequential multiblockcomponent methods, J. Chemometr. 17 (6) (2003) 323–337.

[6] P. Casin, A generalization of principal component analysis to K sets of variables,Comput. Stat. Data Anal. 35 (4) (2001) 417–428.

[7] S. Wold, N. Kettaneh, K. Tjessem, Hierarchical multiblock PLS and PC modelsfor easier model interpretation and as an alternative to variable selection, J.Chemometr. 10 (1996) 463–482.

[8] J.A. Westerhuis, T. Kourti, J.F. MacGregor, Analysis of multiblock and hierarchi-cal PCA and PLS models, J. Chemometr. 12 (1998) 301–321.

[9] J.R. Kettenring, Canonical analysis of several sets of variables, Biometrika 58 (3)(1971) 433–451.

[10] J.P. Van de Geer, Linear relations among k sets of variables, Psychometrika 49(1) (1984) 79–94.

[11] M. Hanafi, H.A.L. Kiers, Analysis of k sets of data with differential emphasis onagreement between and within sets, Comput. Stat. Data Anal. 51 (3) (2006)1491–1508.

[12] M. Tenenhaus, V. Esposito Vinzi, Y.-M. Chatelin, C. Lauro, PLS path modeling,Comput. Stat. Data Anal. 48 (1) (2005) 159–205.

[13] M. Hanafi, PLS path modelling: computation of latent variables with the esti-mation mode B, Comput. Stat. 22 (2007) 275–292.

[14] H. Wold, Partial least squares, in: S. Kotz, N.L. Johnson (Eds.), Encyclopedia ofStatistical Sciences, vol. 6, Wiley, 1985, pp. 581–591.

[15] H. Wold, Nonlinear iterative partial least squares (NIPALS) modelling: somecurrent developments, in: P.R. Krishnaiah (Ed.), Multivariate Analysis, Aca-demic Press, New York, 1973, pp. 383–407.

[16] M. Tenenhaus, M. Hanafi, A bridge between PLS path modeling and multi-blockdata analysis, in: V. Esposito Vinzi, W.W. Chin, J. Henseler, H. Wang (Eds.), Hand-book of Partial Least Squares: Concepts, Methods and Applications, Springer,Berlin, 2010.

[17] M. Tenenhaus, V. Esposito Vinzi, PLS regression, PLS path modeling and gen-eralized procrustean analysis: a combined approach for multiblock analysis, J.Chemometr. 19 (3) (2005) 145–153.

[18] O.M. Kvalheim, History philosophy and mathematical basis of the latent vari-able approach–from a peculiarity in psychology to a general method foranalysis of multivariate data, J. Chemometr. 26 (6) (2012) 210–217.

[19] R.C. Pinto, J. Trygg, J. Gottfries, Advantages of orthogonal inspection in chemo-metrics, J. Chemometr. 26 (6) (2012) 231–235.

[20] S. Wold, H. Antti, F. Lindgren, J. Öhman, Orthogonal signal correction of near-infrared spectra, Chemometr. Intell. Lab. Syst. 44 (1998) 175–185.

[21] J. Trygg, S. Wold, Orthogonal projections to latent structures (O-PLS), J.Chemometr. 15 (2002) 1–18.

[22] J. Trygg, O2-PLS for qualitative and quantitative analysis in multivariate cali-bration, J. Chemometr. 16 (2002) 283–293.

[23] J. Trygg, S. Wold, O2-PLS, a two-block (X-Y) latent variable regression (LVR)method with an integral OSC filter, J. Chemometr. 17 (2003) 53–64.

[24] M. Bylesjö, R. Nilsson, V. Srivastava, A. Grönlund, A.I. Johansson, S. Jansson, J.Karlsson, T. Moritz, G. Wingsle, J. Trygg, Integrated analysis of transcript proteinand metabolite data to study lignin biosynthesis in hybrid aspen, J. ProteomeRes. 8 (1) (2009) 199–210.

[25] I. Måge, B.-H. Mevik, T. Næs, Regression models with process variables and par-allel blocks of raw material measurements, J. Chemometr. 22 (2008) 443–456.

[26] I. Måge, E. Menichelli, T. Næs, Preference mapping by PO-PLS: Separating com-mon and unique information in several data blocks, Food Qual. Prefer. 24 (1)(2012) 8–16.

[27] T. Löfstedt, J. Trygg, OnPLS – a novel multiblock method for the modelling ofpredictive and orthogonal variation, J. Chemometr. 25 (2011) 441–455.

[28] T. Löfstedt, M. Hanafi, G. Mazerolles, J. Trygg, OnPLS path modelling,Chemometr. Intell. Lab. Syst. 118 (2012) 139–149.

[29] A.K. Smilde, H.A.L. Kiers, S. Bijlsma, C.M. Rubingh, M.J. van Erk, Matrix corre-lations for high-dimensional data: the modified RV-coefficient, Bioinformatics25 (3) (2009) 401–405.