Colocalization of fluorescence and Raman microscopic images for the identification of subcellular...

18
Colocalization of Fluorescence and Raman Microscopic Images for the Identification of Subcellular Compartments: A Validation Study Sascha D. Krauß, Dennis Petersen, Daniel Niedieker, Inka Fricke, Erik Freier, Samir F. El-Mashtoly, Klaus Gerwert, and Axel Mosig * Received Xth XXXXXXXXXX 20XX, Accepted Xth XXXXXXXXX 20XX First published on the web Xth XXXXXXXXXX 200X DOI: 10.1039/b000000x A major promise of Raman microscopy is the label-free detailed recognition of cellular and subcellular structures. To this end, identifying colocalization patterns between Raman spectral images and fluorescence microscopic images is a key step to annotate subcellular components in Raman spectroscopic images. While existing approaches to resolve subcellular structures are based on fluorescence labeling, we propose a combination of a colocalization scheme with subsequent training of a su- pervised classifier that allows label-free resolution of cellular compartments. Our colocalization scheme unveils statistically significant overlapping regions by identifying correlation between the fluorescence color channels and clusters from unsuper- vised machine learning methods like hierarchical cluster analysis. The colocalization scheme is used as a pre-selection to gather appropriate spectra as training data. These spectra are used in the second part as training data to establish a supervised random forest classifier to automatically identify lipid droplets and nucleus. We validate our approach by examining Raman spec- tral images overlaid with fluorescence labelings of different cellular compartments, indicating that specific components may indeed be identified label-free in the spectral image. A Matlab implementation of our colocalization software is available at http://www.mathworks.de/matlabcentral/fileexchange/46608-frcoloc. 1 Introduction Identifying overlapping observations between different micro- scopic images of one and the same sample has been a recurrent topic in microscopic image analysis. While corresponding ap- proaches to identify colocalization patterns between two flu- orescence microscopic images are well-established 1,2 , there are essentially no established approaches for advanced mi- croscopic setups where samples are measured across differ- ent types of microscopes. Yet, cross-microscopy-plattform studies are gaining popularity and relevance. One setting where cross-platform image analysis takes an important role is the combination of Raman microscopy with fluorescence microscopy in order to obtain a label-free protocol to resolve subcellular compartments of cultured cells 3 . A similar set- ting is found in studies combining other types of vibrational microscopy such as coherent anti-Stokes Raman scattering (CARS) 4 or infrared (IR) microscopy 5 with either fluores- cence or brightfield microscopy. In these applications, corre- lating observations between vibrational spectroscopic images and fluorescence or histopathological staining images is re- Electronic Supplementary Information (ESI) available: [details of any supplementary information available should be included here]. See DOI: 10.1039/b000000x/ * Department of Biophysics, Ruhr-University Bochum, Univer- sit¨ atsstr. 150, 44780 Bochum, Germany. Tel: +49-234-32-29827; E- mail: [email protected] quired to obtain training data for supervised classifiers, which allow to resolve compartments of cellular or tissue material without labeling, using only vibrational microscopy. The main step for the colocalization task is to use fluores- cence as a means of “annotation” of spectral images, so that representative reference spectra of different cellular compart- ments can be collected based on an overlay between a Raman image and a fluorescence microscopic image. These reference spectra can subsequently be used for training a supervised classifier 4 or interpolating contributions of different compart- ments to an observed location spectrum 3 . Obtaining suitable reference spectra, however, turns out to be a delicate task. A naive approach would be to use spectra from all positions where the fluorescence intensity exceeds a suitable threshold value. However, this would produce a heterogeneous data set for several reasons. This may for instance result from small differences in the z-layer between fluorescence and Raman image, and leads to an imperfect overlay that generally cannot be compensated. Also, differences in confocal volume lead to slight morphological differences between the fluorescence image and the Raman spectral image. To compensate these shortcomings and obtain consistent spectra to train supervised classifiers, one can presegment the spectral image, aiming to identify a segment that has the best possible overlap with the above-threshold positions in the fluorescence image. In this work, we present a systematic computational ap- 1–18 | 1

Transcript of Colocalization of fluorescence and Raman microscopic images for the identification of subcellular...

Colocalization of Fluorescence and Raman Microscopic Images for theIdentification of Subcellular Compartments: A Validation Study†

Sascha D. Krauß, Dennis Petersen, Daniel Niedieker, Inka Fricke, Erik Freier, Samir F. El-Mashtoly,Klaus Gerwert, and Axel Mosig∗

Received Xth XXXXXXXXXX 20XX, Accepted Xth XXXXXXXXX 20XXFirst published on the web Xth XXXXXXXXXX 200XDOI: 10.1039/b000000x

A major promise of Raman microscopy is the label-free detailed recognition of cellular and subcellular structures. To thisend, identifying colocalization patterns between Raman spectral images and fluorescence microscopic images is a key step toannotate subcellular components in Raman spectroscopic images. While existing approaches to resolve subcellular structuresare based on fluorescence labeling, we propose a combination of a colocalization scheme with subsequent training of a su-pervised classifier that allows label-free resolution of cellular compartments. Our colocalization scheme unveils statisticallysignificant overlapping regions by identifying correlation between the fluorescence color channels and clusters from unsuper-vised machine learning methods like hierarchical cluster analysis. The colocalization scheme is used as a pre-selection to gatherappropriate spectra as training data. These spectra are used in the second part as training data to establish a supervised randomforest classifier to automatically identify lipid droplets and nucleus. We validate our approach by examining Raman spec-tral images overlaid with fluorescence labelings of different cellular compartments, indicating that specific components mayindeed be identified label-free in the spectral image. A Matlab implementation of our colocalization software is available athttp://www.mathworks.de/matlabcentral/fileexchange/46608-frcoloc.

1 Introduction

Identifying overlapping observations between different micro-scopic images of one and the same sample has been a recurrenttopic in microscopic image analysis. While corresponding ap-proaches to identify colocalization patterns between two flu-orescence microscopic images are well-established1,2, thereare essentially no established approaches for advanced mi-croscopic setups where samples are measured across differ-ent types of microscopes. Yet, cross-microscopy-plattformstudies are gaining popularity and relevance. One settingwhere cross-platform image analysis takes an important roleis the combination of Raman microscopy with fluorescencemicroscopy in order to obtain a label-free protocol to resolvesubcellular compartments of cultured cells3. A similar set-ting is found in studies combining other types of vibrationalmicroscopy such as coherent anti-Stokes Raman scattering(CARS)4 or infrared (IR) microscopy5 with either fluores-cence or brightfield microscopy. In these applications, corre-lating observations between vibrational spectroscopic imagesand fluorescence or histopathological staining images is re-

† Electronic Supplementary Information (ESI) available: [details of anysupplementary information available should be included here]. See DOI:10.1039/b000000x/∗ Department of Biophysics, Ruhr-University Bochum, Univer-sitatsstr. 150, 44780 Bochum, Germany. Tel: +49-234-32-29827; E-mail: [email protected]

quired to obtain training data for supervised classifiers, whichallow to resolve compartments of cellular or tissue materialwithout labeling, using only vibrational microscopy.

The main step for the colocalization task is to use fluores-cence as a means of “annotation” of spectral images, so thatrepresentative reference spectra of different cellular compart-ments can be collected based on an overlay between a Ramanimage and a fluorescence microscopic image. These referencespectra can subsequently be used for training a supervisedclassifier4 or interpolating contributions of different compart-ments to an observed location spectrum3. Obtaining suitablereference spectra, however, turns out to be a delicate task.A naive approach would be to use spectra from all positionswhere the fluorescence intensity exceeds a suitable thresholdvalue. However, this would produce a heterogeneous data setfor several reasons. This may for instance result from smalldifferences in the z-layer between fluorescence and Ramanimage, and leads to an imperfect overlay that generally cannotbe compensated. Also, differences in confocal volume leadto slight morphological differences between the fluorescenceimage and the Raman spectral image. To compensate theseshortcomings and obtain consistent spectra to train supervisedclassifiers, one can presegment the spectral image, aiming toidentify a segment that has the best possible overlap with theabove-threshold positions in the fluorescence image.

In this work, we present a systematic computational ap-

1–18 | 1

proach to utilize colocalization across different microscopyplatforms. This colocalization approach yields supervisedclassifiers, for which we introduce an appropriate validationmeasure, which allows us to systematically assess the robust-ness across a larger set of samples. Our approach utilizesideas developed in the context of analyzing colocalization be-tween two fluorescence images. Based on presegmentations,our colocalization procedure naturally carries to constellationsinvolving other combinations of microscopes.

Our reference application of resolving the subcellular orga-nization of cells is an important foundation for studying thefunction of proteins, with applications ranging from identi-fying disease related location patterns6–8 to the characteriza-tion of drug response9. While the gold standard for identi-fying cell organelles is fluorescence microscopy10, label-freeapproaches based on Raman3 or CARS4 microscopy promiseto overcome the need for fluorescently labeling of the sam-ple under consideration. In this contribution, we present asystematic validation of such colocalization studies betweenvibrational microspectroscopic and fluorescence microscopicimages. While one variant of this method has been investi-gated previously4, the present contribution provides a moregeneral approach to colocalization involving different colocal-ization measures including a quantitative comparison of thesemeasures. As a guiding example for our study, we investigatethe fully automated identification of nuclei and lipid droplets(LD) in colon and pancreatic cancer cell lines. The knowl-edge about these two organelles is valuable, because their size,morphology, and amount can be signs of cancer and infec-tions11–14.

1.1 Segmentation of Raman Microscopic Images

Raman microscopy allows to characterize cell or tissue sam-ples with a pixel resolution of few hundred nano meters, whereeach pixel location is represented by a Raman emission spec-trum. Biologically or chemically relevant information is com-monly obtained by high dimensional data analysis of the pixelspectra using techniques such as supervised and unsupervisedlearning or factorization methods.

Using Raman (and also CARS) microscopy to resolve dif-ferent parts of subcellular architecture has proven successfulin several studies3,4,15–17, based on a large choice of eitherclustering approaches or interactive segmentation tools18. Inorder to obtain cellular images from the pixel spectra of a mi-crospectroscopic image, Miljkovic et al. 16 compare methodsthat segment the pixel spectra of one dataset into base classes,and categorize the commonly employed approaches into crispclustering where each pixel is assigned one similarity class,and soft clustering where each pixel spectrum is decomposedinto a mixture of several base spectra. Remarkably, the studyby Miljkovic et al. 16 as well as most other studies investigate

unsupervised approaches in the sense that the observed spectraof one dataset are partitioned into base classes. Which of theidentified base classes corresponds to which cellular compart-ment is then left to essentially subsequent visual inspection,e.g. using fluorescence images of the same sample.

The first studies to shift from this unsupervised paradigmto supervised approaches are provided by Klein et al. 3 andBocklitz et al. 19 . Klein et al. 3 systematically overlay a Ra-man spectral image with fluorescence labelings of the samesample. As each organelle to be identified is labeled byone marker protein, they identify Raman spectral bands thatare most informative for one particular organelle by measur-ing mutual information between spectral bands and fluores-cence intensities. These spectral bands are utilized in a su-pervised learning spirit to infer a nonlinear interpolation func-tion, which can predict a fluorescence intensity from a givenpixel spectrum. This results in an intensity image in the spiritof a soft clustering approach. Compared to unsupervised softclustering, and due to the supervised approach of inferring aprediction function, the base class intensities can be assignedto one cellular organelle. Furthermore, supervised approacheswere recently used to automatically identify colon tissue typesincluding adenocarcinoma in Raman spectral datasets20, fol-lowing an annotation-based approach as it is commonly em-ployed in IR microscopy based spectral histopathology5,21. Inthe latter studies, random forests (RF) turned out to be conve-nient tools for supervised classification of both Raman and IRspectra due to their simplicity and efficiency as well as theirrobustness against overfitting.

While in a previous contribution4 a colocalization approachwas introduced to train supervised classifiers for resolvingsubcellular architecture, our present contribution provides asystematic comparison between different correlation mea-sures for this approach, along with a cross-validation schemethat provides a more realistic assessment of the classificationpower than conventional cross-validation. As further contri-butions of this work, we demonstrate that the colocalizationbased training of supervised classifiers originally proposed forCARS data in the above mentioned work also performs on Ra-man spectral images, and assess classifiers for Raman spectra,in particular with respect to different factors such as subcellu-lar organelle, cell type, and confounders.

Supervised classification for resolving subcellular struc-tures has been broadly investigated on the basis of morpho-logical features extracted from fluorescence images9,22–24. Anadvantage of combining label-free Raman microscopy withsupervised classification is that once a supervised classifierhas been trained, it can be applied to new datasets to iden-tify organelles without any fluorescence labeling or visual in-spection of either spectra or segmentations. At the same time,the accuracy of supervised classifiers can be quantified usingwell-established methods. A complication introduced by Ra-

2 | 1–18

man microscopy is that both training and, more importantly,validating these classifiers needs to deal with the presence ofhundreds or thousands of feature vectors for each componentin each cell (namely one vector for each pixel), whereas fluo-rescence microscopic images yield a single feature vector foreach cell and each fluorescence labeling. To deal with theabundance of spectra for training classifiers, our approach fol-lows the procedure typically taken in spectral histopathologyto resolve tissue structure in tissue sections5,21. In these ap-proaches, one first collects training spectra that are represen-tative for different tissue components. Then, based on thesespectra, a supervised classifier is trained. For the validationof Raman spectral classifiers, we use the concept of leave-one-sample-out cross-validation, where all spectra from onesample are assigned to either training or validation set. Thisvalidation scheme facilitates a systematic assessment of therobustness across a larger set of samples, whereas validationin previous studies was either limited to a single sample19 ora small number of samples3, lacking a comprehensive valida-tion measure.

As any supervised classification task, our approach involvesrecruiting training data, which indeed constitutes the core ofour methodological approach. To obtain representative train-ing spectra for different cellular compartments, we overlaythe spectral image with its fluorescence counterpart and per-form a certain colocalization analysis. For this colocalizationanalysis, we employ ideas that have been extensively and suc-cessfully utilized to determine and quantify colocalization be-tween two fluorescence images in previous studies1,2,25–27. Inour setting, one of the two fluorescence images is replaced bya presegmented version of the spectral image. As it is ini-tially unclear what presegmentation of the spectral image willresolve a particular cellular compartment, we systematicallyutilize the hierarchy yielded by hierarchical cluster analysis(HCA), as illustrated in Fig. 1. Our approach to identify rep-resentative spectra for one cellular compartment in fact readsas identifying a branch in the HCA that exhibits the highestdegree of colocalization with the corresponding fluorescenceimage.

1.2 Colocalization Schemes

In order to quantify which area exhibits the highest degreeof colocalization between segments obtained by HCA anda thresholded fluorescence image, we employ colocalizationschemes that have been established for measuring colocaliza-tion between fluorescence images. Several such approacheshave been proposed in the past1,25,26, as surveyed in Bolte andCordelieres 2 . Among these measures, the Pearson correlationcoefficient (PCC) has gained significant popularity. The PCCis defined as

PCC =∑i(Ri −Ravg) · (Gi −Gavg)√

∑i(Ri −Ravg)2 ·∑i(Gi −Gavg)2, (1)

where Ri denotes the intensity of the first color channel (red)at position i, and Ravg the average intensity of the red channel;correspondingly, Gi and Gavg represent the pixel and averageintensities for the second color channel (green).

This motivates us to introduce the following procedure: Forevery possible combination of a cluster and a color channel,the degree of colocalization is calculated according to the PCC(see Fig. 1). As every possible cluster from every level of thedendrogram is checked for colocalization, the clusters with thehighest PCC found for the two or three color channels mightoverlap, which means that they are sub- or supernodes of eachother. If this is the case, only the one with the highest value iskept in this round and for the remaining color channels a newcluster has to be found.

Note that the first cluster chosen may cover a large area ofthe image, which may be much larger than the area coveredby fluorescence foreground. This may prevent the identifca-tion of best matching clusters for other organelles. When as-sessing the suitability of a colocalization measure, it will thusbe of crucial importance to determine the number of uniden-tifiable clusters, which should be as small as possible for anappropriate measure.

2 Methods

2.1 Experimental Materials and Methods

2.1.1 Cell culture. Human pancreatic cancer cellsMIA PaCa-2 (CRl-1420) as well as human colon adenocarcin-moa cells HT29 (HTB-38) were obtained from the AmericanType Culture Collection (kindly provided by Stefan Hahn’slaboratory at Ruhr University Bochum). They were treated asdescribed previously4.

2.1.2 Confocal Raman microscopy. Raman hyperspec-tral data sets were acquired using a confocal Raman micro-scope (Alpha300AR, WITec Inc., Ulm, Germany) coupledto a frequency doubled solid state laser operating at 532 nm(Nd:YAG, max. 40 mW, Reno, USA), using a laser power of10 mW. A 25 µm diameter single-mode optical fiber was usedto couple the laser radiation into a Zeiss microscope. The in-cident laser beam was collimated via an achromatic lens andpassed through a holographic band-pass filter before being fo-cused into the sample through a 60x/1.00 NA water immer-sion objective (Nikon, Japan). The Raman scattered light iscollected with the same objective and passed through a holo-graphic edge filter onto a multi-mode optical fiber (50 µm di-ameter) to a spectrometer equipped with a back-illuminatedelectron multiplying charge coupled device (emCCD) camera

1–18 | 3

Fig. 1 Dendrogram from HCA including exemplary overlays of clus-ters with the fluorescence color channel representing the nucleus. Inthe cell overlay plots the overlay of cluster and fluorescence is shownin yellow, the rest of the fluorescence in red, the rest of the clusterin green and the rest of the cell (neither cluster nor fluorescence) inblack. (A) The best matching cluster (colocalized with a PCC of 0.89)shown in a cell overlay plot and labeled in blue in the dendrogram.(B) The least matching cluster (anti-colocalized with a PCC of -0.43)shown in a cell overlay plot and labeled in orange in the dendrogram.(C) The last cluster consisting of both the best and least matchingclusters, barely colocalized with a PCC of 0.11.

(1600 x 200 px) operating at -60◦C. The sample was locatedon a piezoelectrically driven scanning stage. Raman data setswere obtained by raster-scanning with a pixel size of 0.5 µmfor regions of around 60 µm × 60 µm and exposure time of0.3 s per pixel.

2.1.3 Fluorescence staining and imaging. Afterpermeabilization with 0.2% Triton X-100 for 5 min atroom temperature, the cells were washed with PBS andblocked with 1% bovine serum albumin for 30 min. Thecells were incubated for 10 min with LD540 (4,4-difluoro-2,3,5,6-bis-tetramethylene-4-bora-3a,4a-diaza-s-indacene),washed with PBS-buffer and incubated with 1,5-bis[2-(di-methylamino)ethyl]amino-4,8-dihydroxyanthracene-9,10-dione (DRAQ-5; Cell Signaling Technology, Danvers, USA).The excess fluorescence dyes were removed by PBS-buffer.The fluorescence measurements were performed all the timesequentially on double stained specimen with a confocal laserscanning microscope (Leica TCS SP5 II) using a Leica HCXIRAPO L (25x / 0.95 W) water immersion objective. In orderto enable an optimal match with Raman images, stacks offluorescence images were recorded and the distance betweeneach layer was 0.5 µm.

2.2 Algorithms and Data Analysis

2.2.1 Preprocessing. Cosmic spikes were removed byimpulse noise filter28 and the spectra were interpolated toa reference wavenumber scale. Further data analysis wasperformed on the normalized data in the region between700 cm−1 and 1800 cm−1 and between 2600 cm−1 and3100 cm−1. Spectra from each image data set were hierar-chically clustered based on Ward’s algorithm using Pearson’scorrelation distance to obtain a dendrogram.

The fluorescence images were scaled, clipped and manuallyregistered to the spectral images.

2.2.2 Colocalization Scheme. After hierarchical cluster-ing, each branch in the dendrogram is associated with onearea in the spectral image comprising a group of similar spec-tra. For each branch in the dendrogram, a colocalization indexwith the foreground locations of each corresponding fluores-cence image was computed using PCC. The branch exhibit-ing the highest colocalization index was considered the bestmatching cluster, as formally defined in Supplement 4.1.

Training spectra were extracted from the best matchingcluster based on several post-processing steps, aiming on arestriction of the training spectra area to a “condensed” coreregion. First, 100 intensity thresholds were tested on the flu-orescence images ranging from 1% to 100% intensity. Theimage was binarized by each of these thresholds, and the PCCcomputed. The threshold achieving the highest PCC was keptas the best colocalizing threshold. In other words, the HCAis also utilized in order to find an optimal fluorescence thresh-old for each fluorescence channel. With the binarized versionof the fluorescence and the best matching clusters, additionalenhancements are possible, starting with a connected compo-nents filter: The number of nuclei in the image was given andit is tested whether reducing the number of connected com-ponents (keeping the biggest ones) to the number of nucleialters the degree of correlation (without deleting more thanhalf of the pixels). Then isolated pixels are filtered out bygrain filtering. Finally, lipid droplets were identified by theirspecific marker band at wavenumber 1750 cm-1, and maskedout whenever they were not covered by corresponding fluores-cence foreground.

2.2.3 Implementation. All data processing was imple-mented in MATLAB Version 8.2 along with the Image Pro-cessing and Statistics toolboxes (The MathWorks, Natick,MA).

3 Results and Discussion

3.1 Comparison of Correlation Coefficients

We compared the values of Pearson correlation coefficient(PCC)25, Mander’s overlap coefficient (MOC)1, intensity cor-

4 | 1–18

relation quotient (ICQ)29, and mutual information (MI)3 on aseries of synthetic images involving two color channels (re-ferred to as red and green, respectively). The image seriesstarts with 0% overlap between the red and the green channel,and overlap between the channels was gradually increased to100%, see Supplementary Video 1 for an illustration. To illus-trate the effect of varying overlap on the different coefficients,binary images were used, while relative intensities result in asimilar pattern (Data not shown). The results of these coeffi-cients are plotted against the percentage of red pixels overlap-ping with green pixels (see Fig. 2 A). The ratio of backgroundversus foreground pixels is 1:1, which leads to the desired ef-fect that every coefficient ranges from its minimal to its max-imal possible value. While for the PCC and the ICQ, a nega-tive value indicates anti-colocalization, the MOC has no cor-responding anti-colocalization indicator as it yields only pos-itive values, which are identical to the percentage of overlap.The MI is also limited to positive numbers. Furthermore, theimage series demonstrates a more severe disadvantage of MI,namely that it does not differentiate between the overlap offoreground and background pixels. In other words, the sameMI value is obtained for the same degree of colocalization andanti-colocalization.

In a second series of synthetic images, the ratio of back-ground versus foreground pixels was increased to 1000:1 (seeFig. 2 B). This high proportion of background pixels, which isrealistic as far as small organelles inside cells are concerned,produces very high ICQ values and very low MI values, mak-ing them uninformative. The PCC, however, is sensitive tothis ratio, whereas MOC does not adapt at all when changingthe ratio, as it does not consider the probability of the colocal-ization.

To confirm these findings on non-synthetic data, we inves-tigated an additional set of 75 Raman microscopic imageswith fluorescence counterparts. Beside the nucleus, the cor-responding fluorescence images label two further organelles,including 29 measurements with a combination of the endo-plasmic reticulum (ER) and Golgi apparatus, 13 with ER andmitochondria, 9 with Golgi and peroxisomes, 4 with mito-chondria and peroxisomes, and 20 with Golgi and mitochon-dria. These membrane-rich organelles were used here insteadof lipid droplets as they are more challenging to differentiate4

due to their strong functional and physical connection. There-fore, they are better suited than the more regular morphologi-cal (and also spectral) patterns of lipid droplets to demonstratethe differences of the four measures.

As it turns out, the differences between colocalization mea-sures are reflected by the number of samples in which it wasnot possible to collect training data for at least one of the la-beled organelles represented by a fluorescence color channel,because the better matching organelles did not leave enoughunmatched area in the HCA for the lesser matching ones. For

Fig. 2 Comparison of four different correlation coefficients. Thevalues of Pearson correlation coefficient (solid), mutual information(dashed), Mander’s overlap coefficient (dotted), and intensity corre-lation quotient (dash-dotted) are plotted against the percentage of redpixels overlapping with green pixels. (A) The ratio of backgroundversus foreground pixels is 1:1. It can be seen, that only PCC andICQ indicate anti-colocalization. (B) The ratio of background versusforeground pixels was increased to 1000:1 for a further test. Thishigh proportion of background pixels produces very high ICQ valuesand very low MI values, making them uninformative. PCC adapted tothe new ratio, whereas MOC does not change, as it does not considerthe probability of the colocalization.

1–18 | 5

PCC, this was the case in 29 out of the 75 images (38.6%). Onusing the MI, this number of organelles without training datarose to 43 (57.3%), with the ICQ it was 50 (66.7%) and for theMOC even 61 (81.3%).

This issue is more or less pronounced for different com-binations of organelles, where the worst case occurs for ERwith Golgi, which are both parts of the endomembrane sys-tem. Here, in 55.2% of the images training data cannot befound using the best method (PCC), while it is even 93.1%using MOC. These numbers can be explained by the averagesize of the best matching clusters found by the different mea-sures: 625 pixels by PCC, 652 by MI, 666 by ICQ and 729by MOC. On average the biggest cluster (the nucleus) is fourtimes bigger than the next biggest organelle when identifiedby the PCC, but six times bigger when selected by the MOC.Identifying too large areas as the best matching cluster for theorganelle affects identification of best matching clusters of thesmaller organelles, as the matching clusters already occupy a(too) large area for the nucleus cluster. This problem becomesparticularly obvious for MOC, as its value is determined onlyby the amount of overlap without taking into account back-ground at all. This behavior favors the identification of largeroverlapping areas than the PCC does, where a simultaneousreduction of the two overlapping areas (while keeping the per-centage of overlap) increases the value of the coefficient, whileit does not change the MOC. While this property of MOC maybe desirable under other circumstances, it is inadequate in thecontext of determining best-matching clusters.

It is important to notice that the effect of unidentified bestmatching clusters is not represented in the validation of super-vised classifiers, as no training data to be (mis-)classified willbe contributed to the training data set. This implies that whenassessing the quality of a colocalization-based classifier, thenumber of unidentified clusters for each class is an importantquality indicator.

Overall, our observations on both real data and the two syn-thetic image series clearly support the PCC as the method ofchoice to determine colocalization in this work.

3.2 Supervised Classification of Cell Images

Subsequent to identifying best matching clusters for all fluo-rescence channels, these clusters were used to extract repre-sentative training spectra for training a supervised classifier.As shown in Fig. 3 A-D, the colocalization provides a bestmatching cluster for every organelle, in this case the nucleusin blue and the lipid droplets in red. By superimposing theseclusters with the corresponding fluorescence color channels anarea of overlap appears (shown in yellow), which is the maingoal of this procedure: Using this as a mask to recruit the un-derlying spectra from the Raman image produces a relativelyhomogeneous data set. The mean spectra of the training data

Best matching clusters Fluorescence

Overlap LD Overlap nucleusPCC=.79 PCC=.93

A B

C D

Flu

ore

scen

ce-R

F c

olo

calizatio

n

E F

G H

Random Forest Fluorescence

Overlap LD Overlap nucleus

Flu

ore

scen

ce-H

CA

co

localizatio

n

Fig. 3 Colocalization of Fluorescence with HCA and Random Forest.(A) The best matching clusters for nucleus (blue) and lipid droplets(red). (B) The corresponding fluorescence image. (C) The over-lay (yellow) of the LD fluorescence color channel (red) and its bestmatching cluster (green), colocalized with a PCC of 0.79. (D) Theoverlay of the nucleus fluorescence color channel and its best match-ing cluster, colocalized with a PCC of 0.93. (E) The false color im-age produced by the RF trained on the spectra derived from C&D.(F) The corresponding fluorescence image. (G) The overlay of theLD fluorescence color channel and the corresponding RF class, colo-calized with a PCC of 0.7. (H) The overlay of the nucleus fluores-cence color channel and the corresponding RF class, colocalized witha PCC of 0.96.

sets for lipid droplets, nucleus and the rest class (consistingof all remaining organelles and the cytoplasm) are shown inFig. 4. The spectra gained from the colocalization method areused as training data for a random forest classifier30 using 300trees.

Note that in Fig. 3, the best matching clusters as well as theagreement between random forest versus fluorescence basedsegmentations are indicated by their PCC. In order to addition-ally assess the statistical significance, we computed p-values

6 | 1–18

Fig. 4 Mean spectra of organelles in the training data set. Lipiddroplets, nucleus, and the rest class, consisting of the other organellesand the cytoplasm, are presented. These spectra were automaticallycollected by the colocalization method to obtain a homogeneous dataset for training a random forest.

based on the hypergeometric distribution underlying randomlyscrambled pixels as a null hypothesis26. For all clusters inour dataset, this p-value turns out to be 0, indicating that thecorrelation is significantly different from randomly scrambledpixels and therefore rejecting the null hypothesis of randomoverlap.

3.3 Validation of Classification Results

In general, supervised classifiers can be validated in astraightforward manner using different variants of cross-validation such as leave-one-out, k-fold, or Monte-Carlocross-validation. However, in the case of vibrational mi-crospectroscopy, training data are in a sense more structuredbecause each sample contributes not one, but a large numberof training spectra for each class. In other words, each classin the training data set is further subdivided into samples (seeFig. 5). In this situation, an important question to be addressedthrough a suitable validation scheme is whether spectral vari-ability between samples – e.g. due to variability during samplepreparation – is a potential confounding factor when classify-ing for subcellular compartments. Note that this question isgenerally not addressed by conventional cross validation. Toillustrate this, assume an “outlier” sample where all spectraare biased, e.g. through a strong baseline effect affecting allspectra from the sample. As spectra from this same samplewill be contained in both the training and the validation set,they can be classified with high accuracy during cross valida-

tion. However, in case no spectra from the biased sample arecontained in the training data set, classification of the biasedspectra will fail during validation.

In order to validate our random forest classifiers appropri-ately with regard to sample variability, we performed valida-tion using two different approaches (Fig. 5). First, we per-formed conventional k-fold cross validation (k = 6) on train-ing data obtained from all six available samples. Next, we per-formed leave-one-out cross-validation on a per sample basis,i.e., the validation set was established from all spectra belong-ing to one particular sample (see Fig. 5 for an illustration).Both approaches lead to nearly identically high accuracies. Inorder to simulate high spectral variability between samples,we artificially perturbed all spectra in one of the six samples,and re-evaluated both types of cross-validation. Remarkably,conventional cross-validation was hardly affected by this ar-tifact. While the maximal accuracy of the two versions wasidentical (100%) and the mean was similar at least (99.5%for k-fold vs. 91% for sample-based), the minimal accuracydiffered clearly as it was 99.1% for k-fold and only 55% forleave-one-sample-out. Compared to conventional cross val-idation, this indicates that leave-one-sample-out provides amore realistic assessment of the quality of a spectral classi-fier that also assesses spectral variability between samples, asthere is no overlap of data from the same measurement be-tween the validation and the training data set.

Nonetheless, the classifier achieved sensitivity and preci-sion values of 97-100% and an accuracy of 99.3% on the orig-inal dataset, proving the reliability and consistency of its re-sults and that the colocalization method did produce suitabletraining data sets. Interestingly, these values for the classifiertrained with spontaneous Raman spectral data sets are higherthan that of the classifier trained with CARS results4. This canbe explained in terms of higher spectral resolution of the Ra-man data sets. In addition, the current Raman spectra providemore spectral information (700-1800 and 2700-3100 cm-1)than that of CARS spectra (2700-3000 cm-1). The data setinvolving Golgi, ER, peroxisomes and mitochondria achievesa per-sample cross validation accuracy of 91.2%.Furthermore, the random forest was additionally tested to-wards its ability to reproduce the results of fluorescence. Thedegree of correlation between the organelle localization pre-dicted by the random forest and the fluorescence is presentedon one of these cells (see Fig. 3 E-H), where a high correlationbetween the results of the two methods can be seen. While inthis case a PCC of 0.96 for the position of the nucleus could beobserved, the average on 71 cells was 0.86 (standard deviation0.08). Even when this random forest was tested on a coloncancer cell line (HT29), although being trained on MIA PaCa-2 pancreatic cancer cells, the correlation was at least 0.6 forboth organelles on average. This proves the quality of thesupervised classifier in reproducing the fluorescence images

1–18 | 7

Training Data Nucleus Training Data Lipid Droplets

Co

nv

en

tio

na

l (L

ea

ve

-on

e-o

ut)

Cro

ss

Va

lid

ati

on

Round 1 Training Data Round 1 Validation Data

Round 2 Training Data Round 2 Validation Data

Round 3 Training Data Round 3 Validation Data

Nucleus Spectra from

Sample 1

Nucleus Spectra from

Sample 2

Nucleus Spectra from

Sample 3

LD spectra

from Sample 1

LD spectra

from Sample 2

LD spectra

from Sample 3

Training Data Nucleus Training Data Lipid Droplets

1 2 1 2 3 3

Round 1 Training Data Round 1 Validation Data

1 3 1 3 2 2

Round 2 Training Data Round 2 Validation Data

2 3 2 3 1 1

Round 3 Training Data Round 3 Validation Data

Le

av

e-O

ne

-Sa

mp

le-O

ut

Cro

ss

Va

lid

ati

on

Nucleus Spectra from

Sample 1

Nucleus Spectra from

Sample 2

Nucleus Spectra from

Sample 3

LD spectra

from Sample 1

LD spectra

from Sample 2

LD spectra

from Sample 3

Nucleus Spectra from

Sample 1

Nucleus Spectra from

Sample 2

LD spectra

from Sample 1

LD spectra

from Sample 2

Perturbed Nucleus Spectra

from Sample 3

Perturbed LD spectra from

Sample 3

Dia

go

na

l =

Se

nsitiv

ity

Original Data Perturbed Data

Conventional 6-fold cross validation Conventional 6-fold cross validation

Leave-one-sample-out cross validation Leave-one-sample-out cross validation

Fig. 5 Comparison of conventional and sample-based leave-one-out cross validation. Upper Left. For conventional leave-one-out cross vali-dation each validation round uses one data point from the original training data set for validation, while the remaining data points are used fortraining. Lower Left. In leave-one-sample-out cross-validation the validation data set always consists of all spectra from a complete sample,whereas training data are recruited from all remaining samples. Confusion Matrices. The confusion matrices are based on six measurementsincluding training spectra of Lipid Droplets, Nucleus and Rest class. Artificially perturbing spectra in one sample are hardly visible in conven-tional k-fold cross-validation (upper matrices). In leave-one-sample-out validation, however, sensitivities for different organelles are stronglyaffected (lower right matrix).

without the necessity of using labels or other chemical alter-ations itself.

Relevance of performing HCA. In order to assess the rel-evance of using training spectra obtained from the overlapbetween fluorescence foreground and the best matching clus-ter, we trained a classifier based on spectra from fluorescenceforeground positions for nucleus and lipid droplets, withoutany utilization of HCA. In this setting, a global thresholdin the fluorescence images was determined using the well-established method of Otsu31. As it turns out, the accuracyin leave-one-sample-out cross validation drops from 99.3% to80% in this setting, while the average PCC between the pre-dicted nucleus position and the unthresholded nucleus fluo-rescence drops from .97 to .62 (refer to Supplementary Fig-ure 5 for an example). This can be explained by mismatchesin the fluorescence foreground and the best matching cluster,which seem to be unavoidable und not correctable by registra-tion (see Supplementary Figures 3). Obviously, the utilizationof HCA avoids false training spectra in the training data setas also indicated by Supplementary Figure 4 and thus leads tosignificantly higher accuracy.

3.4 Organelle specificity, cell line specificity, and con-founders

Beside the specificity with respect to subcellular organelles,Raman spectra may also distinguish other conditions. To as-sess this, we trained classifiers to distinguish subcellular or-ganelles of different cell lines. As it turns out, organellesof MIA PaCa-2 cells are spectrally distinguishable from theircounterparts in HT29 cells (Supplement 1, classifier C1). Fur-thermore, a classifier may distinguish spectra from the non-cellular surroundings of samples from the two cell lines (clas-sifier C2). However, as classifier C3 indicates, Raman spec-tra, in particular those observed in areas not covered by anycell, might as well reflect different experimental conditionssuch as fluctuation of laser power or different laser focus.Yet, the transferability of the organelle classifier between celllines described above suggests that the spectral differences be-tween different organelles are sufficiently big not to be over-shadowed by the spectral differences between cell lines orinstrumental conditions. Related phenomena regarding flu-orescence signals in non-cellular surroundings have recentlybeen observed for fluorescence markers of subcellular compo-nents32,33.

While subcellular organelles and cell types are biologi-cally relevant factors, spectral classifiers may also at the same

8 | 1–18

time distinguish factors that are commonly considered con-founders. For example, two different days of experiment canbe distinguished in spectra from areas not covered by cells(classifier C3). For details on the aforementioned classifiers,we refer to Supplement 1.

4 Conclusion

Our approach extends label-free microscopy for live cell imag-ing in several directions. It can be seen as the first appli-cation of Raman microscopy following a completely super-vised paradigm. Furthermore, our approach predicts a crispsegmentation, which makes the result accessible to cross-validation, while soft segmentations are difficult to validatequantitatively. Along the line of quantitative validation, wehave shown that sample-based cross validation may uncoverproblematic effects of spectral variability in the training dataand should be preferred as a more realistic assessment of clas-sification power. The results shown in Fig. 5 clearly indicatethat leave-one-sample-out cross validation can uncover the us-age of unsuitable samples that would have stayed hidden ifconventional k-fold cross validation had been applied. Moregenerally, leave-one-sample-out validation as a more rigid va-lidity measure may also indicate whether the number of sam-ples in the training data set is sufficient to match the spectralvariability between samples. At the same time, the ratio ofunidentified best matching clusters for each class should betaken into account when assessing the quality of a classifier.While it would be of interest for future work, currently noobjective and quantitative validation scheme for either unsu-pervised or supervised soft segmentations is available, neitheron a per-spectrum nor on a per-sample basis.

Beside the specificity towards organelles, we could demon-strate that Raman spectra are at the same time specific towardsother factors, including factors that are commonly consideredas confounders. We also find that Raman subcellular clas-sifiers are transferable (with a loss of accuracy), which hasbeen an issue of investigation recently for fluorescence-basedapproaches34. As the two cell lines under consideration areboth epithelial cells, it may need to be answered in the fu-ture whether classifiers are also transferable to less similar celltypes, for instance stem cells or immune cells. It may also beof future relevance to use our colocalization approach to dis-tinguish cell types, which may be a useful tool for Raman (orCARS) based cell sorting.

Both our present case study of identifying nuclei andlipid droplets, as well as the previous study of identify-ing other cellular compartments4 utilizing our novel colo-calization scheme, support the claim that colocalization ap-proaches are an important ingredient for obtaining label-freemicroscopy protocols. Beyond the identification of cellularcompartments, colocalization schemes may in general also be

useful for resolving tissue structure. In fact, colocalizationstudies between immunohistologically stained tissue sectionsand corresponding IR or Raman microscopic images promisea label-free alternative to immunohistochemistry, which is animportant tool for tissue diagnostics35. Yet, carrying our au-tomated colocalization approach from cells to tissue requiresto deal with artifacts of fluorescence microscopy, which aremuch more pronounced in tissue than they are in cells36.

Just as the quantitative approaches for colocalization in flu-orescence microscopy helped to obtain more reliable con-clusions from fluorescence-based studies, our colocalizationscheme to align observations between fluorescence and Ra-man microscopic images promises an objective and highly re-producible approach for label-free microscopy. As correlat-ing observations on one sample across different types of mi-croscopes has gained popularity recently19,37, colocalizationmeasures provide objective and quantitative means to corre-late observations in settings involving other combinations ofmicroscopes.

Finally, utilizing colocalization measures provides furthersupport to utilize hierarchical clustering in a more advancedmanner. Conventionally, the dendrogram of hierarchicallyclustered image spectra is cut “horizontally” to obtain a seg-mentation into a fixed number of clusters. In Zhong et al. 38 ,however, it has been shown by one of the authors that cut-ting dendrograms through “non horizontal” cuts yields bio-logically more meaningful segmentations for IR image spec-tra. As our newly contributed colocalization scheme generallyalso identifies such non-horizontal cuts, the present study sup-ports this claim also for Raman spectral image segmentation.

Acknowledgement

The authors thank Ralf Erdmann for providing us with LD540staining material.

This research was supported by the Protein Research UnitRuhr within Europe (PURE), Ministry of Innovation, Scienceand Research (MIWF) of North-Rhine Westphalia, Germanyto K. Gerwert and the Center for Vibrational Microscopy(CVM), European Regional Development Fund, EuropeanUnion and North-Rhine Westphalia, Germany.

References1 E. Manders, F. Verbeek and J. Aten, Journal of Microscopy, 1993, 169,

375–382.2 S. Bolte and F. Cordelieres, Journal of microscopy, 2006, 224, 213–232.3 K. Klein, A. M. Gigler, T. Aschenbrenner, R. Monetti, W. Bunk,

F. Jamitzky, G. Morfill, R. W. Stark and J. Schlegel, Biophysical journal,2012, 102, 360–368.

4 S. F. El-Mashtoly, D. Niedieker, D. Petersen, S. D. Krauß, E. Freier,A. Maghnouj, A. Mosig, S. Hahn, C. Kotting and K. Gerwert, Biophysicaljournal, 2014, 106, 1910–1920.

1–18 | 9

5 A. Kallenbach-Thieltges, F. Großeruschkamp, A. Mosig, M. Diem,A. Tannapfel and K. Gerwert, Journal of biophotonics, 2013, 6, 88–100.

6 D. C. Chan, Cell, 2006, 125, 1241–1252.7 B. K. Yoder, X. Hou and L. M. Guay-Woodford, Journal of the American

Society of Nephrology, 2002, 13, 2508–2516.8 P. M. McDonough, R. M. Agustin, R. S. Ingermanson, P. A. Loy, B. M.

Buehrer, J. B. Nicoll, N. L. Prigozhina, I. Mikic and J. H. Price, Assayand drug development technologies, 2009, 7, 440–460.

9 A. E. Carpenter, T. R. Jones, M. R. Lamprecht, C. Clarke, I. H. Kang,O. Friman, D. A. Guertin, J. H. Chang, R. A. Lindquist, J. Moffat et al.,Genome biology, 2006, 7, R100.

10 M. V. Boland, M. K. Markey, R. F. Murphy et al., Cytometry, 1998, 33,366–375.

11 M. T. Accioly, P. Pacheco, C. M. Maya-Monteiro, N. Carrossini, B. K.Robbs, S. S. Oliveira, C. Kaufmann, J. A. Morgado-Diaz, P. T. Bozza andJ. P. Viola, Cancer research, 2008, 68, 1732–1740.

12 P. T. Bozza and J. P. Viola, Prostaglandins, Leukotrienes and EssentialFatty Acids, 2010, 82, 243–250.

13 Y. Yuan, H. Failmezger, O. M. Rueda, H. R. Ali, S. Graf, S.-F. Chin, R. F.Schwarz, C. Curtis, M. J. Dunning, H. Bardwell et al., Science transla-tional medicine, 2012, 4, 157ra143–157ra143.

14 S. Yue, J. Li, S.-Y. Lee, H. J. Lee, T. Shao, B. Song, L. Cheng, T. A.Masterson, X. Liu, T. L. Ratliff et al., Cell Metabolism, 2014, 19, 393–406.

15 C. Matthaus, T. Chernenko, J. A. Newmark, C. M. Warner and M. Diem,Biophysical journal, 2007, 93, 668–673.

16 M. Miljkovic, T. Chernenko, M. J. Romeo, B. Bird, C. Matthaus andM. Diem, Analyst, 2010, 135, 2002–2013.

17 S. F. El-Mashtoly, D. Petersen, H. K. Yosef, A. Mosig, A. Reinacher-Schick, C. Kotting and K. Gerwert, Analyst, 2014, 139, 1155–1161.

18 J. Kolling, D. Langenkamper, S. Abouna, M. Khan and T. W. Nattkemper,Bioinformatics, 2012, 28, 1143–1150.

19 T. W. Bocklitz, A. C. Crecelius, C. Matthaus, N. Tarcea, F. von Eggeling,M. Schmitt, U. S. Schubert and J. Popp, Analytical chemistry, 2013, 85,10829–10834.

20 L. Mavarani, D. Petersen, S. F. El-Mashtoly, A. Mosig, A. Tannapfel,C. Kotting and K. Gerwert, Analyst, 2013, 138, 4035–4039.

21 B. Bird, S. Remiszewski, A. Akalin, M. Kon, M. Diem et al., LaboratoryInvestigation, 2012, 92, 1358–1373.

22 M. V. Boland and R. F. Murphy, Bioinformatics, 2001, 17, 1213–1223.23 N. Orlov, L. Shamir, T. Macura, J. Johnston, D. M. Eckley and I. G. Gold-

berg, Pattern Recognition Letters, 2008, 29, 1684–1693.24 J. Zhou, S. Lamichhane, G. Sterne, B. Ye and H. Peng, BMC bioinformat-

ics, 2013, 14, 291.25 E. Manders, J. Stap, G. Brakenhoff, R. Van Driel and J. Aten, Journal of

cell science, 1992, 103, 857–862.26 S. V. Costes, D. Daelemans, E. H. Cho, Z. Dobbin, G. Pavlakis and

S. Lockett, Biophysical journal, 2004, 86, 3993–4003.27 J. Adler and I. Parmryd, Cytometry Part A, 2010, 77, 733–742.28 G. Judith and N. Kumarasabapathy, Signal and Image Processing: An

International Journal, SIPIJ, 2011, 2, 82–92.29 Q. Li, A. Lau, T. J. Morris, L. Guo, C. B. Fordyce and E. F. Stanley, The

Journal of neuroscience, 2004, 24, 4070–4081.30 L. Breiman, Machine learning, 2001, 45, 5–32.31 N. Otsu, Automatica, 1975, 11, 23–27.32 L. Shamir, Journal of microscopy, 2011, 243, 284–292.33 L. P. Coelho, J. D. Kangas, A. W. Naik, E. Osuna-Highley, E. Glory-

Afshar, M. Fuhrman, R. Simha, P. B. Berget, J. W. Jarvik and R. F. Mur-phy, Bioinformatics, 2013, 29, 2343–2349.

34 X. Chen and R. F. Murphy, Bioinformatics Research and Development,Springer, 2007, pp. 328–342.

35 P. M. Baker and E. Oliva, International Journal of Gynecologic Pathol-

ogy, 2005, 24, 39–55.36 J. Pawley, Handbook of biological confocal microscopy, Springer, 2010.37 R. Masyuko, E. J. Lanni, J. V. Sweedler and P. W. Bohn, Analyst, 2013,

138, 1924–1939.38 Q. Zhong, C. Yang, F. Großeruschkamp, A. Kallenbach-Thieltges, P. Se-

rocka, K. Gerwert and A. Mosig, BMC bioinformatics, 2013, 14, 333.

10 | 1–18

Colocalization of Fluorescence and Raman Microscopic Imagesfor the Identification of Subcellular Compartments: A

Validation StudySupplementary Material

Sascha D. Krauß, Dennis Petersen, Daniel Niedieker,Inka Fricke, Erik Freier, Samir F. El-Mashtoly,

Klaus Gerwert, Axel Mosig

1 Organelle specificity, cell line specificity, and confounders of Ramanspectra

As Raman microscopy is conducted label-free, the image spectra obtained may potentially containsignals specific for different biologically relevant factors as well as signals that are irrelevant or evenconfounding. In fluorescence microscopy, conversely, one or few specific label(s) associated withone condition and one factor (e.g. subcellular organelle) is given in one image. It is important tonotice that in a supervised classification scenario, Raman spectra may in principle be distinguishablewith respect to biologically relevant factors (e.g. cell type, subcellular organelle), while at the sametime being distinguishable with respect to irrelevant or confounding factors (e.g. date of experiment).Correspondingly, in order to assess the distinguishability of different factors, we trained classifiersand evaluated their accuracy in leave-one-sample-out validation in different settings:

(C1) Cell type specificity. We trained a six-class-classifier where the classes for organelle types(nucleus, lipid droplets, rest) are further subdivided into cell-type specific classes. The resultingclassifier achieves an accuracy of 98%, for a complete confusion matrix refer to SupplementaryFigure 1.

(C2) Cell type specificity of spectra from non-cellular surroundings. Spectra from areas not coveredby any cell in five Raman images of HT29 cells along with corresponding spectra in six imagesof MIA PaCa-2 cells were collected. A classifier to distinguish these spectra in the two types ofsamples achieved an accuracy of 99%.

(C3) Confounder specificity of spectra from non-cellular surroundings. We extracted spectra fromareas not covered by any cell in Raman images of 26 MIA PaCa-2 cell samples, where 15measurements were conducted on Day 1, and the reamining 11 conducted on Day 2. As it turnsout, the spectra from the two experimental dates can be distinguished with an accuracy of 96%.

(C4) ”Null experiment”. We trained a classifier on nucleus spectra of four HT29 cells vs. nucleusspectra from four MIA PaCa-2 cells. The labels were shuffled by flipping the labels of two

1

randomly selected cells of each class, resulting in an accuracy of 45%, matching the roughly50% accuracy to be expected by chance. In particular, this supports the claim that the accuraciesobserved above are not due to overfitting.

Predicted classes

Kn

ow

ncla

sse

s

LD

1

N1

R1

LD

2

N2

R2

LD1

N1

R1

LD2

N2

R2

0

20

40

60

80

Supplementary Figure 1: Row-by-row relative confusion matrix of a random forest trained on twocell lines with three organelles each. The classes in this random forest are Nucleus, Lipid Dropletsand Rest, each class in group 1 from MIA PaCa-2 and group 2 from HT29 cells. The values werecalculated according to the concept of leave-one-sample-out cross-validation, the sensitivities can beseen in the diagonal.

2

2 Effect of background correction on segmentation

Several groups have shown that there is no or a weak fluorescence background in Raman measure-ments of cells [3, 1] and that cluster analysis on normalized data allows a good distinction for a certainnumber of cell compartments [2]. Nevertheless, we compared cluster analysis on baseline correctedand uncorrected data and concluded that there are minor differences for our setup on cell measure-ments (Supplementary Figure 2, Figure 3 in the main paper). No peaks are present in the regionselected for normalization and data analysis of the substrate.

Our baseline correction of Raman spectra from cell measurements in media was executed in twosteps. At first the water band at 1645 cm−1 and 3100-3600 cm−1 was subtracted and in the secondstep the residual baseline was removed by a polynomial baseline correction. Afterwards the spectraare normalized in a region between 700 cm−1 and 3100 cm−1. For the subtraction of the water bandendmembers of a Vertex Component Analysis (VCA) were calculated from a set of pure buffer spectra.In a least square solution these endmembers were fitted to the single spectra and subtracted from them.In the second step a sweep algorithm was applied on wavelet-denoised spectra (Daubechies waveletD4) to gain supporting points for the polynomial baseline correction. A fifth order polynomial wasfitted to these supporting points and substracted from the spectra.

Best matching clusters Fluorescence

Overlap LD Overlap nucleusPCC=.8 PCC=.92

A B

C D

Flu

ore

scen

ce-H

CA

co

localizatio

n

Supplementary Figure 2: Effect of background correction on segmentation. Panels A and B displaybest matching clusters and fluorescence, respectively. Panels C and D show overlays between the bestmatching cluster (green) and the corresponding thresholded fluorescence image (red) along with thePCC between the two. The PCC values of .8 and .92 compare to PCCs of .79 and .93, respectively,obtained for uncorrected spectra (see Figure 3 in the main paper).

3

3 Dataset for nucleus and lipid droplet identification

Fluorescence-HCA colocalization

Overlap LD Overlap Nucleus

Supplementary Figure 3: Complete dataset used for the identification of nucleus and lipid droplets.The left column displays overlays between the best matching cluster (green) and the correspondingthresholded fluorescence (red) for lipid droplets, the right column shows the overlay for nucleus.

4

4 Parameter-free extraction of training spectra

In this supplementary section, we provide details on the extraction of spectra for training supervisedclassifiers. In our approach, the only input utilized for extracting training spectra are

• the dendrogram D and

• the fluorescence image I.

The output of the function consists of a set of training spectra. Extracting training spectra involves nofurther parameters, in particular no fluorescence threshold or number of clusters.

When extracting training spectra from a fluorescence image I and spectral image J, we performhierarchical clustering using Ward’s method on the pixel spectra in J to obtain a dendrogram D. IfN denotes the number of pixels in J, dendrogram D is a binary tree with N leaf vertices and N− 1internal vertices, i.e., 2N−1 vertices in total. As a notational convention, we let V denote the set of all2N−1 vertices. Each vertex v ∈V is associated with a binary image G(v), which exhibits intensity 1(white) at the coordinates of all pixel spectra at the leaf vertices below v, and intensity 0 (black) at allother coordinates. The fluorescence image J can be turned into a binary image by setting a thresholdt, so that the thresholded image RI(t) has intensity 1 at all positions where the fluorescence intensityin I exceeds t, and 0 at all other positions.

4.1 Determining a best matching cluster

Our approach starts with determining a best matching cluster in D. To this end, all cluster imagesG(v) for all possible v ∈ V constitute potential best matching clusters, as well as the thresholdedimages RI(t) for all possible thresholds t constitute potential thresholds, where both v and t are to bedetermined automatically without utilizing any further parameters. Note that for any given pair of avertex v and threshold t, we can determine the Pearson correlation PCC(v, t) between G(v) and RI(t)using Equation (1) from the main paper. For pairs (v, t) where the images G(v) and RI(t) display highlysimilar areas, PCC(v, t) will yield a correlation close to 1. If the cluster image and the thresholdedimage disagree, the correlation will be close to 0 or even negative. To identify the highest possibledegree of colocalization between I and J, an optimal pair of vertex vopt and threshold topt are obtainedthrough

(vopt, topt) = argmaxv∈V,t PCCv,t . (1)

The vertex vopt obtained using Eqn. (1) is the formal definition of what is referred to as the bestmatching cluster in the main paper. Due to the statistical origin of Pearson correlation – and as well-established in the corresponding literature on colocalization between fluorescence images (References1,26,27 in the main paper) – a value of PCC(v, t) close to 1 allows the interpretation that there is astatistically significant high degree of agreement between I and J.

It is important to notice the following:

• No user-adjustable parameters are involved in determining the best matching cluster.

• Identification of the best matching cluster is performed separately for each fluorescence image(i.e., fluorescently labelled organelle) for each sample.

• Once a vertex v has been identified with an organelle, v as well as all descendant vertices of vin the dendrogram are removed and thus cannot be identified with other organelles.

5

4.2 Obtaining training data from best matching clusters

In our approach, training data for a given subcellular organelle are obtained on the grounds of thebest matching cluster. More precisely, we extract training spectra from all positions where both thethresholded fluorescence image GI(topt) = 1 and the cluster image R(vopt) = 1, corresponding to theyellow pixels in panels C and D of Fig. 3 in the main paper.

Raman shift [1/cm]500 1000 1500 2000 2500 3000 3500

Inte

nsity [a.u

.]

Fluorescent-only LD mean spectrum

Colocalized LD mean spectrum

28

50

17

50

Supplementary Figure 4: Blue spectrum: Mean spectrum of training spectra obtained from colo-calization of fluorescence and best-matching cluster. The spectra exhibit strong lipid bands at 1750and 2850 cm−1. Red spectrum: Corresponding mean spectrum from fluorescence positions not colo-calized with the best-matching cluster. Obviously, these spectra do not exhibit clear signals of lipiddroplets, as in particular the peak at the lipid band 1750 cm−1 is not present, and the peak at 2850 cm−1

only very weak. The shading of the spectra shows the standard deviation.

Note that red pixels in these overlap images correspond to pixels where fluorescence is abovethreshold, i.e., R(vopt) = 1, but which are not covered by the best matching cluster. We do not includesuch spectra in the training data set. The motivation to do so is that overlap between fluorescence andRaman image will never be perfect due to at least three factors, namely (i) differences in confocalvolume between Raman and fluorescence microscopy, (ii) slight deviations in the z-layer, and (iii)small distortions of the sample due to fluorescence staining. Consequently, some positions in borderregions of the organelles that exhibit fluorescence cover positions in the spectral image where theorganelle is actually not present. This will obviously lead to false training spectra, and reduce the

6

accuracy of the resulting classifier. Overlap with the best matching cluster yields, conversely, trainingspectra that exhibit a high degree of spectral consistency, so that these positions of misaligned overlapcan be expected not to be part of the best matching cluster.

Fluo

resc

ence

-RF

colo

caliz

atio

n

Random Forest Fluorescence

Overlap LD Overlap nucleus

B

C D

A

Supplementary Figure 5: Classification result of classifier trained on fluorescence above-thresholdpositions directly in comparison with fluorescence labeling. The cell shown is the same cell as Fig. 3in the main paper. (A): Classification result with nucleus positions indicated in blue and lipid dropletsin red. The lipid droplets close to the nucleus are obviously not well resolved, as well as larger regionsthat are obviously not part of the nucleus are recognized as nucleus. This leads to a drop in the PCCbetween the predicted lipid droplet regions and the respective fluorescence image (panel (B)) from .79to .5 (see panel (C) for an overlay). The PCC between predicted nucleus and unthresholded nucleusfluorescence drops from 0.96 to 0.79 (overlay shown in panel (D)).

Beyond the observed drop in accuracy, we assessed the influence of choosing overlap (“yellow”)positions for training data compared to choosing all fluorescence foreground (“yellow and red”) po-sitions as training data in further detail. The spectra shown in Figure 4 indicate that spectra in thefluorescent non-overlap positions show no clear signals of lipid droplets. For either supervised classi-fiers or regression based approaches, falsely assigned spectra in the training data will obviously leadto weaker classifiers with reduced accuracy during cross-validation.

It is worthwhile to notice that high accuracies of classification are obtained only if the agreementbetween the best matching cluster and the fluorescence image is high across all data sets in the train-ing data. As can be seen from the cross validation for perturbed data Figure 5 of the main paper, alow agreement between fluorescence and HCA will lead to low cross-validation accuracies.

7

References

[1] Christoph Krafft, Benjamin Dietzek, and Jurgen Popp. Raman and CARS microspectroscopy ofcells and tissues. Analyst, 134(6):1046–1057, 2009.

[2] Milos Miljkovic, Tatyana Chernenko, Melissa J Romeo, Benjamin Bird, Christian Matthaus, andMax Diem. Label-free imaging of human cells: algorithms for image reconstruction of Ramanhyperspectral datasets. Analyst, 135(8):2002–2013, 2010.

[3] Henk-Jan van Manen, Yvonne M Kraan, Dirk Roos, and Cees Otto. Single-cell Raman and fluo-rescence microscopy reveal the association of lipid bodies with phagosomes in leukocytes. Pro-ceedings of the National Academy of Sciences of the United States of America, 102(29):10159–10164, 2005.

8