Bo Dong-ISBI2015-CameraReady

5
DEEP LEARNING FOR AUTOMATIC CELL DETECTION IN WIDE-FIELD MICROSCOPY ZEBRAFISH IMAGES Bo Dong ?Ling Shao Marc Da Costa Oliver Bandmann Alejandro F Frangi ?? Centre of Computational Imaging & Simulation Technologies in Biomedicine (CISTIB) Department of Electronic & Electrical Engineering, The University of Sheffield Department of Neuroscience, The University of Sheffield Department of Computer Science and Digital Technologies, Northumbria University ABSTRACT The zebrafish has become a popular experimental model organism for biomedical research. In this paper, a unique framework is proposed for automatically detecting Tyrosine Hydroxylase-containing (TH-labeled) cells in larval zebrafish brain z-stack images recorded through the wide-field micro- scope. In this framework, a supervised max-pooling Con- volutional Neural Network (CNN) is trained to detect cell pixels in regions that are preselected by a Support Vector Machine (SVM) classifier. The results show that the pro- posed deep-learned method outperforms hand-crafted tech- niques and demonstrate its potential for automatic cell detec- tion in wide-field microscopy z-stack zebrafish images. 1. INTRODUCTION Zebrafish is an excellent system model for neurodegenerative diseases such as Parkinson’s Disease (PD). It is easy to visu- alise the dopaminergic neurons in the zebrafish brain by using a process called whole-mount in situ hybridisation (WISH) with a probe against messenger ribonucleic acid (mRNA) for tyrosine hydroxylase (TH). TH is rate-limiting enzyme that synthesises L-dihydroxyphenylalanine (L-DOPA), the precursor of dopamine. Some of the genetic zebrafish models of PD show a reduction of around 25% of their dopaminer- gic neurons (assessed by WISH for TH), as early as 3 days post fertilisation [1]. Once WISH has been performed on the larvae, they are mounted onto microscope slides and a series of images are taken of each larva in slices (called z-stack images). When these z-stack images are combined, they can be rendered to give a three-dimensional (3D) image of the dopaminergic neurons in the larva’s head. Using this 3D image, it is possible to detect or count the individual neurons. At the moment, this is a manual, time consuming, subjective, and error-prone process. The expectation is that automatic algorithms would free neuroscientists from having to do this detection or counting process manually. Detecting TH-labeled neuron in wide-field microscopy images is difficult. Different neurons have highly variable (a) 1st frame (b) 15 th frame (c) 41st frame (d) 12 th (e) 13 th (f) 14 th (g) 15 th (h) 16 th (i) 17 th (j) 18 th Fig. 1: Z-stack images. (a) - (c): The 1 st , 15 th and 41 st frames of the stack. (d) - (j): The enlargement of the yellow box region shown in (b) and its adjacent frames. In (g), the centre pixel of the 3D TH-labeled cell (red point) is labelled by a human observer for supervised learning and testing. appearances, and the TH-labeled cells are often clustered together, which makes detecting individual cells harder. In addition, different imaging conditions, such as the expo- sure time, the light source intensity and the transparency of the specimen, can cause variations of the image intensity in each of the three RGB channels resulting in cells appearing with different colours in different specimens. Furthermore, when imaging z-stacks of the zebrafish larva using wide-field microscopy, there is slightly out-of-focus light that appears in the recorded z-stack image. In this paper, deblurring or deconvolution [2] is not applied on the z-stack images on purpose to make the task more challenging and the proposed method applicable to realistic scenarios where the deconvo- lution parameters are difficult or impossible to obtain. Fig. 1 shows that the further away from the central frame, the blur- rier the cell. Deep learning is a set of methods in machine learning that attempts to represent high-level features in data with a multiple-layer architecture [3, 4, 5, 6]. CNN, as one of the most effective deep learning techniques, was introduced in [7]. Then, it was further developed by other researchers [3]. Instead of adopting hand-crafted features, such as Histogram

Transcript of Bo Dong-ISBI2015-CameraReady

DEEP LEARNING FOR AUTOMATIC CELL DETECTION IN WIDE-FIELD MICROSCOPYZEBRAFISH IMAGES

Bo Dong?† Ling Shao� Marc Da Costa‡ Oliver Bandmann‡ Alejandro F Frangi?†

? Centre of Computational Imaging & Simulation Technologies in Biomedicine (CISTIB)† Department of Electronic & Electrical Engineering, The University of Sheffield

‡ Department of Neuroscience, The University of Sheffield� Department of Computer Science and Digital Technologies, Northumbria University

ABSTRACT

The zebrafish has become a popular experimental modelorganism for biomedical research. In this paper, a uniqueframework is proposed for automatically detecting TyrosineHydroxylase-containing (TH-labeled) cells in larval zebrafishbrain z-stack images recorded through the wide-field micro-scope. In this framework, a supervised max-pooling Con-volutional Neural Network (CNN) is trained to detect cellpixels in regions that are preselected by a Support VectorMachine (SVM) classifier. The results show that the pro-posed deep-learned method outperforms hand-crafted tech-niques and demonstrate its potential for automatic cell detec-tion in wide-field microscopy z-stack zebrafish images.

1. INTRODUCTION

Zebrafish is an excellent system model for neurodegenerativediseases such as Parkinson’s Disease (PD). It is easy to visu-alise the dopaminergic neurons in the zebrafish brain by usinga process called whole-mount in situ hybridisation (WISH)with a probe against messenger ribonucleic acid (mRNA)for tyrosine hydroxylase (TH). TH is rate-limiting enzymethat synthesises L-dihydroxyphenylalanine (L-DOPA), theprecursor of dopamine. Some of the genetic zebrafish modelsof PD show a reduction of around 25% of their dopaminer-gic neurons (assessed by WISH for TH), as early as 3 dayspost fertilisation [1]. Once WISH has been performed on thelarvae, they are mounted onto microscope slides and a seriesof images are taken of each larva in slices (called z-stackimages). When these z-stack images are combined, they canbe rendered to give a three-dimensional (3D) image of thedopaminergic neurons in the larva’s head. Using this 3Dimage, it is possible to detect or count the individual neurons.At the moment, this is a manual, time consuming, subjective,and error-prone process. The expectation is that automaticalgorithms would free neuroscientists from having to do thisdetection or counting process manually.

Detecting TH-labeled neuron in wide-field microscopyimages is difficult. Different neurons have highly variable

(a) 1st frame (b) 15th frame (c) 41st frame

(d) 12th (e) 13th (f) 14th (g) 15th (h) 16th (i) 17th (j) 18th

Fig. 1: Z-stack images. (a) - (c): The 1st, 15th and 41stframes of the stack. (d) - (j): The enlargement of the yellowbox region shown in (b) and its adjacent frames. In (g), thecentre pixel of the 3D TH-labeled cell (red point) is labelledby a human observer for supervised learning and testing.

appearances, and the TH-labeled cells are often clusteredtogether, which makes detecting individual cells harder. Inaddition, different imaging conditions, such as the expo-sure time, the light source intensity and the transparency ofthe specimen, can cause variations of the image intensity ineach of the three RGB channels resulting in cells appearingwith different colours in different specimens. Furthermore,when imaging z-stacks of the zebrafish larva using wide-fieldmicroscopy, there is slightly out-of-focus light that appearsin the recorded z-stack image. In this paper, deblurring ordeconvolution [2] is not applied on the z-stack images onpurpose to make the task more challenging and the proposedmethod applicable to realistic scenarios where the deconvo-lution parameters are difficult or impossible to obtain. Fig. 1shows that the further away from the central frame, the blur-rier the cell.

Deep learning is a set of methods in machine learningthat attempts to represent high-level features in data with amultiple-layer architecture [3, 4, 5, 6]. CNN, as one of themost effective deep learning techniques, was introduced in[7]. Then, it was further developed by other researchers [3].Instead of adopting hand-crafted features, such as Histogram

Fig. 2: The work flow of the TH-labeled cell detection framework. The off-line training process and the detection process areshown in red and green dotted rectangles, respectively.

Fig. 3: Colour normalisation through IIS.

of Oriented Gradients (HOG) and Scale-Invariant FeatureTransform (SIFT), deep learning can learn high-level fea-tures from training data automatically. Furthermore, CNN isable to reduce learn-able parameters significantly by sharingthe same basis function across different image locations [3].Pooling methods are used for building translational invariantfeatures. We use max-pooling in this paper.

In this paper, we propose an automatic deep learningbased cell detection framework for 3D z-stack microscopyzebrafish images. In the framework, to improve the efficiencyand accuracy of training a CNN from the large-size images,an SVM classifier is applied to detect cell regions for collect-ing the CNN training set. Major contributions of this paperinclude that we apply the deep learning technique, a super-vised max-pooling CNN, to a new and significant area, andthe results show potential capability for automatic cell detec-tion in z-stack images. Moreover, we make accessible a widefield zebrafish microscopy database with TH-labeled neuronsand manual annotations by experts on neuron localisationthat can be used for evaluation and benchmarking purposesby other competing methods.

2. RELATED WORKS

Cell detection in microscopy images is a significant step forfurther cell based experiments. Several researchers focuson cell analysis in zebrafish images, for example, segmen-tation and characterisation of zebrafish retinal horizontalneurons [8], counting of zebrafish neuron somata [9], andhigh-throughput analysis for region detection and quantifica-tion of zebrafish embryo images [10]. There are also manycell detection algorithms published in computer vision andmedical image analysis areas [11, 12]. However, most of

the automated 3D cell detection methods are not a suitablereplacement for manual cell detection [12].

In the past few decades, two kinds of cell detection al-gorithms have emerged. The first kind is segmentation orthresholding based approaches [13] and various software im-plementations emerged including some plugins for ’ImageJ’[11] and the ’FARSIGHT’ toolkit [14]. The second class ismodel or feature based methods [15, 16]. With the develop-ment of the machine learning techniques, learning based celldetection has potential capability for automatic cell detec-tion. Some learning based cell detection methods [17, 18] areproposed for two-dimensional (2D) immunohistochemistryimages. However, there is no particular automatic TH-labeledcell detection method for light microscopy z-stack zebrafishimages. In this paper, we propose an automatic TH-labeledcell detection framework with deep learning for wide-fieldmicroscopy z-stack zebrafish images.

3. DATASET, EXPERIMENTS AND RESULTS

3.1. Zebrafish Dataset

The dataset contains 35 stacks scanned under 20X magnifi-cation through a wide-field microscope. 25 of these stacks areselected for the off-line training and 10 stacks for testing. Intotal, there are approximately 1000 TH-labeled cells.The volume (in voxels) of each stack is 1024 ∗ 1344 ∗ z, anddifferent stacks have different values of z. The spatial reso-lution is 3µm and the axial resolution is 1.5µm. The mag-nification of the objective is 20X , and the numerical aper-ture (NA) is 0.7. We make this dataset publicly available 1

for other researchers to develop and evaluate their TH-labeledcell detection or cell counting algorithms. For each stack inthis dataset, a professional observer labelled all centres of TH-labeled cells as the ground-truth using ’Point Picker’ [19], aplug-in of the open-source JAVA program ’ImageJ’ [11].

3.2. Pre-processing: Colour Normalisation

The zebrafish larvae are recorded in several sessions spanninga number of days for completing the whole dataset. The expo-

1The link for downloading the dataset will be released in the final ver-sion of our paper:http://www.cistib.com/cistib_shf/index.php/translation/downloads

sure time is not guaranteed to be the same for each session ofrecording through the light microscope, so the colour of eachstack may be different. We apply Image Intensity Standard-isation (IIS), which was first introduced in [20] for intensitynormalisation of 2D grey scale images. More recently, Bo-gunovic et al. [18] modified the IIS algorithm for normalisingthe intensity of the 3D grey scale Rotational Angiography. Inthis paper, we apply the original IIS as a colour normalisationmethod for 3D light microscopy images.We calculate three histograms of the three channels of thewhole RGB stack first. Then, the stack histogram of eachchannel is aligned to the corresponding reference based onthe non-linear registration method described in [20]. An ex-ample of this process is shown in Fig. 3.

3.3. Cell Region Detection

We determine the cell regionsR to discard the irrelevant back-ground regions. Selecting background training patches is alsoimportant for training a CNN. Therefore, we detect cell re-gions efficiently and roughly using an SVM classifier, andthen cell and background training patches are collected fromR instead of the whole stack.

When recording the zebrafish larvae, all the TH-labeledcells are located in the centre of the image. We notice thatall TH-labeled cells have distinctive colours from the back-ground, and the largest part of the image area is covered bybackground pixels. To collect supportive training patches fortraining the CNN detector, we need to collect patches nearthe TH-labeled cells. All the stacks have a similar RGB his-togram after IIS colour normalisation, so the colour histogramis the most useful and reliable feature to distinguish the celland non-cell region. The binary SVM classifier based onRGB histogram features (SVM-RGB Histogram)[21] is usedas a rough and fast cell region detector in this paper. Using theSVM-RGB Histogram detector also guarantees that all TH-labeled cells are detected in cell regions. Another reason whywe detect cell region first is that although CNN is a powerfultool, the training set need to be carefully selected.Based on the size of the TH-labeled cell, we extract patches

(a) Original Frame (b) Cell region R

Fig. 4: Cell region detection using SVM-RGB Histogram de-tector.

with size 41 ∗ 41 (123µm ∗ 123µm). To increase the numberof cell samples, for each labelled ground-truth central-pixelC(x, y, z), we draw a cube with a spatial radius rs = 3 pix-

els (9µm) and an axial radius ra = 2 pixels (3µm) alongz direction. All pixels in the cube are considered as cell-sample centres Cp. All the remaining pixels in stacks be-long to background-sample centres Cn. We extract all cellpatches from Cp in 20 training stacks, and extract the samenumber of background patches randomly from Cn. In total,approximately 0.1 million cell patches and the same numberof background patches are extracted to train the SVM-RGBHistogram detector.The SVM detector is used to detect the cell region for col-lecting CNN training patches, which can remove most of thebackground pixels. This stage is more like a feature selec-tion pre-process. In this case, the CNN could also be moreaccurate to detect cell samples in the cell region. Similarly,in the test stage, for accuracy, we also first apply the SVMdetector to detect those regions. We will compare the pro-posed method with training the CNN without this cell regiondetection stage. For training the conventional CNN, the cellsamples are the same, but the non-cell samples are randomlyselected from the background.

3.4. Training Set Pre-processing

After the cell region R is detected by the SVM-RGB His-togram detector, we extract cell and background patches inregion R from all test stacks for training CNN with the samesize of patches and neighbourhood setting described in 3.3.Pixels in the cell regionR have similar colours, so colour fea-ture in the cell region is not reliable for distinguishing cell andbackground patches. To reduce the training time, we transferall RGB patches into the YUV colour space, and only the Y-channel patches are needed. For each Y-channel cell patch,we rotate 0, 90, 180, 270 degrees to make the detector rota-tion invariant and increase the number of cell samples as well.The cell and background patches can have overlapping pixels,which is beneficial for increasing the probability of detectingtrue cells. About 0.5 million cell patches are extracted fromall training stacks, and about the same number of backgroundpatches from the cell region R.

3.5. Cell Pixel Detection

After the max-pooling CNN is trained, we test it on the teststacks. Firstly, the cell region R is detected by the SVM-RGB Histogram detector in every frame of each stack in thetest dataset. Then, the pre-trained CNN is used for detectingcells by scanning every pixel in R, and each pixel is givena probability value Pc. It requires less than 300 seconds toprocess a slice. We will get a 3D probability map Mp for astack, and a Gaussian filter is used for smoothing this proba-bility map Mp. Finally, all 3D local maxima can be found insmoothedMp using method described in [22], which requiresa self-turning threshold.

Fig. 5: The outline of the convolutional neural network architecture.

3.6. CNN Training

We train the CNN network demonstrated in Fig. 5. Train-ing this CNN network needs 3 days of computation using aMATLAB R2011b implementation with C++ library [23] ina personal computer with a i5-3470 CPU clocked at 3.4GHz,14GB memory and 64-bit operating system. Less than 15epochs are required to reach the minimum error (less than10%) with a total of 1 million training patches.

3.7. Results

We evaluate the detection results according to Human Ob-server’s labelled cells on the test set. Detected cells thatare closer than 10 pixels (30µm) in 2D slice, and 5 pixels(7.5µm) in vertical direction from a ground-truth centroidare defined as True Positives. Besides the number of TruePositives (NTP ), we also count the number of False Posi-tives (NFP ) and False Negatives (NFN ). Then, performancemeasures including recall (recall = NTP /(NTP + NFN ),precision (precision = NTP /(NTP + NFP ), and F1

(F1 = 2PR/(P + R)) score are calculated. We namedour proposed method ’Refined CNN’, and we compare itwith CNN without the cell region detection stage (CNN),and the SVM method with different features, includingRGB histogram features (RGBHist), RGB Colour & Scale-Invariant Feature Transform features (RGB&SIFT), andRGB&SIFT combined with Histogram of Oriented Gradients(RGB&SIFT+HOG). Table 1 shows the proposed methodperforms significantly better than the compared techniques.Additionally, it can be seen from the example result in Fig. 6that the detection accuracy is good if cells are not clusteredtogether and the proposed method need to be improved in thecell cluttering areas.

Table 1: Comparison results on test stacks.

Method Recall Precision F1Refined CNN (Proposed) 0.7692 0.6452 0.7018CNN 0.6071 0.3542 0.4474RGBSift+HOG 0.5600 0.2593 0.3544RGBHist 0.4839 0.1136 0.1840RGBSift 0.3704 0.0962 0.1527

4. CONCLUSION AND FUTURE WORK

In this paper, an automatic deep learning based cell detec-tion approach for light microscopy zebrafish z-stack imageshas been introduced. The results show that the proposedframework has potential for 3D cell detection and outper-forms techniques based on hand-crafted features. Futurework will aim at validating our approach on larger datasetswith two observers’ label information and extending the 2DCNN model to a 3D RGB CNN model and making it morestraight-forward for cell detection, with the ultimate goal ofgradually bringing automated 3D cell detection into practice.Furthermore, other feature learning methods [24, 25, 26] willbe investigated as well.

Fig. 6: The detection result on two stacks. In the observer’s,the numbers of TH-labeled cells are 28. In the result pro-cessed by proposed method, there are 20 red, 11 blue and 6green circles in the stack. Red, blue and green circles repre-sent true positives, false positives and false negatives. Num-bers in circles indicate the slice-location of each cell.

5. REFERENCES

[1] E. Rink and M. F. Wullimann, “Development of the cat-echolaminergic system in the early zebrafish brain: animmunohistochemical study,” Developmental brain re-search, vol. 137, no. 1, pp. 89–100, 2002.

[2] B. Dong, L. Shao, A. F. Frangi, O. Bandmann, andM. D. Costa, “Three-dimensional deconvolution of wide

field microscopy with sparse priors: Application to ze-brafish imagery,” in ICPR 2014.

[3] J. Ngiam, Z. Chen, D. Chia, P. W. Koh, Q. V. Le,and A. Y. Ng, “Tiled convolutional neural networks,” inNIPS 2010.

[4] L. Shao, D. Wu, and X. Li, “Learning deep and wide:A spectral method for learning deep networks,” IEEETNNLS, vol. 25, no. 12, pp. 2303–2308, 2014.

[5] D. Wu and L. Shao, “Multimodal dynamic networks forgesture recognition,” in ACMMM 2014.

[6] ——, “Leveraging hierarchical parametric networks forskeletal joints based action segmentation and recogni-tion,” in CVPR 2014.

[7] K. Fukushima, “Neocognitron: A self-organizing neuralnetwork model for a mechanism of pattern recognitionunaffected by shift in position,” Biological cybernetics,vol. 36, no. 4, pp. 193–202, 1980.

[8] M. Karakaya, R. A. Kerekes, S. S. Gleason, R. A. Mar-tins, and M. A. Dyer, “Automatic detection, segmenta-tion and characterization of retinal horizontal neuronsin large-scale 3D confocal imagery,” in SPIE MedicalImaging 2011.

[9] M. Oberlaender, V. J. Dercksen, R. Egger, M. Gensel,B. Sakmann, and H.-C. Hege, “Automated three-dimensional detection and counting of neuron somata,”Journal of Neuroscience Methods, vol. 180, no. 1, pp.147–160, 2009.

[10] X. Xu, X. Xu, X. Huang, W. Xia, and S. Xia, “A high-throughput analysis method to detect regions of inter-est and quantify zebrafish embryo images,” Journal ofbiomolecular screening, vol. 15, no. 9, pp. 1152–1159,2010.

[11] M. D. Abramoff, P. J. Magalhaes, and S. J. Ram, “Imageprocessing with ImageJ,” Biophotonics international,vol. 11, no. 7, pp. 36–43, 2004.

[12] C. Schmitz, B. S. Eastwood, S. J. Tappan, J. R. Glaser,D. A. Peterson, and P. R. Hof, “Current automated 3Dcell detection methods are not a suitable replacementfor manual stereologic cell counting,” Frontiers in neu-roanatomy, vol. 8, 2014.

[13] Y. Al-Kofahi, W. Lassoued, W. Lee, and B. Roysam,“Improved automatic detection and segmentation of cellnuclei in histopathology images,” Biomedical Engineer-ing, IEEE Transactions on, vol. 57, no. 4, pp. 841–852,2010.

[14] G. Lin and M. K. Chawla, “A multi-model approach tosimultaneous segmentation and classification of hetero-geneous populations of cell nuclei in 3D confocal mi-croscope images,” Cytometry Part A, vol. 71, no. 9, pp.724–736, 2007.

[15] M. K. K. Niazi, A. A. Satoskar, and M. N. Gurcan, “Anautomated method for counting cytotoxic T-cells fromCD8 stained images of renal biopsies,” in SPIE MedicalImaging, 2013, pp. 867 606–867 606.

[16] S. Wienert, D. Heim, and K. Saeger, “Detection and seg-mentation of cell nuclei in virtual microscopy images:a minimum-model approach,” Scientific reports, vol. 2,2012.

[17] T. Chen and C. Chefdhotel, “Deep learning based au-tomatic immune cell detection for immunohistochem-istry images,” in Machine Learning in Medical Imaging2014.

[18] H. Bogunovic and Pozo, “Automated segmentation ofcerebral vasculature with aneurysms in 3DRA and TOF-MRA using geodesic active regions: An evaluationstudy,” Medical Physics, vol. 38, no. 1, pp. 210–222,2011.

[19] P. Thevenaz, “Point Picker. (A plugin of the Im-ageJ),” http://bigwww.epfl.ch/thevenaz/pointpicker/,2008–2014.

[20] L. G. Nyu and J. K. Udupa, “On standardizing theMR image intensity scale,” Magnetic Resonance inMedicine, vol. 42, pp. 1072–1081, 1999.

[21] M. Kolesnik and A. Fexa, “Multi-dimensional color his-tograms for segmentation of wounds in images,” in Im-age Analysis and Recognition, 2005, pp. 1014–1022.

[22] D. C. Ciresan, A. Giusti, L. M. Gambardella, andJ. Schmidhuber, “Mitosis detection in breast cancer his-tology images with deep neural networks,” in MICCAI2013.

[23] S. Demyanov. (2014) Convnet library for matlab.[Online]. Available: http://www.demyanov.net/

[24] L. Shao, L. Liu, and X. Li, “Feature learning for imageclassification via multiobjective genetic programming,”IEEE TNNLS, vol. 25, no. 7, pp. 1359–1371, 2014.

[25] F. Zhu and L. Shao, “Weakly-supervised cross-domaindictionary learning for visual recognition,” IJCV, vol.109, no. 1-2, pp. 42–59, 2014.

[26] L. Liu and L. Shao, “Learning discriminative represen-tations from RGB-D video data,” in IJCAI 2013.