Surface-based multi-template automated hippocampal segmentation: Application to temporal lobe...

11
Surface-based multi-template automated hippocampal segmentation: Application to temporal lobe epilepsy Hosung Kim a , Tommaso Mansi b , Neda Bernasconi a,1 , Andrea Bernasconi a,,1 a Neuroimaging of Epilepsy Laboratory, McConnell Brain Imaging Center, Montreal Neurological Institute and Hospital, McGill University, Montreal, Quebec, Canada b Image Analytics and Informatics, Siemens Corporate Research, Princeton, NJ, USA article info Article history: Available online 3 May 2012 Keywords: Hippocampus Automatic segmentation Multiple templates Texture Surface parametrization abstract In drug-resistant temporal lobe epilepsy (TLE), detecting hippocampal atrophy on MRI is crucial as it allows defining the surgical target. In addition to atrophy, about 40% of patients present with malrotation, a developmental anomaly characterized by atypical morphologies of the hippocampus and collateral sul- cus. We have recently shown that both atrophy and malrotation impact negatively the performance of volume-based techniques. Here, we propose a novel hippocampal segmentation algorithm (SurfMulti) that integrates deformable parametric surfaces, vertex-wise modeling of locoregional texture and shape, and multiple templates in a unified framework. To account for inter-subject variability, including shape variants, we used a library derived from a large database of healthy (n = 80) and diseased (n = 288) hip- pocampi. To quantify malrotation, we generated 3D models from manual hippocampal labels and auto- matically extracted collateral sulci. The accuracy of SurfMulti was evaluated relative to manual labeling and segmentation obtained through a single atlas-based algorithm (FreeSurfer) and a volume-based multi-template approach (Vol-multi) using the Dice similarity index and surface-based shape mapping, for which we computed vertex-wise displacement vectors between automated and manual segmenta- tions. We then correlated segmentation accuracy with malrotation features and atrophy. SurfMulti out- performed FreeSurfer and Vol-multi, and achieved a level of accuracy in TLE patients (Dice = 86.9%) virtually identical to healthy controls (Dice = 87.5%). Vertex-wise shape mapping showed that SurfMulti had an excellent overlap with manual labels, with sub-millimeter precision. Its performance was not influenced by atrophy or malrotation (|r| < 0.20, p > 0.2), while FreeSurfer (|r| > 0.35, p < 0.0001) and Vol-multi (|r| > 0.28, p < 0.05) were hampered by both anomalies. The magnitude of atrophy detected using SurfMulti was the closest to manual volumetry (Cohen’s d: manual = 1.71, t = 7.6; SurfMulti = 1.60, t = 7.0; Vol-multi = 1.38, t = 6.1; FreeSurfer = 0.91, t = 3.9). The high performance of SurfMulti regardless of cohort, atrophy and shape variants identifies this algorithm as a robust segmentation tool for hippo- campal volumetry. Ó 2012 Elsevier B.V. All rights reserved. 1. Introduction Temporal lobe epilepsy (TLE) is the most frequent form of drug- resistant focal epilepsy. Hippocampal sclerosis, the histopatholo- gical hallmark of this condition, generally appears as atrophy on MRI (Bernasconi, 2006; Cascino, 2008; Cendes et al., 1993; Jackson et al., 1990). Volumetry is more sensitive in detecting hippocampal sclerosis than visual evaluation and allows defining the surgical target in the majority of patients (Bernasconi et al., 2003; Jackson et al., 1993; Kuzniecky et al., 1999). Removing the diseased hippo- campus is an effective treatment, offering seizure freedom in a large proportion of patients (Cascino, 2004; Schramm and Clusmann, 2008). In addition to atrophy, about 40% of TLE patients show atypical shape and positioning of the hippocampus (Bernasconi et al., 2005; Voets et al., 2011). These features, commonly referred to as malrotation, are considered markers of neurodevelopmental anomalies (Baulac et al., 1998; Voets et al., 2011) and may contrib- ute to the pathogenesis of this condition (Blumcke et al., 2002; Sloviter et al., 2004). They are mainly characterized by a rounder appearance of and vertical orientation of the hippocampus, and an abnormally deep collateral sulcus (Baulac et al., 1998; Bernasconi et al., 2005). Given the clinical utility of volumetry in defining the side of the epileptic focus, automated hippocampal segmentation may constitute a valuable tool in the presurgical evaluation in TLE. How- ever, to date performance in patients has been rather unsatisfactory 1361-8415/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.media.2012.04.008 Corresponding author. Address: Montreal Neurological Institute, 3801 Univer- sity Street Montreal, Quebec, Canada H3A 2B4. Tel.: +1 514 398 3361; fax: +1 514 398 2975. E-mail address: [email protected] (A. Bernasconi). 1 These authors contributed equally to this work. Medical Image Analysis 16 (2012) 1445–1455 Contents lists available at SciVerse ScienceDirect Medical Image Analysis journal homepage: www.elsevier.com/locate/media

Transcript of Surface-based multi-template automated hippocampal segmentation: Application to temporal lobe...

Medical Image Analysis 16 (2012) 1445–1455

Contents lists available at SciVerse ScienceDirect

Medical Image Analysis

journal homepage: www.elsevier .com/locate /media

Surface-based multi-template automated hippocampal segmentation: Applicationto temporal lobe epilepsy

Hosung Kim a, Tommaso Mansi b, Neda Bernasconi a,1, Andrea Bernasconi a,⇑,1

a Neuroimaging of Epilepsy Laboratory, McConnell Brain Imaging Center, Montreal Neurological Institute and Hospital, McGill University, Montreal, Quebec, Canadab Image Analytics and Informatics, Siemens Corporate Research, Princeton, NJ, USA

a r t i c l e i n f o

Article history:Available online 3 May 2012

Keywords:HippocampusAutomatic segmentationMultiple templatesTextureSurface parametrization

1361-8415/$ - see front matter � 2012 Elsevier B.V. Ahttp://dx.doi.org/10.1016/j.media.2012.04.008

⇑ Corresponding author. Address: Montreal Neurolosity Street Montreal, Quebec, Canada H3A 2B4. Tel.: +398 2975.

E-mail address: [email protected] (A. Bern1 These authors contributed equally to this work.

a b s t r a c t

In drug-resistant temporal lobe epilepsy (TLE), detecting hippocampal atrophy on MRI is crucial as itallows defining the surgical target. In addition to atrophy, about 40% of patients present with malrotation,a developmental anomaly characterized by atypical morphologies of the hippocampus and collateral sul-cus. We have recently shown that both atrophy and malrotation impact negatively the performance ofvolume-based techniques. Here, we propose a novel hippocampal segmentation algorithm (SurfMulti)that integrates deformable parametric surfaces, vertex-wise modeling of locoregional texture and shape,and multiple templates in a unified framework. To account for inter-subject variability, including shapevariants, we used a library derived from a large database of healthy (n = 80) and diseased (n = 288) hip-pocampi. To quantify malrotation, we generated 3D models from manual hippocampal labels and auto-matically extracted collateral sulci. The accuracy of SurfMulti was evaluated relative to manual labelingand segmentation obtained through a single atlas-based algorithm (FreeSurfer) and a volume-basedmulti-template approach (Vol-multi) using the Dice similarity index and surface-based shape mapping,for which we computed vertex-wise displacement vectors between automated and manual segmenta-tions. We then correlated segmentation accuracy with malrotation features and atrophy. SurfMulti out-performed FreeSurfer and Vol-multi, and achieved a level of accuracy in TLE patients (Dice = 86.9%)virtually identical to healthy controls (Dice = 87.5%). Vertex-wise shape mapping showed that SurfMultihad an excellent overlap with manual labels, with sub-millimeter precision. Its performance was notinfluenced by atrophy or malrotation (|r| < 0.20, p > 0.2), while FreeSurfer (|r| > 0.35, p < 0.0001) andVol-multi (|r| > 0.28, p < 0.05) were hampered by both anomalies. The magnitude of atrophy detectedusing SurfMulti was the closest to manual volumetry (Cohen’s d: manual = 1.71, t = 7.6; SurfMulti = 1.60,t = 7.0; Vol-multi = 1.38, t = 6.1; FreeSurfer = 0.91, t = 3.9). The high performance of SurfMulti regardlessof cohort, atrophy and shape variants identifies this algorithm as a robust segmentation tool for hippo-campal volumetry.

� 2012 Elsevier B.V. All rights reserved.

1. Introduction

Temporal lobe epilepsy (TLE) is the most frequent form of drug-resistant focal epilepsy. Hippocampal sclerosis, the histopatholo-gical hallmark of this condition, generally appears as atrophy onMRI (Bernasconi, 2006; Cascino, 2008; Cendes et al., 1993; Jacksonet al., 1990). Volumetry is more sensitive in detecting hippocampalsclerosis than visual evaluation and allows defining the surgicaltarget in the majority of patients (Bernasconi et al., 2003; Jacksonet al., 1993; Kuzniecky et al., 1999). Removing the diseased hippo-

ll rights reserved.

gical Institute, 3801 Univer-1 514 398 3361; fax: +1 514

asconi).

campus is an effective treatment, offering seizure freedom in a largeproportion of patients (Cascino, 2004; Schramm and Clusmann,2008). In addition to atrophy, about 40% of TLE patients showatypical shape and positioning of the hippocampus (Bernasconiet al., 2005; Voets et al., 2011). These features, commonly referredto as malrotation, are considered markers of neurodevelopmentalanomalies (Baulac et al., 1998; Voets et al., 2011) and may contrib-ute to the pathogenesis of this condition (Blumcke et al., 2002;Sloviter et al., 2004). They are mainly characterized by a rounderappearance of and vertical orientation of the hippocampus, and anabnormally deep collateral sulcus (Baulac et al., 1998; Bernasconiet al., 2005).

Given the clinical utility of volumetry in defining the side ofthe epileptic focus, automated hippocampal segmentation mayconstitute a valuable tool in the presurgical evaluation in TLE. How-ever, to date performance in patients has been rather unsatisfactory

1446 H. Kim et al. / Medical Image Analysis 16 (2012) 1445–1455

(Akhondi-Asl et al., 2011; Avants et al., 2010; Chupin et al., 2009b;Hammers et al., 2007; Heckemann et al., 2010; Pardoe et al., 2009).We recently evaluated the influence of hippocampal malrotation onthree state-of-the-art automatic segmentation algorithms (Kimet al., 2012) including a region growing approach that utilizesrule-based detection of anatomical landmarks (Chupin et al.,2009b), an algorithm based on the nonlinear warp of a target imageto a probabilistic atlas (Fischl et al., 2002), and a multi-templateapproach (Collins and Pruessner, 2010). While the overall perfor-mance of the multi-template method was superior to the others,accuracy and clinical utility of all three algorithms were affectedby malrotation. The presence of abnormal anatomical variantsaltering the morphology of the hippocampus likely modifies itsspatial relationship with surrounding structures, leading algo-rithms that rely on template or prior-knowledge based on healthysubjects to fall into local minima (Chupin et al., 2009b; Khanet al., 2008; Pardoe et al., 2009). On the other hand, by buildingprior shape models and selecting automatically those that best fitthe structure to segment, methods based on multi-template li-braries and label fusion (Aljabar et al., 2009; Collins and Pruessner,2010; Heckemann et al., 2006) have the potential to overcome thelimitations of individual or averaged template techniques. How-ever, the non-linear image registration may fail in case of atypicalmorphology (Kim et al., 2012).

Besides volume-based segmentation, the use of non-parametricdeformable models based on level-set formulations have allowedfor flexible deformations against morphological and topologicalvariations (Leventon et al., 2000; Yang and Duncan, 2004). As thisapproach does not guarantee point-wise correspondence, samplingof local texture and shape at the boundary is not straightforward.Conversely, parametric models permit vertex-wise sampling(Kelemen et al., 1999; Klemencic et al., 2004; Pitiot et al., 2004).Nevertheless, both methods have provided so far relatively poorresults in healthy controls, likely due to the initialization stepthrough a single average surface or a seed point, which may notsufficiently account for shape variations of the structure tosegment.

In this paper, we propose a novel hippocampal segmentationalgorithm that integrates deformable parametric surfaces, vertex-wise modeling of locoregional texture and shape features, andmultiple templates in a unified framework. We compared the per-formance of our method (henceforth named SurfMulti) to manualtracing and segmentation obtained through a volume-based singletemplate (Fischl et al., 2002) and multi-template approach (Collinsand Pruessner, 2010). This work is an extension of our previouslypublished methodology (Kim et al., 2011). Here, we evaluatedthe contribution of single features to the performance of SurfMulti.In addition, we evaluated the impact of hippocampal malrotationand atrophy on the global performance of the automated algo-rithms and assessed local segmentation accuracy using surface-based shape analysis. Finally, we investigated the ability of eachautomated method to lateralize the seizure focus.

2. Methods

Our approach consists of a template library construction stageand a segmentation stage, as illustrated in Fig. 1. Each stage is de-tailed in the following sections.

2.1. Template library construction

2.1.1. Surface extraction from manual labelsManual hippocampal labels were converted into surface meshes

and parameterized using an area-preserving, distortion-minimiz-ing mapping technique based on spherical harmonics (SPHARM).

The uniform icosahedron-subdivision of the SPHARM allowsobtaining a point distribution model (PDM) that guaranteesshape-inherent vertex-wise correspondence across subjects (Sty-ner et al., 2006).

2.1.2. Regional texture modelsTo compute regional textures, each surface was mapped to its

corresponding MRI. At each vertex vi, we defined a spherical neigh-borhood with various radii (3 mm, 5 mm, 7 mm). The ‘‘inner re-gion’’ (IR) and ‘‘outer region’’ (OR) of these local neighborhoodswere determined with respect to the surface boundary. The follow-ing texture features were then computed:

(i) Normalized intensity (NI) to capture regional tissue homoge-neity. Let lIR,i and lOR,i be the mean of intensities within IRand OR at vi respectively, and SDIR,i and SDOR,j be the standarddeviation. We defined NIIR,i = lIR,i/SDIR,i and NIOR,i = lOR,j/SDOR,i. Noteworthy, as NI is measured in a small region andnormalized by the variance of the image, it is robust againstboth large-scale intensity inhomogeneities and inter-scanneror inter-subject MRI intensity variations.

(ii) Relative intensity (RI) to assess the contrast between IR andOR voxels. RI was defined as RIi = 2x(lOR,i � lIRi)/(lOR,i + lIR,i).

(iii) Gabor energy (GE) to capture image texture through a multi-channel filtering strategy (Grigorescu et al., 2002). Mimickinghuman visual perception, this feature portrays the complex-ity, directionality and repetition of the intensity patterns.The original 2D function is defined as:

gh;k;r;c;/ðx; yÞ ¼ exp � x02 þ c2y02

2r2

� �exp �2p x0

kþ /

� �x0 ¼ x cos hþ y sin h

y0 ¼ �x sin hþ y cos h

ð1Þ

where x and y are the spatial coordinates and angle h defines theorientation of the normal to the parallel stripes of a Gabor function.c is the aspect constant that defines the elongation of the filter, u isthe phase offset that makes the filter asymmetric when it is non-zero and the ratio r/k describes the bandwidth, namely the filtersize b according to:

b ¼ log2pr=kþ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi0:5 ln 2p

pr=k�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi0:5 ln 2p ; r=k ¼ 1

p

ffiffiffiffiffiffiffiffiln 2

2

r2b þ 1

2b � 1ð2Þ

To expand this equation to 3D, let x, y, z be the spatial coordi-nates and Rh a 3 � 3 rotation matrix whose 3D Euler angle h definesthe orientation of the normal to the parallel stripes of a Gabor func-tion. The Gabor filter is thus defined as:

gh;k;r;c;uðx; y; zÞ ¼ exp � x02 þ c2y02 þ c2z02

2r2

� �exp �2p

x0

kþu

� �; ðx0; y0; z0Þ

¼ Rhðx; y; zÞð3Þ

At a voxel I(x, y, z), the Gabor energy is given by GEh,k,r,c,u(x, y,z) = ||gh,k,r,c,u(x, y, z) � I(x, y, z)||, where � is the convolution opera-tor. Here, we calculated the Gabor energy of the immediate sur-roundings only along the surface normal h. We fixed u as 0 (nooffset) and c as 1 (same amount of information horizontally andvertically). Multiscale texture analysis was performed by varyingthe bandwidth b = [0.5, 1, 2].

iv. Intensity gradient (IG) to capture edge information. Gradientsalong x, y, z directions were computed and linearly interpo-lated on the vertices.

Fig. 1. SurfMulti automatic hippocampal segmentation steps. (A) Training set consists of MR images and manual labels of controls and patients. (B) Labels are converted intosurface meshes using spherical harmonics and point distribution model (SPHARM-PDM) that ensure shape-inherent point-wise correspondences across subjects. (C) Eachsurface is mapped onto its corresponding MRI to compute vertex-wise locoregional texture and shape. (D) For a given MRI, an optimal subset of training surfaces and featuresis selected based on a similarity function. The optimal subset is then averaged using adaptive weighting. (E) The final segmentation is obtained by evolving the averagedsurface of the selected subset. See text for details.

H. Kim et al. / Medical Image Analysis 16 (2012) 1445–1455 1447

1448 H. Kim et al. / Medical Image Analysis 16 (2012) 1445–1455

2.1.3. Regional shape modelsThe following features were designed to constrain the deform-

able model evolution; they were used to impose penalties on theevolution according to the shape difference between the currentdeformation and the training data (i.e., the selected templatesubset).

i. Distance between adjacent vertices to prevent irregular vertextopology.

ii. Gaussian curvature to constrain local convexity/concavity.

iii. Local orientation. Manual tracing in regions where anatomi-cal boundaries are not visible often rely on arbitrary obliqueplanes as geometric landmarks (e.g., the inferomedial borderseparating CA1 from the subiculum) (Konrad et al., 2009). Tomodel this feature, we projected the surface normals to thexy, yz and zx planes and computed angles with respect toorthogonal axis x, y and z-axis, respectively.

Subject-wise normalization of texture and shape features wasachieved at each vertex fv through a z-transformation relative tothe distribution of all vertices as follows: Fv ;zscore ¼ ðfv � lf Þ=rf

with lf ¼ 1Nv

PNvv¼1fv ;rf ¼

ffiffiffiffiffi1

Nv

q PNvv¼1ðfv � lf Þ

2, and Nv is the numberof vertices v.

Normalized vertex-wise features were combined into a vectorFvi = [Ftexture,vi Fshape,vi].

2.2. Automatic segmentation of the hippocampus

2.2.1. Automatic selection of the optimal shape and feature templateAll individual templates and test images are linearly registered

to the MNI ICBM-152 nonlinear template. Let Sj = [v1, v2 . . .vi . . . ,vNv] be a SPHARM-PDM surface of the template j in the library,which is initially placed on its own MR image and transformed tothe MNI ICBM space using the transformation matrix obtainedabove. Let Fvi;j

be the true features, i.e., the set of features computedat vi from the template MRI. We then compute a set of estimatedfeatures bFvi;j

from the test MRI and the following similaritymeasure:

Oj ¼ �X

i

kFvi ;j � bFvi ;jkffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

Nt

PNtk¼1ðFvi ;k � Fvi

Þ2q ; Fvi

¼ 1Nt

XNt

k¼1

Fvi ;k ð4Þ

where Nt is the number of templates in the library. Eq. (4) repre-sents a normalized similarity between the true-features of the jthtemplate and the jth estimated-features extracted from the test im-age. Based on (4), we were thus able to identify the templates thatwere most similar to the test image. We investigated two templateselection approaches.

2.2.1.1. Equal weight. Given a test image, we selected the n mostsimilar templates (surfaces and their corresponding features) (Alj-abar et al., 2009; Collins and Pruessner, 2010) and averaged the nselected surfaces (Sopt 1, Sopt 2, . . . ,Sopt n) to generate an initial shape.Best results were obtained experimentally with n = 10 templates.At each vertex of these 10 surfaces, we obtained mean ðFvi

Þ andSD (rFvi

) of their true-features (5) to compute the objective func-tion (9) in later stages.

Fvi¼ 1

n

Xn

j¼1

Fvioptj; rFvi¼ 1

n

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXn

j¼1ðFvi ;opt;j � Fvi

Þ2r

ð5Þ

2.2.1.2. Adaptive weight. Recent studies (Artaechevarria et al., 2009;Coupé et al., 2011) have proposed a weighted averaging of selectedlabels according to their global similarity measure. Even though

our library includes a large number of pathological hippocampi,it may not cover the entire spectrum of morphological variability.Noteworthy, however, selected shape and texture features in thesubset may be more comparable to the target, and thus their sep-arate weighting likely offers a better strategy to create the optimalsubset. Thus, we investigated a weighted averaging strategy asfollows:

Let wS and wF be n� 1 weight vectors for optimal surfaces andfeatures, respectively. We defined the new average surface as:

SXn

j¼1

wS;jSoptj;X

wS;j ¼ 1 ð6Þ

Similarly, we defined the weighted mean and SD of features atvertex vi by:

Fvi¼Xn

j¼1

wF;jFviopt;j;X

wF;j; rFvi

¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXn

j¼1wF;j Fvi ;optj � Fvi

� �2r

ð7Þ

To take into account the weights, we re-defined the similarityfunction based on the n-top ranked subset:

Osubset ¼ �X

i

kFvi� bFvi ;S

krF

ð8Þ

bFvi ;Sis the estimated feature-set computed on the surface S mapped

on the test image. Finally, both weights were determined by maxi-mizing the similarity between the current template-subset and thetest image.

w ¼ ½wSwF � ¼ argmaxw

Osubset ð9Þ

To optimize the vector w, we first set wS = [wS,1 wS,2 . . .

wS,j, . . . ,wS,n] and wF = [wF,1 wF,2 . . .wF,j, . . . ,wF,n] as 1/n. We then iter-atively perturbed each wS,j and wF,j separately by ±d.1/n and up-dated it if the similarity computed by (8) increased. The step-sizeparameter d was initialized to 1 and decreased by 0.1 at each iter-ation. As this perturbation process does not guarantee the sum tobe 1, we normalized the weights to 1 using Eqs. (6) and (7) aftereach perturbation. This process stopped when either Osubset didnot increase after the current iteration, or after a maximum of 10.

2.2.2. Automatic segmentation: evolution and objective functionWe first linearly mapped the average template computed in the

previous section to the test image. Then, we locally deformed thesurface at each vertex along the surface normal based on a mul-ti-level b-spline interpolation (Lee et al., 1997). We centered ab-spline function at each vertex on the evolving surface. Using acoarse-to-fine hierarchy of control lattices, a 2D smooth scalar fieldwas generated to adaptively interpolate the given deformationmagnitude. We then mapped these scalar values to their corre-sponding vertices for the later transformation. Previous multi-tem-plate approaches have shown that the average shape of the optimalsubset produces good agreement with manual labeling (Aljabaret al., 2009; Collins and Pruessner, 2010). Its use as an initial seg-mentation allows the subsequent local deformation to be robustagainst local minima and thus to reach convergence with fewiterations.

In our algorithm, we empirically optimized the initial magni-tude of deformation at 5 mm and decreased it iteratively by1 mm. Inward and outward deformations along the surface normalwere tested at each vertex. The final deformation was achieved bymaximizing a cost function.

Analogous to Eq. (8), the cost function was defined at iteration kas:

H. Kim et al. / Medical Image Analysis 16 (2012) 1445–1455 1449

Ok ¼ �X

i

kFvi� bFvi ;Sk

k,

rF

¼ �X

i

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXtexture

ðFvi� bFvi ;Sk

Þ2 þX

shapeðFvi� bFvi ;Sk

Þ2r

=rF ð10Þ

Sk is the deformed surface at iteration k and bFvi ;Skis its estimated

feature vector. In feature vector F, the vector norm || || is the sumof texture-based and a shape-based terms. Thus, optimizing thatcost function will reconciliate texture and shape information. Wemaximized Ok using the gradient descent approach. As our optimi-zation function is not explicitly differentiable, the gradient is notcomputed analytically. Instead, we used a subgradient method(Kiwiel, 2001) that allows to keep track of the lowest objective func-tion throughout iterations.

3. Experiments and results

3.1. Experiments

3.1.1. SubjectsOur training-set included 40 healthy controls (18 males; mean

age 33 ± 12 yrs) and 144 consecutive TLE patients (61 males; meanage 36 ± 11 yrs), referred to our hospital for the investigation ofdrug-resistant epilepsy. The lateralization of the seizure focuswas based on a standard clinical evaluation including detailedhistory of seizure semiology, recording of seizures by means of vi-deo-EEG monitoring and radiological assessment of hippocampalsclerosis through visual estimation of atrophy and increased T2signal. Based on the convergence of these exams, patients wereclassified into left TLE (LTLE; n = 73) and right TLE (RTLE; n = 71).None of the patients had a mass lesion (tumor, vascular malforma-tion or malformation of cortical development), or traumatic braininjury.

The Ethics Committee of the Montreal Neurological Instituteand Hospital approved the study, and written informed consentwas obtained from all participants.

3.1.2. MRI acquisition and processingMR images were acquired on a 1.5 T Gyroscan (Philips Medical

Systems, Eindhoven, The Netherlands) using a 3D T1-fast field echosequence (TR = 18 ms; TE = 10 ms; NEX = 1; flip angle = 30�; matrixsize = 256 � 256; FOV = 256 mm; slice thickness = 1 mm), provid-ing an isotropic voxel volume of 1 mm3. Prior to processing, imagesunderwent automated correction for intensity non-uniformity(Sled et al., 1998). Then, intensities were normalized by linearly fit-ting the histogram to the MNI ICBM-152 (Fonov et al., 2011). MRimages were registered into the MNI ICBM-152 nonlinear template(Fonov et al., 2011) using nine parameter linear transformation(Collins et al., 1994). The hippocampus was segmented manuallyaccording to our previously published protocol (Bernasconi et al.,2003).

3.1.3. Evaluation of template selection approachesWe evaluated the template selection using a leave-one out

strategy. For each test data, we selected optimal subsets and builtthe initial shape and feature models using both equal and adaptiveweighting. A subset of 10 templates produced the best segmenta-tion accuracy. Adaptive weighting outperformed the equal weight-ing strategy (mean Dice for all groups: 86.9 ± 2.8 vs. 85.7 ± 3.3;p < 0.02). We therefore adopted the former as reference.

3.1.4. Feature contributionTo investigate the contribution of individual features, we seg-

mented the hippocampi of healthy controls (n = 80) iteratively

using each texture feature separately. We calculated the Dice indexat each iteration and compared differences using paired t-test.

3.1.5. Performance evaluation3.1.5.1. Comparison with volume-based single- and multi-templateapproaches. FreeSurfer. In this single-template approach, the hip-pocampus is segmented using a nonlinear template matching(Fischl et al., 2002). After linearly registering the test image tothe template, the algorithm estimates the nonlinear transforma-tion between a given MRI and a probabilistic atlas of the hippo-campus constructed from a cohort of 14 young and middle-agedsubjects using a maximum likelihood criterion. Probabilistic atlaslabels are warped back to the individual MRI using the inverse ofthis transform. The final segmentation is accomplished by maxi-mizing the a posteriori probability in the Bayes formula at eachvoxel. Voxel-wise probabilistic labels and their predicted imageintensities serve as the prior term, while the intensity similaritybetween the target image and the template serves as the likelihoodterm.

Multi-template segmentation based on ANIMAL registration (Col-lins and Pruessner, 2010), henceforth denoted Vol-multi. In brief,MR images of controls and patients were linearly registered tothe MNI ICBM-152 nonlinear template (Fonov et al., 2011) usingthe transformation parameters generated for manual labeling.We created a template library including these images and corre-sponding manual labels. The performance was evaluated using aleave-one-out strategy in which we iteratively excluded the imageto segment from the template library. For a given test image, wefirst nonlinearly warped each individual template to it using theANIMAL registration (Collins et al., 1995). Then, we computed asimilarity index between the warped template and the test imagedefined by the normalized mutual information within a volume ofinterest centered on the template hippocampus. We empiricallyselected the n most similar templates (n = 11) and obtained the fi-nal segmentation using label fusion on a voting strategy (Aljabaret al., 2009; Heckemann et al., 2006; Rohlfing et al., 2004) in whichsegmentations were averaged and thresholded at 0.5.

3.1.5.2. Volume-based agreements. We performed correlation analy-ses between manual and automated segmentations, and con-structed Bland–Altman plots (Bland and Altman, 1986), on whichfor each subject the difference between the two measurements isdrawn against their mean.

To quantify the accuracy of automated segmentations, we com-puted the Dice similarity index: D = 2 � v(M \ A)/(v(M) + v(A),where M/A are the voxels comprising manual/automated labels;‘‘M \ A’’ are voxels in the intersection of M and A; v(�) is the volumeoperator. To better understand the accuracy with respect to falsepositive and false negative, we also calculated precision (Pr), thefraction of correctly segmented hippocampal voxels among all vox-els comprised in the label, and recall (Rc), the fraction of correctlysegmented hippocampal voxels among all the voxels that actuallybelong to the hippocampus (Morra et al., 2008).

Pr ¼ vðM \ AÞ=vðAÞ ¼ TP=ðTPþ FPÞ

Re ¼ vðM \ AÞ=vðMÞ ¼ TP=ðTPþ FNÞ

where TP/FP are true/false positive and TN/FN true/false negative,respectively.

For each automated algorithm, we compared indices betweenpatient groups (i.e. LTLE and RTLE) and controls using Student’st-tests.

In a separate analysis, we assessed the ability of each automaticalgorithm to detect atrophy in TLE groups relative to controls bycomputing Cohen’s d (mean volume controls �mean volumeTLE)/pooled SD) that measures the effect size of a between-group

Table 1Comparison between automated segmentation methods and manual labeling usingquantitative indices (mean ± SD).

Group SurfMulti Vol-multi FreeSurfer

Dice index (%)Controls 87.5 ± 2.6 84.5 ± 4.5a 72.1 ± 2.5a

Ipsilateral 86.1 ± 2.9 81.2 ± 5.7a 65.8 ± 4.5a

Contralateral 87.7 ± 2.7 82.7 ± 5.5a 70.0 ± 3.7a

Precision (%)Controls 86.8 ± 2.2 87.5 ± 4.2 59.1 ± 3.8a

Ipsilateral 84.8 ± 2.5 83.2 ± 5.3 51.8 ± 6.4a

Contralateral 87.2 ± 2.2 86.7 ± 5.0 54.6 ± 5.1a

Recall (%)Controls 88.2 ± 3.0 81.8 ± 6.2a 92.9 ± 2.7Ipsilateral 87.0 ± 3.4 79.9 ± 8.7a 93.0 ± 2.5Contralateral 88.1 ± 3.1 81.2 ± 7.9a 93.3 ± 2.4

Hausdorff distance (mm)Controls 0.63 ± 0.11 0.82 ± 0.18a 1.87 ± 0.24a

Ipsilateral 0.68 ± 0.16 1.1 ± 0.22a 2.38 ± 0.34a

Contralateral 0.63 ± 0.13 0.97 ± 0.20a 2.10 ± 0.29a

Ipsilateral/contralateral refers to the epileptogenic hemisphere. For each algorithmdifferences in accuracy between patients and controls are in bold.

a Lower accuracy of the selected volume-based method compared to SurfMulti.Significances adjusted using Bonferroni correction.

1450 H. Kim et al. / Medical Image Analysis 16 (2012) 1445–1455

difference, and calculated the significance of the observed effectusing t-tests.

Significances of all statistical tests were adjusted for multiplecomparisons using Bonferroni-correction.

3.1.5.3. Surface-based analysis of contour accuracy. To assess globalcontour difference between manual and automated segmentations,we computed the mean Hausdorff distance (Chupin et al., 2007;Yang and Duncan, 2004) as:

HmðSM ; SAÞ ¼maxðhðSM; SAÞ;hðSA; SMÞÞ; with hðSx; SyÞ

¼ 1NSX

Xvx2Sx

dðvx; SyÞ

where Sx and vx represents the surface boundary and its voxels for agiven label x, and d is the Euclidean distance.

To localize potential systematic shape biases, manual and auto-mated labels were converted to SPHARM-PDM surfaces (Styneret al., 2006). For each algorithm, we pooled controls and patients,computed vertex-wise surface-normal component of the displace-ment vector between the automated and manual label, and per-formed t-tests on the differences (Morey et al., 2009). In addition,we mapped the SD of the displacement vector at each vertex. Differ-ences in SD between SurfMulti and Vol-multi were assessed usingF-tests. Results were thresholded for statistical significance usingthe False Discovery Rate (FDR) correction at q < 0.05 (Benjaminiand Hochberg, 1995).

3.1.5.4. Performance evaluation with respect to hippocampal atrophyand malrotation. The most representative indicators of hippocam-pal malrotation are medial positioning, vertical orientation, andincreased depth of the collateral sulcus (Baulac et al., 1998; Bernas-coni et al., 2005). To quantify these characteristics, 3D models ofthe left and right hippocampi in each subject were created frommanual labels (Kim et al., 2008, 2012; Voets et al., 2011) to deter-mine: (1) sagittal translation (in mm), measuring the distance be-tween the geometric centre of the hippocampus and the mid-sagittal plane. (2) Axial rotation, reflecting a medial-lateral deflec-tion of the hippocampus relative to its geometric centre. (3) Longi-tudinal rotation, indicating a relative vertical deviation of theentire hippocampus from its normally horizontal orientation. Axial

Fig. 2. Shape analysis of automated segmentation algorithms. Vertex-wise maps

rotation and longitudinal rotation were measured in angles ofdeviation from the orientation of a reference average hippocam-pus. We chose the MNI ICBM-152 nonlinear template (Fonovet al., 2011) on which we manually segmented the hippocampusas the reference for computing malrotation features. (4) Collateralsulcus depth was determined by calculating the shortest distancebetween the outer cortical surface and the voxel at the sulcal fun-dus. The collateral sulcus was automatically extracted using Brain-VISA (Riviere et al., 2002). We determined the prevalence ofhippocampal malrotation based on a 2 SD cutoff from the distribu-tion of healthy controls.

We used linear models to investigate the effect of hippocampalatrophy and malrotation on variations of the Dice index and the sur-face normal components of the displacement vector, respectively.

The lateralization of the seizure focus is based on the conver-gence of clinical, electrographic and radiological data. To evaluate

of mean displacement and standard deviation averaged across all subjects.

Fig. 3. Impact of hippocampal malrotation and atrophy on automated algorithms. Manual labels (purple), SurfMulti (blue), Vol-multi (green) and FreeSurfer (yellow) overlaidon MR sagittal and coronal views in a healthy control with normal hippocampal positioning (A), a patient with atypical positioning characterized by vertical orientation of thehippocampus and a deep collateral sulcus (B), and a patient with hippocampal atrophy (C). The parameterized surfaces of automatic labels (red) are overlaid on the manualtracing (wireframe). The horizontal dotted lines in the hippocampal body correspond to the level of the coronal sections shown in (A–C). The Dice index (in %) is indicatedbelow each method. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

H. Kim et al. / Medical Image Analysis 16 (2012) 1445–1455 1451

the yield of MRI volumetry alone to lateralize the seizure focus weperformed a linear discriminant analysis. For each set ofhippocampal labels (i.e., manual and automated), we calculatedan asymmetry ratio as 2(L � R)/(L + R), where L/R stands for volumeof the left/right hippocampus. This ratio was standardized using az-transform relative to the distribution of controls. To determinethe side of seizure focus in each patient, we input the standardizedratio into the classifier. To maximize specificity, we defined

Table 2Correlations between automatic segmentation accuracy and hippocampal volume/malrotation features.

Volume Sagittaltranslation

Axialrotation

Longitudinalrotation

Collateralsulcusdepth

SurfMultiControls �0.03 0.02 0.02 0.13 �0.20Ipsilateral 0.16 0.03 �0.10 �0.19 �0.02Contralateral 0.03 �0.07 0.15 �0.03 �0.11

VolMultiControls �0.17 0.27 0.11 �0.30 0.05Ipsilateral 0.28 �0.10 �0.05 �0.40 �0.04Contralateral 0.12 �0.26 �0.26 �0.28 0.03

FreeSurferControls 0.55 �0.25 �0.04 0.06 �0.05Ipsilateral 0.78 0.35 �0.07 �0.55 �0.42Contralateral 0.77 �0.13 �0.03 �0.59 �0.13

Significances (in bold) are adjusted for multiple comparisons using Bonferronicorrection. Ipsilateral/contralateral refers to the epileptogenic hemisphere.

decision margins so that no classification fell within the asymme-try range of healthy controls. Cross-validation was performedusing a leave-one-out approach. This procedure, by which an indi-vidual patient is classified on the basis of the remaining patients,allows an unbiased assessment of lateralization performance forpreviously unseen TLE cases.

3.2. Results

3.2.1. Evaluation of feature contributionThe segmentation accuracy of SurfMulti when using Gabor

energy (83.8 ± 2.6), relative intensity (81.7 ± 3.3), normalizedintensity (Dice = 81.3 ± 3.4) and intensity gradient (77.1 ± 6.2) sep-arately was lower compared to that obtained when using all fea-tures combined (87.5 ± 2.6, p < 0.0001; Supplementary Fig. 1).

Table 3Group differences.

Manual SurfMulti Vol-multi FreeSurfer

Ipsilateral �2.41 ± 1.85(1.71)�

�1.78 ± 1.45(1.60)�

�1.49 ± 1.40(1.38)�

�1.35 ± 2.06(0.91)

Contralateral �0.65 ± 1.40(0.56)�

�0.57 ± 1.41(0.47)

�0.22 ± 1.29(0.22)

�0.05 ± 1.91(0.03)

Hippocampal volume in z-scores is presented in mean ± SD; values in parenthesesindicate Cohen’s d index, i.e. the strength of the effect size of hippocampal atrophy(0.2 indicates a small, 0.5 a medium, and >0.8 a large effect). Ipsilateral/contralat-eral refers to the epileptogenic hemisphere; group-wise significances in volumes(bold) and effect size (�) are adjusted for multiple comparisons using Bonferronicorrection.

1452 H. Kim et al. / Medical Image Analysis 16 (2012) 1445–1455

Moreover, intensity gradient yielded also the poorest performancecompared normalized intensity and relative intensity (p < 0.001).

3.2.2. Volume-based assessment of segmentation accuracyThe Bland–Altman plots showed that differences in volume be-

tween manual and automated procedures did not differ (t < 1.5,p > 0.2; Supplementary Fig. 2).

Results of comparison between automated segmentation meth-ods and manual labeling using various quantitative indices arereported in Table 1.

Fig. 4. Seizure focus lateralization. For each segmentation method, the hippocampal asymhealthy controls and then fed into a linear discriminant function classifier. Crosses den(LTLE/RTLE). Circles below the crosses identify patients with hippocampal malrotation.lines) so that no classification fell within the asymmetry range of controls.

The performance of SurfMulti (mean Dice 87.3 ± 2.7) was supe-rior to the two volume-based approaches in all groups (vs. FreeSur-fer: 68.8 ± 3.8, p < 10e�15; vs. Vol-multi: 82.5 ± 5.4, p < 0.0005,Table 1). Moreover, SurfMulti performed equally well in TLEpatients and controls (p > 0.1), whereas volume-based methodsshowed lower accuracy in patients (FreeSurfer: p < 10e-6; VolMul-ti: p = 0.02). The lower performance of Vol-multi was mainly drivenby larger precision than recall (i.e., higher proportion of FN than FP,85.2 ± 4.8 vs. 80.7 ± 7.7, p < 10e�8). On the other hand, for FreeSur-fer recall was higher than precision (i.e., higher proportion of FP

metry ratio was standardized using z-transformation relative to the distribution ofote individuals in controls (NC) and patients with left/right temporal lobe epilepsyTo maximize the specificity of the classifier, we defined decision margins (broken

H. Kim et al. / Medical Image Analysis 16 (2012) 1445–1455 1453

than FN, Pr = 53.9 ± 4.9 vs. Rc = 93.1 ± 2.5, p < 10e�17). SurfMultishowed larger recall only in hippocampi ipsilateral to the focus(Pr = 84.8 ± 2.6 vs. Rc = 87.0 ± 3.4, p < 0.001).

3.2.3. Shape analysis of segmentationsAnalysis of contour revealed that in all groups the boundary

description was more accurate using SurfMulti (mean Hm =0.65 ± 0.14 mm) than the volume-based methods (vs.Vol-multi: 0.98 ± 0.20, p < 0.0001; vs. FreeSurfer: 2.19 ± 0.31,p < 0.0001).

Fig. 2 shows shape differences between manual tracing andautomated methods.

Comparing automated methods to manual labels showed thatSurfMulti’s segmentation was most similar to manual labeling, withonly sub-millimeter differences located mainly at the lateralborder of the hippocampus (mean error: 0.6 ± 0.4 mm, FDR < 0.05).Vol-multi overestimated the lateral boundary (mean error:0.9 ± 1.1 mm, FDR < 0.05) and underestimated large portions of thesuperior and infero-medial margin (mean error: �1.3 ± 0.9 mm,FDR < 0.01). Moreover, the SD of the displacement vectors was over-all higher for Vol-multi than SurfMulti (0.9 mm vs. 0.4 mm,FDR < 0.0001). In contrast to the other two methods, FreeSurferglobally overestimated the volume (mean error: 1.7 mm ± 2.1,FDR < 0.05) and showed increased SD of the displacement vectors(vs. SurfMulti: p < 10e�6; vs. VolMulti: p = 0.002).

Fig. 3 illustrates the performance of the three segmentationalgorithms in a healthy control and two patients, one with hippo-campal malrotation and another with hippocampal atrophy. In allsubjects, SurfMulti provided the most accurate segmentation rang-ing from 85.7 to 91.1.

3.2.4. Impact of malrotation and atrophy upon segmentation accuracyVariations in hippocampal volume or malrotation did not im-

pact the performance of SurfMulti (all groups, |r| < 0.20, p > 0.2).On the other hand, accuracies of volume-based methods werereduced in the presence atrophy and malrotation. For Vol-multi,Dice index in patients was influenced by atrophy (|r| > 0.28,p = 0.05) and longitudinal rotation (|r| > 0.28; p < 0.05). For Free-Surfer, Dice index in controls was positively correlated with vol-ume (r = 0.55, p < 0.001); in patients, it was impacted bilaterallyby both atrophy (r > 0.77, p < 0.0001) and longitudinal rotation(|r| > 0.55; p < 0.001), and ipsilaterally by collateral sulcus depth(|r| = 0.42, p = 0.002). Results are shown in Table 2.

3.2.5. Ability of automated methods to lateralize the seizure focus inpatients3.2.5.1. Group analysis. Group-wise comparisons identified hippo-campal atrophy ipsilateral to the seizure focus in TLE patients irre-spective of the method, i.e., manual or automated (p < 0.05, Table 3).The effect size of atrophy detected using SurfMulti, nevertheless,was the closest to manual volumetry (Cohen’s d: manual = 1.71,t = 7.6; SurfMulti = 1.60, t = 7.0; Vol-multi = 1.38, t = 6.1; FreeSur-fer = 0.91, t = 3.9).

3.2.5.2. Individual analysis. Results are shown in Fig. 4. Using deci-sion margins that enforced non-classification of patients in whomasymmetry values fell within the range of healthy controls, manualvolumetry correctly lateralized the side of the focus in 65% (93/144) of patients. SurfMulti lateralized 63% (91/144; Fisher’s exacttest, p = 0.9), Vol-multi 60% (87/144, p = 0.5) and FreeSurfer 57%(82/144, p = 0.1) of patients. Vol-multi and FreeSurfer misclassifiedone LTLE patient as RTLE. This patient had multiple malrotationfeatures (longitudinal rotation: z = 2.8, axial rotation: z = 2.3).

Among the 52 patients with malrotation, 36 (69%) were lateral-ized with manual volumetry, 34 (65%, p = 0.8) with SurfMulti, 32(62%, p = 0.5) with Vol-multi and 27 (52%, p = 0.06) with FreeSurfer.

4. Discussion

Our novel automated hippocampal segmentation algorithm Surf-Multi integrates deformable parametric surfaces and multiple tem-plates in a unified framework. It achieved a level of accuracy in TLEpatients virtually identical to healthy controls, with Dice indices of86.9% and 87.5%, respectively. To the best of our knowledge, suchperformance has not yet been paralleled in epilepsy. Vertex-wiseshape mapping showed that SurfMulti with adaptive weight hadan excellent overlap with manual labels, with sub-millimeter preci-sion. Although algorithms, imaging type and performance metricsvary across studies, to date agreements between manual labelingand automated segmentation have been rather low in TLE, with kap-pa indices ranging from 0.63 to 0.77 (Akhondi-Asl et al., 2011;Avants et al., 2010; Chupin et al., 2009b; Hammers et al., 2007;Heckemann et al., 2010; Pardoe et al., 2009). While volume-basedsingle-template and multi-template approaches showed poorerperformance in the presence of hippocampal malrotation and atro-phy (Kim et al., 2012), SurfMulti was not influenced by either ofthese anomalies. Furthermore, its sensitivity to detect atrophy ipsi-lateral to the seizure focus was comparable to manual volumetry.

Multi-template approaches offer a suitable framework toaccount for structural variability by selecting from a database asubset of labels that best describe anatomical characteristics ofthe target structure (Aljabar et al., 2009; Collins and Pruessner,2010; Hammers et al., 2007). Despite the inclusion of both controlsand patients in the template library, however, the accuracy ofVol-multi (Collins and Pruessner, 2010) was lower in patients com-pared to controls, likely due to the non-linear intensity-based im-age registration that may perform sub-optimally in case of atypicalhippocampal morphology. Indeed, multi-template algorithms useeither sum of squared differences (Lotjonen et al., 2010; Sabuncuet al., 2010) or mutual information (Aljabar et al., 2009; Collinsand Pruessner, 2010) as a measure of similarity between the targetand the template. Differences in the distribution of intensitiesbetween the two images may thus impact negatively segmentationresults. We believe that SurfMulti’s equally good results in healthycontrols and patients stems from the integration of surface-basedshape-inherent point-wise correspondences guaranteed bySPHARM-PDM and our vertex-wise sampling scheme with respectto the surface boundary. By computing higher order semantic fea-tures weighted with respect to the mean and SD of the optimalsubset, our algorithm models intrinsic image characteristics. Gra-dient-based sampling has been commonly employed in previoussurface-based segmentation methods (Koh et al., 2010; Taheriet al., 2010; Tsai et al., 2003). This procedure may not be idealthe template selection as it provides low similarity when the initialposition of a given template is not sufficiently close to that of thetarget hippocampus. Instead, we sampled separately the innerand outer regions of structures that are within several millimetersof the hippocampal boundary, allowing for a characterization oflocoregional texture, and thus enhancing the description of ana-tomical variability. Our analysis of feature contribution showedthat the segmentation result using each of the proposed texturesindeed outperformed the result using the intensity gradient.

Compared to volume-based approaches (Aljabar et al., 2009;Collins and Pruessner, 2010; Hammers et al., 2007) in which the finalsegmentation is obtained by averaging selected atlases, we furtherdeform the average template by iterative evolution while reconcil-ing texture and shape information learned via statistical models.During this refinement process, instead of point-by-point deforma-

1454 H. Kim et al. / Medical Image Analysis 16 (2012) 1445–1455

tion (Chupin et al., 2009a; Kim et al., 2005), we performed multi-scale elastic deformations to allow a hierarchical global-to-localwarping, thus making the algorithm more robust against localminima.

The increased demand to study large cohorts have motivated thecontinuous quest for designing robust automated methods. Hippo-campal volumetry is a widely used clinical tool for the detectionand lateralization of mesial temporal lobe epilepsy (Cendes et al.,1993; Jackson et al., 1993; Bernasconi, 2006). Hippocampal volumeloss is also prevalent in Alzheimer’s disease (Frisoni et al., 2010) andis a characteristic feature of a variety of neuropsychiatric disorders,including major depression (Videbech and Ravnkilde, 2004), post-traumatic stress disorder (Hedges and Woon, 2010) and schizo-phrenia (Spoletini et al., 2011). Equally high performance in con-trols and patients, regardless of atrophy and malrotation, stronglysuggests that SurfMulti may become a standard technique for clin-ical assessment of hippocampal volume.

Acknowledgement

This work was supported by the Canadian Institutes of HealthResearch (CIHR MOP-57840, CIHR MOP-93815).

Appendix A. Supplementary data

Supplementary data associated with this article can be found, inthe online version, at http://dx.doi.org/10.1016/j.media.2012.04.008.

References

Akhondi-Asl, A., Jafari-Khouzani, K., Elisevich, K., Soltanian-Zadeh, H., 2011.Hippocampal volumetry for lateralization of temporal lobe epilepsy:automated versus manual methods. Neuroimage 54, S218–S226.

Aljabar, P., Heckemann, R.A., Hammers, A., Hajnal, J.V., Rueckert, D., 2009. Multi-atlas based segmentation of brain images: atlas selection and its effect onaccuracy. Neuroimage 46, 726–738.

Artaechevarria, X., Munoz-Barrutia, A., Ortiz-de-Solorzano, C., 2009. Combinationstrategies in multi-atlas image segmentation: application to brain MR data. IEEETrans. Med. Imaging 28, 1266–1277.

Avants, B.B., Yushkevich, P., Pluta, J., Minkoff, D., Korczykowski, M., Detre, J., Gee,J.C., 2010. The optimal template effect in hippocampus studies of diseasedpopulations. Neuroimage 49, 2457–2466.

Baulac, M., De Grissac, N., Hasboun, D., Oppenheim, C., Adam, C., Arzimanoglou, A.,Semah, F., Lehericy, S., Clemenceau, S., Berger, B., 1998. Hippocampaldevelopmental changes in patients with partial epilepsy: magnetic resonanceimaging and clinical aspects. Ann. Neurol. 44, 223–233.

Benjamini, Y., Hochberg, Y., 1995. Controlling the false discovery rate: a practicaland powerful approach to multiple testing. J. Roy. Stat. Soc. 57, 289–300.

Bernasconi, A., 2006. Magnetic resonance imaging in intractable epilepsy: focus onstructural image analysis. Adv. Neurol. 97, 273–278.

Bernasconi, N., Bernasconi, A., Caramanos, Z., Antel, S.B., Andermann, F., Arnold, D.L.,2003. Mesial temporal damage in temporal lobe epilepsy: a volumetric MRIstudy of the hippocampus, amygdala and parahippocampal region. Brain 126,462–469.

Bernasconi, N., Kinay, D., Andermann, F., Antel, S., Bernasconi, A., 2005. Analysis ofshape and positioning of the hippocampal formation: an MRI study in patientswith partial epilepsy and healthy controls. Brain 128, 2442–2452.

Bland, J.M., Altman, D.G., 1986. Statistical methods for assessing agreementbetween two methods of clinical measurements. Lancet 327, 307–310.

Blumcke, I., Thom, M., Wiestler, O.D., 2002. Ammon’s horn sclerosis: amaldevelopmental disorder associated with temporal lobe epilepsy. BrainPathol. 12, 199–211.

Cascino, G.D., 2004. Surgical treatment for epilepsy. Epilepsy Res. 60, 179–186.Cascino, G.D., 2008. Neuroimaging in epilepsy: diagnostic strategies in partial

epilepsy. Semin. Neurol. 28 (523), 532.Cendes, F., Andermann, F., Gloor, P., Lopes-Cendes, I., Andermann, E., Melanson, D.,

Jones-Gotman, M., Robitaille, Y., Evans, A., Peters, T., 1993. Atrophy of mesialstructures in patients with temporal lobe epilepsy: cause or consequence ofrepeated seizures? Ann. Neurol. 34, 795–801.

Chupin, M., Gérardin, E., Cuingnet, R., Boutet, C., Lemieux, L., Lehéricy, S., Benali, H.,Garnero, L., Colliot, O., 2009a. Fully automatic hippocampus segmentation andclassification in Alzheimer’s disease and mild cognitive impairment applied ondata from ADNI. Hippocampus 19, 579–587.

Chupin, M., Hammers, A., Liu, R.S., Colliot, O., Burdett, J., Bardinet, E., Duncan, J.S.,Garnero, L., Lemieux, L., 2009b. Automatic segmentation of the hippocampusand the amygdala driven by hybrid constraints: method and validation.Neuroimage 46, 55.

Chupin, M., Mukuna-Bantumbakulu, A.R., Hasboun, D., Bardinet, E., Baillet, S.,Kinkingnehun, S., Lemieux, L., Dubois, B., Garnero, L., 2007. Anatomicallyconstrained region deformation for the automated segmentation of thehippocampus and the amygdala: method and validation on controls andpatients with Alzheimer’s disease. Neuroimage 34, 996–1019.

Collins, D.L., Holmes, C.J., Peters, T.M., Evans, A.C., 1995. Automatic 3-D model-basedneuroanatomical segmentation. Hum. Brain Mapp. 3, 190–208.

Collins, D.L., Neelin, P., Peters, T.M., Evans, A.C., 1994. Automatic 3D intersubjectregistration of MR volumetric data in standardized Talairach space. J. Comput.Assist. Tomogr. 18, 192–205.

Collins, D.L., Pruessner, J.C., 2010. Towards accurate, automatic segmentation of thehippocampus and amygdala from MRI by augmenting ANIMAL with a templatelibrary and label fusion. Neuroimage 52, 1355–1366.

Coupé, P., Manjón, J.V., Fonov, V., Pruessner, J., Robles, M., Collins, D.L., 2011. Patch-based segmentation using expert priors: application to hippocampus andventricle segmentation. Neuroimage 54, 940–954.

Fischl, B., Salat, D.H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der, K.A.,Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., Dale,A.M., 2002. Whole brain segmentation. Automated labeling of neuroanatomicalstructures in the human brain. Neuron 33, 341–355.

Fonov, V., Evans, A.C., Botteron, K., Almli, C.R., McKinstry, R.C., Collins, D.L., 2011.Unbiased average age-appropriate atlases for pediatric studies. Neuroimage 54,313–327.

Frisoni, G.B., Fox, N.C., Jack, C.R., Scheltens, P., Thompson, P.M., 2010. The clinical useof structural MRI in Alzheimer disease. Nat. Rev. Neurol. 6, 67–77.

Grigorescu, S.E., Petkov, N., Kruizinga, P., 2002. Comparison of texture featuresbased on Gabor filters. IEEE Trans. Image Process. 11, 1160–1167.

Hammers, A., Heckemann, R., Koepp, M.J., Duncan, J.S., Hajnal, J.V., Rueckert, D.,Aljabar, P., 2007. Automatic detection and quantification of hippocampalatrophy on MRI in temporal lobe epilepsy: a proof-of-principle study.Neuroimage 36, 38–47.

Heckemann, R.A., Hajnal, J.V., Aljabar, P., Rueckert, D., Hammers, A., 2006. Automaticanatomical brain MRI segmentation combining label propagation and decisionfusion. Neuroimage 33, 115–126.

Heckemann, R.A., Keihaninejad, S., Aljabar, P., Rueckert, D., Hajnal, J.V., Hammers, A.,2010. Improving intersubject image registration using tissue-class informationbenefits robustness and accuracy of multi-atlas based anatomicalsegmentation. Neuroimage 51, 221–227.

Hedges, D.W., Woon, F.L., 2010. Alcohol use and hippocampal volume deficits inadults with posttraumatic stress disorder: a meta-analysis. Biol. Psychol. 84,163–168.

Jackson, G.D., Berkovic, S.F., Duncan, J.S., Connelly, A., 1993. Optimizing thediagnosis of hippocampal sclerosis using MR imaging. AJNR 14, 753–762.

Jackson, G.D., Berkovic, S.F., Tress, B.M., Kalnins, R.M., Fabinyi, G.C., Bladin, P.F.,1990. Hippocampal sclerosis can be reliably detected by magnetic resonanceimaging. Neurology 40, 1869–1875.

Kelemen, A., Szekely, G., Gerig, G., 1999. Elastic model-based segmentation of 3-Dneuroradiological data sets. IEEE Trans. Med. Imaging 18, 828–839.

Khan, A.R., Wang, L., Beg, M.F., 2008. FreeSurfer-initiated fully-automatedsubcortical brain segmentation in MRI using large deformation diffeomorphicmetric mapping. Neuroimage 41, 735–746.

Kim, H., Bernasconi, N., Bernhardt, B., Colliot, O., Bernasconi, A., 2008. Basaltemporal sulcal morphology in healthy controls and patients with temporallobe epilepsy. Neurology 70, 2159–2165.

Kim, H., Bernasconi, N., Mansi, T., Bernasconi, A., 2011. Robust surface-based multi-template automated algorithm to segment healthy and pathologicalhippocampi. Med. Image. Comp. Comp. Assist. Interv., 6893.

Kim, H., Chupin, M., Colliot, O., Bernhardt, B.C., Bernasconi, N., Bernasconi, A., 2012.Automatic hippocampal segmentation in temporal lobe epilepsy: impact ofdevelopmental abnormalities. Neuroimage 59, 3178–3186.

Kim, J.S., Singh, V., Lee, J.K., Lerch, J., Ad-Dab’bagh, Y., MacDonald, D., Lee, J.M., Kim,S.I., Evans, A.C., 2005. Automated 3-D extraction and evaluation of the inner andouter cortical surfaces using a Laplacian map and partial volume effectclassification. Neuroimage 27, 210–221.

Kiwiel, K.C., 2001. Convergence and efficiency of subgradient methods forquasiconvex minimization. Math. Program. 90, 1–25.

Klemencic, J., Pluim, J.P.W., Viergever, M.A., Schnack, H.G., Valencic, V., 2004. Non-rigid Registration Based Active Appearance Models for 3D Medical ImageSegmentation. Society for Imaging Science and Technology, Springfield, VA,ETATS-UNIS.

Koh, J., Kim, T., Chaudhary, V., Dhillon, G., 2010. Automatic segmentation of thespinal cord and the dural sac in lumbar MR images using gradient vector flowfield. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2010, 3117–3120.

Konrad, C., Ukas, T., Nebel, C., Arolt, V., Toga, A.W., Narr, K.L., 2009. Defining thehuman hippocampus in cerebral magnetic resonance images – an overview ofcurrent segmentation protocols. Neuroimage 47, 1185–1195.

Kuzniecky, R., Bilir, E., Gilliam, F., Faught, E., Martin, R., Hugg, J., 1999. QuantitativeMRI in temporal lobe epilepsy: evidence for fornix atrophy. Neurology 53, 496–501.

H. Kim et al. / Medical Image Analysis 16 (2012) 1445–1455 1455

Lee, S., Wolberg, G., Shin, S.Y., 1997. Scattered data interpolation with multilevel B-splines. IEEE Trans. Visual Comput. Graphics 3, 228–244.

Leventon, M., Grimson, E., Faugeras, O., 2000. Statistical shape influence in geodesicactive contours. IEEE Trans. Med. Imaging 21, 525–537.

Lotjonen, J.M., Wolz, R., Koikkalainen, J.R., Thurfjell, L., Waldemar, G., Soininen, H.,Rueckert, D., 2010. Fast and robust multi-atlas segmentation of brain magneticresonance images. Neuroimage 49, 2352–2365.

Morey, R.A., Petty, C.M., Xu, Y., Pannu Hayes, J., Wagner Ii, H.R., Lewis, D.V., LaBar,K.S., Styner, M., McCarthy, G., 2009. A comparison of automated segmentationand manual tracing for quantifying hippocampal and amygdala volumes.Neuroimage 45, 855–866.

Morra, J.H., Tu, Z., Apostolova, L.G., Green, A.E., Avedissian, C., Madsen, S.K.,Parikshak, N., Hua, X., Toga, A.W., Jack Jr, C.R., Weiner, M.W., Thompson, P.M.,2008. Validation of a fully automated 3D hippocampal segmentation methodusing subjects with Alzheimer’s disease mild cognitive impairment, and elderlycontrols. Neuroimage 43, 59–68.

Pardoe, H.R., Pell, G.S., Abbott, D.F., Jackson, G.D., 2009. Hippocampal volumeassessment in temporal lobe epilepsy: how good is automated segmentation?Epilepsia 50, 55.

Pitiot, A., Delingette, H., Thompson, P.M., Ayache, N., 2004. Expert knowledge-guided segmentation system for brain MRI. Neuroimage 23, S85–S96.

Riviere, D., Mangin, J.F., Papadopoulos-Orfanos, D., Martinez, J.M., Frouin, V., Regis,J., 2002. Automatic recognition of cortical sulci of the human brain using acongregation of neural networks. Med. Image Anal. 6, 77–92.

Rohlfing, T., Brandt, R., Menzel, R., Maurer, C.R., 2004. Evaluation of atlas selectionstrategies for atlas-based image segmentation with application to confocalmicroscopy images of bee brains. Neuroimage 21, 1428–1442.

Sabuncu, M.R., Yeo, B.T., Van Leemput, K., Fischl, B., Golland, P., 2010. A generativemodel for image segmentation based on label fusion. IEEE Trans. Med. Imaging29, 1714–1729.

Schramm, J., Clusmann, H., 2008. The surgery of epilepsy. Neurosurgery 62(Suppl.2), 463–481 (discussion 481).

Sled, J.G., Zijdenbos, A.P., Evans, A.C., 1998. A nonparametric method for automaticcorrection of intensity nonuniformity in MRI data. IEEE Trans. Med. Imaging 17,87–97.

Sloviter, R.S., Kudrimoti, H.S., Laxer, K.D., Barbaro, N.M., Chan, S., Hirsch, L.J.,Goodman, R.R., Pedley, T.A., 2004. ‘‘Tectonic’’ hippocampal malformations inpatients with temporal lobe epilepsy. Epilepsy Res. 59, 123–153.

Spoletini, I., Cherubini, A., Banfi, G., Rubino, I.A., Peran, P., Caltagirone, C., Spalletta,G., 2011. Hippocampi, thalami, and accumbens microstructural damage inschizophrenia: a volumetry, diffusivity, and neuropsychological study.Schizophr. Bull. 37, 118–130.

Styner, M., Oguz, I., Xu, S., Brechbühler, C., Pantazis, D., Gerig, G., 2006. StatisticalShape Analysis of Brain Structures using SPHARM-PDM. MICCAI 2006Opensource Workshop.

Taheri, S., Ong, S.H., Chong, V.F.H., 2010. Level-set segmentation of brain tumorsusing a threshold-based speed function. Image Vis. Comput. 28, 26–37.

Tsai, A., Yezzi Jr., A., Wells, W., Tempany, C., Tucker, D., Fan, A., Grimson, W.E.,Willsky, A., 2003. A shape-based approach to the segmentation of medicalimagery using level sets. IEEE Trans. Med. Imaging 22, 137–154.

Videbech, P., Ravnkilde, B., 2004. Hippocampal volume and depression: a meta-analysis of MRI studies. Am. J. Psychiatry 161, 1957–1966.

Voets, N.L., Bernhardt, B.C., Kim, H., Yoon, U., Bernasconi, N., 2011. Increasedtemporolimbic cortical folding complexity in temporal lobe epilepsy. Neurology76, 138–144.

Yang, J., Duncan, J.S., 2004. 3D image segmentation of deformable objects with jointshape-intensity prior models using level sets. Med. Image Anal. 8, 285–294.