Inferences for Mixtures of Distributions for Centrally Censored Data with Partial Identification

24
PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] On: 10 June 2010 Access details: Access Details: [subscription number 918398812] Publisher Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37- 41 Mortimer Street, London W1T 3JH, UK Communications in Statistics - Theory and Methods Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t713597238 Inferences for Mixtures of Distributions for Centrally Censored Data with Partial Identification D. Campos a ; C. E. Martínez b ; A. Contreras-Cristán a ; F. O'Reilly a a Universidad Nacional Autónoma de México, Mexico b Universidad Autónoma de la Ciudad de México, Mexico Online publication date: 10 June 2010 To cite this Article Campos, D. , Martínez, C. E. , Contreras-Cristán, A. and O'Reilly, F.(2010) 'Inferences for Mixtures of Distributions for Centrally Censored Data with Partial Identification', Communications in Statistics - Theory and Methods, 39: 12, 2241 — 2263 To link to this Article: DOI: 10.1080/03610920903019920 URL: http://dx.doi.org/10.1080/03610920903019920 Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Transcript of Inferences for Mixtures of Distributions for Centrally Censored Data with Partial Identification

PLEASE SCROLL DOWN FOR ARTICLE

This article was downloaded by: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas]On: 10 June 2010Access details: Access Details: [subscription number 918398812]Publisher Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Communications in Statistics - Theory and MethodsPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713597238

Inferences for Mixtures of Distributions for Centrally Censored Data withPartial IdentificationD. Camposa; C. E. Martínezb; A. Contreras-Cristána; F. O'Reillya

a Universidad Nacional Autónoma de México, Mexico b Universidad Autónoma de la Ciudad deMéxico, Mexico

Online publication date: 10 June 2010

To cite this Article Campos, D. , Martínez, C. E. , Contreras-Cristán, A. and O'Reilly, F.(2010) 'Inferences for Mixtures ofDistributions for Centrally Censored Data with Partial Identification', Communications in Statistics - Theory andMethods, 39: 12, 2241 — 2263To link to this Article: DOI: 10.1080/03610920903019920URL: http://dx.doi.org/10.1080/03610920903019920

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.

Communications in Statistics—Theory and Methods, 39: 2241–2263, 2010Copyright © Taylor & Francis Group, LLCISSN: 0361-0926 print/1532-415X onlineDOI: 10.1080/03610920903019920

Inferences forMixtures of Distributions for CentrallyCensored Data with Partial Identification

D. CAMPOS1, C. E. MARTÍNEZ2,A. CONTRERAS-CRISTÁN1, AND F. O’REILLY1

1Universidad Nacional Autónoma de México, Mexico2Universidad Autónoma de la Ciudad de México, Mexico

In this article, several methods to make inferences about the parameters of afinite mixture of distributions in the context of centrally censored data withpartial identification are revised. These methods are an adaptation of the workin Contreras-Cristán, Gutiérrez-Peña, and O’Reilly (2003) in the case of rightcensoring. The first method focuses on an asymptotic approximation to a suitablysimplified likelihood using some latent quantities; the second method is based onthe expectation-maximization (EM) algorithm. Both methods make explicit use oflatent variables and provide computationally efficient procedures compared to non-Bayesian methods that deal directly with the full likelihood of the mixture appealingto its asymptotic approximation. The third method, from a Bayesian perspective, usesdata augmentation to work with an uncensored sample. This last method is relatedto a recently proposed Bayesian method in Baker, Mengersen, and Davis (2005).Our proposal of the three adapted methods is shown to provide similar inferentialanswers, thus offering alternative analyses.

Keywords Bayesian inference; EM algorithm; Fiducial inference; Gibbs sampler;MCMC; Middle censoring; Profile likelihood.

Mathematics Subject Classification 62F10; 62F15.

1. Introduction

Consider the following setting: one is sampling from a mixture of k populations insuch a way that if the observed value lies either at the left of a lower censoringthreshold Cl, or at the right of an upper censoring threshold Cu, then both its valueand the identity of the population from which it came are registered. If, on the otherhand, the observation lies within the interval �Cl� Cu�, then neither the observationnor the identity of the population is provided; only the fact that there was an

Received February 9, 2007; Accepted May 1, 2009Address correspondence to F. O’Reilly, Estadistica, Universidad Nacional Autónoma

de México, Apdo Postal 20-726, Del. A. Obregon 01000, Mexico; E-mail: [email protected]

2241

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

2242 Campos et al.

observation in that interval is registered. The constants are fixed and known so it isa type-I censoring scheme.

In Baker, Mengersen, and Davis (2005), a similar scheme is studied in what isreferred to by them as their Case 2, and a Bayesian procedure based on MarkovChain Monte Carlo (MCMC) is given. Their procedure is illustrated for the mixtureof two normal distributions. Some simulations are provided and a case study givenfor which, unfortunately, the data are not supplied. In Contreras-Cristán, Gutiérrez-Peña, and O’Reilly (2003), a right censoring scheme that also features identificationof the population if the observation is not censored (partial identification) is studied.Three procedures are proposed and described: one based on the profile likelihoodwith the aid of latent variables, another based on the expectation-maximization(EM) algorithm, and also a Bayesian method based on a Gibbs sampler scheme.The procedures were compared in a well-known reliability set of data (Mendenhalland Hader, 1958), modeled with the mixture of two extreme value distributions andalso in some simulated mixtures of three extreme value distributions.

In this manuscript, it is first shown that all three methods, the latent variableapproach, the EM approach, and the Bayesian based on a Gibbs sampler,may be adapted to deal with the centrally censored problem and thus providealternative solutions. The difference with the already proposed Bayesian procedurein Baker, Mengersen, and Davis (2005) is in the choice of prior distributions,mainly. In Sec. 2 the analysis based on the profile likelihood with the aid of latentvariables is described. An EM algorithm suitable for the problem described aboveis treated in Sec. 3 and, finally, our Bayesian method is described in detail inSec. 4. In subsection 5.1 some data sets are considered and the resulting inferencesprovided by the three methods are shown. A comparison of the methods in termsof their coverage properties is assessed in subsection 5.2 by means of a small-scalesimulation study and an application to real data follows in subsection 5.3. In Sec. 6some general comments are made in terms of the relative agreement betweenthe different approaches, their numerical complexity, general characteristics, andrelative merits.

Let G�·� �� �� be the distribution function corresponding to a mixture of k

distributions Fj�·� �j�, j = 1� � � � � k� with �j unknown for each j� This is given by

G�·� �� �� =k∑

j=1

�jFj�·� �j��

where the components ��j �j > 0� j = 1� 2� � � � � k�∑k

j=1 �j = 1 of the vector � arethe unknown weights in the mixture and �′ = ��1� �2� � � � � �k�� The informationobtained from N independent observations from G can thus be summarized as

�x1in1i=1� �x2i

n2i=1� � � � � �xki

nki=1 and N −

k∑j=1

nj ≡ N − n�

where, for each j = 1� 2� � � � � k� xj1� � � � xjnj denote the nj observations that are eitherbelow Cl or above Cu and identified as coming from distribution Fj� In the notationabove, N − n stands for the number of censored observations, those falling in theinterval �Cl� Cu�.

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

Inferences for Centrally Censored Mixtures 2243

We denote by L��� �� x� the likelihood function for � and �� This function maybe shown to be proportional to

{ k∏j=1

nj∏i=1

�jfj�xji� �j�

}× �G�Cu� �� ��−G�Cl� �� ��

N−n� (1)

where fj�·� �j� is the density function corresponding to Fj�·� �j�� j = 1� 2� � � � � k� It isnoted that even though the observations are either at the left of Cl or above Cu

it is not necessary to denote them differently in the last factor for the likelihood.If one needs to reason how the likelihood is built, it helps, however, to separatethem and then write the conditional density of an observation of the jth populationthat fell below Cl apart from the conditional density of an observation of the jthpopulation that occurred above Cu. The product of these conditional densities isthen multiplied by the probabilities of the N original observations having beengrouped in, say, the sets of n+

j and n−j , (adding to nj), the ones larger than Cu

and the ones smaller than Cl, for all j plus the set of N − n censored ones. Themultinomial probabilities associated appear incorporating terms �

njj , which remain

in the likelihood expression. Other terms in Fj evaluated at Cl raised to the power n−j

and �1− Fj� evaluated at Cu to power n+j cancel with denominators of the products

of the conditional densities.A direct likelihood-based approach to make inferences on the parameters is

likely to be surpassed by the dimension in the parameter space when the mixture isformed by more than two or three populations with �j of dimension two. With twopopulations and a location and scale parameter for each, Fisher’s matrix needed forasymptotic inference is a 5× 5 matrix and for three populations with location andscale parameters, the required analysis is on a 8× 8 matrix. These considerations arebehind the proposal in the following section having to do with a possible reductionon the complexity of the problem by eliciting some latent variables that allow us tofactor the likelihood into separate parts.

2. Profile Likelihood Analysis via Latent Variables

For each j = 1� 2� � � � � k� let us denote by rj the number of unobserved values; thatis, the unknown number of observations, corresponding to distribution Fj , that fellinside the censoring interval �Cl� Cu�. Note that rj ≥ 0 and

∑kj=1 rj = N − n� The

integer quantities r1� r2� � � � � rk are artificial (latent) variables that will enable us toperform an alternative and much simpler analysis for this problem.

If we knew the value of each rj� j = 1� 2� � � � � k� then the weights �′ =��1� �2� � � � � �k� would be irrelevant and the likelihood function L��� x� r� would besuch that

L��� x� r� ∝k∏

j=1

Lj��j� xj� rj�� (2)

where

Lj��j� xj� rj� ∝{ nj∏

i=1

fj�xji� �j�

}(nj + rjnj

)�Fj�Cu� �j�− Fj�Cl� �j�

rj �

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

2244 Campos et al.

In other words, assuming that r1� r2� � � � � rk are known, we can factorize thelikelihood function for � into a product of marginal likelihood functions, each Lj

containing all the necessary information to make inferences about the parameter �jappearing in it.

Let �̂j�rj� be the maximum likelihood estimate (MLE) for �j for fixed rj� Bystudying the corresponding profile likelihood function for r1� r2� � � � � rk� which isgiven by

k∏j=1

Lj��̂j�rj�� xj� rj�� (3)

we can search for the values r j , j = 1� 2� � � � � k� that maximize (3). The finalestimates would then be given by

�̂j�r j �� j = 1� 2� � � � � k� (4)

In what follows, we refer to this scheme as the latent variable profile likelihoodanalysis (LVPLA), as defined in Contreras-Cristán, Gutiérrez-Peña, and O’Reilly(2003).

The idea to follow with this profile likelihood analysis having elicited the latentvariables is to use standard asymptotics provided by Fisher’s observed informationmatrix, which in the case of location and scale is recommended to be based on theusual parametrization of the location � and log���, later termed �, the logarithmof the scale parameter. So the need for the expression of the entries for Fisher’sobserved matrix is evident, as well as a procedure to obtain, in this censoringcase the MLEs that are required for the normal approximations to the profile-likelihoods, as well as to evaluate at them the elements of Fisher’s matrix.

As a comment we would like to mention that there has been some effort toderive general formulae to compute Fisher’s observed matrix in general location-scale families under type I censoring (see Anaya and O’Reilly, 2000, and referencestherein) but we are unaware of a result for the present central censoring scheme.In the case of the normal distribution, which is the one used in this manuscript,after some algebra we obtained closed-form expressions for Fisher’s matrix. In otherlocation-scale families of distributions as the extreme value, similar expressions maybe obtained. In Appendix C, one of the examples from subsection 5�1 is used toperform the LVPLA but using a mixture of two extreme value distributions.

Let f�x� �� �� stand for the normal density with the parameters � (location) and� (log scale), and let F�x� �� �� be the corresponding distribution function. Also let� and � denote the corresponding density and distribution function of a standardnormal, then

f�x� �� �� = e−����x − ��e−���

F�x� �� �� = ���x − ��e−��� (5)

For the parameters ��j� �j� corresponding to population j� the following expressionswill be used throughout Secs. 2 and 3

��Cu� Cl� �j� �j� ≡ ���Cu − �j�e−�j �− ���Cl − �j�e

−�j � and(6)

� �Cu� Cl� �j� �j� ≡ ���Cu − �j�e−�j �−���Cl − �j�e

−�j ��

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

Inferences for Centrally Censored Mixtures 2245

For the k different normal distributions on the mixture, under a centrallycensored scheme as described before, the likelihood function of the jth pair ofparameters �j and �j having observed the �xij

′s with i = 1� 2� � � � � nj , denoted byLj is given by:

Lj��j� �j� ∝ e−nj�j e−12∑nj

i=1 �xij−�j�2e−2�j

�� �Cu� Cl� �j� �j�rj

where r ′js are given integers adding up to N − n.One may simply find the MLEs by building a matrix whose entries are the

values of the likelihood function at the points in Cartesian product of the the gridfor �j and that for �j ; for sufficiently fine grids for both parameters; or as suggestedin the Appendix, use crude grids to yield initial values for the MLEs and use theequations given there to iterate and find the estimates with great precision.

Once the MLEs are found, these of course depend on the particular r ′js so, asmentioned before, an overall likelihood function evaluated at the obtained MLEscomprises a sort of profile likelihood of the r ′js that must be maximized on thecorresponding lattice. This may be done numerically, by enumerating all possiblevalues of the r ′js, and it is a trivial search for low values of k. For a mixture of twodistributions, as is the case exemplified in this manuscript. One only needs to lookfor the cases where r1 equals 0� 1� � � � � N − n with corresponding r2 = N − n− r1and select the r 1 that maximizes the profile for r1� r2.

In our approach here, we are assuming that the values for r j were “given” (eventhough we found them), so there is no need to consider the parameters �j of theweights in the mixture. But in some sense, for a given r j one is in some way implying

that the rationj+r jN

should be close to a sample estimate of �j . In subsections 5.1and 5.3, some comments on possible fiducial distributions induced on �j and theircloseness to modal estimates from a Bayesian scheme are made.

The following quantities appear in the derivation of Fisher’s informationmatrix:

A��j� �j� =��Cu� Cl� �j� �j�

� �Cu� Cl� �j� �j��

B��j� �j� =���Cu − �j�e

−�j ��Cu − �j�− ���Cl − �j�e−�j ��Cl − �j�

� �Cu� Cl� �j� �j��

Next, Fisher’s observed information matrix is given, where the r j′s have been

obtained as before and taken now as fixed or given. In the expression, lj stands forthe log-likelihood.

�2lj�xj� �j� �j�

��j2

= −nje−2�j − r j e

−�j�A��j� �j�

��j

�2lj�xj� �j� �j�

��j��j

= −2nj∑i=1

�xij − �j�e−2�j + r j e

−�jA��j� �j�− r j e−�j

�A��j� �j�

��j

�2lj�xj� �j� �j�

��j2

= −2nj∑i=1

�xij − �j�e−2�j + r j e

−�jB��j� �j�− r j e−�j

�B��j� �j�

��j

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

2246 Campos et al.

Trying to keep a simple notation, the MLEs that would be denoted as �̂j�r j �

and �̂j�r j � will be denoted by �̂

j and �̂ j .

The particular expression for the partials of A and B appear also in theAppendix, but as may be seen, Fisher’s information matrix is easily computed atthe MLEs with r j . This is to be done for each j and with the asymptotic normalapproximation for both �̂

j and �̂ j one may derive the usual confidence intervals

stemming from the fiducial distributions for the parameters �j and �j ; namely, anormal centered at the MLE and whose variance is the corresponding diagonalelement of the inverse of Fisher’s matrix evaluated at the MLEs. The correspondingconfidence intervals are thus the normal approximate intervals associated to thelikelihood intervals but now with assigned approximate confidence.

Some sets of data are analysed with the above-mentioned procedure as well asthe other two procedures in Sec. 5.

3. An EM Algorithm

The EM algorithm is an iterative method for finding maximum likelihood estimatesor locating posterior modes from incomplete data. The next description is concernedwith the latter case. Each iteration consists of two basic steps: the E-step(expectation step) and the M-step (maximization step). Let �s denote the currentguess to the mode of the posterior of interest p�� � x� and let p�� � x� z� denote theaugmented posterior, where x denotes the actual (observed) data and z denotes theaugmented (latent) data. Finally, let p�z � x��s� denote the conditional predictivedistribution of the latent data z given the current guess of the posterior mode.We refer the reader to Redner and Walker (1984) and Tanner (1996, Ch. 4) fora detailed description and examples of the EM algorithm. The implementationpresented here is based in the work of Contreras-Cristán, Gutiérrez-Peña, andO’Reilly (2003), where the case of mixtures of extreme value distributions with rightcensoring and partial identification is analysed. See also Contreras-Cristán (2007).

Algorithm 3.1 (EM Algorithm).

• Input: A current guess to the mode of the posterior of interest, �s�• Output: The posterior mode or EM estimate, ���

E-Step. Compute

Q����s� =∫Zp�z � x��s� log p�� � x� z�dz�

i.e., the expectation of log p�� � x� z� with respect to p�z � x��s�.

M-Step. Maximize the function Q����s� with respect to � to obtain �s+1.Then return to the E-step.

The algorithm is iterated until ��s+1 −�s� or �Q��s+1� �s�−Q��s��s�� issufficiently small.

3.1. Centrally Censored Mixtures of Normal Distributions

We specify in Appendix B� in the context of the problem and the notations stated inSec. 1, the items z� p�� � x� z�� p�z � x���� Q����s�, and the computations required

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

Inferences for Centrally Censored Mixtures 2247

to achieve the E-step and the M-step above. In this subsection we only state theresulting estimates �s+1 obtained at iteration s + 1 in the algorithm.

To this end, denote the parameter � = ��′� �′�′� where �′ = ��′1� � � � � �′k� and

�′j = ��j� �j�� j = 1� 2� � � � k. We remind the reader that for each j� �j = e�j � Theprobability of an observation in the interval �Cl� Cu�

P�Cu� Cl��� = G�Cu���−G�Cl����

under the mixture distribution

G�·��� =k∑

j=1

�jFj�·� �j� �j�� (7)

will be used below.At iteration s + 1� �s+1 is given analytically by

�̂j =∑nj

i=1 xji + �N − n��sj

�sj��Cu�Cl��sj �

P�Cu�Cl��s�

+ �N − n��sj� �Cu�Cl��

sj �

P�Cu�Cl��s��sj

nj + �N − n��sj� �Cu�Cl��

sj �

P�Cu�Cl��s�

� (8)

�̂j =

√√√√√∑nj

i=1�xji − �j�2 + �N − n�

�sj��Cu�Cl��sj ��j �

P�Cu�Cl��s�

nj + �N − n��sj� �Cu�Cl��

sj �

P�Cu�Cl��s�

(9)

and

�̂j =nj + �N − n�

�sj� �Cu�Cl��sj �

P�Cu�Cl��s�

N� (10)

for each j = 1� 2� � � � � k� The expression ��Cu� Cl� �sj� �j� in (9) is defined in Eq. (B4)

of Appendix B. Equation (6) defines � �Cu� Cl� �sj� and ��Cu� Cl� �

sj� above.

Iteration of the E-step (compute Eq. (B3) in Appendix B) and the M-step(evaluate Eqs. (8)–(10)) of the EM algorithm will provide us with estimates �� for ��

3.2. Asymptotic Variance–Covariance Matrix for EM Estimates

Following Meng and Rubin (1991) and Tanner (1996), we can estimate theasymptotic variance-covariance matrix for �� as

V���� ={−�2Q������

��2

∣∣∣∣�

�=�

}−1{I − �M���

��

∣∣∣∣�=��

}−1{�M���

��

∣∣∣∣�=��

}

+{−�2Q������

��2

∣∣∣∣�=��

}−1

� (11)

where I stands for the 3k-dimensional identity matrix, �2Q������

��2 is the Hessian matrixfor Q, and �M���

��is a matrix obtained by numerically differentiating the EM mapping

M��s� = �s+1�

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

2248 Campos et al.

We refer to Tanner (1996, Ch. 4) or Meng and Rubin (1991) for details on thenumerical approximation �M���

��for the derivative of M���. The results by Meng

and Rubin are used in Sec. 5 in order to compute standard errors and confidenceintervals for the EM estimates.

4. A Solution Using Bayesian Ideas

The method described in this section is based on the work of Contreras-Cristán,Gutiérrez-Peña, and O’Reilly (2003), where a Bayesian method for the statisticalinference of mixtures with partially identifiable and right censored data wasstudied. We will extend this Bayesian approach to the context of centrally censoredobservations from mixtures of normal distributions, as described in Sec. 1.

To this end recall that the likelihood function L��� x� for the parameter�= ��′� �′�′ is proportional to the quantity indicated in Eq. (1). In order to makethe Bayesian analysis comparable with the EM algorithm and the LVPLA explainedin the sections above, we will assume that the prior distribution p��� for � is suchthat p��� ∝ 1� In such a case, by using Bayes’ theorem for the posterior distributionwe have

p�� � x� ∝ L��� x�� (12)

Because analytical inferences for � and � are not feasible in this case, we will usesimulation-based methods in order to carry them out. More specifically, the Gibbssampling scheme (Gelfand and Smith, 1990) is adequate for problems with censoreddata and missing observations (Gelman et al., 1995, p. 447).

4.1. A Gibbs Sampling Scheme Using Latent Variables

For each j = 1� � � � � k� let x̃′j ≡ �x̃ji

nj+rji=nj+1 be the additional data that would have

been observed from the jth population had the samples not been censored. It isunderstood that rj� j = 1� 2� � � � � k� is the number of observations from the jthdistribution that fell inside the censoring interval �Cl� Cu�� Thus,

r1 + · · · + rk = N − n

and

�n1 + r1�+ · · · + �nk + rk� = N�

Since �x̃ji are unknown, a Bayesian analysis of this problem must involve thespecification of a joint probability model p��� x̃ � x� for both the parameter � andthe censored observations x̃′ ≡ �x̃′

1� x̃′2� � � � � x̃

′k.

Consider a Gibbs sampler scheme alternating between � and x̃� the fullconditionals for p��� x̃ � x�

p�� � x� x̃� (13)

and

p�x̃ ��� x� = p�x̃ ���� (14)

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

Inferences for Centrally Censored Mixtures 2249

can be obtained as follows. For Eq. (13), suppose that all the samples were fullyobserved (i.e., there are no censored data). In such a case, the likelihood functioncan be written as

L��� x� x̃� ∝k∏

j=1

{�njj

nj∏i=1

fj�xji� �j�

} k∏j=1

{�rjj

nj+rj∏i=nj+1

fj�x̃ji� �j�

}� (15)

Thus, assuming fj to be a normal density with parameter �′j = ��j� �j� we obtain

L��� x� x̃� ∝k∏

j=1

{�nj+rjj �

−�nj+rj �

j e− 1

2�2j

(∑nji=1�xji−�j�

2+∑nj+rji=nj+1�x̃ji−�j�

2)}

� (16)

By Eq. (12) we conclude that the posterior distribution p�� � x� x̃� is proportionalto the right-hand side of (16). From this expression we derive, for the case ofk = 2 distributions in the mixture, the following full conditional distributions forthe components of � that are required to produce samples of (13). In the nextexpressions we denote by xj the vector of the observations identified as coming fromdistribution j, j = 1� 2; also, we use the notation x′ = �x′

1� x′2�. Further, let D′

j =�x′

j� x̃′j�� j = 1� 2� The full conditional distribution for �1 is

p��1 � �1� �1� �2� �2� D1� D2� = p��1 �D1� D2�

∝ �n1+r11 �1− �1�

n2+r2� (17)

which is a Beta(n1 + r1 + 1� n2 + r2 + 1) distribution.The full conditional distribution for �1 is

p��1 � �1� �1� �2� �2� D1� D2� = p��1 � �1� D1� D2�

∝ e− 1

2�21

(∑n1i=1�x1i−�1�

2+∑n1+r1i=n1+1�x̃1i−�1�

2)

∝ e− n1+r1

2�21��1−x̄1�

2

� (18)

where x̄1 ≡ 1n1+r1

(∑n1i=1 x1i +

∑n1+r1i=n1+1 x̃1i

)� Thus, conditionally on D1� D2 and �1� �1

has a Normal(x̄1��21

n1+r1) distribution.

Correspondingly, the full conditional distribution for �1 is

p��1 � �1� �1� �2� �2� D1� D2�

= p��1 � �1� D1� D2� ∝ �−�n1+r1�1 e

− 12�21

(∑n1i=1�x1i−�1�

2+∑n1+r1i=n1+1�x̃1i−�1�

2)� (19)

which is a square root inverted gamma distribution with parameters � = n1+r1−12 and

� = 12

{∑n1i=1�x1i − �1�

2 +∑n1+r1i=n1+1�x̃1i − �1�

2}� The full conditional distributions for

�2 and �2 are analogous.Let us denote

1A�x� ={1 x ∈ A

0 x �∈ A�

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

2250 Campos et al.

Now, in order to produce samples from (14) note that given � and by theconditional independence of x and x̃ (Gelman et al., 1995, p. 9) we have

p�x̃ ��� =k∏

j=1

nj+rj∏i=nj+1

h ji�x̃ji ����

where the density h ji�· ��� of the censored observations for each of the k

populations is

h ji�x̃ji ��� =

g�x̃ji ���G�Cu���−G�Cl���

× 1�Cl�Cu��x̃ji�� (20)

with G�·��� given by the mixture distribution in (7) and g its corresponding densityfunction.

Note that for the case we shall study, the probability density function for x̃ji inEq. (20) is a mixture of normal densities, which for k = 2 has weights �1 and 1− �1

and is truncated on the tails �−� Cl� and �Cu���If censoring is type II, one replaces the constants Cu and Cl by the usual quantile

estimators, as mentioned in Baker, Mengersen, and Davis (2005), in their Sec. 2.2.1.Their case 2 in Sec. 2.2.2 is devoted to the case of centrally censored observationsfrom a mixture of normal densities.

We note that the likelihood expression in the last factor of Eq. (2.6) in Baker,Mengersen, and Davis (2005), does not have a term �k in the product correspondingto the densities of the noncensored observations. However, in the light of the resultsthat they obtain from their MCMC simulation studies, we think this omission mustbe a typo.

5. Numerical Results

In this section we study numerical results of the proposed methods in differentways. Firstly, in the next subsection we estimate the parameters from censoredmixtures for two data sets with relatively short (N = 50) data size. A coverage studyfor different parameters in the mixture is performed in subsection 5.2, and somecomparisons for the three methods are discussed. Finally subsection 5.3 presents anapplication to real data corresponding to lifetimes of elderly persons in a retirementhome.

5.1. A Couple of Examples

First we analyze a data set consisting of N = 50 observations that were censored inthe interval �Cl� Cu� = �−0�5� 0�5� and where only a total of n = 35 were observed.From these n1 = 15 were known to be from the first (normal) population and theremaining 20 from the second one. The data are given in Table 1 and a comparisonis made on Table 2 between the corresponding 95% confidence intervals produced bythe LVPLA and the EM procedures. Table 2 also shows the highest posterior densityregions obtained by using the Bayesian analysis using latent variables (BALV).

We observe that the first method, based on the profile likelihoods of each ofthe four interest parameters, allows for a numerical evaluation of each of theseprofile likelihood functions, having used the values r 1 and r 2 � These likelihoods

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

Inferences for Centrally Censored Mixtures 2251

Table 1Example 1, sample values

Population 1 Population 2

−1�27768317 −0�558180431�27647354 1�901131901�19835022 2�979087761�73313310 1�88861725

−2�18356764 3�738224791�09502253 0�80331268

−1�08670065 3�03734089−0�69020416 −0�60439868−1�69043233 1�68213012−1�84691089 0�83403768−0�97762950 0�87192951−0�77350705 1�39912493−2�11793122 4�01091859−0�56792487 0�743872651�34264155 3�66470124

2�56981287−1�686037992�67940925

−2�97812983−1�75590537

are approximated using the normal distributions with Fisher’s observed matrixdiscussed in Sec. 2. The approximations have an excellent agreement. The upperpanel of Fig. 1 shows the profile likelihood (with continuous lines) and itsnormal approximation (dotted lines), corresponding to the localization parameterof population 1. The lower panel of Fig. 1 shows the profile likelihood (continuouslines) and its normal approximation (dotted lines), corresponding to the log-scaleparameter of population 1. Note that in the case of the location parameter, theplots are almost overlapped. Also, for population 1 in this first example, Fig. 2shows plots of the density estimates corresponding to the posterior samples forthe localization parameter (upper panel) and the log-scale parameter (lower panel),obtained by using the BALV.

Table 2Comparison of results for LVPLA and EM and BALV, Example 1 upper and

lower limits for 95% intervals and highest posterior density regions

LVPLA EM BALV

�1 (−0�6966, 0.1561) (−0�7104, 0.1613) (−0�7904, 0.2533)�2 (0.3102, 1.718) (0.2798, 1.714) (0.2422, 1.821)�1 (−0�2071, 0.3467) (−0�2155, 0.3698) (−0�1323, 0.4894)�2 (0.3048, 0.8605) (0.2943, 0.8591) (0.3437, 1.002)

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

2252 Campos et al.

Figure 1. Normal approximations to the profile likelihoods corresponding to population1 of the first data set. Upper panel: Profile likelihood and its corresponding normalapproximation for localization parameter. Lower panel: Profile likelihood and itscorresponding normal approximation for log-scale parameter.

The four resulting approximating normals for the parameters �1� �1� �2, and �2

had the following mean and variance respectively: (−0�270250, 0.047324), (0.069782,0.019957), (1.014046, 0.128944), and (0.582678, 0.020098).

In the LVPLA method there is a parameter �1, and the assigned r 1 = 10 maybe taken, together with n1 = 15 to have produced 25 observations out of the firstpopulation from a total of N = 50 Bernoulli trials. A fiducial, symmetric proposalsuggests a mixture with equal weights of two beta distributions with parameters� and �, respectively of �25� 26� and �26� 25�, which has mean of 0�5, coincidingwith the ratio of n1+r 1

N= �̂1� Corresponding to the BALV, Fig. 3 is an estimate of a

density produced with the posterior samples of the weight �1.A comparison of the intervals shown in Table 2, prompts us to conclude that

the LVPLA and the EM methods produce very similar results. This is not the case

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

Inferences for Centrally Censored Mixtures 2253

Figure 2. Estimates obtained from posterior samples corresponding to population 1 of thefirst data set. Upper panel: Localization parameter. Lower panel: log-scale parameter.

Figure 3. Estimate for the final distribution corresponding to the mixture weight for thefirst data set.

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

2254 Campos et al.

Table 3Example 2: sample values

Population 1 Population 2

2�48375727 1�71885324−0�88203478 2�01618753−0�79095798 1�54383349−0�81216058 −0�53529982−1�69555619−0�682264272�27630153

−1�91459549

for the highest posterior density regions yielded by the BALV, which are a bit larger.This feature is more apparent for the log-scale parameters.

Next we analyze a data set consisting of N = 50 observations censored in theinterval �Cl� Cu� = �−0�5� 1�5�� where only n = 12 were observed outside the intervaland from which n1 = 8 are from the first population. The data are given in Table 3and a comparison is made in Table 4 between the corresponding 95% confidenceintervals produced by the LVPLA and the EM procedures. Table 4 also showsthe highest posterior density regions obtained from the BALV. Observe that incontrast with the first example, censoring is rather heavy in this case. The assignedor imputed value for r 1 is 10, a mere coincidence, as in the first example.

For the parameters �1� �1� �2, and �2, the resulting approximating normalswere n�−0�270250� 0�047324�, n�0�069782� 0�019957�, n�1�014046� 0�128944�, andn�0�582678� 0�020098�.

As in the first example a small analysis is made for what �1 could have been,having r 1 + n1 = 18 observations out of the first population. A fiducial, symmetricproposal suggests a mixture with equal weights of two beta distributions withparameters � and �, respectively, of �18� 21� and �19� 22�.

As before, the intervals produced by the LVPLA and the EM procedures aresimilar and the highest posterior density regions produced by the BALV are a bitlarger.

The first example was simulated using �1 = 0�5 with a n�0� 1� as the firstpopulation with the second population being a n�1� 4�. The second example wasgenerated with �1 = 0�25 where the first population is a n�0� 1� and the second

Table 4Comparison of results for LVPLA and EM and BALV, Example 2 upper and

lower limits for 95% intervals and highest posterior density regions

LVPLA EM BALV

�1 (−0�4602, 0.6996) (−0�4158, 0.6907) (−0�6448, 0.7580)�2 (0.3752, 1.009) (0.3867, 1.004) (0.3525, 1.085)�1 (−0�1705, 0.4946) (−0�2218, 0.5024) (−0�2105, 0.8544)�2 (−0�8109, −0�2243) (−0�8134, −0�2061) (−0�7543, −0�0561)

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

Inferences for Centrally Censored Mixtures 2255

Table 5True parameter values

(�2 = 1− �1)

Parameter �1 �2

�1 0.0 0.0�2 2.0 0.5�1 1.0 0.25�2 0.4 1.25�1 0.333 0.7

population is a n�1� 0�25�, respectively, with the large censoring interval alreadydescribed. The results show quite an agreement between the three methods used and,as expected, the intervals in the LVPLA method do result in slightly smaller onesdue to the assigned value for r1 but, nevertheless, LVPLA considerably simplifies theanalyses, especially if one confronts larger numbers of populations in the mixture.

5.2. Studying the Coverage of the Proposed Methods

We have conducted a small simulation study to determine the coverage propertiesfor each of the methods LVPLA, EM, and BALV. Let us denote the parameterof the mixture by �

′ = ��1� �1� �2� �2� �1�� as in subsection 3.1. The two-parametervalues �

′1 and �

′2 shown in Table 5 were considered to simulate data sets, each of

these with two percentages of central censoring (denoted �), namely 30 and 50%�Thus, in the end, we studied four different cases by repeating the next steps 100

times for every case:

• A sample from the centrally censored mixture of normal distributions wassimulated. We used N = 300 as sample size.

• An interval was computed for each of the components of � at the 95% level.• A record is kept if the true value of the component of � belongs to the

estimated interval.

The coverage (proportion of intervals containing the true value of thecorresponding component of �) for the three methods is reported in Table 6, wherethe percentage � of centrally censored observations considered is 30% and in Table 7for � = 0�50�

Table 6Coverage of 95% intervals, considering � = 0�30

LVPLA EM algorithm BALV

�1 �2 �1 �2 �1 �2

�1 0.94 0.98 0.97 0.94 0.97 0.93�2 0.97 0.96 0.90 0.93 0.90 0.92�1 0.98 0.96 0.95 0.94 0.95 0.98�2 0.93 0.95 0.95 0.92 0.96 0.95

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

2256 Campos et al.

Table 7Coverage of 95% intervals, considering � = 0�50

LVPLA EM algorithm BALV

�1 �2 �1 �2 �1 �2

�1 0.92 0.98 0.90 0.92 0.94 0.92�2 0.98 0.94 0.96 0.95 0.98 0.94�1 0.93 0.95 0.90 0.96 0.94 0.97�2 0.97 0.94 0.91 0.94 0.98 0.94

The three methods give results that seem alike. But with more detail, if 30% ofthe observations are censored, the results from the EM algorithm and the BALVare closer and the LVPLA appears to have better coverage properties. When thepercentage of censored observations is larger (� = 0�50) the LVPLA and BALV tendto produce similar results and the EM algorithm seems affected.

5.3. Application to a Real Case

In a rest home in Mexico City, a follow-up was done to a selected group of 80residents between an initial date and the date when the last of these residents died.At time of death the probable cause was catalogued as either “heart failure” (cause1) or “other” (cause 2) according to some medical criterion. For some reason thefiles corresponding to 8 of the residents who were known to have died between twointermediate dates were lost, so the date of death and the probable cause were notknown. The only thing certain was that these 8 residents died during that intervalof time.

A preliminary analysis with the available data allows us to assign 46 survivaltimes (times of death) to cause 1, and 26 survival times to cause 2; the remaining 8middle censored observations were assigned according to a mixture of two normaldistributions. The decision to use normal distributions was based on probabilityplots, where the assigned censored observations were given the mid-value of thecensoring interval. Application of the LVPLA yields the value r 1 = 3� For thisnumber, and using a mixture of beta distributions as (a fiducial distribution)appraisal for the proportion �1� the corresponding mean value is 0.6111. Thus,about 61% of this population will have ‘heart failure’ as a cause of death. Theparameter estimates for �1 and �2 resulted in 2287 and 2635 (days) for populations1 and 2, respectively. The difference seems to be significant due to the correspondingestimates for �1 and �2, which were approximately 578 and 403, respectively. Wecould conclude that those who have ‘heart failure’ as cause of death feature shorterlifetimes; however, the problem of verifying the significance is a Behrens–Fisher’stype problem complicated by the unclassified middle censored observations. This isoutside the scope of the present article. The data, of course, are available from theauthors. We thought of this example, where there was no censoring scheme perse;but the loss of data over a period of time resulted in an equivalent setting.

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

Inferences for Centrally Censored Mixtures 2257

6. Concluding Remarks

In this article we have discussed the use of latent variables in the analysis ofcentrally censored mixtures of distributions for data with partial identification.Although we have centered our discussion around the case of normal distributions,the methods exposed here can be easily implemented for other models; for instance,the distribution models treated in Contreras-Cristán, Gutiérrez-Peña, and O’Reilly(2003). In contrast with the Bayesian approach, the LVPLA does not take intoaccount the uncertainty involved in the latent variables. The numerical examplesin Sec. 5 show that the LVPLA tends to produce shorter intervals than either theBALV and the EM.

The main advantage of the LVPLA and BALV is given in terms of theirnumerical efficiency. Both methods avoid the computation and inversion of matriceswith dimensions depending on the number of populations k in the mixture. FromEq. (2) in the LVPLA, once the values �r j have been obtained, the estimationof the asymptotic variance–covariance matrix corresponding to the estimates �̂jfor population j is performed separately for each j = 1� 2� � � � � k� In the case ofthe BALV, inference for population j is performed simply by simulation from thefull conditionals corresponding to �j� j = 1� 2� � � � � k� and from the full conditionalfor x̃. These full conditionals are given in Eqs. (18)–(20). Note that the fullconditional distributions for �j do not depend on the parameter components�k� k �= j� corresponding to another population and that all full conditionals arestraightforward to sample from. These features allow for ease of computations andfor gain in efficiency as the number of populations increases.

Unfortunately, we cannot say the same about the EM algorithm. However, forthis method the estimates �̂j and �̂j are given (for the case of normal populations)explicitly in Eqs. (8) and (9); thus, no numerical search is required. This last featuremakes a bit of a balance among the EM and the BALV and LVPLA for the caseof mixtures of normal distributions.

Case 2 in subsection 2.2.2 of Baker, Mengersen, and Davis (2005) presentsa similar framework to the one we have worked here. These authors proposedto tackle the problem by using an MCMC scheme that differs from the BALVin the specification of the prior distribution for �� Another difference is that theapproach in Baker, Mengersen, and Davis (2005) uses a reparameterization thatlinks the means ��j of the populations in order to avoid convergence of theresulting Markov chain to one of the components in the mixture. It is our beliefthat the BALV approach results in a more straightforward scheme because it doesnot require such constraints; on the other hand, the BALV produces results that arecomparable with the LVPLA and the EM algorithms.

Appendix A

For an alternative derivation of the MLEs starting from a sufficiently close initialguess, the following equations are directly derived from equating the two firstderivatives of lj to zero. For rj given, the MLEs are solutions of the following twoequations:

�̂j =∑nj

i=1 xij − rje−�̂jA��̂j� �̂j�

nj

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

2258 Campos et al.

�̂j =rjB��̂j� �̂j�+−

√�rjB��̂j� �̂j��

2 + 4nj

∑nji=1�xij − �̂j�

2

−2nj

The finding of the solution is done by successive numerical approximations untilthe zero of the system is found. As a suggestion, these systems may be used after aninitial search using a crude grid is made to get approximate maxima of the likelihoodfunction used as initial guess.

The expressions for the three partial derivatives of A and B that are needed inSec. 2 for the entries of Fisher’s matrix are

�A��j� �j�

��j

= −e−�j ��′��Cu − �j�e−�j �− �′��Cl − �j�e

−�j �

���Cu −m�j�e−�j �−���Cl − �j�e

−�j �+ e−�jA2��j� �j��

�A��j� �j�

��j

= −e−�j ��′��Cu − �j�e−�j ��Cu − �j�− e−�j ��′��Cl − �j�e

−�j ��Cl − �j�

���Cu − �j�e−�j �−���Cl − �j�e

−�j �

+ e−�jA��j� �j�B��j� �j��

�B��j� �j�

��j

= −e−�j ��′��Cu − �j�e−�j ��Cu − �j�

2− e−�j ��′��Cl − �j�e−�j ��Cl − �j�

2

���Cu − �j�e−�j �−���Cl − �j�e

−�j �

+ e−�jB2��j� �j��

Appendix B

By assuming that the ith observation corresponding to population j in the mixturefollows a normal distribution, for j = 1� 2� � � � � k and i = 1� 2� � � � � nj� we havefj�xji� �j� = f�xji� �j� �j� and Fj�xji� �j� = F�xji� �j� �j�� with f and F given inEq. (5). Now, consider as augmented data the pairs

z1 = �x0 1� i1�� � � � � zN−n = �x0N−n� iN−n��

where the first component in each pair is an unobserved (censored) datum and thesecond component is an integer that indicates the population corresponding to thefirst component. Thus, the augmented likelihood L�x� z ��� is proportional to

k∏j=1

nj∏i=1

{�jfj�xji� �j�

}× N−n∏l=1

{�il

fil �x0 l� �il �}� (B1)

From Eqs. (1) and (B1) it follows that the predictive distribution for the augmenteddata is

p�z1� � � � � zN−n � x��� =N−n∏l=1

{�il

fil �x0 l� �il �

P�Cu� Cl���

}� (B2)

This distribution is supported on the set

��z1� � � � � zN−n� � zl = �x0l� il�� where il ∈ �1� 2� � � � � k

and x0l ∈ �Cl� Cu�� l = 1� 2� � � � N − n�

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

Inferences for Centrally Censored Mixtures 2259

We assume a uniform prior distribution for � (p��� ∝ 1), and this specificationis taken to make results comparable with those obtained from the LVPLA methodin Sec. 2 and the BALV method in Sec. 4. Note that changing the prior specificationwill cause a change in the following computations.

Under the assumption of a normal probability model, the augmented posteriordensity is then given by

p�� � x� z� ∝k∏

j=1

{ nj∏i=1

�j

1√2��j

e− 1

2�2j�xji−�j�

2}×

N−n∏l=1

{�il

1√2��il

e− 1

2�2il

�x0 l−�il �2}�

To perform the E-step we integrate log�p�� � x� z� with respect to the augmentedpredictive density (B2) evaluated on a current value �s of ��

Q����s� ≡ E{log�p�� � x� z� ��s� x

}=

∫ Cu

Cl

· · ·∫ Cu

Cl

k∑i1=1

· · ·k∑

iN−n=1

log�p�� � x� z�

×N−n∏l=1

�silfil �x0 l� �

sil�

P�Cu� Cl��s�dx0 1 · · ·dx0N−n�

It is not difficult to obtain that

Q����s��=

k∑j=1

{nj + �N − n�

�sj�� �Cu� Cl� �j� �j�

P�Cu� Cl��s�

}log �j

−k∑

j=1

{nj + �N − n�

�sj�� �Cu� Cl� �j� �j�

P�Cu� Cl��s�

}log �j

−k∑

j=1

1

2�2j

nj∑i=1

�xji − �j�2 − �N − r�

k∑j=1

{�sj

2�2j

· ��Cu� Cl� �sj� �j�

P�Cu� Cl��s�

}� (B3)

where�= denotes equality up to an additive constant (with respect to �) and

��Cu� Cl� �sj� �j� = ��s

j�2

{(Cu − �s

j

�sj

)�

(Cu − �s

j

�sj

)−

(Cl − �s

j

�sj

)�

(Cl − �s

j

�sj

)}

+ ��sj�

2�� �Cu� Cl� �j� �j�+ ��sj − �j�

2�� �Cu� Cl� �j� �j�

+ 2��sj − �j���

sj����Cu� Cl� �j� �j�� (B4)

In order to perform the M-step, note that Q����s� = H1��1� � � � � �k�+H2��1,� � � � �k� �1� � � � � �k�� where

H1��1� � � � � �k� =k∑

j=1

{nj + �N − n�

�sj�� �Cu� Cl� �j� �j�

P�Cu� Cl��s�

}log �j

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

2260 Campos et al.

and

H2��1� � � � � �k� �1� � � � �k� = −k∑

j=1

{nj + �N − n�

�sj�� �Cu� Cl� �j� �j�

P�Cu� Cl��s�

}log �j

−k∑

j=1

1

2�2j

nj∑i=1

�xji − �j�2

− �N − r�k∑

j=1

{�sj

2�2j

· ��Cu� Cl� �sj� �j�

P�Cu� Cl��s�

}�

Then, to maximize Q with respect to �� we maximize each of terms H1 and H2

separately.Maximization for H2 requires one to compute its partial derivatives with respect

to �j and �j� for j = 1� 2� � � � � k� These are given by

�H2

��j

= 1

2�2j

{ rj∑i=1

2�xji − �j�

}− �N − n�

1

2�2j

�sj

(���Cu�Cl��

sj ��j �

��j

)P�Cu� Cl��

s�

and

�H2

��j

= −(nj + �N − n�

�sj�� �Cu� Cl� �j� �j�

P�Cu� Cl��s�

)1�j

+nj∑i=1

�xji − �j�2 1

�3j

+ �N − n��sj��Cu� Cl� �

sj� �j�

P�Cu� Cl��s�

1

�3j

Thus, H2 attains a maximum at the solution �̂′ = ���̂1� �̂1�� � � � � ��̂k� �̂k�� to theequations

�H2

��j

��̂� = 0��H2

��j

��̂� = 0� j = 1� 2� � � � � k�

The solution �̂ is given analytically in Eqs. (8) and (9) of Sec. 3�Regarding the maximization of H1��1� � � � � �k�� by using Lagrange multipliers

(with the linear restriction∑k

j=1 �j = 1), we obtain that H1 attains a maximum atthe point �̂j given in Eq. (10) of Sec. 3.

Iteration of the E-step (compute Eq. B3) and the M-step (evaluate Eqs. (8)–(10))of the EM algorithm will provide us with estimates �� for ��

Appendix C

The purpose of this Appendix is to redo the calculations with the LVPLA methodfor the first example in subsection 5.1, with exactly the same data set (which wassimulated from normal distributions and then centrally censored) but now usinganother location-scale distribution, the extreme value distribution.

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

Inferences for Centrally Censored Mixtures 2261

Figure C1. Normal approximations to the profile likelihoods corresponding to population1 of the first data set (here using a mixture of extreme value distributions). Upper panel:Profile likelihood and its normal approximation for localization parameter. Lower panel:Profile likelihood and its normal approximation for log-scale parameter.

In Sec. 2 and in Appendix A, formulas for the likelihood and the first derivativesof the log-likelihood function were explicitly derived for the case of the mixtureof normals, and in subsection 5.1, the results of the LVPLA were shown andcomparedwith the other twomethods. It is essentially straightforward to obtain closedexpressions for the first and second derivatives of the log-likelihood function when themixture is of extreme value distributions with different location parameters �j , andlog-scale parameters �j . Because of that, the expressions are not shown because theymay be easily derived.

Upon performing the analysis, the results of approximation to the numericalprofile likelihood provided by the corresponding profile likelihood (the normal

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

2262 Campos et al.

approximation using Fisher’s observed information matrix) were very good also.It is not as close as when the mixture of normals is considered but still very good.

The result is illustrated only for the first population of the first example inFig. C1. As may be expected, the obtained profile likelihood for, say, �1 and usingthe mixture of normals is different to the corresponding profile likelihood whenwe use the mixture of extreme values. In fact, the MLEs are also different becausethe implication of a numerical value for the location parameter has a slightlydifferent meaning for each family (the normal distribution or the extreme valuedistribution). The MLEs for the parameters in the first population when using themixture of the extreme value distributions resulted in �̂1 = 0�2808 and �̂1 = −0�0186compared with −0�2702 and 0.0769 obtained when the mixture of normals wasused. For the other population similar results were obtained. It is also interesting toobserve that the number of censored observations ascribed to the first populationunder the normal or extreme value models in the example discussed was 10 and 11respectively, also alike.

Nevertheless, as shown in Fig. C2, the comparison of the normal densitywith the above MLEs as parameters and the extreme value density with thecorresponding MLEs also as parameters clearly shows agreement over the rangein which both fits try to model the same data set. No attempt is made here todiscuss the possibility of first doing some sort of model search or to perform somegoodness-of-fit, as mentioned in subsection 5.3 where a rough probability plot wasdone with the rest home survival data after assigning the 8 censored observations.

It is hoped that with this short exercise changing the location-scale family fromnormal to extreme value, the LVPLA is shown to work in other location-scaledistributions as well.

Figure C2. Comparison of normal distribution with parameters given by the MLEs(obtained via LVPLA) and extreme value distribution with parameters given by the MLEs(obtained via LVPLA).

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010

Inferences for Centrally Censored Mixtures 2263

Acknowledgments

The authors thank an anonymous referee for his helpful comments and suggestions,which led to improvements of our work. The research of Alberto Contreras-Cristánhas been funded by PAPIIT grant IN109906 and by CONACyT grant J48538.Federico O’Reilly and Alberto Contreras-Cristán gratefully acknowledge supportfrom Sistema Nacional de Investigadores Mexico.

References

Anaya, K., O’Reilly, F. (2000). Fisher’s observed information matrix for location and scaleparameters under type I censoring. Comm. Stat. Theor. Meth. 29(7):1527–1537.

Baker, P., Mengersen, K., Davis, G. (2005). A Bayesian solution to reconstructing centrallycensored distributions. J. Agr. Biol. Environ. Stat. 1:61–84.

Bernardo, J. M., Smith, A. F. M. (1994). Bayesian Theory. New York: Wiley.Contreras-Cristán, A. (2007). Using the EM algorithm for inference in a mixture of

distributions with censored but partially identifiable data. Comput. Stat. Data Anal.51:2769–2781.

Contreras-Cristán, A., Gutiérrez-Peña, E., O’Reilly, F. (2003). Inferences using latentvariables for mixtures of distributions for censored data with partial identification.Comm. Stat. Theor. Meth. 32(4):749–774.

Dempster, A. P., Laird, N., Rubin, D. B. (1977). Maximum likelihood from incomplete datavia the EM algorithm. J. Roy. Stat. Soc. B 39:1–38.

Diebolt, J., Robert, C. P. (1994). Estimation of finite mixture distributions through Bayesiansampling. J. Roy. Stat. Soc. B 56:363–375.

Gamerman, D. (1997). Markov Chain Monte Carlo. Stochastic Simulation for BayesianInference. London: Chapman and Hall.

Gelfand, A. E., Smith, A. F. M. (1990). Sampling-based approaches to calculating marginaldensities. J. Amer. Statist. Assoc. 85:398–409.

Gelman, A., Carlin, J. B., Stern, H. S., Rubin, D. B. (1995). Bayesian Data Analysis. London:Chapman and Hall.

Jeffreys, H. (1961). Theory of Probability. 3rd ed. Oxford: Oxford University Press.Mendenhall, W. Hader, R. J. (1958). Estimation of parameters of mixed exponentially

distributed failure time distributions from censored life test data. Biometrika45:504–520.

Nelder, J. A., Mead, R. (1965). A simplex method for function minimisation. Comput. J.7:308–313.

Redner, R. A., Walker, H. F. (1984). Mixture densities, maximum likelihood and the EMalgorithm. SIAM Rev. 26:195–239.

Tanner, M. A. (1996). Tools for Statistical Inference. 3rd ed. New York: Springer Verlag.Tanner, M., Wong, W. H. (1987). The calculation of posterior distributions by data

augmentation. J. Am. Stat. Assoc. 82:529–550.

Downloaded By: [O'Reilly, Federico Jorge][Inst Inv Matem Aplicadas] At: 22:39 10 June 2010