Inference with the lognormal distribution - e-Repositori UPF
-
Upload
khangminh22 -
Category
Documents
-
view
2 -
download
0
Transcript of Inference with the lognormal distribution - e-Repositori UPF
Inference with the lognormal distribution
Nicholas T. Longford
SNTL and UPF, Barcelona, Spain∗
Abstract
Several estimators of the expectation, median and mode of the lognormal distribution are
derived. They aim to be approximately unbiased, efficient, or have a minimax property in the
class of estimators we introduce. The small-sample properties of these estimators are assessed by
simulations and, when possible, analytically. Some of these estimators of the expectation are far
more efficient than the maximum likelihood or the minimum-variance unbiased estimator, even
for substantial sample sizes.
Keywords: χ2 distribution, efficiency, lognormal distribution, minimax estimator, Taylor expansion.
JEL classification: C13 — Estimation (Economic and Statistical Methods); C42 — Specific Distri-
butions (Special Topics).
∗Departament d’Economia i Empresa, Universitat Pompeu Fabra, Ramon Trias Fargas 25–27, 08005 Barcelona,
Spain; [email protected]
1
Introduction
The lognormal distribution is used in a wide range of applications, when the multiplicative scale
is appropriate and the log-transformation removes the skew and brings about symmetry of the data
distribution (Limpert, Stahel and Abbt, 2001). Normality is the preferred distributional assumption in
many contexts, and logarithm is often the first transformation that an analyst considers to promote it.
Linear models are convenient to specify and all the relevant moments are easy to calculate and operate
with on the log-scale. However, there are instances when moments, and the expectation in particular,
are of interest on the original (exponential) scale. For example, the lognormal distribution is frequently
applied to variables in monetary units, such as companies’ assets, liabilities and profits, residential
property prices (Zabel, 1999) and household income (Longford and Pittau, 2006). The population
mean of such a variable may be a much more relevant target for inference than the population mean of
its logarithm. The sample mean is a suitable estimator for large samples, when asymptotics provide
a good approximation. In samples that are not large enough, and especially when the underlying
(normal-scale) variance is large, the sample mean is very inefficient. We explore several alternatives
and study their small-sample properties.
Finney (1941) derived the minimum-variance unbiased estimator of the expectation and variance
of the lognormal distribution, but it involves the evaluation of an infinite series; see Thoni (1969) for
an application. Aitchison and Brown (1957) is a comprehensive reference for the lognormal distribu-
tion; see also Crow and Shimizu (1988). Royston (2001) considers the lognormal distribution as an
alternative basis for survival analysis, claiming robustness and convenience. He fits a linear model on
the log scale, but implies that the prediction obtained on the log scale can be transformed back to the
original scale straightforwardly. The confidence intervals can be, but the prediction as such cannot,
because the transformation is highly nonlinear. Toma (2003) derives estimators for the multivariate
lognormal distribution, but her focus is on large-sample properties. Zhou, Gao and Hui (1997) study
tests for comparing two lognormal samples, but consider only test statistics that resemble the t, which
include the likelihood ratio. We derive a closed-form estimator that is biased but is more efficient
than Finney’s estimator. With our approach, we also study estimation of the median and mode of
the lognormal distribution.
The remainder of this section introduces the notation and reviews the basic results. The next
section derives estimators of the quantities exp(µ + aσ2), which include the expectation, median and
mode of the lognormal distribution related to the normal distribution with mean µ and variance
σ2. The following sections describe simulations of these estimators. The paper is concluded by a
discussion.
Let X = (X1 , . . . , Xn)⊤ be a random sample from a lognormal distribution, generated by ex-
2
ponentiating a random sample from the normal distribution N (µ, σ2). Denote this distribution
by LN (µ, σ2). Its respective expectation and variance are E(X) = exp(µ + 1
2σ2) and var(X) =
E(2X){exp(σ2)− 1}. The sample mean µ = {log(X1) + · · ·+ log(Xn)}/n is unbiased and efficient for
µ, with sampling variance var(µ) = σ2/n. However, these desirable properties are lost by nonlinear
transformations; exp(µ) is unbiased and efficient for neither exp(µ) nor exp(µ + 1
2σ2). We have
E(µ) = exp
(
µ +σ2
2n
)
var(µ) = exp
(
2µ +σ2
n
){
exp
(
σ2
n
)
− 1
}
We assume that there is an unbiased estimator σ2 of σ2, and that
kσ2
σ2∼ χ2
k ,
the χ2 distribution with k degrees of freedom, defined by the density
f(x) =1
Γ(
1
2k)
(
1
2
)1
2k
x1
2k−1 exp
(
−x
2
)
,
on (0, +∞). In the introduced setting, k is equal to n − 1, but we consider generalisations in which
more than one degree of freedom is lost. In the next section we study estimators of the form θ(b) =
exp(µ + bσ2). For evaluating their bias and MSE, we require expressions for the expectation and
variance of exp(bσ2).
For a random variable Y with χ2k distribution, we have
E{
exp(
bσ2)}
= E
{
exp
(
bσ2
kY
)}
=1
Γ(
1
2k)
(
1
2
)1
2k ∫ +∞
0
x1
2k−1 exp
{
−x
(
1
2− bσ2
k
)}
dx
=
(
1
2
)1
2k (
1
2− bσ2
k
)− 1
2k
=
(
k
k − 2bσ2
)1
2k
, (1)
for b < k/(2σ2). By similar operations we obtain the identity
var{
exp(
bσ2)}
= E{
exp(
2bσ2)}
−[
E{
exp(
bσ2)}]2
=
(
k
k − 4bσ2
)1
2k
−(
k
k − 2bσ2
)k
, (2)
which holds as long as b < k/(4σ2); otherwise the variance is not defined.
The expectation, median and mode of LN (µ, σ2) are exp(µ + 1
2σ2), exp(µ) and exp(µ − σ2),
respectively. This motivates the general problem of estimating the quantity θ(a) = exp(µ+aσ2), with
a given constant a, by exploring the estimators of the form θ(ba) = exp(µ+baσ2). In the next section,
we seek such estimators that are unbiased, attain the minimum mean squared error (MSE), and have
a minimax property.
3
Estimation
We seek first the constant b for which θ(b) = exp(µ + bσ2) is unbiased for θ(a) = exp(µ + aσ2). As µ
and σ2 are independent,
E{
θ(b)}
= exp
(
µ +σ2
2n
) (
k
k − 2bσ2
)1
2k
. (3)
Therefore θ(b) is unbiased for θ(a) when
k − 2bσ2
k= exp
{
−2σ2
k
(
a − 1
2n
)}
,
that is, for
b∗a,ub =k
2σ2
[
1 − exp
{
−σ2
k
(
a − 1
2n
)}]
. (4)
As b∗a,ub depends on σ2, we have to estimate it. Its naive estimator is denoted by b∗a,ub . Its dependence
on σ2 is avoided by the Taylor expansion, which yields the approximation
b†a,ub= a − 1
2n.
It can be interpreted as a multiplicative bias correction of the naive estimator exp(µ) which has the
expectation exp{µ + σ2/(2n)}. For a = 1
2it agrees with the linear term of the expansion of the
function g(t) in the minimum-variance unbiased estimator exp(µ) g(1
2σ2), derived by Finney (1941);
g (t) = 1 +
∞∑
h=1
(n − 1)2h−1th
h! nh
h−1∏
m=1
1
n + 2m − 1,
with the convention that the product of no terms (for h = 1) is equal to unity. Unlike Finney’s
estimator, neither θ(b∗a,ub) nor θ(b†a,ub
) are unbiased because the estimators b are not linear in σ2.
Both estimators θ turn out to be very inefficient for all three values of a.
The sampling variance of θ(b) is
var(
µ + bσ2)
= E {exp (2µ)}E{
exp(
2bσ2)}
−[
E (µ) E{
exp(
bσ2)}]2
= exp
(
2µ +2σ2
n
) (
k
k − 4bσ2
)k
2
− exp
(
2µ +σ2
n
) (
k
k − 2bσ2
)k
, (5)
and its bias for θ(a) is
exp(µ)
{
exp
(
σ2
2n
) (
k
k − 2bσ2
)k
2
− exp(
aσ2)
}
.
Hence the MSE of θ(b) in estimating θ(a) is
m(b; a) = exp (2µ)
{
exp(
2aσ2)
− 2 exp
(
σ2
2n+ aσ2
) (
k
k − 2bσ2
)k
2
+ exp
(
2σ2
n
) (
k
k − 4bσ2
)k
2
}
.
(6)
4
The minimum of this function of b is found as the root of its derivative
∂m
∂b= 2σ2 exp(2µ)
{
exp
(
σ2
2n
) (
k
k − 4bσ2
)k
2+1
− exp
(
aσ2 +σ2
2n
) (
k
k − 2bσ2
)k
2+1
}
.
The solution is
b∗a,ms =k
2σ2
Da − 1
2Da − 1, (7)
where
Da = exp
{
2σ2
k + 2
(
a − 3
2n
)}
.
Finiteness of the MSE (b < 1
4k/σ2) implies the condition Da > 1
2. The (linear) Taylor expansion of
Da ,
Da.= 1 +
2σ2
k + 2
(
a − 3
2n
)
,
yields the approximation to b∗a,ms
b†a,ms =k(2an − 3)
4kσ2(2an − 3) + 2(k + 2)n. (8)
Assuming that a is not very large (values of particular interest are 1
2, 1 and −1), and k < n not
extremely small, this approximation is good for small values of σ2/(k +2) and when the denominator
in b†a,ms is distant from zero. Singularity occurs for σ2 = − 1
2(k + 2)n/(2an − 3)/k, which raises a
concern for a = 0 only for small n and large σ2 (when σ2 .= n/6), and for a = −1 when σ2 .
= 0.25.
Beyond these points of singularity (σ2 > n/6 for a = 0 and σ2 > 0.25 for a = −1), the MSE is infinite
(not defined). As σ2 is estimated, the problem with evaluating θ(b†−1,ms) arises whenever 0.25 is a
plausible (unexceptional) value of the estimator σ2.
If we managed to obtain b∗a,ms , we would attain with θ(b∗a,ms) the so called ideal MSE
m(
b∗a,ms; a)
= exp(2µ)
{
exp
(
2σ2
n
)
(2Da − 1)k
2 − 2 exp
(
σ2
2n+ aσ2
) (
2Da − 1
Da
)k
2
+ exp(
2aσ2)
}
.
(9)
It is the lower bound for the MSE in the class of estimators θ(b) in which b is a constant. We regard
m(b∗a,ms ; a) as a reference against which we compare the MSEs of other (realisable) estimators of θ(a).
Minimax estimation
In the range in which it is well defined, the variance var{θ(b)} is an increasing function of σ2. To see
this, we differentiate the expression in (5).
∂var{θ(b)}∂σ2
=1
nexp(2µ)
{
2 exp
(
2σ2
n
) (
k
k − 4bσ2
)k
2
− exp
(
σ2
n
) (
k
k − 2bσ2
)k
2
}
+ σ2 exp(2µ)
{
2 exp
(
2σ2
n
) (
k
k − 4bσ2
)k
2+1
− exp
(
σ2
n
) (
k
k − 2bσ2
)k
2+1
}
.
5
The expression in the braces in the first line is positive, because by dropping the leading factor 2 we
obtain the variance var(µ + bσ2) when µ = 0; see (2). The expression in the braces in the second
line can also be related to this variance. Since k/(k − 4bσ2) > k/(k − 2bσ2) > 1, it is also positive.
Therefore var{θ(b)} is an increasing function of σ2 throughout the interval 0 < σ2 < k/(4b), where it
is well defined.
One can expect that the MSE m(b, a) is an increasing function of σ2 for any reasonable choice
of b. If we cannot find an estimator that is (uniformly) efficient for all σ2 > 0, we might pay more
attention to efficiency for greater values of σ2, for which more is at stake. This motivates the following
approach to estimating θ(a). Suppose we are certain (or very confident) that σ2 does not exceed a
specified value σ2mx . Then we use the coefficient ba,mx for which the estimator θ(ba,mx) is efficient
when σ2 = σ2mx ,
ba,mx =k
2σ2mx
Da,mx − 1
2Da,mx − 1, (10)
as in (7), with implicitly defined Da,mx , and apply the estimator θ(ba,mx). Rigour would be enhanced
by writing ba,mx = ba,mx(σ2mx), but that would make the notation cumbersome. We rule out the
setting with 0 < a < 3/(2n), so that Da is an increasing function of σ2 and Da > 1 when a > 3/(2n).
When a ≤ 0, Da decreases with σ2 and the condition that Da,mx > 1
2imposes an upper bound on the
values of σ2mx that can be declared.
One might reasonably expect that the estimator θ(ba,mx) based on a value σ2mx = σ2
1 is more
efficient for all σ2 ∈ (0, σ21 ] than θ(ba,mx) based on a value σ2
2 > σ21 . That is, for a sharper (smaller)
upper bound σ2mx we should be rewarded by uniformly more efficient estimation, so long as this bound
is justified. This conjecture is proved by differentiating the function var{θ(ba,mx)} with respect to
σ2mx .
The MSE of the estimator θ(ba,mx) based on a given value of σ2mx is obtained directly from (6):
m(ba,mx ; a) = exp(2µ)
[
exp
(
2σ2
n
) {
1 − 2 (Da,mx − 1)
2Da,mx − 1
σ2
σ2mx
}− k
2
− 2 exp
(
σ2
2n+ aσ2
) (
1 − Da,mx − 1
2Da,mx − 1
σ2
σ2mx
)− k
2
+ exp(
2aσ2)
]
. (11)
Let Ca = log(Da,mx)/σ2mx , so that ∂Da,mx/∂σ2
mx = CaDa,mx . Then
∂m(ba,mx ; a)
∂σ2mx
= exp
(
2µ +2σ2
n
)
kσ2
σ4mx
{
(2Da,mx − 1) (Da,mx − 1) − CaDa,mx σ2mx
}
(2Da,mx − 1)2
×[
exp
{
σ2
(
a − 3
2n
)}{
(2Da,mx − 1)σ2mx
(2Da,mx − 1)σ2mx − (Da,mx − 1)σ2
}k
2+1
−{
(2Da,mx − 1)σ2mx
(2Da,mx − 1)σ2mx − 2 (Da,mx − 1)σ2
}k
2+1
]
. (12)
6
The long fraction in the first row is positive because the inequality Da,mx − 1 > Caσ2mx > 0, obtained
from the Taylor expansion of Da,mx around σ2mx = 0, implies that
(2Da,mx − 1) (Da,mx − 1) − CaDa,mx σ2mx > (Da,mx − 1)2 . (13)
The exponential in the second row is equal to Dk/2+1a , and so the sign of the expression in (12) is the
same as the sign of
Da
{
(2Da,mx − 1)σ2mx − 2 (Da,mx − 1)σ2
}
− (2Da,mx − 1)σ2mx − (Da,mx − 1)σ2
= (Da − 1) (2Da,mx − 1)σ2mx − (2Da − 1) (Da,mx − 1)σ2
={
F (σ2) − F (σ2mx)
}
(2Da − 1) (2Da,mx − 1)σ2
σ2mx
,
where F (σ2) = σ−2(Da − 1)/(2Da − 1). The function F is decreasing, since
∂F
∂σ2=
DaCa
(2Da − 1)2
1
σ2− Da − 1
2Da − 1
1
σ4
= − (Da − 1) (2Da − 1) − CaDaσ2
(2Da − 1)2 σ4< 0 ,
using the same argument as in (13). Therefore F (σ2) < F (σ2mx) and (12) is positive, whenever
σ2 < σ2mx . This concludes the proof that a smaller upper bound σ2
mx results in a uniformly more
efficient estimator θ(ba,mx ), so long as the bound is justified, that is, σ2 < σ2mx .
The simulations described in the following sections are conducted for σ2 ∈ (0, 10). For orientation,
a typical draw from LN (µ, 10) is about exp(√
10).= 24 times greater or smaller than the median
exp(µ). The expectations and biases of all estimators have the multiplicative factor exp(µ), and the
variances and MSEs the factor exp(2µ). Therefore we can reduce our attention to the targets θ(a)
with µ = 0, so that σ2 is the sole parameter of interest. Nevertheless, µ is estimated throughout. We
study the relative biases and relative root-MSEs, defined as
Brel
(
θ)
=E
(
θ)
θ(a)− 1
rMSErel
(
θ)
=
√
√
√
√
MSE(
θ)
m(
b∗a,ms , a) .
They reduce the strong association of the biases and MSEs with σ2.
Estimating the mean
Table 1 lists the estimators of the expectation θ(1
2) = exp(µ + 1
2σ2). The relative biases and MSEs
of these estimators with sample size n = 50 are plotted as functions of σ2 in Figure 1. They are
7
Table 1: Estimators of the expectation of the lognormal distribution θ(1
2).
Criterion Name ba (or ba) Note (a = 1
2)
Naive (CLT) X Very inefficient for moderate σ2
Naive Naive1 0 Very inefficient for small and moderate σ2
Naive Naive2 a Maximum likelihood; inefficient
No bias UnbiasedN b∗a,ub; see (4) Small bias only for small σ2
No bias UnbiasedA b†a,ub= a − 1
2n Biased1
Minimum MSE MinMSEn b∗a,ms ; see (7) Efficient
Minimum MSE MinMSEa b†a,ms ; see (8) Very inefficient for moderate σ2 1
Finney Finney Unbiased, but inefficient
Minimax MinMax ba,mx ; see (10) Used with σ2mx = 2, 4 and 8
Minimum MSE Ideal b∗a,ms ; see (7) Used with the value of σ2. The reference.
Note: 1 Not represented in Figures 1 and 2.
based on simulations with 10 000 replications, and some of the plotted curves are mildly smoothed.
The estimators θ(b†0.5,ub
) and θ(b†0.5,ms) are omitted because for all the sample sizes we consider,
n = 10, 25, 50, 100 and 250 (and k = n − 1), they are uniformly less efficient than θ(b∗0.5,ub
) and
θ(b∗0.5,ms), respectively. The diagram shows that the estimator θ(b∗a,ms), intended to be efficient,
delivers on the promise, even though it is biased throughout. It is uniformly more efficient than even
the ideal estimator θ(b∗a,ms). This is counterintuitive — not knowing the value of σ2 is ‘rewarded’ by
reduced variance of θ. This is not a contradiction, however, because m(b∗a,ms , a) is the lower bound
for the MSEs of estimators only in the class θ(b), with a constant b; θ(b∗a,ms) does not belong to this
class. Finney’s estimator is unbiased, but it is less efficient than the ideal, especially for large σ2. The
estimators intended to be approximately unbiased, θ(b∗a,ub) and θ(b†a,ub), have substantial negative
biases for all σ2 and θ(b∗a,ub) approaches efficiency only for very large values of σ2. The pursuit of
unbiasedness results in a substantial MSE inflation.
Despite using a smooth (analytical) reference root-MSE, some of the empirical relative root-MSEs
are far from smooth even with 10 000 replications. The estimated root-MSEs have relatively large
sampling variances, especially for large σ2.
The biases of the minimax estimators, drawn by thick solid lines in the intervals of plausible values
of σ2, are large and positive for small σ2 and steeply decrease with σ2. At the upper bounds σ2mx , the
biases are negative. With σ2mx = 8, the minimax estimator is nearly as efficient as the ideal estimator
even for σ2 = 10, but for small values of σ2 is it perceptibly inefficient. In contrast, when σ2mx is set
to 2 or 4, the loss of efficiency with respect to the ideal is only slight throughout (0, σ2mx), but the loss
8
0 5 10
−1
01
2Sample size 50
Variance
Rel
ativ
e bi
as
Sample meanNaive1Naive2(ML)UnbiasedNMinMSEnFinneyMinimax
0 5 10
1.0
1.5
2.0
2.5
VarianceR
elat
ive
root
−M
SE
Figure 1: The relative biases and relative root-MSEs of estimators of the expectation of the lognormaldistribution, as functions of the variance σ2. Sample size n = 50, with k = 49 degrees of freedom forestimating σ2.
is substantial for large values of σ2. The minimax estimators have relatively small variances, and the
biases are substantial contributors to the MSE for most values of σ2. With increasing σ2, the relative
biases converge to −1, the lower bound for any nonnegative estimator.
Figure 2 summarises the biases and MSEs of the discussed estimators for a selection of other
sample sizes, using the same layout as Figure 1. Estimator θ(b∗a,ms) is slightly more efficient than
the ideal, but uniformly so, for all sample sizes. Finney’s estimator is uniformly less efficient than
the ideal, but its inefficiency decreases with sample size. The inefficiency of the ML estimator also
decreases with sample size, but at n = 250 it is still not competitive, except when σ2 is very small.
With increasing sample size, the root-MSE inflation when σ2mx is understated becomes less severe.
Efficiency is associated with substantial negative bias even for n = 250. We conclude that θ(b∗0.5,ms)
is uniformly most efficient for θ(1
2) among the estimators we explored for sample sizes in the range
10 – 250.
Estimating the median
The median corresponds to the setting a = 0. The two naive estimators coincide for a = 0, so we
use the label Naive for the common estimator. We also exclude estimator θ(b†a,ms) because for larger
values of σ2 (e.g., for σ2 > 5 when n = 50) it attains very large values with nontrivial probabilities.
9
0 5 10
02
Sample size 10
Variance
Rel
ativ
e bi
as
0 5 10
12
Variance
Rel
ativ
e ro
ot−
MS
E
0 5 10
02
Sample size 25
Variance
Rel
ativ
e bi
as
0 5 10
12
Variance
Rel
ativ
e ro
ot−
MS
E
0 5 10
02
Sample size 100
Variance
Rel
ativ
e bi
as
0 5 10
12
Variance
Rel
ativ
e ro
ot−
MS
E
0 5 10
02
Sample size 250
Variance
Rel
ativ
e bi
as
0 5 10
12
Variance
Rel
ativ
e ro
ot−
MS
E
Figure 2: The relative biases and relative root-MSEs of estimators of the expectation of the lognormaldistribution, as functions of the variance σ2. See the legend in Figure 1 to identify the estimators.
10
0 5 10
−0.
2−
0.1
0.0
0.1
0.2
Sample size 50
Variance
Rel
ativ
e bi
as
0 5 10
1.0
1.5
2.0
VarianceR
elat
ive
root
−M
SE
SampleNaiveUnbiasedNUnbiasedAMinMSEnMinimax
Figure 3: The relative biases and relative root-MSEs of estimators of the median of the lognormaldistribution, as functions of the variance σ2. Sample size n = 50.
The results for sample size n = 50 are presented in Figure 3. The biases of the estimators are
much more moderate than for estimating the expectation θ(1
2). The sample median is uniformly less
efficient than any other estimator. The minimax estimator and the estimator θ(b∗0,ms) are about as
efficient as the ideal estimator θ(b∗0,ms), and they are uniformly more efficient than the sample and
naive estimators, as well as the estimators intented to be unbiased.
Some properties of θ(b∗0,ms) and θ(b0,mx) can be inferred from the expression
D0 = exp
{
− 3σ2
(k + 2)n
}
.
The dependence of D0 on σ2, or of θ(b0) on σ2, is very weak for all but very small k and n. Of course,
the sample median does not depend on σ2 at all, but the weak influence of σ2 on θ(b0) is sufficient to
make it an efficient estimator. For small bσ2 (or bσ2), exp(bσ2).= 1+bσ2 and var{exp(bσ2)} .
= 2b2σ4/k.
That is why the root-MSEs of the estimators θ(b) are approximately proportional to σ2.
Figure 4 summarises the results for the other sample sizes. The sample median is uniformly the
least efficient of the estimators we study, except for n = 10 and σ2 > 6, because estimator θ(b∗0,ms)
has a breakdown at around σ2 = 6, indicated by the vertical dashes. With increasing sample size,
the relative biases of all the estimators converge to zero. The minimax estimators are very forgiving;
their inefficiency when σ2 > σ2mx is appreciable only for n = 10. We conclude by recommending the
estimator θ(b∗0,ms) when n > 20, although it is indistinguishable from θ(b0,mx) based on a liberally
11
05
10
−0.2 −0.1 0.0 0.1 0.2
Sam
ple size 10
Variance
Relative bias
05
10
1.0 1.5 2.0
Variance
Relative root−MSE
05
10
−0.2 −0.1 0.0 0.1 0.2
Sam
ple size 25
Variance
Relative bias
05
10
1.0 1.5 2.0
Variance
Relative root−MSE
05
10
−0.2 −0.1 0.0 0.1 0.2
Sam
ple size 100
Variance
Relative bias
05
10
1.0 1.5 2.0
Variance
Relative root−MSE
05
10
−0.2 −0.1 0.0 0.1 0.2
Sam
ple size 250
Variance
Relative bias
05
10
1.0 1.5 2.0
Variance
Relative root−MSE
Fig
ure
4:
The
relativ
ebia
sesand
relativ
ero
ot-M
SE
sof
estimato
rsof
the
med
ian
of
the
lognorm
al
distrib
utio
n,as
functio
ns
ofth
eva
riance
σ2.
12
0 5 10
−1.
0−
0.5
0.0
0.5
1.0
Sample size 50
Variance
Rel
ativ
e bi
as
UnbiasedNMinMSEnMinimax
0 5 10
24
68
VarianceR
elat
ive
root
−M
SE
Figure 5: The relative biases and relative root-MSEs of estimators of the mode of the lognormaldistribution, as functions of the variance σ2. Sample size n = 50.
chosen value of σ2mx . For smaller sample sizes, θ(b0,mx) does not have the drawback of a sudden MSE
inflation.
Estimating the mode
The mode of a continuous distribution does not have a natural sample or naive estimator. We could
select a ‘window’ width w and define the estimator of the mode as the center of the interval of width
w (or the mean of such centres), which contains the largest number of observations. However, such
an estimator is bound to be very inefficient.
The relative biases and root-MSEs of the estimators intended to be unbiased and efficient for θ(−1)
with sample size n = 50 are plotted in Figure 5. Both estimators are severely biased, except when σ2 is
very small. The biases and MSEs of θ(b†−1,ms) and θ(b†−1,ub) are extremely large; they are off the scale
in both panels of the diagram for most values of σ2, and are therefore not plotted at all. Estimator
θ(b∗−1,ms) is inefficient only slightly for σ2 < 5 but breaks down at about σ2 = 7.9; the graphs of
its relative bias and root-MSE are discontinued in the diagram at that value. With a well informed
setting of σ2mx , the minimax estimator is quite efficient. For example, θ(b−1,mx) with σ2
mx = 8 is only
slightly less efficient than the ideal even for σ2 = 10, and is more efficient than θ(b∗−1,ms) for σ2 > 4.1.
It is relatively very inefficient only when σ2 is very small. The estimators θ(b−1,mx) with σ2mx = 2 and
4 are quite efficient for very small values of σ2, but for large values of σ2 they are very inefficient.
13
Some insight into the breakdown of the estimators θ(b∗−1,ms) and θ(b†−1,ms) can be gained directly
from the expressions for the respective coefficients b. The singularity of the coefficient b−1 caused by
the value D−1 = 1
2corresponds to σ2 = log(2)(k + 2)n/(2n + 3), that is, for n = 50 and k = 49,
to σ2† = 17.16. In b−1,ms , we substitute σ2 for σ2 in D−1 , so very large or very small values of b1
are obtained whenever values around 17 are plausible for σ2. For n = 50 this occurs for σ2 > 7.9.
With the approximation to D−1 by the Taylor expansion, the denominator of b†−1,ms vanishes when
σ2 = (k +2)n/{k(2n+3)}, which is close to 0.25 for all but very small k and n. Hence the breakdown
of θ(b†a,ms) for σ2 around 0.25; it occurs also for other sample sizes.
The relative biases and root-MSEs are plotted in Figure 6 for sample sizes n = 10, 25, 100 and 250.
Some of the curves are discontinued at breakdowns, where they suddenly diverge. For n = 10, D−1 <
0.5 for σ2 > 3.3, and the variance of the reference estimator θ(b−1,ms) is not defined. Breakdown
occurs even for the minimax estimators for σ2mx = 4 and 8. For sample sizes n = 100 and 250, only
the estimator θ(b∗−1,ub) performs well throughout the range of values of σ2. The minimax estimators
are useful with small samples only when we can narrow down the range of plausible values of σ2 and
σ2 is not very large. The penalty for understating σ2mx is quite harsh, except when n ≥ 250.
We conclude by suggesting that estimation of the mode be based on at least n = 50 observations.
Estimator θ(b∗−1,ms) is nearly efficient for small values of σ2, and is a relatively safe choice otherwise.
It breaks down for large values of σ2, but these breakdown values increase with σ2. If the values of
σ2 can be narrowed down, say, to an interval of length 2.0 or shorter, then the minimax estimator is
quite efficient.
MSE estimation
In this section we consider estimation of the MSE of θ(ba), with a focus on a = 1
2and b0.5 = b∗0.5,ms ,
for which the estimator is more efficient than the ideal. We estimate MSE{θ(b∗0.5,ms ); θ(1
2)} by the
MSE of the ideal evaluated at σ2 = σ2, that is,
m
(
b∗0.5,ms ;1
2
)
= MSE
{
θ(
b∗0.5,ms
)
; θ
(
1
2
)∣
∣
∣
∣
σ2 = σ2
}
.
We have to consider estimation of MSE and root-MSE separately because these targets are related
nonlinearly. For example, if an estimator m is unbiased for m, then√
m need not be unbiased for√
m.
The relative biases, defined as ¯m/m − 1 and√
¯m/√
m − 1, where the bar ¯ indicates averaging over
the replications, are plotted for n = 50, 100 and 250 in Figure 7. Both m and√
m overestimate their
respective targets, the MSE and root-MSE, except for very small values of σ2. On the multiplicative
scale, the extent of overestimation is smaller for root-MSE than for MSE, and is smaller for larger
sample sizes. For n = 10 and n = 25, the estimators are useful only for very small σ2; for n > 250 the
bias of the root-MSE estimator is very small even for σ2 = 10.
14
05
10
−1.0 −0.5 0.0 0.5 1.0
Sam
ple size 10
Variance
Relative bias
05
10
1.0 1.5 2.0 2.5 3.0
Variance
Relative root−MSE
05
10
−1.0 −0.5 0.0 0.5 1.0
Sam
ple size 25
Variance
Relative bias
05
10
1.0 1.5 2.0 2.5 3.0
Variance
Relative root−MSE
05
10
−1.0 −0.5 0.0 0.5 1.0
Sam
ple size 100
Variance
Relative bias
05
10
1.0 1.5 2.0 2.5 3.0
Variance
Relative root−MSE
05
10
−1.0 −0.5 0.0 0.5 1.0
Sam
ple size 250
Variance
Relative bias
05
10
1.0 1.5 2.0 2.5 3.0
Variance
Relative root−MSE
Fig
ure
6:
The
relativ
ebia
sesand
relativ
ero
ot-M
SE
sof
estimato
rsof
the
mode
of
the
lognorm
al
distrib
utio
n,as
functio
ns
ofth
eva
riance
σ2.
15
0 5 10
1.0
1.1
1.2
Variance
Rel
ativ
e M
SE
n=50 n=100
n=250
0 5 10
1.0
1.1
1.2
Variance
Rel
ativ
e ro
ot−
MS
E
n=50
n=100
n=250
Figure 7: The biases of the estimators of the MSE and root-MSE of the estimator θ(b∗0.5,ms) of theexpectation of the lognormal distribution.
The substantial bias of these estimators for small sample sizes should be judged in the context
of large variance of the data as well as of the distribution of the MSE and rMSE estimators. For
example, the relative biases of the MSE and root-MSE for σ2 = 5 and n = 10 are 3.45 and 0.32,
respectively, but these figures are associated with standard deviations (over the replications) of 35.6
and 1.11, respectively. We smoothed the empirical values of the biases only slightly, to indicate the
uncertainty about them that is present in 10 000 replications. The same set of random numbers was
used for each sample size n.
Conclusion
Estimating the expectation, median and mode of the lognormal distribution are examples of failure
of the maximum likelihood and of inapplicability of the asymptotic theory for sample sizes that for
many other commonly encountered distributions would be sufficiently large. We derived estimators
that are much more efficient than their naive alternatives, and explored how information about σ2
can be incorporated in (minimax) estimation. Although biased, our estimator of the expectation is
much more efficient than Finney’s estimator, especially for large variances σ2. Finney’s estimator is
minimum-variance unbiased, a clearly formulated optimality property, whereas our estimator has no
(universal) optimality properties. Our results indicate that unbiasedness and efficiency (small MSE)
are conflicting inferential goals when estimating the location of a lognormal distribution. Insisting
on unbiasedness when pursuing efficiency is an unaffordable luxury. Instead of MSE, the criterion to
16
minimise var(θ) + ρ{E(θ) − θ}2 could be adopted for a specified constant ρ, but a rationale for any
particular value of ρ > 1 is difficult to formulate.
The counterintuitive result that θ(b∗0.5,ms) is (slightly) more efficient than θ(b∗0.5,ms) for estimating
the expectation exp(µ + 1
2σ2) can be exploited for estimating the MSE of θ(b∗0.5,ms) by substituting
σ2 in the expression (9) for MSE{θ(b∗0.5,ms); θ(1
2)}. The resulting MSE (and root-MSE) estimator has
a positive bias which for a given σ2 declines with sample size, and for a given sample size increases
with σ2.
Our estimators rely on the functional form of the target, exp(µ+aσ2), and so their robustness might
be questioned. Robustness can be assessed by simulations, for instance, using the exponential of a
distribution that differs slightly from the normal. The t-distributions with few degrees of freedom (and
some noncentrality) are not suitable for this purpose because their exponentials (log-t distributions)
do not have expectations for any number of degrees of freedom. In any sensitivity study, the naive
estimators start with a considerable handicap which is unlikely to be overcome for moderate departures
from lognormality.
Acknowledgements
Research for this manuscript was supported by the Grant SEJ2006–13537 from the Spanish Ministry
of Science and Technology. Suggestions made by Omiros Papaspiliopoulos are acknowledged.
References
Aitchison, J., and Brown, I. A. C. (1957). The Lognormal Distribution. Cambridge University Press,
Cambridge, UK.
Crow, E. L., and Shimizu, K. (Eds.) (1998). Lognormal Distributions. Theory and Applications. M.
Dekker, New York.
Finney, D. J. (1941). On the distribution of a variate whose logarithm is normally distributed. Journal
of the Royal Statistical Society, Supplement 7, 155–161.
Limpert, E., Stahel, W. A., and Abbt, M. (2001). Log-normal distributions across the sciences: keys
and clues. Biosciences 51, 341–352.
Longford, N. T., and Pittau, M. G. (2006). Stability of household income in European countries in
the 1990’s. Computational Statistics and Data Analysis 51, 1364–1383.
Royston, P. (2001). The lognormal distribution as a model for survival time in cancer, with an
emphasis on prognostic factors. Statistica Neerlandica 55, 89–104.
17
Thoni, H. (1969). A table for estimating the mean of a lognormal distribution. Journal of the
American Statistical Association 64, 632–636.
Toma, A. (2003). Robust estimators of the parameters of multivariate lognormal distribution. Com-
munications in Statistics. Theory and Methods 32, 1405–1417.
Zabel, J. (1999). Controlling for quality in house price indices. Journal of Real Estate Finance and
Economics 13, 223–241.
Zhou, X.-H., Gao, S., and Hui, S. L. (1997). Methods for comparing the means of two independent
log-normal samples. Biometrics 53, 1129–1135.
18