Kernel Density Estimation and Metropolis-Hastings Sampling in Process Capability Analysis of Unknown...

Wenzhen Huang1, Ankit Pahwa

University of Massachusetts Dartmouth Mechanical Engineering Department

North Dartmouth, MA, USA 1Contact author

Zhenyu Kong School of Industrial Engineering and Management

Oklahoma State University Stillwater, OK, USA

ABSTRACT Strong normality assumption is associated with widely used process capability indices such as cp, cpk. Violation of the assumption will mislead the interpretation in applications. A nonparametric method is proposed for density estimation of any unknown distribution. Kernels are used for density estimation and metropolis-hastings (M-H) algorithm is adopted to generate samples from the density. M-H sampling provides a tool to accommodate different kernel functions and flexibility of future extension to multivariate cases. Conformity (yield) based indices (yp, y) are adopted to replace cp, cpk. These indices can be conveniently assessed by the proposed kernel density based M-H algorithm (K-M-H). The method is validated by several simulation case studies.

INTRODUCTION Process capability indices such as Cp, Cpk are popularly used in industry for manufacturing process evaluation and quality control. Cp, Cpk essentially use specification limits and estimated process mean and standard deviation to indicate potential and actual process capability of conforming to specifications. The initial introduction and interpretation of these indices were strongly tied to normality assumption. More and more modified indices were introduced to accommodate non-normality, off-center, better performance, and different interpretation requirements (e.g., quality loss). The bewildering diversified versions of indices made the interpretation and comprehension per se an important issue and attracted intensions of academic and industrial community. Early comprehensive interpretations of these indices were given by Kane (1986), Sullivan et al. (1984), and Kotz et al. (1993). More recent review and discussion include Tsui (1997), Palmer

and Tsui (1999). Kotz and Johnson (2002) gave an exhaustive summary and review in process capability indices definition, interpretation, and new development. Since non-normal population in manufacturing quality control and geometric tolerance (GD&T) is common as stated by Bisgaard et al. (1997), Kotz et al. (2002), and Kong et al. (2009), improved methods and indices for non-normal cases have attracted more interests. To target non-normality and to align with 6σ dispersion of normal cases, Clements (1989) introduced an alternative of a percentile range, i.e., replacing 6σ with the range between upper and lower 0.135 percentage points of a non-normal quality characteristic in the index definition. This led to similar capability indices. Along the similar line, variety of parametric models were developed for percentile range assessment, such as Pearson system (Clements, 1989), Johnson system, Weibull, lognormal, generalized lambda (Pal, 2005), t, gamma etc. (Kotz et al. 2002). These efforts broadened generality and applicability of initial indices. They also complicate assessment, comparison, and interpretation of the indices. For instance with the same value of an index Cp (or Cpk) the answer may not be straight forward on whether the two processes have the same capability in terms of nonconformity. These parametric techniques assume a functional form for the density of quality characteristic. More involved model (distribution family) assumption, parameter estimation, and model adequacy checking are required. With the bewildering diversified indices the confusion, incoherence, and misleading are inevitable in application. It is well known that Cp, Cpk generally do not provide us with the proportion of nonconforming products. Only for normal distribution this proportion can be easily derived from Cp, Cpk.

KERNEL DENSITY ESTIMATION AND METROPOLIS-HASTINGS SAMPLING IN PROCESS CAPABILITY ANALYSIS OF UNKNOWN DISTRIBUTIONS

1 Copyright © 2012 by ASME

Proceedings of the ASME 2012 International Manufacturing Science and Engineering Conference MSEC2012

June 4-8, 2012, Notre Dame, Indiana, USA

MSEC2012-7299

Downloaded From: http://proceedings.asmedigitalcollection.asme.org/ on 05/14/2014 Terms of Use: http://asme.org/terms

https://www.researchgate.net/publication/233224639_Evaluation_of_Nonnormal_Process_Capability_Indices_using_Generalized_Lambda_Distribution?el=1_x_8&enrichId=rgreq-ab5e0cdd-7411-4ace-a955-036401e4e5ee&enrichSource=Y292ZXJQYWdlOzI2MjI2MzMyMDtBUzo5NjczMDQwMjAwMDg5NkAxNDAwMDczNzM0MjU5

Several authors preferred to going back to basic concept of what a process capability really means, i.e., replacing Cp, Cpk with more transparent and coherent indices such as nonconformity or average fallout rate (the probability of fallout p ). Carr (1991) was probably the first who proposed the

estimated fallout rate p as an index. Yeh et al. (1998, 2001)

proposed the ratio of expected (desired) fallout rate to the observed one as an index for both univariate and multivariate processes. Tsui (1997) also preferred the yield or conforming rate (1- p ) for capability evaluation, and he proposed

alternatives i.e. quality yield indices to accommodate conformity and quality loss evaluation. Flaig (1999, 2002) also suggested the use of conforming rate (1- p ) as an index.

Parametric models were suggested for nonconformity assessment by almost all the authors who preferred nonconformity indices. If the population is believed to follow certain distribution family of distributions, the related parameters are estimated based on the sample data, and then the nonconforming probability p is calculated with the estimated

distribution model. However, none of the distribution family available can accommodate all possible behaviors in actual data densities. Density estimate therefore becomes a prerequisite. Nonparametric approach was also proposed to estimate density for process capability analysis, and Gaussian kernel was adopted for density estimation (Polansky, 1998, 2000). Nonparametric kernel density estimate (KDE) is purely data driven, requiring no model training process and prior model knowledge/assumption on population density, i.e. “letting the

data speak for themselves”. The extension to multivariate KDE is straightforward. Thus it can be a promising approach to accommodate any distribution for capability analysis. Two factors are crucial for KDE, i.e., bandwidth and kernel functions. Gaussian kernel is one of the commonly used kernels which can simplify conformity assessment (Polansky, 1998). Exploring other kernel alternatives may increase flexibility and improve performance of density estimate of truncated distributions. With KDE the nonconforming probability p can

be assessed. One way to do this, instead of randomly sampling an actual process, is to generate random samples from the estimated density, an idea of Bootstrap. p can thus be estimated

by the ratio of fallout to the total sample size. In this paper, we adopt yield based process capability indices. Several kernel functions are applied to increase the model flexibility, and Markov chain Monte Carlo (MCMC) sampling algorithm (Metropolis-Hastings sampling) is introduced to draw random samples for assessing the nonconforming probability and yield. The integration of KDE and MCMC may provide a novel tool for nonconformity based process capability evaluation. The paper is organized as follows. The yield based indices are introduced and the proposed assessing strategy is presented in the nest section. Section 3 deals with kernel density estimation method, including kernel functions and bandwidth

selection. Section 4 presents Metropolis-Hastings sampling and step by step procedure for implementation. Simulation case study is presented in Section 5, including four different types of distributions, and comparison is made among benchmark, proposed method, and the commonly used approach. Section 6 gives a summary as conclusions. CONFORMITY INDICES Widely used process capability indices Cp, Cpk usually require normality assumption. These indices indicate the ratios of specification range and the process dispersion. 6σ is used to represent process dispersion (equivalent to a 99.73% conformity range). Alternatives have also been proposed when normality is violated, using the range of distribution quantiles (0.00135 and 0.99865) as the measure of process dispersion. These indices can be interpreted by the corresponding conformity, indicated by defective part per million or ppm. However, the conformity interpretation of Cp, Cpk will be significantly different and misleading when normality assumption is moderately even slightly violated. In this paper we assume the process under study is in-control which can be assessed by control chart techniques with subgroup samples. We further assume the process quality characteristic (individual sample) can be characterized by any known or unknown statistical model. We propose directly using yield or conformity to characterize process capability below. It gives a coherent and unified index for any distribution. This prevents confusion and misunderstanding caused by vagueness in existing indices (Cp, Cpk). The yield is defined as the probability of quality characteristic(s) falling in specification ranges or regions (for multivariate processes). Thus,

Yield =Prob[LSL ≤ x ≤ USL] (for a univariate process) (1) or Yield =Prob[X∈Ωs] (for a multivariate process) (2) Where x, X are the quality characteristic and the vector quality characteristics, respectively. ΩS denotes a specification region. Ωs usually represents a hypercube but can be a complex irregular region. Examples of such irregular region can be found in many applications such as semiconductor and automotive manufacturing (Huang et al., 2008). An important example is to evaluate the process capability of a multivariate process in manufacturing of geometrical features with interrelated GD&T tolerances in which position/orientation, and form tolerances are embedded, therefore not independent, creating an irregular specification space (Huang et al, 2010). For instance, a composite tolerance on two dimensional variables creates an irregular Ωs. USL and LSL are upper and lower specification limits. In a univariate case, the conformity is expressed as:

dxxfY

USL

LSL

pk ∫= )( , and

)( dxxfMaxYMaxY

USL

LSL

pkp ∫==µµ

(3)



https://www.researchgate.net/publication/263661151_Process_Capability_Surrogate_Model-Based_Tolerance_Synthesis_for_Multi-Station_Manufacturing_Systems_MMS?el=1_x_8&enrichId=rgreq-ab5e0cdd-7411-4ace-a955-036401e4e5ee&enrichSource=Y292ZXJQYWdlOzI2MjI2MzMyMDtBUzo5NjczMDQwMjAwMDg5NkAxNDAwMDczNzM0MjU5

Where Ypk, Yp can be interpreted as actual and potential conformity (Yield), respectively. f(x) is the density function of a quality characteristic x. µ is the mean of x. In general Ypk < Yp. And Ypk = Yp only if the process is correctly positioned. In the symmetric distribution case this means well centered. ppm can thus be conveniently expressed by:

ppmactual =(1- Ypk)106 and ppmpotential =(1-Yp)106 In multivariate cases, a general expression of conformity is

X)X( dfY

S

pk ∫=

Ω

and

X)X(µµ

dfMaxYMaxY

S

pkp ∫==

Ω

(4)

Where µµµµ denotes the mean vector of X. Ypk, Yp clearly define the capability of process for producing quality products. The interpretation is coherent and independent of model assumption. However, the applicability of Eq.(1) relies on how easy and accurate Ypk, Yp can be obtained from process data. To this end, two issues must be resolved: i) density f(x) estimation; ii) Ypk, Yp calculation. We propose kernel density estimation and Metropolis-Hastings sampling in the next two sections to attack these issues. The strategy is presented below:

FIGURE 1- PROCEDURE OF PROCESS CAPABILITY EVALUATION USING KERNEL DENSITY ESTIMATION

AND SAMPLING

KERNEL DENSITY ESTIMATION In process capability analysis the accuracy of results of Cp,

Cpk are essential for interpretation and decision making. Model (normality) assumption is one of the most critical factors, detrimentally affecting the quality of the results if the distribution feature is incorrectly assumed. When it is moderately even slightly violated but still accepted the

conformity of a process represented by Cp, Cpk can be very misleading, as was shown in (Polanski 1998) and Section 4. These non-normality cases are common in practice. For example, as presented by Bisgaard et al. (1997) the process bias in a machining process if a tolerance is subject to maximum

material condition (MMC) the distribution of a radius of a hole tends to be skewed to its lower tolerance boundary side and a radius of a shaft tends to be skewed to its upper tolerance boundary side because one cannot add materials to a part in machining processes. Thus incurable mistakes of an out-of-spec larger hole or a smaller shaft can be avoided.

Parametric method is appealing because of simplicity. However, it lacks the flexibility to accommodate these non-normal distributions. Nonparametric density estimation is a well established method. The most appealing advantage of it is the flexibility of accommodating to any unknown distributions, i.e., “let the data speak for themselves” or much less dependent of modeling experiences and assumptions.

We assume a process is in a stable status but the quality characteristic x is not necessarily normally distributed. The quality characteristic is x and its distribution is denoted by f(x). If we collected n samples from the process as a training data set, denoted by xi, i=1,…n, the density function can be expressed as the smooth Parzen estimate:

)5(),(1

)(ˆ

11∑∑

==

−=

−=

n

i

i

n

i

i

h

xxKhn

h

xxK

nhxf α

Where K(z) is a kernel. Any smooth function, satisfying

( ) ,0≥zK ( ) ( ) 1=∫ zKdzzK ,

( ) ( ) ( ) ,0,1 == ∫∫ dzzzKzKdzzK ( ) 02 ≠∫ dzzKz and < ∞, can be used

as a kernel. h represents the window width of the kernel function, controlling the smoothness of )(ˆ xf . The kernel K is a

local weighting function, giving more weight to the observed point x that is closer to the training point xi and the weight decreases as its distance from xi increases. ),( hnα is a function

of n and h to ensure 1)(ˆ =∫

∞

∞−

dxxf (density function). For

simplification ),( hnα can be ignored in sample generation as

shown in the next section. Eq(1) is essentially a regression or function estimation

technique, requiring very little model training. Another appealing feature is that the important distribution patterns (skewness, multimodal, thick tails etc.) can be reserved.

There are several widely used kernels in literature. The choice of kernels is less important than the choice of h in terms of the behavior of )(ˆ xf (Marron et al. 1988). To accommodate

different distributions, in this paper, we propose using Gaussian, tri-cube, and Epanechnikov quadratic kernels. The Gaussian kernel has infinite support, whereas the others have finite supports which are desirable for bounded (truncated) distributions. These kernels are: Gaussian:

Data collection xi, i=1, 2, …n

Kernel density estimate )(ˆ xf →f(x)

M-H sampling x(t) ~ )(ˆ xf , t=N+1, …

Ypk, Yp estimation (Eqs. (1), (3) and Sect. 4 Eqs. (18), (19))

Is the process in-control?

Process correction

N

Y



))(2

1exp(

2

1 2

h

xx

hh

xxK ii −

−=

−

π

(6)

Tri-cube:

≤

−−−

=

−

otherwise

h

xxif

h

xx

h

xxK

iii

0

1)1( 3

3

(7)

Epanechnikov quadratic:

≤

−−−

=

−

otherwise

h

xxif

h

xx

h

xxK

iii

0

1)1(4

32

(8)

Bandwidth parameter h is a more important factor in density estimation, controling the smoothness of )(ˆ xf . When h

is too large important features of the underlying distribution (e.g. multimodal) are smoothed away. When h is too small the

)(ˆ xf tends to be too wiggly or overfitting, representing the

sample randomness rather than true patterns. Various techniques have been developed for selection of optimal bandwidth. By optimum it means to achieve a tradeoff between biased )(ˆ xf

and variance of )(ˆ xf , and achieving a minimum estimation

error. A common measure of the estimation error is the mean integrated squared error (MISE).

dxxfxfEhMISE h

2))()(ˆ()( ∫ −= (9)

A review on the optimal bandwidth selection was given in (Jones, Marron, and Sheather, 1996). These techniques are especially beneficial in automatic bandwidth determination for multiple estimates and dimensionality reduction where manual selection is impractical. In this paper, we propose to use the “Quick and Dirty”, i.e. the Rule of thumb method for bandwidth selection (Turlach, 1993, Jones et al., 1996). This can be interactively used with visual choice of bandwidth. In process capability analysis one usually does not many processes to be estimated simultaneously because in manufacturing the process data are costly; the careful planning and experimental setup are also time consuming. Another reason is the simplicity for implementation, thus, it is more appealing and desirable for practitioners. If we take K as the Gaussian kernel, the Rule of thumb optimal h can be derived based on the asymptotic mean integrated squared error (AMISE) as (Silverman, 1986):

4/)()()()()( )2(22

41 fRKhKRnhhAMISE µ+= − (10)

Minimizing AMISE with respect to h leads to the following optimal bandwidth:

5/1

5/1

)2(22 )()(

)( −

= n

fRK

KRho

µ

(11)

Where ∫= dxxKKR )()( 2 . ( )222

2 )()( ∫= dxxKxKµ is independent of

bandwidth h. And f (2) denotes 2nd derivative of f. By using the standard normal distribution as a reference distribution to replace the unknown f in Eq.(9), the Rule of thumb yields the estimate

5/1ˆ06.1 −= nho σ (12)

Where 2σ is the sample variance of xi. For other non-

Gaussian kernels, the equivalent h can be obtained by rescaling (Marron et al. 1988). Suppose we have estimated an unknown density f using Gausisian kernel KA and bandwidth ho, using rescaling to estimate f with a different kernel, say KB. Then the appropriate bandwidth hB to use with KB is

A

o

B

nh

δ

σ 5/1ˆ06.1 −

= (13)

where the scale factor A

oδ (the so-called canonical bandwidths)

can be found in literature, i.e.,

7764.04

110/1

≅

=

πδ A

o (14)

ho in Eq.(10) is sensitive to outliers in xi which may cause a too large estimate of σ , and hence a too large ho,

resulting in an oversmoothed f. A more robust alternative to estimate σ is to use the estimated quantile

range nn xxR 25.075.0ˆ −= . nx 25.0 , nx 75.0 denote 25% and 75%

quantile points of xi. It can be shown that the robust estimate of σ is

34.1

ˆˆ

R=σ

It yields a Better Rule of Thumb (robust) estimate of ho,

5/134.1

ˆ,ˆ06.1 −= n

RMinho σ (15)

With the selected kernel and bandwidth ho the Yp, Ypk can

be estimated by plugging in )(ˆ xf in Eq.(3) with the proposed

numerical method as presented below. Statistical property of kernel estimate reveals the effects of

kernel functions and factor h to the quality of )(ˆ xf

approximating the true f(x). Bias of the estimation is expressed as

Bias( 0),()()(2

))(ˆ 22

)2(2

→+= hhoKxfh

xf µ (16)

The variance of the estimation is

Var( ∞→+= nhnh

oxfKRnh

xf ),1

()()(1

))(ˆ (17)

The optimal bandwidth ho is set to balance these two estimation errors. In conformity analysis we are more interested in the tail property bias. It directly affects the nonconformity estimation.

Thus, once ho is determined the contribution of )(2 Kµ is

another independent control factor which actually determine how the weight is allocated along the tails. Larger

)(2 Kµ implies more weight is put to the tails as shown below.

∫= dxxKxK )()( 22µ



Numerical Monte Carlo integration was conducted to

investigate )(2 Kµ effects which rely only on kernel K. The

results are summarized in Table 1 below.

Table 1 )(2 Kµ evaluation for three kernel functions

Kernels Equation # )(2 Kµ

Epanechnikov Eq.(8) 0.19841 quadratic Tri-cube Eq.(7) 0.16639 Gaussian Eq.(6) 0.98934

* 10000 Monte Carlo samples were used for )(2 Kµ integration

The results in Table 1 mean that Gaussian kernel put more weight on tails than other two kernels. For the same ho, Gaussian kernel may cause more bias error at tails. The similar observation can be found in the simulation case study in Sect. 5.

METROPOLIS-HASTINGS (M-H) SAMPLING AND CONFORMITY ESTIMATION Monte Carlo integration provides a generic and flexible method for evaluating Eq.(3). With Gaussian kernel density estimate (Polansky, 1996, 1999) gave close form expressions of process conformity. In this paper, an alternative method, i.e. Metropolis-Hastings (M-H) sampling, is proposed. The idea is to draw large number (M) of random samples which follow the

distribution )(ˆ xf , and then use them to estimate conformity

probability. In comparison, the M-H method is more flexible, easy to implement, and independent of kernel function and the types of specification region. It will be more desirable for multivariate applications where the design spaces (specification regions) are irregular, making the direct integration infeasible. Since the training data collected from a manufacturing process is usually costly and limited (e.g., N≤100), the computation cost of M-H for density evaluation at M target points is linear, i.e., O(M). The M-H algorithm involves the following steps: i)

generating random samples from )(ˆ xf ; ii) counting fall-out

(nonconforming) samples; iii) calculating yield or the ratio of the number of conforming samples to the total number of the samples. The one side fall-outs can also be easily counted which provides off center information and is useful in estimating the optimal position of the process (Ypk). In addition, sampling method can accommodate both regular and complex irregular specification regions in multivariate processes.

Since the feature of the estimated density )(ˆ xf can be

complex there is no straightforward approach to directly draw

samples from )(ˆ xf as in Monte Carlo simulation with well

known parametric models such as normal, uniform, Gamma etc. Metropolis-Hastings sampling algorithm is adopted to draw

samples )(ˆ xf .

The M-H algorithm requires a target density and a proposal

density. The former in evaluating Eq.(1) is defined as )(ˆ xf .

The later, a conditional density denoted as q(⋅|x), is selected to be easy to simulate and explicitly available. The commonly used proposals include uniform (U(a,b)) and normal. The uniform proposal is independent of x i.e. q(y|x)=q(y) and the

normal proposal q(y|x)= )2

)((exp

2

12

2

σσπ

xy −− is

symmetrical. The procedure of Metropolis-Hastings sampling is as follows (Robert and Casella, 2006): With an initial point x(0) Step 1: Repeat for t=1, 2, …, N with current x(t) Step 2 Draw a new sample Yt

from q(y |x(t)) Step 3 Draw a sample u~U(0,1) Step 4 Calculate

)|(

)|(,

)(ˆ)(ˆ

)()(

)()(

2)(

)(

1 tt

tt

t

t

xYq

Yxqa

xf

Yfa == (18)

a=a1*a2 Step 5 Update current sample, i.e., If a≥1, x

(t+1)= Y(t), i.e., updating current sample, i.e. accepting

the new sample; Otherwise if a>u, let x

(t+1)= Y(t), updating current sample, i.e.

accepting the new sample; If a<u, let x(t+1)=x

(t), rejecting the new sample. Actually the Steps 4~5 is to update a sample with the acceptance probability ρ(x,y), i.e. take:

−=+

),(1

),()()()(

)()()()1(

ttt

ttt

t

Yxyprobabilitwithx

YxyprobabilitwithYx

ρ

ρ (19)

Where

= 1,)|(

)|(

)(ˆ)(ˆ

min),(xyq

yxq

xf

yfyxρ . Uniform or normal

proposals are symmetrical i.e., q(x|y) = q(y|x), thus

1)|(/)|( ≡xyqyxq . 1),(ˆ/)(ˆmin),( xfyfyx =ρ .

The above M-H algorithm produces a series of samples x

(t). x(t) forms a Metropolis-Hastings Markov chain with the

stationary distribution )(ˆ xf . After a sufficient long initial burn-

in process, the x(t) t >NB can be taken as random samples

from )(ˆ xf .

As shown in Fig. 1 the Yp, Ypk are estimated by counting the nonconforming M-H samples x

(t):

burninaftersamplesoftotal

samplesgnonconforYpk

#

min#1−= (20)

We can count numbers of nonconforming samples that are beyond USL and below LSL separately. Denotes I>USL = # of

nonconforming samples that are beyond upper specification limit and I<LSL = # of nonconforming samples that are below the lower specification limit. We designed the following procedure to estimate the potential capability (yield) Yp: If I>USL>I<LSL Shift the process x

(t) up and update I>USL, I<LSL iteratively until I>USL ≈ I<LSL Or if I>USL<I<LSL



Shift the process x(t) down and update I>USL, I<LSL

iteratively until I>USL ≈ I<LSL. For convenience, instead of shifting the process up or down in calculation we shift the relative position of specification limits and record the amount required to be shifted. With the updated I>USL, I<LSL, we have

burninaftersamplesoftotal

IIY LSLUSL

p#

1 <> +−= (21)

This procedure can be conducted by retrieving the recorded M-H samples x

(t), no extra M-H simulation is needed. The resultant total amount of shift can be recorded and used to guide process adjustment similar to centering a process. SIMULATION CASE STUDY Two Beta distributions, a mixture of two normal distributions representing a bimodal case, and a standard normal distribution were selected to simulate typical distributions in applications. The Beta distributions are selected because they can be skewed and have finite support, representing truncated and skewed distributions. The mixture of normal distributions represents multimodal processes and has infinite support (infinite tails). The standard normal pdf can be taken as a extreme case of using Gaussian kernel density estimation, i.e., using one kernel function to fit all data points. For convenience we only compare the conformities of benchmark, kernel based M-H sampling approach (K-M-H), and the parametric model method i.e., Cp, Cpk analysis. The Cp,

Cpk are translated into corresponding conformity using Eqs. (22)~(24) below. The benchmarks were set to generate 100 million samples directly from Beta(2,1), Beta(2,6), a bi-normal mixture, as well as normal N(0,1). Monte Carlo simulations were conducted and the conformities were checked by counting the non-conforming samples (Eqs.(20)~(21)). The benchmark distributions are shown in Fig. 2. The conformity results are shown in Table 2. For comparison, we use traditional process capability analysis to estimate

−−=

−=

σ

µ

σ

µ

σ ˆ

ˆ,

ˆ

ˆmin,

ˆ*6

LSLUSLC

LSLUSLC pkp (22)

The underlying assumption is normality of the quality characteristics. The equivalent potential conformity (associated with Cp) is:

1)3(*2 −= pp CY Φ (23)

And actual capability (yield) is:

).ˆ

ˆ()

ˆ

ˆ(

σ

µΦ

σ

µΦ

−−

−=

LSLUSLYpk (24)

Where σµ ˆ,ˆ are estimated from N=100 Monte Carlo samples

generated from the specified distributions. These samples were also used for kernel density estimation and M-H sampling. We used these distributions to generate N=100 samples xi i=1, 2, …, N to simulate the individual quality characteristic data from these processes. The training data size is determined with the sampling plan of 25 subgroups and n=4 samples in each subgroup for phase I SPC charting process (to ensure in-

control). The procedure in Fig. 1 was then followed to estimate density and yield. In the M-H sampling, the initial 10,000 samples were taken as burn-in process. The following samples were split into 30 sample sets, each containing 40,000 samples. From each sample set the conformity (yield) is estimated. The average of these 30 conformities gives the conformity results in Table 2 below. The last column in Table 2 represents the relative errors, i.e.

%100||

% ×−

=ConformityBenchmark

ConformityBenchmarkConformityEstimatedε

Table 2 Comparison of process capability indices of Yp, Ypk with different approaches

Pdf USL/LSL

Ypk Yp 95% CI of Ypk Errors ε %

Benchmark

via simulation

Beta(2,1) Beta(2,6) Bi-normal

N(0,1)

0.97/0.030.97/0.03

9.5/-3 3/-3

0.9432/ 0.986/ 0.9987/ 0.9973

0.9984 0.9999 0.9973

0% 0% 0% 0%

Epanechnikov quadratic

kernel


N(0,1)

0.97/0.030.97/0.03

9.5/-3 3/-3

0.9534/0.9835/0.9981/

1.0/

0.9999 1.0 0.9997 1.0

[.9451, .9617] [.9775, .9862] [.9975,.9987]

[1.0,1.0]

1.08%/0.16% 0.25%/0%

0.06%/0.02% 0.27%/0.27%

Tri-cube kernel


N(0,1)

0.97/0.030.97/0.03

9.5/-3 3/-3

0.9510/ 0.98775/0.99986/

1.0/

0.9944 1.0 0.9999 1.0

[.9428, .9593] [.9838, .9917] [.9997,1.000]

[1.0,1.0]

0.83%/0.4% 0.18%/0% 0.12%/0%

0.27%/0.27%

Gaussian kernel


N(0,1)

0.97/0.030.97/0.03

9.5/-3 3/-3

0.9563/ 0.9614/ 0.97939 0.9967

0.9852 1.0 /0.9903 /0.9946

[.9503,.9623] [.9547, .9681] [.9775,.9813] [.9957, .9978]

1.39%1.32% 2.49%/0%

1.93%/0.96% 0.06%/0.27%

Parametric model via

Cp, Cpk analysis


N(0,1)

0.97/0.030.97/0.03

9.5/-3 3/-3

0.89200.92100.96690.9983

/0.9532 /0.9987 /0.9737 /0.9985

5.43%/4.52% 6.59%/0.13% 3.18%/2.62% 0.1%/0.12%

The 95% confidence intervals were estimated from 30

sample sets. From the results and computation experiences it was observed that: 1. Among three K-M-H algorithms, Epanechnikov quadratic

and Tri-cube kernels give better results on all non-normal distributed processes; Gaussian kernel seems overestimating the nonconformity or too conservative. This spillover effect is caused by relatively large )(2 Kµ or over

emphasizing the tails, as presented in Section 3 Table 1. However, Gaussian kernel does produce smoother density estimates.

2. The conformity results of all three K-M-H algorithms are superior over the traditional parametric normal model method except for the perfect normal distribution. For non-normal scenario the conformity estimates are significantly different.

3. The quick & dirty bandwidth selection works well in the case study.

4. Computation cost is trifle. It depends on the sample size. In most of cases it takes about 30 sec for about 600,000 kernel function assessments in M-H sampling.

5. It is noticed that the estimated Yp, Ypk are random in the sense that they rely on the actual sampling process i.e.



samples xi directly from the unknown process model and also on the M-H sampling process.

6. Another advantage over traditional approach is that the M-H is actually a bootstrap sampling from the nonparametric kernel density model. Hence the confidence interval can be estimated simultaneously.

0 0.5 10

200

400

600

800

0 0.5 10

500

1000

1500

-5 0 5 100

100

200

300

400

-5 0 50

500

1000

1500

FIGURE 2 - DISTRIBUTIONS: (a) BETA(2,1) DISTRIBUTION; (b) BETA(2,6) DISTRIBUTION; (c)

MIXTURE OF N(0,1) AND N(5,1.5); (d) NORMAL N(0,1)

In the following figures (Figs 3-11), (a) is )(ˆ xf , (b) M-H

sample distribution, (c) training samples (n=100), (d) 30 estimated yields.

0 10 20 30 400

5

10

15

20

0 10 20 30 400

5

10

15

20

0 0.5 10

5

10

15

20

25

0 10 20 300.945

0.95

0.955

0.96

0.965

0.97

FIGURE 3 - BETA(2,1) WITH EPAN: YPK=0.9534, 95% C.I= [0.94511, 0.9617]

0 10 20 30 400

0.5

1

1.5

2

0 10 20 30 400

0.5

1

1.5

2

0 0.5 10

5

10

15

20

25

0 10 20 300.945

0.95

0.955

0.96

0.965

0.97

FIGURE 4 - BETA(2,1) WITH GAUSSIAN: YPK=0.9563, 95% C.I= [0.95034, 0.9623]

0 10 20 30 400

10

20

30

40

0 10 20 30 400

10

20

30

40

0 0.5 10

10

20

30

0 10 20 300.93

0.94

0.95

0.96

FIGURE 5 - BETA(2,1) WITH TRI: YPK=0.9510, 95% C.I= [0.9428, 0.9593]

0 10 20 30 400

5

10

15

20

25

0 10 20 30 400

5

10

15

20

25

0 0.2 0.4 0.6 0.80

5

10

15

20

25

0 10 20 300.975

0.98

0.985

0.99

FIGURE 6 - BETA(2,6) WITH EPAN: YPK=0.98185, 95% C.I= [0.97745, 0.98624]

(a) (b)

(c) (d)

(b) (a)

(d) (c)

(a) (b)

(c) (d)

(a) (b)

(c) (d)

(a) (b)

(c) (d)



0 10 20 30 400

0.5

1

1.5

2

2.5

0 10 20 30 400

0.5

1

1.5

2

2.5

0 0.2 0.4 0.6 0.80

5

10

15

20

0 10 20 300.95

0.955

0.96

0.965

0.97

0.975

FIGURE 7 - BETA(2,6) WITH GAUSSIAN: YPK=0.9614, 95% C.I= [0.95473, 0.96814]

0 10 20 30 400

10

20

30

0 10 20 30 400

10

20

30

0 0.2 0.4 0.6 0.80

10

20

30

0 10 20 300.98

0.985

0.99

0.995

FIGURE 8 - BETA(2,6) WITH TRI: YPK=0.98775, 95% C.I= [0.98383, 0.99167]

0 10 20 30 400

10

20

30

40

0 10 20 30 400

10

20

30

40

-5 0 5 100

5

10

15

20

25

0 10 20 300.9975

0.998

0.9985

0.999

FIGURE 9 - BI-NORMAL WITH EPAN: YPK=0.9981, 95% C.I= [0.99753,0.99867]

0 10 20 30 400

5

10

15

20

0 10 20 30 400

5

10

15

20

-5 0 5 10 150

10

20

30

0 10 20 300.976

0.978

0.98

0.982

FIGURE 10 - BI-NORMAL WITH GAUSSIAN: YPK=0.97939, 95% C.I= [0.97751, 0.98128]

0 10 20 30 400

10

20

30

40

0 10 20 30 400

10

20

30

40

-5 0 5 100

5

10

15

20

25

0 10 20 300.9996

0.9997

0.9998

0.9999

1

FIGURE 11 - BI-NORMAL WITH TRI: YPK=0.99986, 95% C.I= [0.9997,1]

Matlab m-file codes were developed and simulations were

conducted on genuine intel (r) cpu t2500 @2.0ghz, ram2046Mb, windows xp. The computation seems to be not a serious concern. Most of the calculation can be done within one minute. SUMMARY

The process capability indices Cp, Cpk have been widely accepted and used in industry for process capability evaluation. The Cp, Cpk analysis may cause ambiguity and misleading in interpretation in application when normality assumption is slightly or moderately violated. A new method for process capability evaluation is proposed based on nonparametric model and Markov Chain Monte Carlo technique. The new method directly defines conformity or yield as process capability index which can avoid ambiguity and misleading in interpretation. Kernel density estimation and Metropolis-Hastings sampling, a popular sampling algorithm in

(a) (b)

(c) (d)

(a) (b)

(a) (b) (a) (b)

(a) (b)

(c) (d)

(c) (d) (c) (d)

(c) (d)



Markov Chain Monte Carlo technique, were adopted for kernel pdf model checking and conformity analysis. Yield computation in conformity analysis involves a multivariate integral computation. If prior experiences or knowledge provide sufficient fidelity to the kernel pdf model, the model checking by M-H sampling can thus be avoided. Monte Carlo or quasi-Monte Carlo [25] methods, as well as other space filling techniques [26] can also be applied for high efficient conformity (yield) computation. Four distributions, representing truncated, skewed, multimode, and perfect normal densities, were used in simulation for validation. The results show that the proposed K-M-H approach in conformity estimation has the following features:

1. Assumption-free: the kernel estimation strategy subjects to no prior model assumption, i.e., letting the

data speak for themselves. This is very appealing to practitioners who concern about capability indices interpretation and how much violation of normality should be tolerated.

2. M-H sampling method serves both model checking and conformity analysis, having sufficient model fidelity, other techniques can provide more efficient yield calculation algorithms;

3. The method can also be easily extended to multivariate processes;

4. More accurate conformity estimation than currently used parametric model method, especially when normality is violated;

5. Coherent and unambiguous interpretation of process capability by using conformity indices Yp, Ypk;

6. Superiority of Epanechnikov quadratic and tri-cube kernels over Gaussian in conformity estimation for non-normal distributions in the simulation case study;

7. The quick and dirty bandwidth selection approach works well in the simulation case study;

8. Coding and implementation: once the ho is determined the expression of kernel density is straightforward; M-H sampling and coding is extremely simple;

9. Computation cost is trifle for process capability estimation.

Despite assumption-free, some prior knowledge about the density of the process can be useful. For instance, if process knowledge suggests a truncated distribution a kernel with a finite support like Epanechnikov quadratic and tri-cube kernels can be the better choices to prevent spillover effects, leading to overestimated nonconformities. The quick & dirty selection of ho is preferred because it can avoid computation intensive optimal ho searching and more mathematical involvement. In addition, the contribution of bias and variance error from the tails is much less because f(x) and f(2)(x) in Eqs. (16) and (17) are normally very small compared to the central region (e.g., with peaks and valleys) of f(x). Thus, the tedious efforts in

minimizing MISE or AMISE in Eqs. (9) and (10) may lead to trifle improvement in density tails, making it worthless.

The proposed technique can find applications in most of multivariate quality control problems such as in semiconductor, process, pharmaceutical, and general manufacturing industries. One specific application area is in manufacturing quality control problem with geometric tolerance (GD&T) requirements in which the interrelated tolerances and complexity of the multivariate statistical model prevent current tolerance techniques from being viable option.

ACKNOWLEDGEMENTS

The authors gratefully acknowledge financial support of the National Science Foundation awards grant numbers NSF-CMMI: #0928609 and CMMI-#0927557.

REFERENCES [1] Polansky, A. M., (1998). “A smooth nonparametric approach to process capability” Qual. Reliab. Engng. Int. 14, pp. 43-48.

[2] Polansky, A. M., 2000, “An algorithm for computing a smooth non-parametric capability estimate”, J. of Qual. Tech., 32, pp. 284-289.

[3] Wand, M. P. and M. C. Jones, Kernel Smoothing, Vol. 60 of Monographs on Statistics and Applied Probability, Chapman and Hall, London, 1995.

[4] Kotz, S. and N. L. Johnson, Process capability indices, Chapman and Hall, London, 1993.

[5] Kotz, S. and N. L. Johnson, (2002). “Process capability Indices—A review, 1992-2000”, J. Qual. Tech. 34(1), pp. 2-19.

[6] Robert, C. P. and G. Casella, Monte Carlo Statistical

Methods, Springer-Verlag, New York, 2006.

[7] Palmer, K. and K. Tsui, (1999). “A review and interpretations of process capability indices”, Annals of

Operational Research 87, pp. 31-47.

[8] Silverman, B. W., Density Estimation for Statistics and

Data Analysis, Vol. 26 of Monographs on Statistics and Applied

Probability, Chapman and Hall, London, 1986.

[9] Turlach,B. A., (1993). “Bandwidth selection in kernel density estimation: A review”, Discussion Paper 9307, Institut für Statistik und Ökonometrie, Humboldt-Universität zu Berlin.

[10] Tsui, K., (1997). “Interpretation of process capability indices and some alternatives”, Qual. Engng, 9(4), pp. 587-596.



[11] Clements, J. A., (1989). “Process capability calculation for non-normal distributions”, Qual. Progress 22, pp. 95-100.

[12] Flaig, J. J., (2002). “Process Capability Optimization”, Quality Engineering, 15(2), pp. 233–242.

[13] Flaig, J. J., (1999). “Process Capability Sensitivity Analysis”, Quality Engineering, 11, pp. 587–592.

[14] Marron, J. S. and D. Nolan, (1988). “Canonical kernels for density estimation”, Statistics & Probability Letters 7(3), pp. 195-199.

[15] Kane, V. E., (1986). Process Capability Indices”, Journal

of Quality Technology, 18, pp. 41-52.

[16] Sullivan, L. P., (1984). “Reducing variability: A new approach to quality”, Quality Progress, 17(7), pp. 15-21.

[17] Yeh, A. B. and S. Bhattacharya, (1998). “A robust capability index”, Communication in Statistics—Simulation and

Computation 27, pp. 565-589.

[18] Yeh, A. B. and H. Chen, (2001). “A nonparametric multivariate process capability index”, International journal of

modeling & simulation, 21(8), pp. 218-223.

[19] Carr, W. E., (1991). “A new process capability index: parts per million”, Quality Progress, 24(2), pp. 152-154.

[20] Pal, S., (2005). “Evaluation of Nonnormal Process Capability Indices using Generalized Lambda Distribution”, Quality Engineering, 17, pp. 77–85.

[21] Huang, W., T. Phoomboplab, and D. Ceglarek, 2008, “Process Capability Surrogate Model Based Tolerance Synthesis of Multi Station Manufacturing Systems (MMS)”, IIE

Transactions, special issue on Quality Control and

Improvement for Multistage Systems, 41, pp. 309-322.

[22] Jones, M. C., J. S. Marron, and S. J. Sheather, (1996). “Progress in data-based bandwidth selection for kernel density estimation”, Computational Statistics, 11, pp. 337-381.

[23] Bisgaard, S. and S. Graves, (1997). “A Negative Process Capability Index from Assembling Good Components? A Problem in Statistical Tolerancing”, Qual. Engng, 10(2), pp. 409-414.

[24] Huang, W., B. R. Konda, Z. Kong, (2010). "Geometric Tolerance Simulation Model for Rectangular and Circular Planar Features", Trans. of NAMRI/SME 38.

[25] Huang, W., D. Ceglarek, Z. G. Zhou, (2004). “Using Number-Theoretical Net Method (NT-net) in Tolerance Analysis”, International Journal of Flexible Manufacturing

Systems, 6(1), pp. 65-90.

[26] Huang, W., Z. Kong, A. Chennamaraju, (2010). “Fixture Robust Design by Sequential Space Filling Methods in Multi-Station Manufacturing Systems”, ASME Trans. Journal of

Computing & Information Science in Engineering (JCISE). 10, pp. 041001-1 ~ 041001-11.

[27] Kong, Z., W. Huang, A. Oztekin, (2009). “Stream of Variation Analysis for Multiple Station Assembly Process with Consideration of GD&T Factors”, ASME Trans., Journal of Manufacturing Science and Engineering, 131, pp. 51010-51020.



Kernel Density Estimation and Metropolis-Hastings Sampling in Process Capability Analysis of Unknown...

Documents

Transcript of Kernel Density Estimation and Metropolis-Hastings Sampling in Process Capability Analysis of Unknown...