Multiple Regression Methods
-
Upload
khangminh22 -
Category
Documents
-
view
1 -
download
0
Transcript of Multiple Regression Methods
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 121Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chapter 12:Multiple Regression Methods
Hildebrand, Ott and GrayBasic Statistical Ideas for Managers
Second Edition
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 12Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Learning Objectives for Ch. 12
• The Multiple Linear Regression Model• How to interpret a slope coefficient in the multiple
regression model• The reason for using the adjusted coefficient of
determination in multiple regression• The meaning of multicollinearity and how to detect it• How to test the overall utility of the predictor• How to test the additional value of a single predictor• How to test the significance of a subset of predictors in
multiple regression• The meaning of extrapolation
2
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 123Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Section 12.1The Multiple Regression Model
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 124Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.1 The Multiple Regression Model
Example (with two predictors):• Y = Sales revenue per region (tens of thousands of
dollars) • = advertising expenditures (thousands of dollars)• = median household income (thousands of dollars)
The data follow:
1x2x
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 125Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.1 The Multiple Regression Model
• The data are:Region Sales Adv Exp Income
A 1 1 32B 1 2 38C 2 1 42D 2 3 35E 3 2 41F 3 4 43G 4 3 46H 4 5 44I 5 5 48J 5 6 45
• A graphical representation follows.
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 126Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.1 The Multiple Regression Model
• Objective:
45
Sa les
1.040
2.5
4.0
Incom e
5.5
3524 306A dv Ex p
3 D S catte rp lot of S a le s v s Income v s Adv Ex p
Fit a plane through the points
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 127Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.1 The Multiple Regression Model
• Population Model:
or
where is the error term.
• Interpretation of any • Change in Y per unit change in , when all
other independent variables are held constant.• is called the partial slope, j = 1,2, …,k
kk xxYE βββ +++= ...)( 110
εβββ ++++= kk xxY ...110
εjβ
jx
jβ
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 128Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.1 The Multiple Regression Model
• First-order model ⇒ No higher-order terms or interaction terms
• An interaction term is the product of two predictors:
• Change in E(Y) per unit change in depends on the value of .
1x2x
21xx
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 129Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Section 12.2Estimating Multiple Regression
Coefficients
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1210Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.2 Estimating Multiple Regression Coefficients
• Criterion used to estimate β 'sMethod of Least Squares⇒ minimize the sum of squared residuals
• Symbolically: min
• We will use software to do the calculations
2
∑∧
−⎟⎟⎠
⎞
⎜⎜⎝
⎛ii yy
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1211Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.2 Estimating Multiple Regression Coefficients
Example (Y = Sales, x1 = Advertising Expenditures, x2 = Median Household Income):
The Minitab output follows.Regression Analysis: Sales versus Adv Exp and Income
The regression equation isSales = - 5.09 + 0.416 Adv Exp + 0.163 Income
Predictor Coef SE Coef T P VIFConstant -5.091 1.720 -2.96 0.021Adv Exp 0.4158 0.1367 3.04 0.019 1.8Income 0.1633 0.0475 3.44 0.011 1.8
S = 0.541142 R-Sq = 89.8% R-Sq(adj) = 86.8%
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1212Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.2 Estimating Multiple Regression Coefficients
• The fitted model is:
vs.
“The coefficient of an independent variable in a multiple regression equation does not, in general, equal the coefficient that would apply to that variable in a simple linear regression. In multiple regression, the coefficient refers to the effect of changing that variable while other independent variables stay constant. In simple linear regression, all other potential independent variables are ignored.” (Hildebrand, Ott and Gray)
1 25.09 0.416 0.163Y x x∧
= − + +
210.681 0.725 7.69 0.258Y = + x and Y = + x∧ ∧
−
jx
jx
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1213Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.2 Estimating Multiple Regression Coefficients
• Interpretation of 0.416: An additional unit (or an increase of $1,000) of Advertising Expenditures leads to 0.416 increase in Sales when Median Household Income is fixed, i.e., regardless of whether is 32 or 48.
• Does this seem reasonable? • If Advertising Expenditures are increased by 1
unit, do you expect Sales to increase by 0.416 units regardless of whether the region has income of $32,000 or $48,000?
2x
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1214Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.2 Estimating Multiple Regression Coefficients
• The output also gives the estimate of , both directly and indirectly.
• Indirectly, can be estimated as follows:
where = n - (k + 1) = n – k - 1
• The estimate of can also be read directly from “s” on the output.
εσ
εσ
Error)(Residual MS =
/df Residuals) Squaredof (Sum = sε
Errordf
εσ
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1215Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.2 Estimating Multiple Regression Coefficients
Example (Sales vs. Adv. Exp. And Income): The Minitab output follows.Regression Analysis: Sales versus Adv Exp and Income
The regression equation isSales = - 5.09 + 0.416 Adv Exp + 0.163 Income
S = 0.541142 R-Sq = 89.8% R-Sq(adj) = 86.8%
Analysis of Variance
Source DF SS MS F PRegression 2 17.9502 8.9751 30.65 0.000Residual Error 7 2.0498 0.2928Total 9 20.0000
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1216Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.2 Estimating Multiple Regression Coefficients
• Use the output to locate the estimate of
From the output, = 0.5411
Or,
εσ
εs
541.02928.0
)(
==
= ErrorMSsε
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1217Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.2 Estimating Multiple Regression Coefficients
• Coefficient of Determination
• Concept: “…we define the coefficient of determination as the proportional reduction in the squared error of Y, which we obtain by knowing the values of .” (Hildebrand, Ott and Gray)
• As in simple regression,
⎟⎟⎠
⎞⎜⎜⎝
⎛•R or R 2
kxxxy
2
21
SSTSSE
- = SSTSSR
= R2 1
1 2, ,..., kx x x
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1218Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.2 Estimating Multiple Regression Coefficients
Example (Sales vs. Adv. Exp. And Income):
From the output, “R-Sq = 89.8%”
Interpretation: 89.8% of the variation in Sales is explained by a multiple regression model with Adv. Exp. and Income as predictors.
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1219Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.2 Estimating Multiple Regression Coefficients
• Adjusted Coefficient of Determination
• SSE and SST are each divided by their degrees of freedom.
• Since (n - 1) / (n - (k + 1)) > 1 ⇒
)( 2aR
) - (n / SST
)) + (k - (n / SSE - = Ra 1
112
SSTSSE
) + (k - n
- n - =
11
1
R <R 22a
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1220Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.2 Estimating Multiple Regression Coefficients
• Why use ?
• SST is fixed, regardless of the number of predictors.
• SSE decreases when more predictors are used. ⇒ R2 increases when more predictors are used.
• However, can decrease when another predictor is added to the fitted model, even though R2
increases.
• Why? The decrease in SSE is offset by the loss of a degree of freedom in [n – (k +1)] for SSE.
2aR
2aR
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1221Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.2 Estimating Multiple Regression Coefficients
• The following example illustrates this.
Example:For a fitted model with 10 observations, suppose SST = 50. When k = 2, SSE = 5. When k = 3, SSE = 4.5.
• Even though there has been a modest increase in , has decreased.
2R2
αR
2R 2αR
K=2:
K=3: 865.50
5.469191.
505.41
871.505
7919.
5051
=⎟⎠⎞
⎜⎝⎛
⎟⎠⎞
⎜⎝⎛−=−
=⎟⎠⎞
⎜⎝⎛
⎟⎠⎞
⎜⎝⎛−=−
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1222Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.2 Estimating Multiple Regression Coefficients
• Sequential Sum of Squares (SEQ SS)• Concept: The incremental contributions to SS
(Regression) when the predictors enter the model in the order specified by the user.
Example (Sales vs. Adv. Exp. And Income):The Minitab output follows for when Adv.Exp. is entered first.
Analysis of VarianceSource DF SS MS F PRegression 2 17.9502 8.9751 30.65 0.000
Source DF Seq SSAdv Exp 1 14.4928Income 1 3.4574
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1223Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.2 Estimating Multiple Regression Coefficients
• SS (Regression Using and )SSR ( , )
= 17.9502• SS (Regression Using only)
SSR ( ) = 14.4928
• SS (Regression for when is already in the model)
SSR ( | ) = 3.4574
1x 2x1x
1x
2x
2x 1x2x 1x
≡
≡ 1x
≡
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1224Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.2 Estimating Multiple Regression Coefficients
Example (Sales vs. Adv. Exp. And Income):The Minitab output follows for when Income is entered first.
Analysis of VarianceSource DF SS MS F PRegression 2 17.9502 8.9751 30.65 0.000
Source DF Seq SSIncome 1 15.2408Adv Exp 1 2.7093
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1225Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.2 Estimating Multiple Regression Coefficients
• SS (Regression Using and ) SSR ( , ) = 17.9502 {Unchanged
• SS (Regression Using only) SSR ( ) = 15.2408
• SS (Regression for when is already in the model)
SSR ( | ) = 2.7093
• Regardless of which predictor is entered first, the sequential sums of squares, when added, equal SS (Regression).
1x1x
1x
2x
2x2x
2x1x
≡
2x≡
≡
2x
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1226Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Section 12.3Inferences in Multiple Regression
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1227Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
• Objective: Build a parsimonious model⇒ as few predictors as necessary
• Must now assume errors in population model are normally distributed
• F-test for overall model
vs. 0...:H 210 === kβββ
0:Ha ≠joneleastat β
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1228Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
• Test Statistic: F = MS (Regression)/MS (Residual Error)
• Concept: “If SS (Regression) is large relative to SS (Residual), the indication is that there is real predictive value in [some of] the independent variables .”(Hildebrand, Ott and Gray)
• Decision Rule: Rejector reject α<− valuepifH0
1 2, ,..., kx x x
1,,0 −−> knkFFifH α
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1229Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
Example (Sales vs. Adv. Exp. and Income):
The Minitab output follows:
Analysis of VarianceSource DF SS MS F PRegression 2 17.9502 8.9751 30.65 0.000Residual Error 7 2.0498 0.2928Total 9 20.0000
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1230Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
Test : At least one
at the 5% level.
Since F = 30.65 > F.05, 2, 7 = 4.74,
reject at the 5% level.
Or since p-value = 0.000 < .05,
reject at the 5% level.
Implication: At least one of the x’s has some predictivepower.
aHvsH .0: 210 == ββ 0≠jβ
0: 210 == ββH
0: 210 == ββH
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1231Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
• t-test for Significance of an Individual Predictor
• implies that has no additional predictive value as the last predictor in to a model that contains all the other predictors
• Test Statistic: t =
where is the estimated standard error of
k , 1,2, j,0:.0:H0 …=≠= jaj Hvs ββ
0H jx
ˆˆ( 0) /
jj sβ
β −
ˆj
sβ
ˆjβ
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1232Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
• In Minitab notation, T = (Coef) / (SE Coef)
• Decision Rule: Reject
or reject if p-value < α.
• Warning: Limit the number of t-tests to avoida high overall Type 1 error rate.
1,2/0 −−> knttifH α
0H
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1233Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
Example (Sales vs. Adv. Exp. and Income):
The Minitab output follows:
Predictor Coef SE Coef T P VIFConstant -5.091 1.720 -2.96 0.021Adv Exp 0.4158 0.1367 3.04 0.019 1.8Income 0.1633 0.0475 3.44 0.011 1.8
Test at the 5% level.0:.0: 110 ≠= ββ aHvsH
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1234Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
Since
reject at the 5% level.
Or since p-value = .019 < .05,
reject at the 5% level.
Implication: Advertising Expenditures provides additional predictive value to a model having Income as a predictor
,365.204.3 7,025. =>= tt0: 10 =βH
0: 10 =βH
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1235Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
• MulticollinearityConcept: High correlation between at least one pair of
predictors, e.g., x1 and x2
⇒ Correlated x's provide no new information.Example: In predicting heights of adults using the length of
the right leg, the length of the left leg would be of little value.
• Symptoms of Multicollinearity• Wrong signs for s• t-test isn't significant even though you believe the
predictor is useful and should be in the fitted model.
β ′ˆ
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1236Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
• Detection of Multicollinearity
• is the coefficient of determination obtained by regressing xj on the remaining (k - 1) predictors, denoted by .
• If > 0.9, this is a signal that multicollinearity is present.
• This criterion can be expressed in a different way.
R2x x x x x k1 +j 1 -j 1j •••••••
2jR
2jR
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1237Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
• Let VIFj denote the Variance Inflation Factor of the jth predictor:
• If VIFj > 10, this is a signal that multicollinearity is present.
2
11j
j
VIFR
=−
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1238Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
• Why is VIFj called the variance inflation factor for the jth predictor?
• The estimated standard error of in a multiple regression is:
or
jβ∧
∑ −−=
)1()(1
22ˆjjij Rxx
ssj
εβ
∑ −= 2ˆ )( jij
j
xxVIF
ssj
εβ
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1239Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
• If VIFj is large, so is , which leads to a t-test
that is not statistically significant.
• The VIF “ …measures how much the variance (square of the standard error) of a coefficient is increased because of collinearity.” (Hildebrand, Ott and Gray)
j
sβ∧
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1240Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
Example (Sales vs. Adv. Exp. and Income):
The Minitab output follows. The regression equation isSales = - 5.09 + 0.416 Adv Exp + 0.163 IncomePredictor Coef SE Coef T P VIFConstant -5.091 1.720 -2.96 0.021Adv Exp 0.4158 0.1367 3.04 0.019 1.8Income 0.1633 0.0475 3.44 0.011 1.8
Since both VIFs = 1.8 < 10, multicollinearity between Advertising Expenditures and Median Household Income is not a problem.
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1241Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
• The Minitab output for regressing Adv Exp on Income follows:
The regression equation is
Adv Exp = - 6.26 + 0.229 Income
S = 1.39955 R-Sq = 43.2%
Since R2 =.432, VIF = 1/(1 - .432) = 1.8, as shown.
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1242Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
• To illustrate multicollinearity, consider Exercise 12.19• Exercise 12.19: A study of demand for imported
subcompact cars consists of data from 12 metropolitan areas. The variables are: Demand: Imported subcompact car sales as a
percentage of total salesEduc: Average number of years of schooling
completed by adultsIncome: Per capita incomePopn: Area populationFamsize: Average size of intact families
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1243Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
The Minitab output follows:The regression equation isDemand = - 1.3 + 5.55 Educ + 0.89 Income + 1.92 Popn
- 11.4 Famsize
Predictor Coef SE Coef T P VIFConstant -1.32 57.98 -0.02 0.982Educ 5.550 2.702 2.05 0.079 8.8Income 0.885 1.308 0.68 0.520 4.0Popn 1.925 1.371 1.40 0.203 1.6Famsize -11.389 6.669 -1.71 0.131 12.3
S = 2.68628 R-Sq = 96.2% R-Sq(adj) = 94.1%
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1244Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
• Is there a multicollinearity (MC) problem?
Since the VIF = 12.3 > 10 for the variable “Famsize”, there is a MC problem.
• Note that the p-value for the F test = 0.000, indicating that at least one of the x’s has predictive value. However, the smallest p-value for any t-test is 0.079, indicating that not one of the individual x’s has predictive value.
• What is the source of the MC problem?A matrix plot, in conjunction with the correlations, couldbe useful. This exercise will be revisited in Section 13.1
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1245Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.3 Inferences in Multiple Regression
• Remedies if multicollinearity is a problem.
• Eliminate one or more of the collinear predictors.
• Form a new predictor that is a surrogate of collinear predictor
• Multicollinearity could occur if one of the predictors is .
• This can be eliminated by using as the predictor.
2( )x x−
2x
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1246Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Section 12.4Testing a Subset of the Regression
Coefficients
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1247Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.4 Testing a Subset of the Regression Coefficients
• To illustrate the concept, consider Exercise 13.55Exercise 13.55: A bank that offers charge cards to customers
studies the yearly purchase amount (in thousands of dollars) on the card as related to the age, income (in thousands of dollars), whether the cardholder owns or rents a home and years of education of the cardholder. The variable owner equals 1 if the cardholder owns a home and 0 if the cardholder rents a home. The other variables are self-explanatory. The original data set has information on 160 cardholders. Upon further examination of the data, you decide to remove the data for cardholder 129 because this is an older individual who has a high income from having saved early in life and having invested successfully. This cardholder travels extensively and frequently uses her/his charge card.
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1248Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.4 Testing a Subset of the Regression Coefficients
Problem to be investigated:
The income and education predictors measure the economic well-being of a cardholder.
Do these predictors have any predictive value given the age and home ownership variables?
The null hypothesis is that the β’s corresponding to these predictors are simultaneously equal to 0.
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1249Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.4 Testing a Subset of the Regression Coefficients
• General Case
Complete Model:
Null hypothesis:
Reduced Model:
kkgggg xxxxYE βββββ ++++++= ++ …… 11110)(
gg xxYE βββ +++= …110)(
0: 10 ===+ kgH ββ …
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1250Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.4 Testing a Subset of the Regression Coefficients
• Exercise 13.55
Complete Model:
Null hypothesis:
Reduced Model:
)()()()()( 43210 EducnIncomeOwnerAgeYE βββββ ++++=
0:0 == EducnIncomeH ββ
)()()( 210 OwnerAgeYE βββ ++=
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1251Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.4 Testing a Subset of the Regression Coefficients
• The test statistic is called the Partial F statistic
Partial F Statistic
• Rationale• SSE decreases as new terms are added to the model. • If the x’s from (g + 1) to k have predictive ability, then
SSEcomplete should be much smaller than SSEreduced
• Their difference [SSEreduced – SSEcomplete] should be large
]/[][]/[][
completecomplete
completereducedcompletereduced
dfSSEdfdfSSESSE
F−−
=
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1252Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.4 Testing a Subset of the Regression Coefficients
Note: dfreduced - dfcomplete = k – g; dfcomplete = n – (k + 1)
Note: SSEcomplete/dfcomplete = MSEcomplete
Note: Other versions of the partial F-test are in H,O&G.
• Decision criterion: Reject H0 if Partial F > Fα,k-g,n-k-1
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1253Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.4 Testing a Subset of the Regression Coefficients
Exercise 13.55:
(from the Minitab output that follows)
Since 6.045 > F.05,2,154=3.055, reject H0.Either Income or Education add predictive value to
a model that contains Age and Owner
0:0 == EducnIncomeH ββ
045.60078.
]24/[]1937.1288.1[
]/[][]/[][
=−−
=
−−=
completecomplete
completereducedcompletereduced
dfSSEdfdfSSESSE
F
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1254Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.4 Testing a Subset of the Regression Coefficients
Regression Analysis: Purch_1 versus Age_1, Income_1, Owner_1, Educn_1
The regression equation isPurch_1 = - 0.797 + 0.0336 Age_1 + 0.00927 Income_1 +
0.112 Owner_1 + 0.00928 Educn_1
S = 0.08804 R-Sq = 95.0% R-Sq(adj) = 94.8%
Analysis of Variance
Source DF SS MS F PRegression 4 22.4678 5.6170 724.62 0.000Residual Error 154 1.1937 0.0078Total 158 23.6616
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1255Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.4 Testing a Subset of the Regression Coefficients
Regression Analysis: PURCH_1 versus AGE_1, OWNER_1
The regression equation isPURCH_1 = - 0.602 + 0.0402 AGE_1 + 0.220 OWNER_1
S = 0.09085 R-Sq = 94.6% R-Sq(adj) = 94.5%
Analysis of Variance
Source DF SS MS F P
Regression 2 22.374 11.187 1355.48 0.000
Residual Error 156 1.288 0.008
Total 158 23.662
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1256Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Section 12.5Forecasting Using Multiple
Regression
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1257Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.5 Forecasting Using Multiple Regression
• A major purpose of regression is to make predictions using the fitted model.
• In simple regression, we could obtain a confidence interval for E(Y) or a prediction interval for an individual Y.
• In both cases, the danger of extrapolation must be considered.
• Extrapolation occurs when using values of x far outside the range of x-values used to build the fitted model.
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1258Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.5 Forecasting Using Multiple Regression
• In regressing Sales on Advertising Expenditures, Advertising Expenditures ranged from 1 to 6.
It would be incorrect to obtain a Confidence Interval for E(Y) or a Prediction Interval for Y far outside this range.
We don’t know if the fitted model is valid outside this range.
• In multiple regression, one must consider not only the range of each predictor but the set of values of the predictors taken together.
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1259Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.5 Forecasting Using Multiple Regression
• Consider the following example: Example:
Y = sales revenue per region (tens of thousands of dollars) x1 = advertising expenditures (thousands of dollars) x2 = median household income (thousands of dollars)
The values for x1 and x2 are:
45484446434135423832x2
6553423121x1
JIHGFEDCBARegion
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1260Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.5 Forecasting Using Multiple Regression
• The scatterplot for x1 vs. x2 follows.
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1261Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.5 Forecasting Using Multiple Regression
• Extrapolation occurs when using the fitted model to predict outside the elbow-shaped region.
• This would occur when Advertising Expenditures is 5 and income is 35.
• The Minitab output follows
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1262Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
12.5 Forecasting Using Multiple Regression
Regression Analysis: Sales versus Adv Exp, Income
The regression equation isSales = - 5.09 + 0.416 Adv Exp + 0.163 Income
Predicted Values for New ObservationsNewObs Fit SE Fit 95% CI 95% PI1 2.703 0.530 (1.451, 3.956) (0.913, 4.494) XX denotes a point that is an outlier in the predictors.
Values of Predictors for New ObservationsNewObs Adv Exp Income1 5.00 35.0
Minitab indicates that this set of values for x1 and x2 is an outlier
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 12Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Keywords: Chapter 12
• Multiple regression model
• Partial slopes• First order model• Adjusted Coefficient
of Determination, Ra2
• Multicollinearity• Variance Inflation
Factor
• Overall F test• t-test• Complete model• Reduced model• Partial F test• Extrapolation
63
Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 12Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Summary of Chapter 12
• The Multiple Linear Regression Model• Interpreting the slope coefficient of a single predictor in a
multiple regression model• Understanding the difference between the coefficient of
determination (Ra2) and the adjusted coefficient of
determination (R2)• The detection of multicollinearity and its impact• Using the F statistic to test the overall utility of the predictors• Using the t-test to test the additional value of a single
predictors• Using the partial F test for assessing the significance of a
subset of predictors• The meaning of extrapolation in multiple regression
64