Multiple Regression Methods

22
1 Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2 nd edition, Chapter 12 Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 12: Multiple Regression Methods Hildebrand, Ott and Gray Basic Statistical Ideas for Managers Second Edition Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2 nd edition, Chapter 12 Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Learning Objectives for Ch. 12 The Multiple Linear Regression Model How to interpret a slope coefficient in the multiple regression model The reason for using the adjusted coefficient of determination in multiple regression The meaning of multicollinearity and how to detect it How to test the overall utility of the predictor How to test the additional value of a single predictor How to test the significance of a subset of predictors in multiple regression The meaning of extrapolation 2 Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 12 3 Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2 nd edition, Chapter 12 Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Section 12.1 The Multiple Regression Model

Transcript of Multiple Regression Methods

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 121Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Chapter 12:Multiple Regression Methods

Hildebrand, Ott and GrayBasic Statistical Ideas for Managers

Second Edition

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 12Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Learning Objectives for Ch. 12

• The Multiple Linear Regression Model• How to interpret a slope coefficient in the multiple

regression model• The reason for using the adjusted coefficient of

determination in multiple regression• The meaning of multicollinearity and how to detect it• How to test the overall utility of the predictor• How to test the additional value of a single predictor• How to test the significance of a subset of predictors in

multiple regression• The meaning of extrapolation

2

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 123Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Section 12.1The Multiple Regression Model

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 124Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.1 The Multiple Regression Model

Example (with two predictors):• Y = Sales revenue per region (tens of thousands of

dollars) • = advertising expenditures (thousands of dollars)• = median household income (thousands of dollars)

The data follow:

1x2x

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 125Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.1 The Multiple Regression Model

• The data are:Region Sales Adv Exp Income

A 1 1 32B 1 2 38C 2 1 42D 2 3 35E 3 2 41F 3 4 43G 4 3 46H 4 5 44I 5 5 48J 5 6 45

• A graphical representation follows.

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 126Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.1 The Multiple Regression Model

• Objective:

45

Sa les

1.040

2.5

4.0

Incom e

5.5

3524 306A dv Ex p

3 D S catte rp lot of S a le s v s Income v s Adv Ex p

Fit a plane through the points

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 127Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.1 The Multiple Regression Model

• Population Model:

or

where is the error term.

• Interpretation of any • Change in Y per unit change in , when all

other independent variables are held constant.• is called the partial slope, j = 1,2, …,k

kk xxYE βββ +++= ...)( 110

εβββ ++++= kk xxY ...110

εjβ

jx

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 128Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.1 The Multiple Regression Model

• First-order model ⇒ No higher-order terms or interaction terms

• An interaction term is the product of two predictors:

• Change in E(Y) per unit change in depends on the value of .

1x2x

21xx

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 129Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Section 12.2Estimating Multiple Regression

Coefficients

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1210Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.2 Estimating Multiple Regression Coefficients

• Criterion used to estimate β 'sMethod of Least Squares⇒ minimize the sum of squared residuals

• Symbolically: min

• We will use software to do the calculations

2

∑∧

−⎟⎟⎠

⎜⎜⎝

⎛ii yy

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1211Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.2 Estimating Multiple Regression Coefficients

Example (Y = Sales, x1 = Advertising Expenditures, x2 = Median Household Income):

The Minitab output follows.Regression Analysis: Sales versus Adv Exp and Income

The regression equation isSales = - 5.09 + 0.416 Adv Exp + 0.163 Income

Predictor Coef SE Coef T P VIFConstant -5.091 1.720 -2.96 0.021Adv Exp 0.4158 0.1367 3.04 0.019 1.8Income 0.1633 0.0475 3.44 0.011 1.8

S = 0.541142 R-Sq = 89.8% R-Sq(adj) = 86.8%

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1212Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.2 Estimating Multiple Regression Coefficients

• The fitted model is:

vs.

“The coefficient of an independent variable in a multiple regression equation does not, in general, equal the coefficient that would apply to that variable in a simple linear regression. In multiple regression, the coefficient refers to the effect of changing that variable while other independent variables stay constant. In simple linear regression, all other potential independent variables are ignored.” (Hildebrand, Ott and Gray)

1 25.09 0.416 0.163Y x x∧

= − + +

210.681 0.725 7.69 0.258Y = + x and Y = + x∧ ∧

jx

jx

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1213Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.2 Estimating Multiple Regression Coefficients

• Interpretation of 0.416: An additional unit (or an increase of $1,000) of Advertising Expenditures leads to 0.416 increase in Sales when Median Household Income is fixed, i.e., regardless of whether is 32 or 48.

• Does this seem reasonable? • If Advertising Expenditures are increased by 1

unit, do you expect Sales to increase by 0.416 units regardless of whether the region has income of $32,000 or $48,000?

2x

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1214Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.2 Estimating Multiple Regression Coefficients

• The output also gives the estimate of , both directly and indirectly.

• Indirectly, can be estimated as follows:

where = n - (k + 1) = n – k - 1

• The estimate of can also be read directly from “s” on the output.

εσ

εσ

Error)(Residual MS =

/df Residuals) Squaredof (Sum = sε

Errordf

εσ

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1215Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.2 Estimating Multiple Regression Coefficients

Example (Sales vs. Adv. Exp. And Income): The Minitab output follows.Regression Analysis: Sales versus Adv Exp and Income

The regression equation isSales = - 5.09 + 0.416 Adv Exp + 0.163 Income

S = 0.541142 R-Sq = 89.8% R-Sq(adj) = 86.8%

Analysis of Variance

Source DF SS MS F PRegression 2 17.9502 8.9751 30.65 0.000Residual Error 7 2.0498 0.2928Total 9 20.0000

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1216Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.2 Estimating Multiple Regression Coefficients

• Use the output to locate the estimate of

From the output, = 0.5411

Or,

εσ

εs

541.02928.0

)(

==

= ErrorMSsε

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1217Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.2 Estimating Multiple Regression Coefficients

• Coefficient of Determination

• Concept: “…we define the coefficient of determination as the proportional reduction in the squared error of Y, which we obtain by knowing the values of .” (Hildebrand, Ott and Gray)

• As in simple regression,

⎟⎟⎠

⎞⎜⎜⎝

⎛•R or R 2

kxxxy

2

21

SSTSSE

- = SSTSSR

= R2 1

1 2, ,..., kx x x

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1218Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.2 Estimating Multiple Regression Coefficients

Example (Sales vs. Adv. Exp. And Income):

From the output, “R-Sq = 89.8%”

Interpretation: 89.8% of the variation in Sales is explained by a multiple regression model with Adv. Exp. and Income as predictors.

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1219Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.2 Estimating Multiple Regression Coefficients

• Adjusted Coefficient of Determination

• SSE and SST are each divided by their degrees of freedom.

• Since (n - 1) / (n - (k + 1)) > 1 ⇒

)( 2aR

) - (n / SST

)) + (k - (n / SSE - = Ra 1

112

SSTSSE

) + (k - n

- n - =

11

1

R <R 22a

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1220Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.2 Estimating Multiple Regression Coefficients

• Why use ?

• SST is fixed, regardless of the number of predictors.

• SSE decreases when more predictors are used. ⇒ R2 increases when more predictors are used.

• However, can decrease when another predictor is added to the fitted model, even though R2

increases.

• Why? The decrease in SSE is offset by the loss of a degree of freedom in [n – (k +1)] for SSE.

2aR

2aR

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1221Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.2 Estimating Multiple Regression Coefficients

• The following example illustrates this.

Example:For a fitted model with 10 observations, suppose SST = 50. When k = 2, SSE = 5. When k = 3, SSE = 4.5.

• Even though there has been a modest increase in , has decreased.

2R2

αR

2R 2αR

K=2:

K=3: 865.50

5.469191.

505.41

871.505

7919.

5051

=⎟⎠⎞

⎜⎝⎛

⎟⎠⎞

⎜⎝⎛−=−

=⎟⎠⎞

⎜⎝⎛

⎟⎠⎞

⎜⎝⎛−=−

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1222Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.2 Estimating Multiple Regression Coefficients

• Sequential Sum of Squares (SEQ SS)• Concept: The incremental contributions to SS

(Regression) when the predictors enter the model in the order specified by the user.

Example (Sales vs. Adv. Exp. And Income):The Minitab output follows for when Adv.Exp. is entered first.

Analysis of VarianceSource DF SS MS F PRegression 2 17.9502 8.9751 30.65 0.000

Source DF Seq SSAdv Exp 1 14.4928Income 1 3.4574

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1223Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.2 Estimating Multiple Regression Coefficients

• SS (Regression Using and )SSR ( , )

= 17.9502• SS (Regression Using only)

SSR ( ) = 14.4928

• SS (Regression for when is already in the model)

SSR ( | ) = 3.4574

1x 2x1x

1x

2x

2x 1x2x 1x

≡ 1x

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1224Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.2 Estimating Multiple Regression Coefficients

Example (Sales vs. Adv. Exp. And Income):The Minitab output follows for when Income is entered first.

Analysis of VarianceSource DF SS MS F PRegression 2 17.9502 8.9751 30.65 0.000

Source DF Seq SSIncome 1 15.2408Adv Exp 1 2.7093

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1225Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.2 Estimating Multiple Regression Coefficients

• SS (Regression Using and ) SSR ( , ) = 17.9502 {Unchanged

• SS (Regression Using only) SSR ( ) = 15.2408

• SS (Regression for when is already in the model)

SSR ( | ) = 2.7093

• Regardless of which predictor is entered first, the sequential sums of squares, when added, equal SS (Regression).

1x1x

1x

2x

2x2x

2x1x

2x≡

2x

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1226Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Section 12.3Inferences in Multiple Regression

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1227Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

• Objective: Build a parsimonious model⇒ as few predictors as necessary

• Must now assume errors in population model are normally distributed

• F-test for overall model

vs. 0...:H 210 === kβββ

0:Ha ≠joneleastat β

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1228Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

• Test Statistic: F = MS (Regression)/MS (Residual Error)

• Concept: “If SS (Regression) is large relative to SS (Residual), the indication is that there is real predictive value in [some of] the independent variables .”(Hildebrand, Ott and Gray)

• Decision Rule: Rejector reject α<− valuepifH0

1 2, ,..., kx x x

1,,0 −−> knkFFifH α

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1229Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

Example (Sales vs. Adv. Exp. and Income):

The Minitab output follows:

Analysis of VarianceSource DF SS MS F PRegression 2 17.9502 8.9751 30.65 0.000Residual Error 7 2.0498 0.2928Total 9 20.0000

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1230Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

Test : At least one

at the 5% level.

Since F = 30.65 > F.05, 2, 7 = 4.74,

reject at the 5% level.

Or since p-value = 0.000 < .05,

reject at the 5% level.

Implication: At least one of the x’s has some predictivepower.

aHvsH .0: 210 == ββ 0≠jβ

0: 210 == ββH

0: 210 == ββH

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1231Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

• t-test for Significance of an Individual Predictor

• implies that has no additional predictive value as the last predictor in to a model that contains all the other predictors

• Test Statistic: t =

where is the estimated standard error of

k , 1,2, j,0:.0:H0 …=≠= jaj Hvs ββ

0H jx

ˆˆ( 0) /

jj sβ

β −

ˆj

ˆjβ

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1232Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

• In Minitab notation, T = (Coef) / (SE Coef)

• Decision Rule: Reject

or reject if p-value < α.

• Warning: Limit the number of t-tests to avoida high overall Type 1 error rate.

1,2/0 −−> knttifH α

0H

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1233Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

Example (Sales vs. Adv. Exp. and Income):

The Minitab output follows:

Predictor Coef SE Coef T P VIFConstant -5.091 1.720 -2.96 0.021Adv Exp 0.4158 0.1367 3.04 0.019 1.8Income 0.1633 0.0475 3.44 0.011 1.8

Test at the 5% level.0:.0: 110 ≠= ββ aHvsH

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1234Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

Since

reject at the 5% level.

Or since p-value = .019 < .05,

reject at the 5% level.

Implication: Advertising Expenditures provides additional predictive value to a model having Income as a predictor

,365.204.3 7,025. =>= tt0: 10 =βH

0: 10 =βH

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1235Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

• MulticollinearityConcept: High correlation between at least one pair of

predictors, e.g., x1 and x2

⇒ Correlated x's provide no new information.Example: In predicting heights of adults using the length of

the right leg, the length of the left leg would be of little value.

• Symptoms of Multicollinearity• Wrong signs for s• t-test isn't significant even though you believe the

predictor is useful and should be in the fitted model.

β ′ˆ

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1236Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

• Detection of Multicollinearity

• is the coefficient of determination obtained by regressing xj on the remaining (k - 1) predictors, denoted by .

• If > 0.9, this is a signal that multicollinearity is present.

• This criterion can be expressed in a different way.

R2x x x x x k1 +j 1 -j 1j •••••••

2jR

2jR

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1237Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

• Let VIFj denote the Variance Inflation Factor of the jth predictor:

• If VIFj > 10, this is a signal that multicollinearity is present.

2

11j

j

VIFR

=−

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1238Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

• Why is VIFj called the variance inflation factor for the jth predictor?

• The estimated standard error of in a multiple regression is:

or

jβ∧

∑ −−=

)1()(1

22ˆjjij Rxx

ssj

εβ

∑ −= 2ˆ )( jij

j

xxVIF

ssj

εβ

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1239Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

• If VIFj is large, so is , which leads to a t-test

that is not statistically significant.

• The VIF “ …measures how much the variance (square of the standard error) of a coefficient is increased because of collinearity.” (Hildebrand, Ott and Gray)

j

sβ∧

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1240Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

Example (Sales vs. Adv. Exp. and Income):

The Minitab output follows. The regression equation isSales = - 5.09 + 0.416 Adv Exp + 0.163 IncomePredictor Coef SE Coef T P VIFConstant -5.091 1.720 -2.96 0.021Adv Exp 0.4158 0.1367 3.04 0.019 1.8Income 0.1633 0.0475 3.44 0.011 1.8

Since both VIFs = 1.8 < 10, multicollinearity between Advertising Expenditures and Median Household Income is not a problem.

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1241Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

• The Minitab output for regressing Adv Exp on Income follows:

The regression equation is

Adv Exp = - 6.26 + 0.229 Income

S = 1.39955 R-Sq = 43.2%

Since R2 =.432, VIF = 1/(1 - .432) = 1.8, as shown.

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1242Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

• To illustrate multicollinearity, consider Exercise 12.19• Exercise 12.19: A study of demand for imported

subcompact cars consists of data from 12 metropolitan areas. The variables are: Demand: Imported subcompact car sales as a

percentage of total salesEduc: Average number of years of schooling

completed by adultsIncome: Per capita incomePopn: Area populationFamsize: Average size of intact families

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1243Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

The Minitab output follows:The regression equation isDemand = - 1.3 + 5.55 Educ + 0.89 Income + 1.92 Popn

- 11.4 Famsize

Predictor Coef SE Coef T P VIFConstant -1.32 57.98 -0.02 0.982Educ 5.550 2.702 2.05 0.079 8.8Income 0.885 1.308 0.68 0.520 4.0Popn 1.925 1.371 1.40 0.203 1.6Famsize -11.389 6.669 -1.71 0.131 12.3

S = 2.68628 R-Sq = 96.2% R-Sq(adj) = 94.1%

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1244Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

• Is there a multicollinearity (MC) problem?

Since the VIF = 12.3 > 10 for the variable “Famsize”, there is a MC problem.

• Note that the p-value for the F test = 0.000, indicating that at least one of the x’s has predictive value. However, the smallest p-value for any t-test is 0.079, indicating that not one of the individual x’s has predictive value.

• What is the source of the MC problem?A matrix plot, in conjunction with the correlations, couldbe useful. This exercise will be revisited in Section 13.1

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1245Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences in Multiple Regression

• Remedies if multicollinearity is a problem.

• Eliminate one or more of the collinear predictors.

• Form a new predictor that is a surrogate of collinear predictor

• Multicollinearity could occur if one of the predictors is .

• This can be eliminated by using as the predictor.

2( )x x−

2x

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1246Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Section 12.4Testing a Subset of the Regression

Coefficients

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1247Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.4 Testing a Subset of the Regression Coefficients

• To illustrate the concept, consider Exercise 13.55Exercise 13.55: A bank that offers charge cards to customers

studies the yearly purchase amount (in thousands of dollars) on the card as related to the age, income (in thousands of dollars), whether the cardholder owns or rents a home and years of education of the cardholder. The variable owner equals 1 if the cardholder owns a home and 0 if the cardholder rents a home. The other variables are self-explanatory. The original data set has information on 160 cardholders. Upon further examination of the data, you decide to remove the data for cardholder 129 because this is an older individual who has a high income from having saved early in life and having invested successfully. This cardholder travels extensively and frequently uses her/his charge card.

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1248Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.4 Testing a Subset of the Regression Coefficients

Problem to be investigated:

The income and education predictors measure the economic well-being of a cardholder.

Do these predictors have any predictive value given the age and home ownership variables?

The null hypothesis is that the β’s corresponding to these predictors are simultaneously equal to 0.

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1249Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.4 Testing a Subset of the Regression Coefficients

• General Case

Complete Model:

Null hypothesis:

Reduced Model:

kkgggg xxxxYE βββββ ++++++= ++ …… 11110)(

gg xxYE βββ +++= …110)(

0: 10 ===+ kgH ββ …

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1250Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.4 Testing a Subset of the Regression Coefficients

• Exercise 13.55

Complete Model:

Null hypothesis:

Reduced Model:

)()()()()( 43210 EducnIncomeOwnerAgeYE βββββ ++++=

0:0 == EducnIncomeH ββ

)()()( 210 OwnerAgeYE βββ ++=

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1251Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.4 Testing a Subset of the Regression Coefficients

• The test statistic is called the Partial F statistic

Partial F Statistic

• Rationale• SSE decreases as new terms are added to the model. • If the x’s from (g + 1) to k have predictive ability, then

SSEcomplete should be much smaller than SSEreduced

• Their difference [SSEreduced – SSEcomplete] should be large

]/[][]/[][

completecomplete

completereducedcompletereduced

dfSSEdfdfSSESSE

F−−

=

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1252Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.4 Testing a Subset of the Regression Coefficients

Note: dfreduced - dfcomplete = k – g; dfcomplete = n – (k + 1)

Note: SSEcomplete/dfcomplete = MSEcomplete

Note: Other versions of the partial F-test are in H,O&G.

• Decision criterion: Reject H0 if Partial F > Fα,k-g,n-k-1

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1253Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.4 Testing a Subset of the Regression Coefficients

Exercise 13.55:

(from the Minitab output that follows)

Since 6.045 > F.05,2,154=3.055, reject H0.Either Income or Education add predictive value to

a model that contains Age and Owner

0:0 == EducnIncomeH ββ

045.60078.

]24/[]1937.1288.1[

]/[][]/[][

=−−

=

−−=

completecomplete

completereducedcompletereduced

dfSSEdfdfSSESSE

F

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1254Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.4 Testing a Subset of the Regression Coefficients

Regression Analysis: Purch_1 versus Age_1, Income_1, Owner_1, Educn_1

The regression equation isPurch_1 = - 0.797 + 0.0336 Age_1 + 0.00927 Income_1 +

0.112 Owner_1 + 0.00928 Educn_1

S = 0.08804 R-Sq = 95.0% R-Sq(adj) = 94.8%

Analysis of Variance

Source DF SS MS F PRegression 4 22.4678 5.6170 724.62 0.000Residual Error 154 1.1937 0.0078Total 158 23.6616

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1255Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.4 Testing a Subset of the Regression Coefficients

Regression Analysis: PURCH_1 versus AGE_1, OWNER_1

The regression equation isPURCH_1 = - 0.602 + 0.0402 AGE_1 + 0.220 OWNER_1

S = 0.09085 R-Sq = 94.6% R-Sq(adj) = 94.5%

Analysis of Variance

Source DF SS MS F P

Regression 2 22.374 11.187 1355.48 0.000

Residual Error 156 1.288 0.008

Total 158 23.662

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1256Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Section 12.5Forecasting Using Multiple

Regression

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1257Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.5 Forecasting Using Multiple Regression

• A major purpose of regression is to make predictions using the fitted model.

• In simple regression, we could obtain a confidence interval for E(Y) or a prediction interval for an individual Y.

• In both cases, the danger of extrapolation must be considered.

• Extrapolation occurs when using values of x far outside the range of x-values used to build the fitted model.

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1258Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.5 Forecasting Using Multiple Regression

• In regressing Sales on Advertising Expenditures, Advertising Expenditures ranged from 1 to 6.

It would be incorrect to obtain a Confidence Interval for E(Y) or a Prediction Interval for Y far outside this range.

We don’t know if the fitted model is valid outside this range.

• In multiple regression, one must consider not only the range of each predictor but the set of values of the predictors taken together.

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1259Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.5 Forecasting Using Multiple Regression

• Consider the following example: Example:

Y = sales revenue per region (tens of thousands of dollars) x1 = advertising expenditures (thousands of dollars) x2 = median household income (thousands of dollars)

The values for x1 and x2 are:

45484446434135423832x2

6553423121x1

JIHGFEDCBARegion

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1260Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.5 Forecasting Using Multiple Regression

• The scatterplot for x1 vs. x2 follows.

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1261Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.5 Forecasting Using Multiple Regression

• Extrapolation occurs when using the fitted model to predict outside the elbow-shaped region.

• This would occur when Advertising Expenditures is 5 and income is 35.

• The Minitab output follows

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 1262Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12.5 Forecasting Using Multiple Regression

Regression Analysis: Sales versus Adv Exp, Income

The regression equation isSales = - 5.09 + 0.416 Adv Exp + 0.163 Income

Predicted Values for New ObservationsNewObs Fit SE Fit 95% CI 95% PI1 2.703 0.530 (1.451, 3.956) (0.913, 4.494) XX denotes a point that is an outlier in the predictors.

Values of Predictors for New ObservationsNewObs Adv Exp Income1 5.00 35.0

Minitab indicates that this set of values for x1 and x2 is an outlier

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 12Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Keywords: Chapter 12

• Multiple regression model

• Partial slopes• First order model• Adjusted Coefficient

of Determination, Ra2

• Multicollinearity• Variance Inflation

Factor

• Overall F test• t-test• Complete model• Reduced model• Partial F test• Extrapolation

63

Hildebrand & Ott, Statistical Thinking for Manager, 4th edition, Chapter 12Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 12

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Summary of Chapter 12

• The Multiple Linear Regression Model• Interpreting the slope coefficient of a single predictor in a

multiple regression model• Understanding the difference between the coefficient of

determination (Ra2) and the adjusted coefficient of

determination (R2)• The detection of multicollinearity and its impact• Using the F statistic to test the overall utility of the predictors• Using the t-test to test the additional value of a single

predictors• Using the partial F test for assessing the significance of a

subset of predictors• The meaning of extrapolation in multiple regression

64