1_regresi Model Building Methodology

Post on 28-Oct-2014

125 views 5 download

Tags:

Transcript of 1_regresi Model Building Methodology

Ch. 1Ch. 1Regresi: “Regresi: “Model Building Model Building Methodology”Methodology”

Setyo Tri WahyudiSetyo Tri Wahyudi

PendahuluanPendahuluan

Korelasi: Ukuran kekuatan hubungan antara 2 variabel. Misal X1 dengan X2.Nilai antara 0-1; nilai 0 semakin tidak berhubungan (tidak berkorelasi); nilai 1 korelasi sempurna.

Regresi: Suatu proses pembentukan model matematika atau fungsi yang dapat digunakan untuk prediksi atau penentuan suatu variabel oleh variabel lainnya.

Macam-macam RegresiMacam-macam Regresi Regresi Linear

Regresi linier ialah bentuk hubungan di mana variabel bebas X maupun variabel tergantung Y sebagai faktor yang berpangkat satu.

Regresi linier ini dibedakan menjadi:1). Regresi linier sederhana dengan bentuk fungsi:

Y = a + bX + e,2). Regresi linier berganda dengan bentuk fungsi:

Y = b0 + b1 X1 + . . . + b1 X1 + e

Dari kedua fungsi di atas 1) dan 2); masing-masing berbentuk garis lurus (linier sederhana) dan bidang datar (linier berganda).

Regresi Non-LinearRegresi non linier ialah bentuk hubungan atau fungsi di mana variabel X dan atau variabel Y dapat berfungsi sebagai faktor atau variabel dengan pangkat tertentu. Beberapa bentuk regresi non linier adalah sebagai berikut:1). Regresi polinomial ialah regresi dengan sebuah variabel bebas sebagai faktor dengan pangkat terurut.

Y = a + bX + cX2 (fungsi kuadratik).Y = a + bX + cX2 + bX2 (fungsi kubik)Y = a + bX + cX2 + dX2 + eX4 (fungsi kuartik),Y = a + bX + cX2 + dX3 + eX4 + fX5 (fungsi kuinik), dan

seterusnya.2). Regresi hiperbola (fungsi resiprokal)Pada regresi hiperbola, di mana variabel bebas X atau variabel tak bebas Y, dapat berfungsi sebagai penyebut sehingga regresi ini disebut regresi dengan fungsi pecahan atau fungsi resiprok. Regresi ini mempunyai bentuk fungsi seperti:

1/Y = a + Bx3). Regresi eksponensialRegresi eksponensial ialah regresi di mana variabel bebas X berfungsi sebagai pangkat atau eksponen. Bentuk fungsi regresi ini adalah: Y = a ebX

Regresi sederhana vs Regresi sederhana vs bergandaberganda

• Sederhana: terdapat dua variabel dalam model– dependent variable, the variable to be

predicted, usually called Y– independent variable, the predictor or

explanatory variable, usually called X

Y = Y = 00 + + 11XX11 + +

• Berganda: terdapat dua atau lebih variabel independen

Y = Y = 00 + + 11XX11 + + 22XX22 + + 33XX33 + . . . + + . . . + kkXXkk+ +

Evaluating Regression ModelEvaluating Regression ModelEvaluating Regression ModelEvaluating Regression Model

H

Hk

a

01 2 3

0:

:

At least one of the regression coefficients is 0

H

H

H

H

H

H

H

H

a a

a

k

ak

01

1

03

3

02

2

0

0

0

0

0

0

0

0

0

:

:

:

:

:

:

:

:

SignificanceTests for

IndividualRegressionCoefficients

Testingthe

OverallModel

Testing the Overall Model (F Testing the Overall Model (F test)test)Testing the Overall Model (F Testing the Overall Model (F test)test)

0 is tscoefficien regression theof oneleast At :

0: 21

0

aH

H

MSRSSR

kMSE

SSE

n kF

MSR

MSE

1

ANOVAdf SS MS F p

Regression 2 8189.723 4094.86 28.63 .000Residual (Error) 20 2861.017 143.1Total 22 11050.74

. , , .

. . ,

01 2 20 585

28 63 585

FFCal

reject H .0

Significance Test of the Significance Test of the Regression Coefficients (t Regression Coefficients (t

test)test)

Significance Test of the Significance Test of the Regression Coefficients (t Regression Coefficients (t

test)test)H

H

H

H

a

a

01

1

02

2

0

0

0

0

:

:

:

:

tCal = 5.63 > 2.086, reject H0.

t.025,20 = 2.086

Residuals and Sum of Residuals and Sum of Squares ErrorSquares ErrorResiduals and Sum of Residuals and Sum of Squares ErrorSquares Error

SSE

Y Y Y 2

Y Y Y Y Y 2

Y Y

SSE and Standard Error SSE and Standard Error of the Estimateof the EstimateSSE and Standard Error SSE and Standard Error of the Estimateof the Estimate

eSSSE

n k

where

1

2861

23 2 11196.

: n = number of observations

k = number of independent variables

SSE

ANOVAdf SS MS F P

Regression 2 8189.7 4094.9 28.63 .000Residual (Error) 20 2861.0 143.1Total 22 11050.7

Coefficient Determination Coefficient Determination (R(R22))Coefficient Determination Coefficient Determination (R(R22))

2

2

8189 723

11050 74741

1 12861017

11050 74741

R

R

SSR

SSYSSE

SSY

.

..

.

..

SSESSYY SSR

Adjusted RAdjusted R22Adjusted RAdjusted R22

adj

SSEn kSSYn

R..

. . .2

1 1

1

1

286101723 2 111050 74

23 1

1 285 715

SSYYSSEn-k-1n-1

Model-BuildingModel-BuildingModel-BuildingModel-Building

Stepwise RegressionForward SelectionBackward EliminationAll Possible Regressions

Stepwise RegressionStepwise RegressionStepwise RegressionStepwise Regression• Perform k simple regressions; and

select the best as the initial model

• Evaluate each variable not in the model– If none meet the criterion, stop– Add the best variable to the model; re-

evaluate previous variables, and drop any which are not significant

• Return to previous step

Forward SelectionForward SelectionForward SelectionForward Selection

Like stepwise, except variables are not re-evaluated after entering the model

Backward EliminationBackward EliminationBackward EliminationBackward Elimination

Start with the “full model” (all k predictors)

If all predictors are significant, stopOtherwise, eliminate the most non-

significant predictor; return to previous step

Data for Multiple Data for Multiple RegressionRegressionData for Multiple Data for Multiple RegressionRegression

Y World Crude Oil Production

X1 U.S. Energy Consumption

X2 U.S. Nuclear Generation

X3 U.S. Coal Production

X4 U.S. Dry Gas Production

X5 U.S. Fuel Rate for Autos

Y X1 X2 X3 X4 X5

55.7 74.3 83.5 598.6 21.7 13.3055.7 72.5 114.0 610.0 20.7 13.4252.8 70.5 172.5 654.6 19.2 13.5257.3 74.4 191.1 684.9 19.1 13.5359.7 76.3 250.9 697.2 19.2 13.8060.2 78.1 276.4 670.2 19.1 14.0462.7 78.9 255.2 781.1 19.7 14.4159.6 76.0 251.1 829.7 19.4 15.4656.1 74.0 272.7 823.8 19.2 15.9453.5 70.8 282.8 838.1 17.8 16.6553.3 70.5 293.7 782.1 16.1 17.1454.5 74.1 327.6 895.9 17.5 17.8354.0 74.0 383.7 883.6 16.5 18.2056.2 74.3 414.0 890.3 16.1 18.2756.7 76.9 455.3 918.8 16.6 19.2058.7 80.2 527.0 950.3 17.1 19.8759.9 81.3 529.4 980.7 17.3 20.3160.6 81.3 576.9 1029.1 17.8 21.0260.2 81.1 612.6 996.0 17.7 21.6960.2 82.1 618.8 997.5 17.8 21.6860.6 83.9 610.3 945.4 18.2 21.0460.9 85.6 640.4 1033.5 18.9 21.48

StepwiseStepwise: Step 1 - Simple Regression : Step 1 - Simple Regression Results Results for Each Independent Variablefor Each Independent Variable

StepwiseStepwise: Step 1 - Simple Regression : Step 1 - Simple Regression Results Results for Each Independent Variablefor Each Independent Variable

Dependent

Variable

Independent

Variable t-Ratio R2

Y X1 11.77 85.2%

Y X2 4.43 45.0%

Y X3 3.91 38.9%

Y X4 1.08 4.6%

Y X5 33.54 34.2%

All Possible Regressions All Possible Regressions with Five Independent with Five Independent VariablesVariables

All Possible Regressions All Possible Regressions with Five Independent with Five Independent VariablesVariables

6.20

Functional Forms of Regression

The term linear in a simple regression model means that there are linear in the parameters; variables in the regression model may or may not be linear.

6.21

True model is non-linear

Y

X

Income

Age6015

PRF

SRF

But run the wrong linear regression model and makes a wrong prediction

6.22

Yi = 0 + 1Xi + i

Examples of Linear Statistical Models

ln(Yi) = 0 + 1Xi + i

Yi = 0 + 1 ln(Xi) + i

Yi = 0 + 1Xi + i2

Examples of Non-linear Statistical Models

Yi = 0 + 1Xi + i

2

Yi = 0 + 1Xi + exp(2Xi) + i

Yi = 0 + 1Xi + i

2

Linear vs. Nonlinear

6.23

Different Functional Forms

5. Reciprocal (or inverse)

Attention to each form’s slope and elasticity

1. Linear2. Log-Log3. Semilog • Linear-Log or Log-Linear

4. Polynomial

6.24

Functional Forms of Regression models

Transform into linear log-form:

iXlnlnYln 1

iXY **

1

*

0

* iXlnYln

1

*

0==>

==>1

*

1 where

**

*

ln

ln

X

dX

Y

dY

Xd

Yd

dX

dY elasticity coefficient

2. Log-log model:ieXY

0

This is a non-linear model

6.25

Functional Forms of Regression modelsQ

uan

tity

Dem

and

Y

X

price

1

0 XY

lnY

lnX

XY lnlnln 10

lnY

lnX

XY lnlnln 10 Qu

anti

ty D

eman

d

price

Y

X

1

0 XY

6.26

Functional Forms of Regression models3. Semi log model:

Log-lin model or lin-log model:

iiiXY

10ln

iiiXY ln10

or

and

1

relative change in Y

absolute change in X YdXdY

dXY

dY

dXYd 1ln

1

absolute change in Y

relative change in X 1lnX

dXdY

XddY

6.27

5. Reciprocal (or inverse) transformations

i

i

i X

Y )1

(10

Functional Forms of Regression models(Cont.)

iii XY )(*

10==> Where

i

iX

X1*

4. Polynomial: Quadratic term to capture the nonlinear pattern

Yi= 0 + 1 Xi +2X2i + i

Yi

Xi

1>0, 2<0

Yi

Xi

1<0, 2>0

6.28

Some features of reciprocal model

XY

1

Y

0X

0

0and 01

Y

X

0

0

+

-

XY

1

00 and 01

Y

0

X0

01 /

00 and 01

Y

0

X0

01 /

00 and 01

6.29

Two conditions for nonlinear, non-additive equation transformation.

1. Exist a transformation of the variable.

2. Sample must provide sufficient information.

Example 1:Suppose

213

2

12110 XXXXY

transforming X2* = X1

2

X3* = X1X2

rewrite *

33

*

22110 XXXY

6.30

Example 2:

2

10

X

Y

transforming2

*

1

1

XX

*

110 XY rewrite

However, X1* cannot be computed, because is unknown.

2

6.31

Application of functional form regression

1. Cobb-Douglas Production function:

eKLY 0

Transforming:

KLY

KLY

lnlnln

lnlnlnln

210

210

==>

1ln

ln

Ld

Yd

2ln

ln

Kd

Yd

: elasticity of output w.r.t. labor input

: elasticity of output w.r.t. capital input.

121

>

<Information about the scale of returns.

6.32

2. Polynomial regression model: Marginal cost function or total cost function

costs

y

MC

i.e.

costs

y

XXY 2

210 (MC)

orcosts

y

TCXXXY 3

3

2

210 (TC)

6.33

25325.1304.100 MPNG ^

(1.368) (39.20)

Linear model

6.34

GNP = -1.6329.21 + 2584.78 lnM2

(-23.44)

(27.48)

^

Lin-log model

6.35

lnGNP = 6.8612 + 0.00057 M2(100.38) (15.65)

^

Log-lin model

6.36

2ln9882.05529.0ln MNPG ^

(3.194) (42.29)

Log-log model

6.37

Wage(y)

unemp.(x)

SRF

10.43wage=10.343-3.808(unemploy)

(4.862) (-2.66)

^

6.38

)1

(x

y

SRF-1.428

uN

uN: natural rate of unemployment

Reciprocal Model

(1/unemploy)

Wage = -1.4282+8.7243 )1

(x

(-.0690)

(3.063)

^

The 0 is statistically insignificantTherefore, -1.428 is not reliable

6.39

lnwage = 1.9038 - 1.175ln(unemploy)

(10.375) (-2.618)

^

6.40

Lnwage = 1.9038 + 1.175 ln )1

(X

(10.37) (2.618)

^

Antilog(1.9038) = 6.7113, therefore it is a more meaningful and statistically significant bottom line for min. wage

Antilog(1.175) = 3.238, therefore it means that one unit X increase will have 3.238 unit decrease in wage

6.41

(MacKinnon, White, Davidson)

MWD Test for the functional form (Wooldridge, pp.203)

Procedures:

1. Run OLS on the linear model, obtain Y ^

Y = 0 + 1 X1 + 2 X2 ^ ^ ^ ^

2. Run OLS on the log-log model and obtain lnY

lnY = 0 + 1 ln X1 + 2 ln X2^ ^ ^ ^

3. Compute Z1 = ln(Y) - lnY ^^

4. Run OLS on the linear model by adding z1

Y = 0’ + 1’ X1 + 2’ X2 + 3’ Z1 ^ ^ ^ ^ ^

and check t-statistic of 3’

If t*3

> tc ==> reject H0 : linear model

If t*3

< tc ==> not reject H0 : linear model

6.42

MWD test for the functional form (Cont.)

5. Compute Z2 = antilog (lnY) - Y^ ^

6. Run OLS on the log-log model by adding Z2

lnY = 0’ + 1’ ln X1 + 2’ ln X2 + 3’ Z2^ ^ ^ ^ ^

If t*3

> tc ==> reject H0 : log-log model

If t*3

< tc ==> not reject H0 : log-log model

and check t-statistic of ’3

6.43

MWD TEST: TESTING the Functional form of regression

CV1 =

Y _ =

1583.279

24735.33

= 0.064

^

Y

Example:(Table 7.3)Step 1:Run the linear modeland obtain

C

X1

X2

6.44

lnY

fitted or

estimated

Step 2:Run the log-log modeland obtain

C

LNX1

LNX2

CV2 =

Y _ =

0.07481

10.09653= 0.0074

^

6.45

MWD TEST

tc0.05, 11 = 1.796

tc0.10, 11 = 1.363

t* < tc at 5%=> not reject H0

t* > tc at 10%=> reject H0

Step 4:H0 : true model is linear

C

X1

X2

Z1

6.46

MWD Testtc

0.025, 11 = 2.201

tc0.05, 11 = 1.796

tc0.10, 11 = 1.363

Since t* < tc

=> not reject H0

Comparing the C.V. =C.V.1

C.V.2

=0.064

0.0074

Step 6:

H0 : true model is log-log model

CLNX1LNX2Z2

6.47

Y

^The coefficient of variationcoefficient of variation:

C.V. =

It measures the average error of the sample regression function relative to the mean of Y.

Linear, log-linear, and log-log equations can be meaningfully compared.

The smaller C.Vsmaller C.V. of the model, the more preferredmore preferred equationequation (functional model).

Criterion for comparing two different functional models:

6.48

= 4.916 means that model 2 is better

Coefficient Variation (C.V.)

/ Y of model 1 ^

/ Y of model 2 ^

= 2.1225/89.612

0.0217/4.4891=

0.0236

0.0048

Compare two different functional form models:

Model 1linear model

Model 2log-log model

TUGAS INDIVIDU:TUGAS INDIVIDU:1. Cari sebarang data (buku, web)1. Cari sebarang data (buku, web)2. Tentukan model awal (berdasar teori):2. Tentukan model awal (berdasar teori): model linear dan model log-linear model linear dan model log-linear3. Lakukan uji MWD3. Lakukan uji MWD4. Interpretasikan hasilnya4. Interpretasikan hasilnya

Pengumpulan: Pengumpulan: - Minggu Depan (17/09/2012)- Minggu Depan (17/09/2012)- Print out - Print out