Post on 28-Oct-2014
Ch. 1Ch. 1Regresi: “Regresi: “Model Building Model Building Methodology”Methodology”
Setyo Tri WahyudiSetyo Tri Wahyudi
PendahuluanPendahuluan
Korelasi: Ukuran kekuatan hubungan antara 2 variabel. Misal X1 dengan X2.Nilai antara 0-1; nilai 0 semakin tidak berhubungan (tidak berkorelasi); nilai 1 korelasi sempurna.
Regresi: Suatu proses pembentukan model matematika atau fungsi yang dapat digunakan untuk prediksi atau penentuan suatu variabel oleh variabel lainnya.
Macam-macam RegresiMacam-macam Regresi Regresi Linear
Regresi linier ialah bentuk hubungan di mana variabel bebas X maupun variabel tergantung Y sebagai faktor yang berpangkat satu.
Regresi linier ini dibedakan menjadi:1). Regresi linier sederhana dengan bentuk fungsi:
Y = a + bX + e,2). Regresi linier berganda dengan bentuk fungsi:
Y = b0 + b1 X1 + . . . + b1 X1 + e
Dari kedua fungsi di atas 1) dan 2); masing-masing berbentuk garis lurus (linier sederhana) dan bidang datar (linier berganda).
Regresi Non-LinearRegresi non linier ialah bentuk hubungan atau fungsi di mana variabel X dan atau variabel Y dapat berfungsi sebagai faktor atau variabel dengan pangkat tertentu. Beberapa bentuk regresi non linier adalah sebagai berikut:1). Regresi polinomial ialah regresi dengan sebuah variabel bebas sebagai faktor dengan pangkat terurut.
Y = a + bX + cX2 (fungsi kuadratik).Y = a + bX + cX2 + bX2 (fungsi kubik)Y = a + bX + cX2 + dX2 + eX4 (fungsi kuartik),Y = a + bX + cX2 + dX3 + eX4 + fX5 (fungsi kuinik), dan
seterusnya.2). Regresi hiperbola (fungsi resiprokal)Pada regresi hiperbola, di mana variabel bebas X atau variabel tak bebas Y, dapat berfungsi sebagai penyebut sehingga regresi ini disebut regresi dengan fungsi pecahan atau fungsi resiprok. Regresi ini mempunyai bentuk fungsi seperti:
1/Y = a + Bx3). Regresi eksponensialRegresi eksponensial ialah regresi di mana variabel bebas X berfungsi sebagai pangkat atau eksponen. Bentuk fungsi regresi ini adalah: Y = a ebX
Regresi sederhana vs Regresi sederhana vs bergandaberganda
• Sederhana: terdapat dua variabel dalam model– dependent variable, the variable to be
predicted, usually called Y– independent variable, the predictor or
explanatory variable, usually called X
Y = Y = 00 + + 11XX11 + +
• Berganda: terdapat dua atau lebih variabel independen
Y = Y = 00 + + 11XX11 + + 22XX22 + + 33XX33 + . . . + + . . . + kkXXkk+ +
Evaluating Regression ModelEvaluating Regression ModelEvaluating Regression ModelEvaluating Regression Model
H
Hk
a
01 2 3
0:
:
At least one of the regression coefficients is 0
H
H
H
H
H
H
H
H
a a
a
k
ak
01
1
03
3
02
2
0
0
0
0
0
0
0
0
0
:
:
:
:
:
:
:
:
SignificanceTests for
IndividualRegressionCoefficients
Testingthe
OverallModel
Testing the Overall Model (F Testing the Overall Model (F test)test)Testing the Overall Model (F Testing the Overall Model (F test)test)
0 is tscoefficien regression theof oneleast At :
0: 21
0
aH
H
MSRSSR
kMSE
SSE
n kF
MSR
MSE
1
ANOVAdf SS MS F p
Regression 2 8189.723 4094.86 28.63 .000Residual (Error) 20 2861.017 143.1Total 22 11050.74
. , , .
. . ,
01 2 20 585
28 63 585
FFCal
reject H .0
Significance Test of the Significance Test of the Regression Coefficients (t Regression Coefficients (t
test)test)
Significance Test of the Significance Test of the Regression Coefficients (t Regression Coefficients (t
test)test)H
H
H
H
a
a
01
1
02
2
0
0
0
0
:
:
:
:
tCal = 5.63 > 2.086, reject H0.
t.025,20 = 2.086
Residuals and Sum of Residuals and Sum of Squares ErrorSquares ErrorResiduals and Sum of Residuals and Sum of Squares ErrorSquares Error
SSE
Y Y Y 2
Y Y Y Y Y 2
Y Y
SSE and Standard Error SSE and Standard Error of the Estimateof the EstimateSSE and Standard Error SSE and Standard Error of the Estimateof the Estimate
eSSSE
n k
where
1
2861
23 2 11196.
: n = number of observations
k = number of independent variables
SSE
ANOVAdf SS MS F P
Regression 2 8189.7 4094.9 28.63 .000Residual (Error) 20 2861.0 143.1Total 22 11050.7
Coefficient Determination Coefficient Determination (R(R22))Coefficient Determination Coefficient Determination (R(R22))
2
2
8189 723
11050 74741
1 12861017
11050 74741
R
R
SSR
SSYSSE
SSY
.
..
.
..
SSESSYY SSR
Adjusted RAdjusted R22Adjusted RAdjusted R22
adj
SSEn kSSYn
R..
. . .2
1 1
1
1
286101723 2 111050 74
23 1
1 285 715
SSYYSSEn-k-1n-1
Model-BuildingModel-BuildingModel-BuildingModel-Building
Stepwise RegressionForward SelectionBackward EliminationAll Possible Regressions
Stepwise RegressionStepwise RegressionStepwise RegressionStepwise Regression• Perform k simple regressions; and
select the best as the initial model
• Evaluate each variable not in the model– If none meet the criterion, stop– Add the best variable to the model; re-
evaluate previous variables, and drop any which are not significant
• Return to previous step
Forward SelectionForward SelectionForward SelectionForward Selection
Like stepwise, except variables are not re-evaluated after entering the model
Backward EliminationBackward EliminationBackward EliminationBackward Elimination
Start with the “full model” (all k predictors)
If all predictors are significant, stopOtherwise, eliminate the most non-
significant predictor; return to previous step
Data for Multiple Data for Multiple RegressionRegressionData for Multiple Data for Multiple RegressionRegression
Y World Crude Oil Production
X1 U.S. Energy Consumption
X2 U.S. Nuclear Generation
X3 U.S. Coal Production
X4 U.S. Dry Gas Production
X5 U.S. Fuel Rate for Autos
Y X1 X2 X3 X4 X5
55.7 74.3 83.5 598.6 21.7 13.3055.7 72.5 114.0 610.0 20.7 13.4252.8 70.5 172.5 654.6 19.2 13.5257.3 74.4 191.1 684.9 19.1 13.5359.7 76.3 250.9 697.2 19.2 13.8060.2 78.1 276.4 670.2 19.1 14.0462.7 78.9 255.2 781.1 19.7 14.4159.6 76.0 251.1 829.7 19.4 15.4656.1 74.0 272.7 823.8 19.2 15.9453.5 70.8 282.8 838.1 17.8 16.6553.3 70.5 293.7 782.1 16.1 17.1454.5 74.1 327.6 895.9 17.5 17.8354.0 74.0 383.7 883.6 16.5 18.2056.2 74.3 414.0 890.3 16.1 18.2756.7 76.9 455.3 918.8 16.6 19.2058.7 80.2 527.0 950.3 17.1 19.8759.9 81.3 529.4 980.7 17.3 20.3160.6 81.3 576.9 1029.1 17.8 21.0260.2 81.1 612.6 996.0 17.7 21.6960.2 82.1 618.8 997.5 17.8 21.6860.6 83.9 610.3 945.4 18.2 21.0460.9 85.6 640.4 1033.5 18.9 21.48
StepwiseStepwise: Step 1 - Simple Regression : Step 1 - Simple Regression Results Results for Each Independent Variablefor Each Independent Variable
StepwiseStepwise: Step 1 - Simple Regression : Step 1 - Simple Regression Results Results for Each Independent Variablefor Each Independent Variable
Dependent
Variable
Independent
Variable t-Ratio R2
Y X1 11.77 85.2%
Y X2 4.43 45.0%
Y X3 3.91 38.9%
Y X4 1.08 4.6%
Y X5 33.54 34.2%
All Possible Regressions All Possible Regressions with Five Independent with Five Independent VariablesVariables
All Possible Regressions All Possible Regressions with Five Independent with Five Independent VariablesVariables
6.20
Functional Forms of Regression
The term linear in a simple regression model means that there are linear in the parameters; variables in the regression model may or may not be linear.
6.21
True model is non-linear
Y
X
Income
Age6015
PRF
SRF
But run the wrong linear regression model and makes a wrong prediction
6.22
Yi = 0 + 1Xi + i
Examples of Linear Statistical Models
ln(Yi) = 0 + 1Xi + i
Yi = 0 + 1 ln(Xi) + i
Yi = 0 + 1Xi + i2
Examples of Non-linear Statistical Models
Yi = 0 + 1Xi + i
2
Yi = 0 + 1Xi + exp(2Xi) + i
Yi = 0 + 1Xi + i
2
Linear vs. Nonlinear
6.23
Different Functional Forms
5. Reciprocal (or inverse)
Attention to each form’s slope and elasticity
1. Linear2. Log-Log3. Semilog • Linear-Log or Log-Linear
4. Polynomial
6.24
Functional Forms of Regression models
Transform into linear log-form:
iXlnlnYln 1
iXY **
1
*
0
* iXlnYln
1
*
0==>
==>1
*
1 where
**
*
ln
ln
X
dX
Y
dY
Xd
Yd
dX
dY elasticity coefficient
2. Log-log model:ieXY
0
This is a non-linear model
6.25
Functional Forms of Regression modelsQ
uan
tity
Dem
and
Y
X
price
1
0 XY
lnY
lnX
XY lnlnln 10
lnY
lnX
XY lnlnln 10 Qu
anti
ty D
eman
d
price
Y
X
1
0 XY
6.26
Functional Forms of Regression models3. Semi log model:
Log-lin model or lin-log model:
iiiXY
10ln
iiiXY ln10
or
and
1
relative change in Y
absolute change in X YdXdY
dXY
dY
dXYd 1ln
1
absolute change in Y
relative change in X 1lnX
dXdY
XddY
6.27
5. Reciprocal (or inverse) transformations
i
i
i X
Y )1
(10
Functional Forms of Regression models(Cont.)
iii XY )(*
10==> Where
i
iX
X1*
4. Polynomial: Quadratic term to capture the nonlinear pattern
Yi= 0 + 1 Xi +2X2i + i
Yi
Xi
1>0, 2<0
Yi
Xi
1<0, 2>0
6.28
Some features of reciprocal model
XY
1
Y
0X
0
0and 01
Y
X
0
0
+
-
XY
1
00 and 01
Y
0
X0
01 /
00 and 01
Y
0
X0
01 /
00 and 01
6.29
Two conditions for nonlinear, non-additive equation transformation.
1. Exist a transformation of the variable.
2. Sample must provide sufficient information.
Example 1:Suppose
213
2
12110 XXXXY
transforming X2* = X1
2
X3* = X1X2
rewrite *
33
*
22110 XXXY
6.30
Example 2:
2
10
X
Y
transforming2
*
1
1
XX
*
110 XY rewrite
However, X1* cannot be computed, because is unknown.
2
6.31
Application of functional form regression
1. Cobb-Douglas Production function:
eKLY 0
Transforming:
KLY
KLY
lnlnln
lnlnlnln
210
210
==>
1ln
ln
Ld
Yd
2ln
ln
Kd
Yd
: elasticity of output w.r.t. labor input
: elasticity of output w.r.t. capital input.
121
>
<Information about the scale of returns.
6.32
2. Polynomial regression model: Marginal cost function or total cost function
costs
y
MC
i.e.
costs
y
XXY 2
210 (MC)
orcosts
y
TCXXXY 3
3
2
210 (TC)
6.33
25325.1304.100 MPNG ^
(1.368) (39.20)
Linear model
6.34
GNP = -1.6329.21 + 2584.78 lnM2
(-23.44)
(27.48)
^
Lin-log model
6.35
lnGNP = 6.8612 + 0.00057 M2(100.38) (15.65)
^
Log-lin model
6.36
2ln9882.05529.0ln MNPG ^
(3.194) (42.29)
Log-log model
6.37
Wage(y)
unemp.(x)
SRF
10.43wage=10.343-3.808(unemploy)
(4.862) (-2.66)
^
6.38
)1
(x
y
SRF-1.428
uN
uN: natural rate of unemployment
Reciprocal Model
(1/unemploy)
Wage = -1.4282+8.7243 )1
(x
(-.0690)
(3.063)
^
The 0 is statistically insignificantTherefore, -1.428 is not reliable
6.39
lnwage = 1.9038 - 1.175ln(unemploy)
(10.375) (-2.618)
^
6.40
Lnwage = 1.9038 + 1.175 ln )1
(X
(10.37) (2.618)
^
Antilog(1.9038) = 6.7113, therefore it is a more meaningful and statistically significant bottom line for min. wage
Antilog(1.175) = 3.238, therefore it means that one unit X increase will have 3.238 unit decrease in wage
6.41
(MacKinnon, White, Davidson)
MWD Test for the functional form (Wooldridge, pp.203)
Procedures:
1. Run OLS on the linear model, obtain Y ^
Y = 0 + 1 X1 + 2 X2 ^ ^ ^ ^
2. Run OLS on the log-log model and obtain lnY
lnY = 0 + 1 ln X1 + 2 ln X2^ ^ ^ ^
3. Compute Z1 = ln(Y) - lnY ^^
4. Run OLS on the linear model by adding z1
Y = 0’ + 1’ X1 + 2’ X2 + 3’ Z1 ^ ^ ^ ^ ^
and check t-statistic of 3’
If t*3
> tc ==> reject H0 : linear model
If t*3
< tc ==> not reject H0 : linear model
6.42
MWD test for the functional form (Cont.)
5. Compute Z2 = antilog (lnY) - Y^ ^
6. Run OLS on the log-log model by adding Z2
lnY = 0’ + 1’ ln X1 + 2’ ln X2 + 3’ Z2^ ^ ^ ^ ^
If t*3
> tc ==> reject H0 : log-log model
If t*3
< tc ==> not reject H0 : log-log model
and check t-statistic of ’3
6.43
MWD TEST: TESTING the Functional form of regression
CV1 =
Y _ =
1583.279
24735.33
= 0.064
^
Y
Example:(Table 7.3)Step 1:Run the linear modeland obtain
C
X1
X2
6.44
lnY
fitted or
estimated
Step 2:Run the log-log modeland obtain
C
LNX1
LNX2
CV2 =
Y _ =
0.07481
10.09653= 0.0074
^
6.45
MWD TEST
tc0.05, 11 = 1.796
tc0.10, 11 = 1.363
t* < tc at 5%=> not reject H0
t* > tc at 10%=> reject H0
Step 4:H0 : true model is linear
C
X1
X2
Z1
6.46
MWD Testtc
0.025, 11 = 2.201
tc0.05, 11 = 1.796
tc0.10, 11 = 1.363
Since t* < tc
=> not reject H0
Comparing the C.V. =C.V.1
C.V.2
=0.064
0.0074
Step 6:
H0 : true model is log-log model
CLNX1LNX2Z2
6.47
Y
^The coefficient of variationcoefficient of variation:
C.V. =
It measures the average error of the sample regression function relative to the mean of Y.
Linear, log-linear, and log-log equations can be meaningfully compared.
The smaller C.Vsmaller C.V. of the model, the more preferredmore preferred equationequation (functional model).
Criterion for comparing two different functional models:
6.48
= 4.916 means that model 2 is better
Coefficient Variation (C.V.)
/ Y of model 1 ^
/ Y of model 2 ^
= 2.1225/89.612
0.0217/4.4891=
0.0236
0.0048
Compare two different functional form models:
Model 1linear model
Model 2log-log model
TUGAS INDIVIDU:TUGAS INDIVIDU:1. Cari sebarang data (buku, web)1. Cari sebarang data (buku, web)2. Tentukan model awal (berdasar teori):2. Tentukan model awal (berdasar teori): model linear dan model log-linear model linear dan model log-linear3. Lakukan uji MWD3. Lakukan uji MWD4. Interpretasikan hasilnya4. Interpretasikan hasilnya
Pengumpulan: Pengumpulan: - Minggu Depan (17/09/2012)- Minggu Depan (17/09/2012)- Print out - Print out