Brief Introduction SEEMINGLY UNRELATED REGRESSION (SUR)

19
Brief Introduction SEEMINGLY UNRELATED REGRESSION (SUR) APPLICATION and DEVELOPMENT ALDON MHP SINAGA Yogyakarta, 2015

Transcript of Brief Introduction SEEMINGLY UNRELATED REGRESSION (SUR)

Brief Introduction SEEMINGLY UNRELATED REGRESSION

(SUR) APPLICATION and DEVELOPMENT

ALDON MHP SINAGA

Yogyakarta, 2015

i

CONTENTS INTRODUCTION ................................................................................... 1

About SUR ........................................................................................ 1

SUR Model Spesification and Estimation ..................................................... 2

IMPLEMENTATION OF SUR ON RESEARCH .................................................... 6

SUR in Environmental Research ............................................................... 6

SUR in Animal Husbandry Research ........................................................... 7

SUR in Agricultural Production Research .................................................... 9

SUR in Agricultural Economics Research ................................................... 10

Applied SUR Overview ........................................................................ 12

Application Approaches ................................................................... 12

Application Benefit ......................................................................... 13

END NOTES AND CONCLUSION ................................................................ 15

Development of SUR Model .................................................................. 15

Conclusion ...................................................................................... 15

REFFERENCE ..................................................................................... 17

COPY OF ARTICLES ............................................. Error! Bookmark not defined.

1

INTRODUCTION

About SUR

In some cases during our research we often found that we are dealing with

several number of equation which contain an amount of same regressors. Those

regressors some way has known influence some different regresand. Let says if we

want to learn about investment on two big company, as illustrated by Zellner (1962).

As we have two situation of investment at two company (say Westinghouse and

General Electrics). As we describe the situation as two different equation, and assume

that in every equation real gross investment were determine by an outstanding shares

at the beginning period and the opening value of each companies capital stock. On

those condition as the data for the equation came from the same periods we can

expect that error terms of those two equation may be contemporaneous correlated

(Zellner, 1962).

Correlation of the error terms comes up as given presence of common market

forces condition and the likelihood of similar factors which affect the inclusion of

error terms in the regression. In instance, if error terms from the first equation is

related to some unobservable variable, which mostly also followed by the other

equation error terms, so we can find that error terms of each equation are

correlated. This condition made those two equation “seemingly”unrelated.

James Thronton (2013) has noted that Seemingly Unrelated Regression (SUR) is

one of four types equation system which contains with two or more particulary

related equation. Beside SUR, we also known (1). Simultaneous Equation System, (2).

Recursive Equation System, and (3). Block Recursive System. In several ways, SUR

have a similar with Simultaneous Equation System, but have a basic differences in

how the the equations are related one to another.

In SUR equation system, the equations are related based on one or both these

following ways :

1. The error terms of each equations are related. The error terms are

correlated if there are a common unobserved factors which influence the

dependent variables of the equations.

2

2. There are a relation among the parameters of each different equations.

This condition will occurs if (a). the same parameters appears in more

than one equation, or (b). if a parameters in one equation has a linear

or nonlinear function / relation to the parameters in the other

equations.

(Thronton, 2013)

Mark Beasley (2008) has clearly mentioned that SUR models are often applied

when we faced several equations, which looks like to be unrelated; but however, they

might be related by the fact that:

1. Some coefficients were assumed to be the same or zero;

2. The disturbances wereknown well to be correlated across equations; and/or

3. The variables at right hand side subset of equation (independent variables /

regressors subset) are the same.

There are many economic arguments or phenomenon which best described by a

Seemingly Unrelated Regression equation system. For an examples there are : (1)

Investment demand equations for several firms / company at the same industry. (2)

Consumer demand function / equations which implied by behavior in maximizing

utility. And (3) Input demand function / equations which implied by behavior of the

company to minimizing cost and maximizing profit (Thronton, 2013).

SUR Model Spesification and Estimation

From the background described before, Thronton (2013) has describing SUR

spesification and estimation. We can assume that if there is a M number of equations

that are related each other because the error terms are correlated. This M number of

seemingly unrelated regression equations can be written with matrix format :

y1 = X11 + 1

y2 = X22 + 2

y3 = X33 + 3

3

.

.

yM = XMM + M

or can be written as follows :

yi = Xii + i

asi = 1,2,3 .., M

If time is represented by T, thus ;

in matrix form, yiisT x 1 column vector of observation on the ith dependent

variable

while Xi is T x K matrix of observation for K-1 explanatory variables;

i is the Kx1 column vector of parameters for the ith equation;

andi is the Tx1 column vector of disturbances for the ith equation.

Now, we can combine this M number of equation as one large equation, which can

formed as matrix as follows;

y1 X1 0 0 0 0 0 0 1 1

y2 0 X2 0 0 0 0 0 2 2

y3 0 0 X3 0 0 0 0 3 3 . = 0 0 0 . 0 0 0 x . + . . 0 0 0 0 . 0 0 . . . 0 0 0 0 0 . 0 . .

yM 0 0 0 0 0 0 XM M M

(MT)x1 (MT)x(MK) (MK)x1 (MT)x1

This multiple equation can be simply written as;

y = X +

where :

y = ( M . T ) x 1 column vector of observation on the M number of equation of regressand variable

X = ( M . T ) x ( M . K) matrix of observation on regressors variable

= ( M . K ) x 1 column vector of parameter for M number of equation

= ( M . T ) x 1 column vector of disturbances for M number of equation

Those spesification above has defined by the assumption as follow;

1. The functional form of SUR equation is linear in parameters.

y = X +

4

2. The mean of error term in SUR equation is equal to zero.

E() = 0

3. The errors of SUR equation are follow variance-covariance matrix of errors

Cov() = E(T) = W = I,

thus it must satisfy the assumptions that;

a. The error variance for every individual equation which be a part of SUR is constant (no heteroscedasticity)

b. The error variance may be different for every individual equations. c. The errors for every individual equation which be a part of SUR are

uncorrelated (no autocorrelation) d. The errors for different individual equations arecontemporaneously

correlated, following this pattern;

For time series data, the errors in different equations at the same time are correlated, while the errors in different equations for different time are notcorrelated.

For cross section data, the errors in different equations for the same decision makingunits(samples of respondents) are correlated, mean while the errors in different equations for different decision making units (samples of respondents) are not correlated.

4. The error term for SUR equation has a normal distribution

~ N

5. The error term of SUR equation is uncorrelated with every independent

variable in the SUR equation.

Cov (,X) = 0

Finally, to obtain estimation of the parameters (i)of the SUR model, we need

to choose an estimator. There are 4 option of estimator which we can employ to

estimate i, ;

1. Ordinary least squares (OLS), the estimator is following the rule

^ = (XTX)-1XTy,

which can beeasily employto estimate vector ( M . K ) x 1 in the single

equation. OLS estimator is unbiased but inefficient, that because the error

5

of SUR equation are non-spherical, and OLS fail to use the information

about this error to estimates the parameter(i).

2. Generalized least squares (GLS), GLS is estimate the parameter (i)

following the rule

^GLS = (XTW-1X)-1XT W-1ythat equal to^GLS = [XT (-1I) X]-1XT (-1I)y.

Therule can be applied directly to SUR Equation where W is the (M.T)x(M.T)

error variance-covariance matrix for SUR equation. GLS estimator is

unbiased, efficient and also fulfill the maximum likelihood requiremet.

GLS is more precise than OLS since GLS uses the information about

disturbances contained in W in the estimation of paramaters. GLS has a

major limitation, as GLS is not a feasible estimators. Its appear due the

fact that we cannot determine the elements of error variance-covariance

matrix (W).

3. Feasible generalized least squares (FGLS), in order to improve feasibility

of GLS estimator we can use FGLS. FGLS is followed by the rule

^FGLS = (XTW-1^X)-1XT W-1^yequal to^FGLS = [XT (-1^I) X]-1XT (-1^I) y

As mentioned before, the limitation of GLS is its feasibility. In case to

improve it, we need to estimates W. We can estimate W^ by using a sample

data. Then by replacing W with estimates W^, will result to FGLS

estimator. The most popular method in estimating W is Zellner’s method.

FGLS method has known to be unbiased, efficient and consistent estimators.

Some studies also suggest that FGLS has a smaller variance compare to OLS

estimators.

4. Iterated feasible least squares (ITGLS), this method is another alternative of

FGLS. As described from the previous method (FGLS), to determine

estimate W (W^) we can use Zellner’s Iteration SUR method(ISUR)

estimators. As a FGLS method, this ITGLS method has deliver unbiased,

efficient and consistent results. In Advantage, eventough there are many

debates, this methods has proven that yields better estimators when we use

it for small samples.

6

IMPLEMENTATION OF SUR ON RESEARCH

As described above, there are many researches and phenomenon are best describe by

SUR. Since there are two or more equation which looks like unrelated but have a

relation in order of its error, it may accurately and efficiently estimates by SUR.

There are some research which has employ SUR which can use to learn and describe

implementation of SUR

SUR in Environmental Research

Zaman et.al. (2011), has employ SUR for their research upon The impact of

population on environmental degradation in South Asia. This study conducted to

undertake an empirical fact for relationship between population and environmental

degradation along 1985-2009 for three countries i.e., India, Pakistan and Sri Lanka.

Researcher has consider a balanced panel data of three countries over the

period of 25 years from 1985-2009. The data were taken from World Development

Indicators pub- lished by the World Bank (2009). The research has use four variables

i.e., arable land = AL, carbondiaoxide emissions = CO2, population growth = PG and

population density = PD.

To study the implication of population indicators (growh and density) to the

environment in, researches has used two environmental indicators (CO2 and AL) as

independent variables separately covering the period of study. Thus they have

estimated two simple nonlinear population-environment models which have been

specified as follow:

logALit = α+ 1rLogPGit +2rLogPDit +it

logCOit = ω+ 1rLogPGit +2rLogPDit +it

where AL is the wide of arable land (hectares); CO2 are level of carbondiaoxide

emissions (Kt); PG is the rate of population growth (annual %); PD is the umber of

population density (people per km2), time is represent by t = 1, 2…25 periods; and i =

1...3 are representation of each countries.

To analyze the empirical model, as described above, Zaman et.al. has conducted

several sequences procedures of analysis as follow :

7

1. Panel Unit Root Test, used to check stationary condition of data. It’s a

common procedure engage time series data. To do the test, researches

employed The ImPesaran Shin (IPS) test (2003).

2. Panel Cointegration Test, used to check a long run conintegration between

variables. This test will shows whether there are proper to employ SUR

estimation.

3. And finally the SUR Estimation, which employed to estimates relationship

between population and environmental indicators.

The results of this study has reveal that population density significantly cause on

increase of arable land in all three countries. The population growth and population

density cause an increase of carbondioxide emission in Pakistan. While only

population growth has significantly causes carbondioxide emission in India and Sri

Lanka.

Except pronounciation of “non-linear” model for population and environmental

relationship, researchers of this study has conducted the research properly.

Employing every procedures of analysis ato the appropriate.

SUR in Animal Husbandry Research

The research titled :The Use of Seemingly Unrelated Regression (SUR) to

Predict the Carcass Composition of Lambs, has shwn different type of application of

SUR. The research has aimed to develop and evaluate models for predicting the

carcass composition of lambs (Cadavez and Henningsen, 2011).

As the research mentioned five equation as below;

LMP = α0 +α1 HCW +α2C12+α3 E2+ε1 ............................................. (1)

SFP = β0 +β1 HCW +β2C12+β3 E2+ε2 .............................................. (2)

IFP = γ0 +γ1 HCW +γ2C12+γ3 E2+ε3 ............................................... (3)

BP = δ0 +δ1 HCW +δ2C12+δ3 E2+ε4 ............................................... (4)

KCFP = θ0 +θ1 HCW +θ2C12+θ3 E2+ε5 ............................................ (5)

8

Where LMP = lean meat proportion, SFP = subcutaneous fat proportion, IFP =

intermuscular fat proportion, BP = bone plus remainders pro- portion, and KCFP =

kidney knob and channel fat proportion, assumed determined by HCW = Hot carcass

weight. The equation also included C12 = the subcutaneous fat thickness (mm) which

measure between the 12th and 13th rib, and E2 = the total breast bone tissue

thickness (mm) which was taken with a sharpened steel rule in the middle of the 2nd

sternebrae.

The research has compared results of OLS to those five equation above with

SUR results, which generated through Generalized Last Square (GLS). The analysis

shown results below;

The results has shown that SUR technique has able to generate the results that

relevant for implementing objective carcass classification systems. Theimportant

results of this study has revealed that the Hot Carcass Weight (HCW) and The Total

Breast Bone Tissue Thickness (E2) measurement are the most relevant predictors of

carcass tissues indicators.Although the performance of predictors might be different

for different lamb populations (other breeds or production systems).

Statistically by comparison to OLS, SUR estimator provides the lowest standard errors

of the estimated parameters and thus, the highest precision of the estimation.

9

SUR in Agricultural Production Research

Determinants of profitability in rain-fed paddy rice production in Ikenne

Agricultural Zone, Ogun State, Nigeria was a research title, which conducted by O.O.

Olubanjo& O. Oyebanjo (2005). This research was focused to analyzed the effect of

farm inputs use on the profitability of rain-fed paddy rice production in some

Agricultural Zone, at Ogun State, Nigeria.

This research has ultilizedthe Cobb-Douglas type profit function which fitted to

the data. SUR has employed to estimates the determinants factor which effect

profitability of paddy production upon 6 different rice production communities. The

profit function has drawn as equation below :

InΠi* = InAi* + µi1* Inp + µi2* Inr + µi3* Inw + bi1* InK + bi2* InT + bi3*lnD

Where :

Πi* = the UOP profit (i.e. revenue less variable costs divided by the unit price of output),

p = price per kg of seed r = price per kg of fertiliser w = wage rate per manday K = interest on fixed capital per farm, T = Farm size cultivated in hectares per farm, and D = Dummy for variety of seed cultivated (0 = local and 1 = improved) i = 1,2,3 … 6 (different rice production communities)

The results of this research can be described as a table below. This research

has revealed that the elasticity of the profit function has increased as quantity of

fertiliser applied and farm size cultivated increase. Elasticity of profit function also

decreased in respect due to increased use of labor cost and seeds expenses. Farmer

also tends to increase the number of Fertilizer as this will enhance productivity and

ensure increased profitability of rain-fed paddy rice production.

10

SUR in Agricultural Economics Research

Other study has come from NidhiyaMenon (2007), whose was conducting

research on Rainfall Uncertainty and Occupational Choice in Agricultural Households

of Rural Nepal. This reseach has an objective to why that households strive to

diversify their sources of income, eventhough agriculture was main occupied job in

Nepal.

Menon (2007) has modeled the house hold choices to alternative of working

expectation by this equation as described below :

where :

Pijw = is the probability that non-head member i in household j in ward (region) w is engaged in the same occupation as his/her household head.

X1ijw and X2jw = exogenous variables, specific to member i and household j that influence the choice of occupations. These variables are:age, gender, land ownership, number of males and females over ages 10 years in the household and the total number of dependents.

11

X3w = variables specific to the ward that effect occupational choice COVw = coefficient of variation of rain

As the researcher has differentiated P as a probability of householdworkers

classified as self-employed (Pse) and probability of household workers classified as

wages-employed (Pwe). And as both dependent variables has depends on the same

regressors. Menon has found that two different equation has appropriate for use of

Zellner’s seemingly unrelated regression (SUR) model.

The research has found that occupational choice is highly correlated to the

uncertainty associated with historical rainfall patterns. Where the head is employed

in agriculture, other family members are less likely to choose agriculture as an

occupation in districts where rain is more uncertain. Estimates indicate that for a 1%

increase in the coefficient of variation of rain, there is a 0.61% decrease in the

probability of choosing the same occupation as the household head, where the head is

classified as self-employed in agriculture. The negative effect of rainfall uncertainty

on occupational choice is less evident in households that have access to credit, and in

households with relatively high levels of human capital.

The same study has come from Feng and Heerink(2008). They were conducting

research which aimed to examines the factors that determine the participation off

farm households in land renting and migration, and investigates whether participation

in land renting and migration influence each other. As the research has assumed that

land renting and migration are determined by household characteristics, Other Fixed

vectors (as; number of dependants, number of durable assets, number of cattle, age

of households, age of adults, education of households, education of adults, female-

male ratio, irrigated land per adults, possession of land contracts, land transfer

rights), labor endowments, and household land endowments, this research has

conclude the models as below :

Where;

12

R = dummy variable for land renting (1if the household rented land) M = dummy variable for migration (1if there was at least one

household memberinvolved in migration) ZH = a vector of household characteristics ZF = a vector of fixed factors L = household labour endowment A = household land endowment w = wage rate r = land rent Z = a vector of institutional factors affecting land renting and

migration έ = error terms Basic on its dependent variable which are bivariate qualitative parameter, the

models above can be called as a seemingly unrelated bivariate probit models. Under

this research, the model has shown its ability to measure household decision making

over land and labor and its dependency to several independent variables.

This study has become very conventional or classic in utilized SUR. As many

studies about the same topic has fail to find the inter relation among reforms in rural

China with development ofland andlabor markets.Feng and Heerink (2008) has Found

that the errorterms ofthe land renting equation and the migration equationwere

strongly correlated. There is a negative relationship between land renting and

migration.

Applied SUR Overview

Application Approaches

Most of the research has been Described above has Using SUR without

significant modification. Those research told us several think related to applied SUR.

1. There are differences application of SUR as the difference of the parameters in

the SUR equation. The first character is the use of quantitative parameters for

the dependent variable and the second is the use of qualitative parameters for

the dependent variable. Not many adjustments needed to SUR approach as its

use quantitative parameters. On using qualitative parameters, we are

13

introduced with the use of SUR with Bivariate Probit models (Feng and Heerink,

2008).

2. In addition we also introduced with use of SUR for time series data and cross

section data. On the use of SUR for time series data, we see there are more

procedures. Panel Cointegration Test, employed to check a long run

cointegration between variables. This test has fully needed to shows whether

to analyze the data were proper to employ SUR estimation (Zaman et.al,

(2011), Menon, (2007)) .8

3. We also introduced in utilization of SUR to estimates the coefficient as we

employed panel data sets (Zaman et.al., 2011).

Application Benefit

SUR has deliver a certain benefit to the specific research objective. The

benefit has not been same from one research to another. Its depend on the objective

of the research. Most all of the research above has shown that SUR has proofed that

standard errors of SUR were less than standard error generated by OLS.

Research by Rumelan and Setiawan (2009), has specially aimed to compared

whether OLS or SUR is better to deliver the best estimators for households food

security. The results has as written in the following table;

14

This results also supported by Zaman, et. (2008) Al, Cadaves and Heningsen

(2012) (has described above). They proof that SUR was efficient ways of regression

compares to OLS, due its smaller standard errors.

Another advanced benefit in using SUR has shown by Feng and Heerink (2008).

Their research has solved relationship of various vector of household characteristics

which determine whether farmer decide to rent or decide to migrate. As many

previous research are fail to unrevealed this interaction, Feng and Heerink has found

longruncointegrationof the error term. Although the variables are seemingling

unrelated, SUR models has able to bring out the real cointegration and also found

significant effect of several regressors.

15

END NOTES AND CONCLUSION

Development of SUR Model

As many statistics models, SUR has experienced modification and development.

Most of modification were addressed to improve SUR application to certain situation.

Some of this development were known as follows:

1. Dynamic Seemingly Unrelated Regression (DSUR). One of parameter

estimator proposed by Nelson C. Mark, Masao Ogaki and DonggyuSul (2004).

DSUR has proven tobe feasible especially for balanced panel data which the

number of cointegrating equation (N) were substantially smaller than the

number of time series observations T. Dynamic Seemingly Unrelated

Regression or DSUR also applicable for situation which cointegrating vectors

are homogeneous across equations or where they are not. This model has

proven to be properly used in case of estimating relation of investment

rate over saving rates in European country (Mark et.al, 2004).

2. Sparse Seemingly Unrelated Regression (SSUR). As conventional SUR

model which is unconstrained model has known tobe over parameterized,

Wang (2010) has introduced, the beyesian analysis of SSUR. SSUR has the

main innovations include;

a. Inferences via Markov chain Monte Carlo (MCMC) Simulations for

specific constraints of regression coefficients and errors,

b. Evaluations of the marginal likelihoods of restrictions using coupled

Candidate’s formula approximations, and

c. The extension of sparse modelling to dynamic SUR models.

Conclusion

After discuss some research and article above, we can concluded that;

o Seemingly Unrelated Regression, has known to be unbiased, efficient and fulfill

the maximum likelihood, as applied for the multiple equation which contains

similar regressors.

16

o Several adjustment in a certain research to apply SUR, has made a SUR to properly

applied to Wide range type of research.

o Seemingly Unrelated Regression also experience a development on model

spessification and estimation. Dynamic SUR and Sparse SUR is two kind of SUR

Modification.

17

REFFERENCE

Cadavez, V.A.P. and A. Henningsen, 2011. The Use of Seemingly Unrelated Regression (SUR) to Predict the Carcass Composition of Lambs. Meat Science 92, 2012, p. 548–553. DOI: 10.1016/j.meatsci.2012.05.0257.

Feng, S. and N. Heerink, 2008. Are farm households' land renting and migration decisions inter-related in rural China?. NJAS - Wageningen Journal of Life Sciences Volume 55 issue 4 May 2008 pp: 345-362.

Mark Beasley, 2008. Seemingly Unrelated Regression (SUR) Models as a Solution to Path Analytic Models with Correlated Errors. Multiple Linear Regression Viewpoints, 2008, Vol. 34(1). University of Alabama at Birmingham.

NidhiyaMenon, 2007. Rainfall Uncertainty and Occupational Choice in Agricultural Households of Rural Nepal. 2006 North East Universities Development Conference at Cornell University. November 1st, 2006.

Olubanjo O.O. and O. Oyebanjo, 2005. Determinants of profitability in rain-fed paddy rice production in Ikenne Agricultural Zone, Ogun State, Nigeria. African Crop Science Conference Proceedings, Vol. 7. pp. 901-903. ISSN 1023-070X/2005.

Rumelan and Setiawan, 2009. Pemodelan Ketahanan Pangan Rumah Tangga Indonesia dengan Pendekatan Seemingly Unrelated Regression. Repository - Institut Teknologi Sepuluh Nopember – Surabaya.

Thronton, James, 2013. Econometrics – SUR Models Text Files. University of Eastern Michigan, USA. Downloaded from http: // people.emich.edu / jthornton / text-files / Econ515_out_sur_model.doc. July 7th, 2015.

Wang, H., 2010. Sparse seemingly unrelated regression modelling: Applications in finance and econometrics. Computational Statistics and Data Analysis 54 (2010) 2866–2877. DOI: 10.1016/j.csda.2010.03.028.

Zaman K., Himayatullah Khan, Muhammad Mushtaq Khan, ZohraSaleem, and Muhammad Nawaz, 2011. The impact of population on environmental degradation in South Asia: application of seemingly unrelated regression equation model. Environmental Economics, Volume 2, Issue 2, 2011.

Zellner, Arnold, 1962. An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. Journal of the American Statistical Association, 57, 348-368