Stata Tutorial - MyCourses
-
Upload
khangminh22 -
Category
Documents
-
view
0 -
download
0
Transcript of Stata Tutorial - MyCourses
Stata Tutorial∗
Sheng Dai †
January 5, 2020
I have prepared this tutorial to give you an introduction to some of the basic propertiesof Stata. Examples of inputting, importing and editing data are covered. The instructionsto get summary statistics and doing a basic regression analysis are also provided. We alsomake some basic tables and graphs. Last I introduce few ways how to save your results andkeep track of what you have done. At the end of the tutorial, miscellaneous tips relatedto a number of subjects are given. Moreover, over the course I provide most of the neededcommands in the problem sets also, but it is still worthwhile reading this tutorial.
In this tutorial, as an example, we use the test data.dta dataset that can be found fromthe course webpage. Open this data set in Stata prior doing anything. 1
Some general tips :As an econometric program for a basic course in econometrics, Stata has a good property
that everything can be done through menus. Thus it is not necessary to memorize anywritten commands. Although later on it may be faster to work only with commands. Inmost of the places in this tutorial I only provide you the commands, but in places where itis necessary, I provide you the menu structure to do things.
First of all, I will advise you to use the “Help” menu in the Stata if you encounterproblems. You can search there by command (Help → Stata Command) or by keywords(Help → Search · · · ).2 The instructions in Stata are quite readable. Also Google helps alot in case of an emergency. Usually typing the type of problem and also including Stata asone of the keywords produces relatively relevant pages.
This tutorial covers the Stata basics, but there are many stata tutorials online:
• A very good internet guide to Stata
• A web book for regression analysis with Stata
• Stata Cheat Sheets
∗The present tutorial is prepared for the 30C00200-ECONOMETRICS course at Aalto University BIZ. Dr.Antti Saastamoinen created an original version of this tutorial on 4 September 2014, and the previous versionwas modified by Dr. Xun Zhou on 8 January 2019.†Sheng Dai: Management Science Unit, Aalto University School of Business, Ekonominaukio 1, 02150
Espoo, Finland; email: [email protected] is available on all computers in the lab R102 at Vare and U351 at Otakaari 1.2Note that italic font with double-quote is the Stata Menu command, whereas Courier New font is the
Stata programming command.
Contents
1 Overview & Basics 11.1 Basic syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Data 22.1 Enter data in Stata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Excel and Comma separated value files (*.csv) . . . . . . . . . . . . . . . . . 42.3 Copy-Paste tactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Variables 53.1 Generating variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 Missing values and observation id-variable . . . . . . . . . . . . . . . . . . . 73.3 Change a variable name and removing a variable . . . . . . . . . . . . . . . . 9
4 Summary statistics and examining your data 104.1 Summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.2 Counting, grouping/sorting and listing your data . . . . . . . . . . . . . . . 11
5 Tables and Graphs 135.1 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Regression analysis 166.1 Basic regression analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166.2 Regression analysis to subsets in the data . . . . . . . . . . . . . . . . . . . . 166.3 Regression analysis to subsets in the data . . . . . . . . . . . . . . . . . . . . 176.4 Postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186.5 Other types of regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.6 Drawing a regression line in 2D case . . . . . . . . . . . . . . . . . . . . . . . 206.7 Residuals vs. fitted plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7 Management of an analysis project 237.1 How to store your model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237.2 Do-files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237.3 Log-files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247.4 The practical ways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
8 Time series/Panel Data 268.1 Lag and difference operator . . . . . . . . . . . . . . . . . . . . . . . . . . . 288.2 Time series line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298.3 Panel data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308.4 Interaction expansion and its application to a fixed effect regression . . . . . 30
9 Miscellaneous tips 319.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319.2 Summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
10 Data sources 32
1 Overview & Basics
This is how Stata generally looks. You can adjust the color preferences from “Edit” →“Preferences” → “Generel Preferences”, if you don’t like the current color scheme.
1. Variable list. Lists the variables in the data set. Clicking a variable will ‘write’ it to
the command window.
2. Review window. Shows the commands you have used. Also the improper ones. This
is quite convenient window since you can quickly replicate your command.
3. Command window. Place to write your commands (if you use them).
4. Output window. Shows the results.
5. Two most important menus. As the names suggest “Graphs” is where all the
graphs are made, and behind “Statistics” we find everything that is covered in this
course (and well beyond).
1
6. The “Save” button. Clicking this button will overwrite the current *.dta file. Notice
that all the new variables that you have created will be saved, but no estimation result
are saved as a default unless you separately save them (see later). *.dta file is mainly
a data file.
1.1 Basic syntax
As said, although Stata can be fully used through menus, usually it is more convenient to
resort to commands. The basic syntax of Stata command goes in the following way (the
square brackets are just to separate the different blocks of syntax, they are not part of it):
[command] [variables] [conditions, e.g. if-rule etc], [options]
For example:
sum myownvar if myownvar>500, detail
Gives detailed summary statistics of variable myownvar from observations for which the
value of myownvar is above 500. The required fields are the command and the variables.
Others are optional.
2 Data
Generally there is two ways to enter data in Stata. Either you enter it in Stata itself or
import it from some external file, usually from an Excel sheet. In this course however I
provide the datasets directly in the Stata format so we don’t need to import anything in this
course. But you may need this later.
2.1 Enter data in Stata
1. Choose Data editor. Notice that there is Edit and Browse modes for the data screen.
In Browse you are only allowed to browse the data.
2
The editor itself looks like as seen below (using the example data):
Notice how the data is organized. The variables are in the columns and observations
are found on rows. For example here we have 12 observations from variables y and
x1-x3 (in addition few other variables, which will be introduced later). I advise you
not yet to pay much attention to the variables under the red cross. At this point we
could think our data as being cross-sectional (i.e. single time period) consisting of 12
firms with production y and “inputs” x1 to x3.
2. Enter data in to the cells (press Enter after inputting a value). Finnish users are
warned against using commas as decimal separators as Stata uses period.
3. Leave the editor.
The dataset can be then saved normally through the “File” menu in the main Stata window.
Note that the data format of Stata is *.dta. [OR if you have a readymade data in other
format you can import it in the following way.]
3
2.2 Excel and Comma separated value files (*.csv)
Many of the times, it is easier to work on with your data in Excel first, before importing it
to Stata. But in Stata you cannot import Excel files as such. When you have data in Excel
format it is good to save it as *.csv file before importing it to Stata. In Stata you then:
“File” ⇒ “Import” ⇒ “Text data (delimited, *csv, · · · )”
Then you see the window below.
IMPORTANT NOTE: If you are using Excel to first edit your data (before importing it
to Stata) again avoid using commas as decimal separators (as Finnish usually do). Stata
accepts only periods as decimal separators. To avoid unnecessary work, it is preferable to
change the used decimal separator in Excel to period. To do this follow:
4
2.3 Copy-Paste tactics
Of course, provided that your data is fine in Excel, you can usually just take the column
you want, and copy-paste it to the Stata Data editor. You must paste it to the first cell
of the column. Stata then asks how to deal with the first line of your data. If you copied
the variable name from Excel, then you should choose the second option, “Treat as variable
names”.
3 Variables
3.1 Generating variables
Of course you can generate variables within Stata itself. This is generally done by using
the gen command. If you generate variables from the scratch, the variable must always be
assigned to some initial value (usually for example 0).
gen testvar=0
This generates variable named testvar with all values initially set as 0. The variable
appear in the Variables window. After this you can go again to the Data editor and start
replacing the zeros with our own values.
You can also replace some of the values in a given variable, usually by assigning some
condition/rule on which the replacing is based on. This is done by using the replace
–command. For example:
replace testvar=1 if x2>=5
Replaces the values of testvar to 1 for all those observations for which the value of x2
is equal or greater than 5. Here you see an example of an if-rule, which is very important
to handle. If-rule is an easy way to assign your commands only to a subset of observations.
If-rules are commonly based on logical or relational operators which are given below along
with the usual arithmetic operators.
5
Of course you can generate variables from other variables. For example,
gen x13=x1+x3
gen ln y=ln(y)
In the first x13 is the sum of variables x1 and x3, in the second we have created a natural
logarithm of y.
PROBLEM (that you can practice yourself): Generate a variable that obtains a value of
1 if y is equal or above the average and 0 otherwise.
Another command that can be used to generate variables is egen (see Help and search
command egen). This is an extension to gen command which can handle more sophisticated
mathematical functions in generating varibles (see Help for ‘Mathematical functions’). For
example in our test data:
egen ymax=max(y)
This returns a constant named ymax, which should have a value of 1000 in our data.
Another example:
egen rank x1=rank(x1)
This gives you the ranks of observations with respect to variable x1. If you try use some
of the egen functions with egen command, you will encounter an error.
Of course you can name the generated variables however you like, so you don’t need to
follow my naming conventions.
Variables of course can be string variables also (letters). Many of the time you for example
have the name of the firms or countries as one ‘variable’. The string variables are indicated
in the editor with the red color. For example we could have:
6
Here variable Firm shows the firms’ name if we consider the observations to be firms.
3.2 Missing values and observation id-variable
Many of the times your data includes missing values such that you don’t have the value of
certain variable for a specific observation. For example it might be that we don’t have the
value of x1 for the second observation in our example dataset. Thus:
7
Stata indicates the missing value as a dot. The missing value is also referred with the
dot later on. For example if came to know that the value is actually 3.45 you could do the
following (of course you could also type the value directly to the editor):
replace x1=3.45 if x1==.
This however would replace all the missing values (if more) in x1 with the value 3.45.
This is necessarily what we want. If you want to refer to a specific observation, it is good
idea to generate an observation id variable.
gen id= n
Here the expression n is Stata way of referring to the observation number.
Now what we could do is
replace x1=3.45 if id==2
I will later explain what the panel id variable is.
8
3.3 Change a variable name and removing a variable
The easiest way to rename variables in Stata is to use command rename
rename testvar newvar
This renames the variable testvar to newvar.
Variables can be removed with drop command.
drop newvar
This removes the variable newvar.
Drop command can also be used to drop observations. For example:
drop in 10
This would drop the observation number 10. Sometimes dropping observations may be
easier through Data Editor. Just select the data you want to remove, right-click and choose
“Drop Selected Data”. This applies either on variables (columns) or observations (rows).
Stata has also a good variable manager (“Data Editor” ⇒ “Tools” ⇒ “Variable Man-
ager”) where you can for example label your variables and values and make notes concerning
some certain variable. An example of value labels is seen below:
9
Above we have drawn a scatterplot (more on graphs later), for variables y and x1. We
have defined a label based on variable Firm that has the firms’ names. Labels are good in
tables and graphs, but we really don’t need to handle them here during this course. Labels
could be assigned also to the variables. Then labels would be in place of variable names in
graphs for example. This is good as it is good to have the variable names relatively short
and then use labels to identify the variables better.
The scatterplot was obtained by using the following command line, where markers have
been labeled by the Firm variable. These are marker labels, value labels generally refer to
labels that are given to observations with a given “value” in specific variable.
twoway (scatter y x1, mlabel(Firm)), title(SCATTERPLOT)
Of course, putting labels for each marker is not advised if you have a lot of observations.
It gets messy easily.
4 Summary statistics and examining your data
Now that we know how to input and manipulate data in Stata, let’s do something with it.
We will use the same data as what have been used up to this point.
4.1 Summary statistics
Summary statistic of our data is probably the first thing you should to in our analysis. To
get summary statistics from a certain variable, write:
sum y, detail
Of course you can obtain summary statistic through menu Statistics, but it is more
tedious than just writing the command. The option detail is optional. What you should see
is:
10
Notice that I have given y a label “Production” which then nicely prints out to the final
output of Stata. Of course you can obtain summary statistics from multiple variables at the
same time.
sum y x1
Note that we left the , detail definition out this time. We got a bit less information.
To have summary statistics from a long list of variables you can write.
sum x1-x3
Note that variables need to be after each other in the editor. If we had more variables,
for example, x4 x5 x6, then we could write, for example:
sum y x4-x6
This would give you the summary statistics from variables y and x4, x5 and x6.
4.2 Counting, grouping/sorting and listing your data
Sometimes you would like to know how many observations you have, meeting some specified
criteria. This is usually impossible just by looking the data if you have a large dataset. Our
test data is very small, but let’s use the trick on it anyway.
count if y==500
This will count the number of observation having y equal 500. Note that for if-rule you
need to have two = signs there.
11
Sometimes data need to be sorted to some order. First use menu “Data” ⇒ “Sort” ⇒· · · to know the commands. For example:
sort y
This will sort data in ascending order with respect variable y. More sophisticated sorting
is also possible. For example:
Lets first sort the date with respect the variable x2, which is 0/1 type of variable (Dummy-
variable).
sort x2
by x2:tabulate y
What has been done here, is that we have first sorted according the two groups defined by
x2. subsequently we have tabulated the values of y in ascending order in both of the groups
(tabulate puts the values automatically to the ascending order). We could also obtain for
example summary statistics for y in a similar manner:
Below the summary statistics have been obtained by the subgroups.
12
You may also want to list all the observations that meet some specific criteria. Use list
command.
We have listed all values of y with x2 being zero. Multiple if-rules are also possible (in
everywhere where you may use them).
It is rather self evident now what we have done here.
5 Tables and Graphs
5.1 Tables
In many instances it is necessary to make some kind of tables of your data. We already saw
an example in the previous chapter. This is especially so, if you want to summarize your
13
data with respect some categorical variable. For example, in our test data, the variable x2
can be viewed as a two category binary variable. Below is a brief overview of different types
of tabulations.
Cum. refers to the cumulative percentage and Freq. to the frequency.
14
5.2 Graphs
Graphs are a good way to examine your data. Scatter plots and histograms are one of the
most important graphical ways to start you data analysis. To add graphs for instance to
your Word –document, just right click the image and copy it on clipboard.
Let’s make a scatter plot of variables y vs. x1 and then we draw histogram of variable y.
scatter y x1
histogram y, percent
In histogram, the number of bins is automatically determined. This can be changed.
Scatterplot Histogram
Scatterplot basically plots the observations given the pair values for two variables. His-
togram is an illustration of the distribution of the given variable. For example, here we see
that 50% of our observations are concentrated on the range 400 to 700 in variable y. Note
that in Stata the upper limit is not in the class so it is actually 50% belong to range of
400 ≤ y < 700 . See also the menu “Graphics”⇒“Histogram”
The histogram command without the percent option would give an illustration of the
density function of y.
histogram y
15
If you calculated the combined area of the bars and you get 1 as you should get since it
is a density estimate.
Of course Stata is able to draw all the basic bar and pie charts etc. All of the graphs can
also be drawn for only a subset of observations.
6 Regression analysis
6.1 Basic regression analysis
Regression analysis is the cornerstone of this course and it is important that you handle the
basic commands in Stata that are involved in making a regression analysis.
6.2 Regression analysis to subsets in the data
Doing a basic regression analysis in most of the econometrical programs is very easy. This
is also so in Stata. You basically have to define the dependent variable and the independent
variables. Anyhow when doing assignments, don’t get fooled by the apparent easiness of
doing this in Stata. Just presenting the output that Stata gives, won’t do you much good,
if you don’t know how to interpret the results.
Assume that we want know the effect of x1 on y (simple regression).
reg y x1
What you should obtain is something like:
16
You may compare this to the way how it is done in Excel and you probably find this a
bit easier. Multiple regression does not require any further hurdles.
reg y x1 x2 x3
The menu structure for regression analysis is: “Statistics”⇒ “Linear models and related”
⇒ “Linear regression”. In this course you generally don’t need to fiddle around much with
the advanced options you may find in the menus. Sometimes however it might useful to
restrict the used observations. But as an example, you can adjust for example the confidence
intervals from “Reporting” tab (e.g. from 95% to 99%).
6.3 Regression analysis to subsets in the data
Now that we know how to sort data, we can run separate regressions for separate groups.
sort x2
by x2: reg y x1
17
This does a linear regression separately for groups ‘0’ and ‘1’ defined by variable x2. We
have used a by command here. Basically why we want to do something like this is because
we would like to examine whether the regression equations for these groups differ.
6.4 Postestimation
Postestimation commands are important since they enable us to evaluate for example the
residuals and fitted values from a given model. In Stata, the postestimation commands are
found are found through the menus as: “Statistics” ⇒ “Postestimation” ⇒ “Predictions”.
In Stata it is important to notice that the postestimation commands apply for the latest
model estimated.1 This is the latest model that Stata has in its memory. Postestimation
1There is a way in Stata to store you estimation results and call them later (you don’t need to re-estimate
18
commands can differ between different types of models and thus the post estimation menu
may not look the same all the time. The example below relates to the simple linear OLS-
regression.
reg y x1
Now the if you would use the menu:
You would need to specify the name for the new variable and the specify what you want
to be done. For example, to get the fitted values (predictions of the model) you would choose
“Linear prediction (xb)”.
If you use command, after estimating the model you would write:
predict fit, xb
If you want to obtain residuals, you would write:
predict res, residuals
Note that res & fit are just my naming of the variables. Predict generates variables
the model then). But in this course the estimations are such straightforward (fast) that we are in no needof this option.
19
named so. You could name them differently. Residuals are of course the difference between
the fitted values and actual values.
Naturally there is number of other postestimation commands but we will not cover them
here. For example in the menu “Postestimation” ⇒ “Tests, contrasts, and comparisons of
parameter estimates” is number statistical tests which will be necessary in the later stages
of econometric adventures.
6.5 Other types of regressions
Naturally Stata is capable of doing huge amount of other types of estimations than just
simple linear OLS regression. In this course we need only a fraction of these other features.
As examples, I show you the instrumental variable regression and logistic regression. These
methods are covered during the course.
Instrumental variable regression (explanatory variable is not independent of error term)
ivreg y (x1=x2)
Here x1 is endogenous and is instrumented with x2. Through menus: “Statistics” ⇒“Endogenous Covariates” ⇒ “Linear regression with endogenous covariates” · · ·
Logistic regression (dependent variable is binary)
logit x2 y
Or through menus: “Statistics” ⇒ “Binary Outcomes” ⇒ “Logistic Regression”
6.6 Drawing a regression line in 2D case
Now, since you know how to do a linear regression, you probably want to draw a linear
regression line into your scatterplot. This can be done by using command line below:
scatter y x1 || lfit y x1
Then you should have something like this >>
20
Through the menus:
“Graphs” ⇒ “Twoway graphs (scatter, line, etc.)”
In the window, create new plot. If you want to draw the scatter plot first choose Basic
plots and scatter (is a default). Choose your Y and X variables. To make the fit line in the
picture, create another plot, but choose Fit plots and linear prediction. Y and X variables
are the same than in the scatter plot. There is also possibility to make nonlinear fit lines.
6.7 Residuals vs. fitted plot
One important graph is residuals-versus-fitted values plot. With that you can also make
some assumptions about the goodness of you model. Mainly it concerns situations when we
are dealing with the issue of heteroscedasticity.
Given the residuals res an the fitted values fit that we predicted earlier in chapter 6.5
making a scatter plot between residuals vs. fitted is straightforward.
scatter res fit
However the same plot can be called directly by:
rvfplot
Above we actually have modified the rvfplot command as we have added the zero-line.
rvfplot, yline(0, lwidth(medthick) lcolor(purple))
21
The reason why we did it in the ‘hard’ way, was to recap, how to use predict command.
You could also try to observe, whether residuals are normally distributed.
histogram res, bin(7) normal
We have changed the default number of bins and added the normal density plot (blue
line). These residuals do not appear very normally distributed, but of course, there are only
12 observations.
22
7 Management of an analysis project
Before any longer project, it is probably a good thing to start making some kind of bookings
of things you have done with Stata. Of course, to make especially tables to look good, you
should put information from them to Excel and edit your tables there then. Below is couple
of ways to keep track of things.
7.1 How to store your model
After estimating some model, the model estimates can be stored. If we have estimated the
model y on x1 (the first we estimated), then the results of it can be saved in the following
way.
reg y x1
est store model1
Now, if you estimate something else in between, the results of model1 (just a name for
the model) can be called later on as:
est replay model1
orest restore model1
It is important to realize that the model you last have estimated is the active model and
any model you estimate next overwrites the previous one.
Notice however that estimates saved using store command are saved only for time of
the session. Soon as you close Stata, these stored estimation results vanish. To make a
permanent storage of your results you should use:
est save model1
Alternatively you can do as advised in subsections 7.3 or 7.4.
7.2 Do-files
One way to use Stata, is to write *.do files, which basically are just text files, including all
the commands that you want Stata to perform. Since *.dta files contain only the data you
have (variables, predictions) and not the commands you have used, the use of *.do files can
23
be quite useful. This is how it works. Open Do-editor form button and create a new *.do.
Let’s name it as test do.do. Start writing commands into the do-file. To call the do-file
later, just open it in Stata from “Do”· · · menu. In do-editor you can also run it by pressing
(“Execute”). To add notes between your commands, start line with *-sign. An example of
a do-file is shown below.
NOTE that before you open/run the do-file you need have the *.dta file opened in Stata
to which the do-file refers.
7.3 Log-files
Making logs of all the output is very good way to have backups of the obtained results. So
when you start a log file, it basically records the output seen in the output window. Before
giving any commands go to “File”⇒ “Log”⇒ “Begin”. This will create a new log file. Then
as before, you can write commands. Now, let’s call the above do-file when log command is
on.
1. Create the log file test log.smlc
2. Call the test do.do file
3. End making log from “File” ⇒ “Log” ⇒ “Close”
Now to view the log file, just go “File” ⇒ “Log” ⇒ “View”.
To edit the log file, it must be translated to a text file. This is done from “File”⇒ “Log”
⇒ “Translate”.
24
And now, if you open the text file you can see, exact the same stuff that Stata showed
you earlier.
7.4 The practical ways
The way I usually store my results is not very fancy. Generally what I do is just copying
my results from the output screen of Stata to a *.txt file. From there I then pick whatever I
need. I also copy the commands to the file, so that I can replicate my results rapidly by just
copying the command from there. Note that copying directly to Word won’t usually work.
See next paragraph how to put your results into Word. It is also possible to copy tables to
Excel as such. This must be done through menu “Edit” ⇒ “Copy Table”. This will keep
the figures pretty much in line, but some formatting in Excel is anyway needed.
If you want to add your results to the Word without editing your tables in Excel, there
are two options. Firstly, the quickest way is just copy your results from Stata output screen
and paste them into Word. Then you paint over your results in Word, change the font to
Courier and adjust the size below 11 preferably. This should do the trick.
Secondly you can add the abovementioned *.text file straight to your Word-document.
In Word 2003 it is done from “Insert” ⇒ “File” · · · menu. There maybe some problems
fitting the text file in the document (since Stata output is rather wide), but usually it fits
well if you extend the margins of your Word-document. For the purposes of this course both
of these ways are this than sufficient, but later on, I suggest you do beautiful tables with
Excel or with some other program.
When you have added a *.text file to your Word document, it should look something like
below. You may have to adjust the margins in Word a bit to get it right.
25
8 Time series/Panel Data
If you are working with time series or panel data some prior preparations are in place. Let’s
now include a time variable to our test data. Assume that each observation is from different
year, e.g. 2000–2011. I have also now masked (to the picture) some of the other variables
as they really relate in a different situation. Now the situation is more like that we have a
single firm, which we observe through time (not 12 firms in a same period).
26
Before you can work with time series data you need to specify a time variable.
tsset time, yearly
The menu would look like:
The name of our time variable here is time and the option yearly indicates that we have
yearly data. This can be also done from the menu “Statistics” ⇒ “Time Series” ⇒ “Setup
and Utilities” ⇒ “Declare dataset to be a time series data”. In this menu you can also set
27
the panel identifying variable which is needed when working with panel data (it identifies
different units). Couple of much needed operations in time series are the differencing and
lagging.
8.1 Lag and difference operator
Now to make a one period lagged variable of x1, write:
gen lag1 x1=L.x1
Now you should see a variable that has a value from earlier period for every observation.
Naturally, there is a missing value for the first period since there is no earlier period.
Now it would be straightforward for example to try to estimate an AR(1)-model.
reg x1 lag x1
Lag-operator can be used within commands so you could have done it as:
reg x1 L.x1
The difference operator simply takes the difference between observations from two sub-
sequent periods (period t and t-1). To use the difference operator, simply write:
gen dif1 x1=D.x1
28
Lag and difference operators can be used in a sequence, so that the two period lag is
done with:
gen lag2 x1=L.L.x1
8.2 Time series line
The following is how you plot a basic time series line.
twoway tsline(y)
29
8.3 Panel data
Panel data is a bit different type of time data as it also has the cross sectional dimension.
That is, we observe multiple units over multiple periods. Here in our example, we could
assume that we have 4 firms that we observe over the course of 3 years as:
What you need to before the panel data is to specify not only the time variable but also
the panel identifying variable.
tsset panel id paneltime, yearly
The variable panel id identifies the firm and paneltime obviously marks the period.
8.4 Interaction expansion and its application to a fixed effect re-gression
First, if you know your panel data stuff already, this next thing is great. Many times in
the panel setting we would run a fixed effect model to take account the individual time
invariant characteristics. This analogous if we include individual dummies and run a normal
regression. This would involve creating a lot of dummy variables in some cases.
With interaction expansion you can avoid creating a huge number of separate dummy
variables by hand for each individual, group or every year. Let’s see how it works.
30
Consider a regression model that we are explaining y with x1 and we’ll to this in panel
data context.
tsset panel id paneltime, yearly
Now with the interaction expansion of panel id and applying normal OLS:
xi: reg y x1 i.panel id
And with a basic fixed effects regression. Examine your results!
xtreg y x1, fe
You get the exactly the same result for the coefficient of x1! Of course there are reasons
why dummy variable approach is considered inferior to the fixed effects approach. After
fixed effects, the individual effects could be obtained from postestimation.
9 Miscellaneous tips
In this chapter I present some random tips that might occur to my mind during the course.
So I might update this section of this tutorial during the course.
9.1 Data description
The command describe gives an overall description of the data. It is probably good to do
this first. Since the example data does not mean much I provide an example with the Stata’s
internal example file.
Basically, storage type and display formats are things that you don’t need to worry about
in this course. The variable label’s on the right hand side of the table are something that
can be edited also.
9.2 Summation
To calculate the total sum of some variable, use total() command. For example in our test
data, take the sum of variable y :
total(y)
31
So the sum should be 6000. Of course summation can be applied for example with
if-condition.
10 Data sources
To get you going with the program, I provide you couple of good places to find data and start
playing with the program. One is of course the webpage of the course book- Introduction
to econometrics , where many of the data that we use in the course are located. Another
one is the site for the one of the most well known econometrics books, William H. Greene’s
Econometric Analysis .
https://global.oup.com/uk/orc/busecon/economics/dougherty5e/
http://pages.stern.nyu.edu/~wgreene/Text/econometricanalysis.htm
There is also some example dataset within Stata itself. These are found from: “File” ⇒“Example datasets”
HOPE YOU ENJOYED THE COURSE, AND DID LEARN SOMETHING
USEFUL FROM THE COURSE!
32