Stata Tutorial - MyCourses

Stata Tutorial∗

Sheng Dai †

January 5, 2020

I have prepared this tutorial to give you an introduction to some of the basic propertiesof Stata. Examples of inputting, importing and editing data are covered. The instructionsto get summary statistics and doing a basic regression analysis are also provided. We alsomake some basic tables and graphs. Last I introduce few ways how to save your results andkeep track of what you have done. At the end of the tutorial, miscellaneous tips relatedto a number of subjects are given. Moreover, over the course I provide most of the neededcommands in the problem sets also, but it is still worthwhile reading this tutorial.

In this tutorial, as an example, we use the test data.dta dataset that can be found fromthe course webpage. Open this data set in Stata prior doing anything. 1

Some general tips :As an econometric program for a basic course in econometrics, Stata has a good property

that everything can be done through menus. Thus it is not necessary to memorize anywritten commands. Although later on it may be faster to work only with commands. Inmost of the places in this tutorial I only provide you the commands, but in places where itis necessary, I provide you the menu structure to do things.

First of all, I will advise you to use the “Help” menu in the Stata if you encounterproblems. You can search there by command (Help → Stata Command) or by keywords(Help → Search · · · ).2 The instructions in Stata are quite readable. Also Google helps alot in case of an emergency. Usually typing the type of problem and also including Stata asone of the keywords produces relatively relevant pages.

This tutorial covers the Stata basics, but there are many stata tutorials online:

• A very good internet guide to Stata

• A web book for regression analysis with Stata

• Stata Cheat Sheets

∗The present tutorial is prepared for the 30C00200-ECONOMETRICS course at Aalto University BIZ. Dr.Antti Saastamoinen created an original version of this tutorial on 4 September 2014, and the previous versionwas modified by Dr. Xun Zhou on 8 January 2019.†Sheng Dai: Management Science Unit, Aalto University School of Business, Ekonominaukio 1, 02150

Espoo, Finland; email: [email protected] is available on all computers in the lab R102 at Vare and U351 at Otakaari 1.2Note that italic font with double-quote is the Stata Menu command, whereas Courier New font is the

Stata programming command.

https://mycourses.aalto.fi/pluginfile.php/1165319/mod_folder/content/0/test_data.dta?forcedownload=1

http://wlm.userweb.mwn.de/Stata/

https://stats.idre.ucla.edu/stata/webbooks/reg/

https://geocenter.github.io/StataTraining/portfolio/01_resource/

Contents

1 Overview & Basics 11.1 Basic syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Data 22.1 Enter data in Stata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Excel and Comma separated value files (*.csv) . . . . . . . . . . . . . . . . . 42.3 Copy-Paste tactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Variables 53.1 Generating variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 Missing values and observation id-variable . . . . . . . . . . . . . . . . . . . 73.3 Change a variable name and removing a variable . . . . . . . . . . . . . . . . 9

4 Summary statistics and examining your data 104.1 Summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.2 Counting, grouping/sorting and listing your data . . . . . . . . . . . . . . . 11

5 Tables and Graphs 135.1 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6 Regression analysis 166.1 Basic regression analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166.2 Regression analysis to subsets in the data . . . . . . . . . . . . . . . . . . . . 166.3 Regression analysis to subsets in the data . . . . . . . . . . . . . . . . . . . . 176.4 Postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186.5 Other types of regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.6 Drawing a regression line in 2D case . . . . . . . . . . . . . . . . . . . . . . . 206.7 Residuals vs. fitted plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

7 Management of an analysis project 237.1 How to store your model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237.2 Do-files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237.3 Log-files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247.4 The practical ways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

8 Time series/Panel Data 268.1 Lag and difference operator . . . . . . . . . . . . . . . . . . . . . . . . . . . 288.2 Time series line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298.3 Panel data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308.4 Interaction expansion and its application to a fixed effect regression . . . . . 30

9 Miscellaneous tips 319.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319.2 Summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

10 Data sources 32

1 Overview & Basics

This is how Stata generally looks. You can adjust the color preferences from “Edit” →“Preferences” → “Generel Preferences”, if you don’t like the current color scheme.

1. Variable list. Lists the variables in the data set. Clicking a variable will ‘write’ it to

the command window.

2. Review window. Shows the commands you have used. Also the improper ones. This

is quite convenient window since you can quickly replicate your command.

3. Command window. Place to write your commands (if you use them).

4. Output window. Shows the results.

5. Two most important menus. As the names suggest “Graphs” is where all the

graphs are made, and behind “Statistics” we find everything that is covered in this

course (and well beyond).

1

6. The “Save” button. Clicking this button will overwrite the current *.dta file. Notice

that all the new variables that you have created will be saved, but no estimation result

are saved as a default unless you separately save them (see later). *.dta file is mainly

a data file.

1.1 Basic syntax

As said, although Stata can be fully used through menus, usually it is more convenient to

resort to commands. The basic syntax of Stata command goes in the following way (the

square brackets are just to separate the different blocks of syntax, they are not part of it):

[command] [variables] [conditions, e.g. if-rule etc], [options]

For example:

sum myownvar if myownvar>500, detail

Gives detailed summary statistics of variable myownvar from observations for which the

value of myownvar is above 500. The required fields are the command and the variables.

Others are optional.

2 Data

Generally there is two ways to enter data in Stata. Either you enter it in Stata itself or

import it from some external file, usually from an Excel sheet. In this course however I

provide the datasets directly in the Stata format so we don’t need to import anything in this

course. But you may need this later.

2.1 Enter data in Stata

1. Choose Data editor. Notice that there is Edit and Browse modes for the data screen.

In Browse you are only allowed to browse the data.

2

The editor itself looks like as seen below (using the example data):

Notice how the data is organized. The variables are in the columns and observations

are found on rows. For example here we have 12 observations from variables y and

x1-x3 (in addition few other variables, which will be introduced later). I advise you

not yet to pay much attention to the variables under the red cross. At this point we

could think our data as being cross-sectional (i.e. single time period) consisting of 12

firms with production y and “inputs” x1 to x3.

2. Enter data in to the cells (press Enter after inputting a value). Finnish users are

warned against using commas as decimal separators as Stata uses period.

3. Leave the editor.

The dataset can be then saved normally through the “File” menu in the main Stata window.

Note that the data format of Stata is *.dta. [OR if you have a readymade data in other

format you can import it in the following way.]

3

2.2 Excel and Comma separated value files (*.csv)

Many of the times, it is easier to work on with your data in Excel first, before importing it

to Stata. But in Stata you cannot import Excel files as such. When you have data in Excel

format it is good to save it as *.csv file before importing it to Stata. In Stata you then:

“File” ⇒ “Import” ⇒ “Text data (delimited, *csv, · · · )”

Then you see the window below.

IMPORTANT NOTE: If you are using Excel to first edit your data (before importing it

to Stata) again avoid using commas as decimal separators (as Finnish usually do). Stata

accepts only periods as decimal separators. To avoid unnecessary work, it is preferable to

change the used decimal separator in Excel to period. To do this follow:

4

2.3 Copy-Paste tactics

Of course, provided that your data is fine in Excel, you can usually just take the column

you want, and copy-paste it to the Stata Data editor. You must paste it to the first cell

of the column. Stata then asks how to deal with the first line of your data. If you copied

the variable name from Excel, then you should choose the second option, “Treat as variable

names”.

3 Variables

3.1 Generating variables

Of course you can generate variables within Stata itself. This is generally done by using

the gen command. If you generate variables from the scratch, the variable must always be

assigned to some initial value (usually for example 0).

gen testvar=0

This generates variable named testvar with all values initially set as 0. The variable

appear in the Variables window. After this you can go again to the Data editor and start

replacing the zeros with our own values.

You can also replace some of the values in a given variable, usually by assigning some

condition/rule on which the replacing is based on. This is done by using the replace

–command. For example:

replace testvar=1 if x2>=5

Replaces the values of testvar to 1 for all those observations for which the value of x2

is equal or greater than 5. Here you see an example of an if-rule, which is very important

to handle. If-rule is an easy way to assign your commands only to a subset of observations.

If-rules are commonly based on logical or relational operators which are given below along

with the usual arithmetic operators.

5

Of course you can generate variables from other variables. For example,

gen x13=x1+x3

gen ln y=ln(y)

In the first x13 is the sum of variables x1 and x3, in the second we have created a natural

logarithm of y.

PROBLEM (that you can practice yourself): Generate a variable that obtains a value of

1 if y is equal or above the average and 0 otherwise.

Another command that can be used to generate variables is egen (see Help and search

command egen). This is an extension to gen command which can handle more sophisticated

mathematical functions in generating varibles (see Help for ‘Mathematical functions’). For

example in our test data:

egen ymax=max(y)

This returns a constant named ymax, which should have a value of 1000 in our data.

Another example:

egen rank x1=rank(x1)

This gives you the ranks of observations with respect to variable x1. If you try use some

of the egen functions with egen command, you will encounter an error.

Of course you can name the generated variables however you like, so you don’t need to

follow my naming conventions.

Variables of course can be string variables also (letters). Many of the time you for example

have the name of the firms or countries as one ‘variable’. The string variables are indicated

in the editor with the red color. For example we could have:

6

Here variable Firm shows the firms’ name if we consider the observations to be firms.

3.2 Missing values and observation id-variable

Many of the times your data includes missing values such that you don’t have the value of

certain variable for a specific observation. For example it might be that we don’t have the

value of x1 for the second observation in our example dataset. Thus:

7

Stata indicates the missing value as a dot. The missing value is also referred with the

dot later on. For example if came to know that the value is actually 3.45 you could do the

following (of course you could also type the value directly to the editor):

replace x1=3.45 if x1==.

This however would replace all the missing values (if more) in x1 with the value 3.45.

This is necessarily what we want. If you want to refer to a specific observation, it is good

idea to generate an observation id variable.

gen id= n

Here the expression n is Stata way of referring to the observation number.

Now what we could do is

replace x1=3.45 if id==2

I will later explain what the panel id variable is.

8

3.3 Change a variable name and removing a variable

The easiest way to rename variables in Stata is to use command rename

rename testvar newvar

This renames the variable testvar to newvar.

Variables can be removed with drop command.

drop newvar

This removes the variable newvar.

Drop command can also be used to drop observations. For example:

drop in 10

This would drop the observation number 10. Sometimes dropping observations may be

easier through Data Editor. Just select the data you want to remove, right-click and choose

“Drop Selected Data”. This applies either on variables (columns) or observations (rows).

Stata has also a good variable manager (“Data Editor” ⇒ “Tools” ⇒ “Variable Man-

ager”) where you can for example label your variables and values and make notes concerning

some certain variable. An example of value labels is seen below:

9

Above we have drawn a scatterplot (more on graphs later), for variables y and x1. We

have defined a label based on variable Firm that has the firms’ names. Labels are good in

tables and graphs, but we really don’t need to handle them here during this course. Labels

could be assigned also to the variables. Then labels would be in place of variable names in

graphs for example. This is good as it is good to have the variable names relatively short

and then use labels to identify the variables better.

The scatterplot was obtained by using the following command line, where markers have

been labeled by the Firm variable. These are marker labels, value labels generally refer to

labels that are given to observations with a given “value” in specific variable.

twoway (scatter y x1, mlabel(Firm)), title(SCATTERPLOT)

Of course, putting labels for each marker is not advised if you have a lot of observations.

It gets messy easily.

4 Summary statistics and examining your data

Now that we know how to input and manipulate data in Stata, let’s do something with it.

We will use the same data as what have been used up to this point.

4.1 Summary statistics

Summary statistic of our data is probably the first thing you should to in our analysis. To

get summary statistics from a certain variable, write:

sum y, detail

Of course you can obtain summary statistic through menu Statistics, but it is more

tedious than just writing the command. The option detail is optional. What you should see

is:

10

Notice that I have given y a label “Production” which then nicely prints out to the final

output of Stata. Of course you can obtain summary statistics from multiple variables at the

same time.

sum y x1

Note that we left the , detail definition out this time. We got a bit less information.

To have summary statistics from a long list of variables you can write.

sum x1-x3

Note that variables need to be after each other in the editor. If we had more variables,

for example, x4 x5 x6, then we could write, for example:

sum y x4-x6

This would give you the summary statistics from variables y and x4, x5 and x6.

4.2 Counting, grouping/sorting and listing your data

Sometimes you would like to know how many observations you have, meeting some specified

criteria. This is usually impossible just by looking the data if you have a large dataset. Our

test data is very small, but let’s use the trick on it anyway.

count if y==500

This will count the number of observation having y equal 500. Note that for if-rule you

need to have two = signs there.

11

Sometimes data need to be sorted to some order. First use menu “Data” ⇒ “Sort” ⇒· · · to know the commands. For example:

sort y

This will sort data in ascending order with respect variable y. More sophisticated sorting

is also possible. For example:

Lets first sort the date with respect the variable x2, which is 0/1 type of variable (Dummy-

variable).

sort x2

by x2:tabulate y

What has been done here, is that we have first sorted according the two groups defined by

x2. subsequently we have tabulated the values of y in ascending order in both of the groups

(tabulate puts the values automatically to the ascending order). We could also obtain for

example summary statistics for y in a similar manner:

Below the summary statistics have been obtained by the subgroups.

12

You may also want to list all the observations that meet some specific criteria. Use list

command.

We have listed all values of y with x2 being zero. Multiple if-rules are also possible (in

everywhere where you may use them).

It is rather self evident now what we have done here.

5 Tables and Graphs

5.1 Tables

In many instances it is necessary to make some kind of tables of your data. We already saw

an example in the previous chapter. This is especially so, if you want to summarize your

13

data with respect some categorical variable. For example, in our test data, the variable x2

can be viewed as a two category binary variable. Below is a brief overview of different types

of tabulations.

Cum. refers to the cumulative percentage and Freq. to the frequency.

14

5.2 Graphs

Graphs are a good way to examine your data. Scatter plots and histograms are one of the

most important graphical ways to start you data analysis. To add graphs for instance to

your Word –document, just right click the image and copy it on clipboard.

Let’s make a scatter plot of variables y vs. x1 and then we draw histogram of variable y.

scatter y x1

histogram y, percent

In histogram, the number of bins is automatically determined. This can be changed.

Scatterplot Histogram

Scatterplot basically plots the observations given the pair values for two variables. His-

togram is an illustration of the distribution of the given variable. For example, here we see

that 50% of our observations are concentrated on the range 400 to 700 in variable y. Note

that in Stata the upper limit is not in the class so it is actually 50% belong to range of

400 ≤ y < 700 . See also the menu “Graphics”⇒“Histogram”

The histogram command without the percent option would give an illustration of the

density function of y.

histogram y

15

If you calculated the combined area of the bars and you get 1 as you should get since it

is a density estimate.

Of course Stata is able to draw all the basic bar and pie charts etc. All of the graphs can

also be drawn for only a subset of observations.

6 Regression analysis

6.1 Basic regression analysis

Regression analysis is the cornerstone of this course and it is important that you handle the

basic commands in Stata that are involved in making a regression analysis.

6.2 Regression analysis to subsets in the data

Doing a basic regression analysis in most of the econometrical programs is very easy. This

is also so in Stata. You basically have to define the dependent variable and the independent

variables. Anyhow when doing assignments, don’t get fooled by the apparent easiness of

doing this in Stata. Just presenting the output that Stata gives, won’t do you much good,

if you don’t know how to interpret the results.

Assume that we want know the effect of x1 on y (simple regression).

reg y x1

What you should obtain is something like:

16

You may compare this to the way how it is done in Excel and you probably find this a

bit easier. Multiple regression does not require any further hurdles.

reg y x1 x2 x3

The menu structure for regression analysis is: “Statistics”⇒ “Linear models and related”

⇒ “Linear regression”. In this course you generally don’t need to fiddle around much with

the advanced options you may find in the menus. Sometimes however it might useful to

restrict the used observations. But as an example, you can adjust for example the confidence

intervals from “Reporting” tab (e.g. from 95% to 99%).

6.3 Regression analysis to subsets in the data

Now that we know how to sort data, we can run separate regressions for separate groups.

sort x2

by x2: reg y x1

17

This does a linear regression separately for groups ‘0’ and ‘1’ defined by variable x2. We

have used a by command here. Basically why we want to do something like this is because

we would like to examine whether the regression equations for these groups differ.

6.4 Postestimation

Postestimation commands are important since they enable us to evaluate for example the

residuals and fitted values from a given model. In Stata, the postestimation commands are

found are found through the menus as: “Statistics” ⇒ “Postestimation” ⇒ “Predictions”.

In Stata it is important to notice that the postestimation commands apply for the latest

model estimated.1 This is the latest model that Stata has in its memory. Postestimation

1There is a way in Stata to store you estimation results and call them later (you don’t need to re-estimate

18

commands can differ between different types of models and thus the post estimation menu

may not look the same all the time. The example below relates to the simple linear OLS-

regression.

reg y x1

Now the if you would use the menu:

You would need to specify the name for the new variable and the specify what you want

to be done. For example, to get the fitted values (predictions of the model) you would choose

“Linear prediction (xb)”.

If you use command, after estimating the model you would write:

predict fit, xb

If you want to obtain residuals, you would write:

predict res, residuals

Note that res & fit are just my naming of the variables. Predict generates variables

the model then). But in this course the estimations are such straightforward (fast) that we are in no needof this option.

19

named so. You could name them differently. Residuals are of course the difference between

the fitted values and actual values.

Naturally there is number of other postestimation commands but we will not cover them

here. For example in the menu “Postestimation” ⇒ “Tests, contrasts, and comparisons of

parameter estimates” is number statistical tests which will be necessary in the later stages

of econometric adventures.

6.5 Other types of regressions

Naturally Stata is capable of doing huge amount of other types of estimations than just

simple linear OLS regression. In this course we need only a fraction of these other features.

As examples, I show you the instrumental variable regression and logistic regression. These

methods are covered during the course.

Instrumental variable regression (explanatory variable is not independent of error term)

ivreg y (x1=x2)

Here x1 is endogenous and is instrumented with x2. Through menus: “Statistics” ⇒“Endogenous Covariates” ⇒ “Linear regression with endogenous covariates” · · ·

Logistic regression (dependent variable is binary)

logit x2 y

Or through menus: “Statistics” ⇒ “Binary Outcomes” ⇒ “Logistic Regression”

6.6 Drawing a regression line in 2D case

Now, since you know how to do a linear regression, you probably want to draw a linear

regression line into your scatterplot. This can be done by using command line below:

scatter y x1 || lfit y x1

Then you should have something like this >>

20

Through the menus:

“Graphs” ⇒ “Twoway graphs (scatter, line, etc.)”

In the window, create new plot. If you want to draw the scatter plot first choose Basic

plots and scatter (is a default). Choose your Y and X variables. To make the fit line in the

picture, create another plot, but choose Fit plots and linear prediction. Y and X variables

are the same than in the scatter plot. There is also possibility to make nonlinear fit lines.

6.7 Residuals vs. fitted plot

One important graph is residuals-versus-fitted values plot. With that you can also make

some assumptions about the goodness of you model. Mainly it concerns situations when we

are dealing with the issue of heteroscedasticity.

Given the residuals res an the fitted values fit that we predicted earlier in chapter 6.5

making a scatter plot between residuals vs. fitted is straightforward.

scatter res fit

However the same plot can be called directly by:

rvfplot

Above we actually have modified the rvfplot command as we have added the zero-line.

rvfplot, yline(0, lwidth(medthick) lcolor(purple))

21

The reason why we did it in the ‘hard’ way, was to recap, how to use predict command.

You could also try to observe, whether residuals are normally distributed.

histogram res, bin(7) normal

We have changed the default number of bins and added the normal density plot (blue

line). These residuals do not appear very normally distributed, but of course, there are only

12 observations.

22

7 Management of an analysis project

Before any longer project, it is probably a good thing to start making some kind of bookings

of things you have done with Stata. Of course, to make especially tables to look good, you

should put information from them to Excel and edit your tables there then. Below is couple

of ways to keep track of things.

7.1 How to store your model

After estimating some model, the model estimates can be stored. If we have estimated the

model y on x1 (the first we estimated), then the results of it can be saved in the following

way.

reg y x1

est store model1

Now, if you estimate something else in between, the results of model1 (just a name for

the model) can be called later on as:

est replay model1

orest restore model1

It is important to realize that the model you last have estimated is the active model and

any model you estimate next overwrites the previous one.

Notice however that estimates saved using store command are saved only for time of

the session. Soon as you close Stata, these stored estimation results vanish. To make a

permanent storage of your results you should use:

est save model1

Alternatively you can do as advised in subsections 7.3 or 7.4.

7.2 Do-files

One way to use Stata, is to write *.do files, which basically are just text files, including all

the commands that you want Stata to perform. Since *.dta files contain only the data you

have (variables, predictions) and not the commands you have used, the use of *.do files can

23

be quite useful. This is how it works. Open Do-editor form button and create a new *.do.

Let’s name it as test do.do. Start writing commands into the do-file. To call the do-file

later, just open it in Stata from “Do”· · · menu. In do-editor you can also run it by pressing

(“Execute”). To add notes between your commands, start line with *-sign. An example of

a do-file is shown below.

NOTE that before you open/run the do-file you need have the *.dta file opened in Stata

to which the do-file refers.

7.3 Log-files

Making logs of all the output is very good way to have backups of the obtained results. So

when you start a log file, it basically records the output seen in the output window. Before

giving any commands go to “File”⇒ “Log”⇒ “Begin”. This will create a new log file. Then

as before, you can write commands. Now, let’s call the above do-file when log command is

on.

1. Create the log file test log.smlc

2. Call the test do.do file

3. End making log from “File” ⇒ “Log” ⇒ “Close”

Now to view the log file, just go “File” ⇒ “Log” ⇒ “View”.

To edit the log file, it must be translated to a text file. This is done from “File”⇒ “Log”

⇒ “Translate”.

24

And now, if you open the text file you can see, exact the same stuff that Stata showed

you earlier.

7.4 The practical ways

The way I usually store my results is not very fancy. Generally what I do is just copying

my results from the output screen of Stata to a *.txt file. From there I then pick whatever I

need. I also copy the commands to the file, so that I can replicate my results rapidly by just

copying the command from there. Note that copying directly to Word won’t usually work.

See next paragraph how to put your results into Word. It is also possible to copy tables to

Excel as such. This must be done through menu “Edit” ⇒ “Copy Table”. This will keep

the figures pretty much in line, but some formatting in Excel is anyway needed.

If you want to add your results to the Word without editing your tables in Excel, there

are two options. Firstly, the quickest way is just copy your results from Stata output screen

and paste them into Word. Then you paint over your results in Word, change the font to

Courier and adjust the size below 11 preferably. This should do the trick.

Secondly you can add the abovementioned *.text file straight to your Word-document.

In Word 2003 it is done from “Insert” ⇒ “File” · · · menu. There maybe some problems

fitting the text file in the document (since Stata output is rather wide), but usually it fits

well if you extend the margins of your Word-document. For the purposes of this course both

of these ways are this than sufficient, but later on, I suggest you do beautiful tables with

Excel or with some other program.

When you have added a *.text file to your Word document, it should look something like

below. You may have to adjust the margins in Word a bit to get it right.

25

8 Time series/Panel Data

If you are working with time series or panel data some prior preparations are in place. Let’s

now include a time variable to our test data. Assume that each observation is from different

year, e.g. 2000–2011. I have also now masked (to the picture) some of the other variables

as they really relate in a different situation. Now the situation is more like that we have a

single firm, which we observe through time (not 12 firms in a same period).

26

Before you can work with time series data you need to specify a time variable.

tsset time, yearly

The menu would look like:

The name of our time variable here is time and the option yearly indicates that we have

yearly data. This can be also done from the menu “Statistics” ⇒ “Time Series” ⇒ “Setup

and Utilities” ⇒ “Declare dataset to be a time series data”. In this menu you can also set

27

the panel identifying variable which is needed when working with panel data (it identifies

different units). Couple of much needed operations in time series are the differencing and

lagging.

8.1 Lag and difference operator

Now to make a one period lagged variable of x1, write:

gen lag1 x1=L.x1

Now you should see a variable that has a value from earlier period for every observation.

Naturally, there is a missing value for the first period since there is no earlier period.

Now it would be straightforward for example to try to estimate an AR(1)-model.

reg x1 lag x1

Lag-operator can be used within commands so you could have done it as:

reg x1 L.x1

The difference operator simply takes the difference between observations from two sub-

sequent periods (period t and t-1). To use the difference operator, simply write:

gen dif1 x1=D.x1

28

Lag and difference operators can be used in a sequence, so that the two period lag is

done with:

gen lag2 x1=L.L.x1

8.2 Time series line

The following is how you plot a basic time series line.

twoway tsline(y)

29

8.3 Panel data

Panel data is a bit different type of time data as it also has the cross sectional dimension.

That is, we observe multiple units over multiple periods. Here in our example, we could

assume that we have 4 firms that we observe over the course of 3 years as:

What you need to before the panel data is to specify not only the time variable but also

the panel identifying variable.

tsset panel id paneltime, yearly

The variable panel id identifies the firm and paneltime obviously marks the period.

8.4 Interaction expansion and its application to a fixed effect re-gression

First, if you know your panel data stuff already, this next thing is great. Many times in

the panel setting we would run a fixed effect model to take account the individual time

invariant characteristics. This analogous if we include individual dummies and run a normal

regression. This would involve creating a lot of dummy variables in some cases.

With interaction expansion you can avoid creating a huge number of separate dummy

variables by hand for each individual, group or every year. Let’s see how it works.

30

Consider a regression model that we are explaining y with x1 and we’ll to this in panel

data context.

tsset panel id paneltime, yearly

Now with the interaction expansion of panel id and applying normal OLS:

xi: reg y x1 i.panel id

And with a basic fixed effects regression. Examine your results!

xtreg y x1, fe

You get the exactly the same result for the coefficient of x1! Of course there are reasons

why dummy variable approach is considered inferior to the fixed effects approach. After

fixed effects, the individual effects could be obtained from postestimation.

9 Miscellaneous tips

In this chapter I present some random tips that might occur to my mind during the course.

So I might update this section of this tutorial during the course.

9.1 Data description

The command describe gives an overall description of the data. It is probably good to do

this first. Since the example data does not mean much I provide an example with the Stata’s

internal example file.

Basically, storage type and display formats are things that you don’t need to worry about

in this course. The variable label’s on the right hand side of the table are something that

can be edited also.

9.2 Summation

To calculate the total sum of some variable, use total() command. For example in our test

data, take the sum of variable y :

total(y)

31

So the sum should be 6000. Of course summation can be applied for example with

if-condition.

10 Data sources

To get you going with the program, I provide you couple of good places to find data and start

playing with the program. One is of course the webpage of the course book- Introduction

to econometrics , where many of the data that we use in the course are located. Another

one is the site for the one of the most well known econometrics books, William H. Greene’s

Econometric Analysis .

https://global.oup.com/uk/orc/busecon/economics/dougherty5e/

http://pages.stern.nyu.edu/~wgreene/Text/econometricanalysis.htm

There is also some example dataset within Stata itself. These are found from: “File” ⇒“Example datasets”

HOPE YOU ENJOYED THE COURSE, AND DID LEARN SOMETHING

USEFUL FROM THE COURSE!

32

https://aalto.finna.fi/Record/alli.574538



https://global.oup.com/uk/orc/busecon/economics/dougherty5e/

http://pages.stern.nyu.edu/~wgreene/Text/econometricanalysis.htm

Stata Tutorial - MyCourses

Documents

Transcript of Stata Tutorial - MyCourses