/* Things to cover: 1- General look of stata: - Output window, command window, variables window etc.; - Do-file, browse data, inspect data, stop buttons; 2- Organizing your work - global variables for directories - original data - do-files - output (logs, graphs) - modified data - do-files & comments - log-files */ //UNDERSTANDING DO FILES and COMMENTS /* You are actually reading a do-file. Do-files are text-files with the ".do" extension that indicates stata that they contain commands to be interpreted. Each command allows you to conduct a partial statistical analysis and will be covered in details in future laboratories. The first thing you should know is that symbols like /* */ indicates to Stata that text in between are not commands, but some text you use for your own purpose. These are genrally comments (as this text) to help the reader understand the commands. The principal advantage of the "slashstar" (/* */) is that comments can run over multiple lines. */ //Another indication of comments is the double slash (//). Comments with double slashes cannot be longer than one line. //Otherwise, there will be an error. //ORGANISNG YOUR WORK /* It is useful to organise your work in a coherent set of directories. You should think of a folder, say ECON452, where you have 4 directories: - ORIGINALDATA - MODIFIEDDATA - DOFILES - OUTPUT The last thing you want is to destroy/modify your original dataset as you will need to download it/access it again. Hence, it is a good practice to save it in a particular folder to avoid confusion. Once you have performed some transformations or added new variables that are ready for analysis, you should save the data in another folder (say "MODIFIEDDATA"). The folder DOFILES will contain all the do-files containing commands. The folder OUTPUT will contain all the output that dofiles produce. Teams might choose different types of folder organisations, the essential feature is to protect the original data. */ //GLOBAL VARIABLES TO ORGANIZE YOUR DO-FILES /* GLOBAL variables contains information that lasts the whole stata session. LOCAL variables lasts only for the time of a do-file (or shorter). A good practice is to store your master directory in a global variable: */ global DIRECTORY = "/Users/pabsta/Documents/2-Enseignement/ECON452/tutorial1/" /* (This command needs to be changed in order to work in DUN350 or on Windows computers to include the "C:/" path structure) With such syntax, one refers only once to the path and as such, if the path changes (as if you change computers), one needs to change it only at one place (at the beginning of the do-file). The following commands tells Stata to change its current directory to the one defining the project: */ chdir $DIRECTORY /* Note that global variables are referenced with a dollar sign $ to indicate stata that it is actually a variable. Otherwise, Stata would only see this as an unrecognizable string of characters. Local variables are referenced by nesting them with the backtick ` and the tick '. So if DIRECTORY were a local variable its content would be referenced with the syntax `DIRECTORY' For MAC/UNIX users, note that the command cd does the same thing as the command chdir. */ //UNDERSTANDING LOG-FILES /* As you will construct do-files with a lot of commands in them, the Stata screen will not be able to display all the output. LOG-FILES allow you to store all the output of a Stata session in a text file. This will allow you to consult all the output after the do-file has been executed. To open a log-file, one can then perform the following command: */ log using OUTPUT/tutorial1LogFile.txt, replace text /* 3- General commands - Set obs, clear, set mem - gen, egen = exp, conditionnal expressions. - macros: local vs global variables - */ set mem 15m /* Sets the RAM memory that Stata can use on the computer to 15 megabytes. Not of practical concern if you use large datasets (50 000 obs or more). This commands needs to be added to do-files, though, as otherwise, stata allocates only 1.00 MB. */ set obs 1000 /* Sets the number of observations in the dataset to 1000. In practice, this is done automatically when you load a real dataset (we will see how later), so this is useful only for this tutorial. */ //We will build a theoritical AR(1) of the form : y_t = 0.5 * y_t-1 + white_noise //First generate the white noise of variance 4. gen white_noise = invnormal(uniform())*2 //This creates a variable called "white_noise" and assigns it some value drawned from an N(0, 4) distribution. //To see this, lets draw an histogram: hist white_noise //Now, we must indicate time to stata: gen t = _n /* Since white_noise has been generated in no particular order (they are independent), we simply generate a variable called "t" and use Stata's internal index (_n) to generate a time index. Now we sort data by time: */ sort t /* The command sorts the data in ascending order of the variable t. In the example, this changes nothing since t has been generated with the internal index. It is however necessary to let know Stata are sorted in order to use the command by (below). The following command generates an empty variable: */ gen y = . /* The period (.) is Stata's convention for empty observations. It is not a numerical value. It is empty. Hence, the previous commands generates variable with a 1000 empty observations. The following command sets the first observation to be the white_noise only: */ replace y = white_noise if(t == 1) //Litteraly replace y by the value of the white_noise for all observations where t is equal to one. //(e.g: the first observation) /* Now, generating all other variables requires a loop: Basically, the following code says "for time going from 2 to 1000 (total observations), build the AR(1) model" we specified: */ forvalues i = 2/1000 { replace y = 0.5*y[`i'-1] + white_noise[`i'] if(t == `i') } //We can not plot the data we just generated to see how it looks: graph twoway line y t //(This commands plots y against time) /* We hardly see anything! Maybe a subplot will be more useful. Let's try the first 100 observations: */ graph twoway line y t if(t<=100) //(This is better) //USING HELP FILES /* We have seen a lot of commands so far. To help you understand in details how they work, Stata has a comprehensive help file that is useful. To use it, one has only to write help followed by the name of the command: */ help graph //(Displays all the information about the command graph) //Typing help alone will bring the general help menu. //IMPORTING DATA /* There are basically three different sources of data: - those directly availlable from the web (as the Federal reserve) - those in an excel/.csv file - those already in a .dta file (like odesi) The first category is the easiest one: */ clear //clears everything generated so far. webuse regress //downloads directly from Stata's website count //Counts the number of observations (148) graph twoway scatter y x1 //One can also specify a database at a given address: clear use http://www.pabsta.qc.ca/files/ECON452/DATA/someAR1.dta /* Notice that the command is now "use" instead of "webuse". Of course, one must know where is the data on the web. A good program to install for such thing is FREDUSE to use the US Federal Reserve database. */ //EXCEL Sheets /* Download the following file in your "original data" directory: http://www.pabsta.qc.ca/files/ECON452/DATA/businessCycles.csv One can then use the following command: */ insheet using "ORIGINALDATA/businesscycles.csv", delimiter(";") //Note that it is also possible to do this through the menus in Stata. //Problem: Excel uses "," as delimiters for decimals while Stata uses points (.) //How to fix: replace directly in the .csv file OR: destring y_trend k_trend n_trend i_trend u_data u_trend g_trend w_data c_trend, replace dpcomma /* Everything is numeric now. The following data set contains a representation of the business cycles in Canada (recessions, expansions, etc.) */ graph twoway line y_data y_trend year //One line is the actual GDP, the other line is its (estimated) long-run trend //You can now save this (modified) data set in a Stata format: save MODIFIEDDATA/businessCycles.dta //USING .dta FILES /* Using .dta files is fairly easy: */ clear use MODIFIEDDATA/businessCycles.dta //That's it! //INSTALLING NEW COMMANDS //This command will be considerably useful for writing reports: help outreg //Follow the instructions (and read the help file to know what it does!) //(Note: you might not have the permission to install it in DUN350. Let me know if the command is not installed.) /* 6- Some Data sources - E-STAT - ODESI - Yahoo Finances - Canadian institute for healthcare information https://secure.cihi.ca/estore/productFamily.htm?locale=en&pf=PFC1671 */ log close //closing the log file global drop $DIRECTORY //Deleting the global variable: you might not want to do this at the end of each do file.