clear set mem 15m cd "/Users/pabsta/documents/2-enseignement/ECON452/tutorial3/OUTPUT" log using analysis, replace t use "http://www.pabsta.qc.ca/sites/default/files/threeSeries.dta" tset t /* Basics of the Box-Jenkins methodology: 1- Is the process stationary? If no -->Êwe don't know what to do yet. If yes ->Êproceed to find candidates "p & q" for an ARMA(p,q) representation. 2- Analysis of PAC, AC graphs to find upper bounds to p & q. Look for particuliar patterns 3- Proceed to ARIMA estimation and check if we can deduce the process through statistical test 4- Choose plausible models based on its ability to forecast and other factors. (not seen in class yet). */ /* S1 is the process stationary? */ tsline s1 if(t<100) // < 100 so we can see something //Looks like it oscilates around a stable average -> indication of stationarity. //Proceed to PAC, AC analysis: ac s1 //Two things to deduce from this graph: /* First, only the second lag seems to be "spiking out" and is significantly different than zero. This is an indication that that there is probably an MA(2) component in the process. Second, there seems to be a pattern of oscillations in the lags (going up and down) This is an indication that there might be an hidden AR process in there. */ pac s1 /* Pretty much the same story here, but "inversed". There is a spike in the second lag, suggesting an AR(2) component in the process. There also seem to be an oscillation pattern, which might hide an MA process. HENCE, we should start our broad estimation with and ARMA(2,2) process. */ //Same thing as both previous graphs, but with an onscreen output. corrgram s1 //First estimation of an ARMA(2,2) model: /* The command below asks stata "perform a maximum likelihood estimation on the following model: s1 = cons + a1 L.s1 + a2 L2.s1 + err + b1 L.err + b2 L2.err and give us the estimates" */ arima s1, ar(1,2) ma(1,2) /* None of the coefficients are independently significant (zero belongs to the confidence interval everywhere), but if we look at the F test of the joint restriction that they are all equal to zero (F test in the upper right corner), we see that the p-value is close to zero, which indicates that some variables are somehow significant. Lets test if the MA component is jointly insignificant: */ test ([ARMA]L.ma = 0) ([ARMA]L2.ma = 0) /* The p-value of this test is 0.89, so we cannot reject the null hypothesis that both coefficients are equal to zero. Before we proceed to estimate the process without an MA component, lets test that the AR component is jointly equal to zero. */ test ([ARMA]L.ar = 0) ([ARMA]L2.ar = 0) /* p-value of 0.24 (higher than 5%) so we also keep h0. The two tests contradict themselves somehow. One suggest that we should drop the AR component while the other says we should drop the MA component. We should proceed with additional tests before going further. */ test ([ARMA]L.ar = 0) ([ARMA]L.ma = 0) //Test if the process is solely an AR(2), MA(2) test ([ARMA]L.ar = 0) ([ARMA]L.ma = 0) ([ARMA]L2.ma = 0) //Test if the process is solely an AR(2) test ([ARMA]L.ar = 0) ([ARMA]L.ma = 0) ([ARMA]L2.ar = 0) //Test if the process is solely an MA(2) /* These tests suggest that we have might have either an ar(2) process or an ma(2) process. */ //This model is a candidate solution to the process at hand. arima s1, ar(2) //Here a new variable res1_1 is built predict res1_1, r //The variable contains the residuals (estimated error terms) //This is a statistical test to see of the residuals are somehow a white noise (we have not seen this test yet) wntestq res1_1 /*High p-value: indication that we should keep h0*/ //This model is also a candidate solution to the process at hand. arima s1, ma(2) predict res1_2, r wntestq res1_1 /* We still have to pick one of the two models we have found so far. We do not know yet how to do this, but if a model is better at forecasting, this should be taken as an indication that the process is somehow "better. To do this, we check the variance of the residuals. If the variance is higher, the forecasting error are greater and thus, the model is less performant. */ sum res1_1 res1_2 /* The first model has a slightly smaller variance so perhaps we should keep it.*/ //Second series tsline s2 if(t<100) ac s2 pac s2 corrgram s2 /* The PAC command reveals a clear and bright picture. Both the first lag and the second lag spikes out while the remaining lags seems to be non-significant withouth any distinguishable pattern. This is an indication that the process at hand is an AR(2) with both lags (first and second) being important. On the other hand, the AC command suggest that the AR model dominates since there are clear patterns of oscillation. Nonetheless, we will keep up to four lags in our first estimation, simply to encompass the whole possibilities. Both have exponential decay, so they also suggest stationarity. */ arima s2, ar(1,2) ma(1,2, 3, 4) test ([ARMA]L3.ma = 0) ([ARMA]L2.ma = 0) arima s2, ar(1,2) ma(1, 4) arima s2, ar(1,2) ma(1) arima s2, ar(1,2) //Most compact representation. predict res2, r wntestq res2 //Third series tsline s3 if(t <100) /* Woah! No way this is stationarry! The process is ever increasing with time! */ ac s3 //No exponential decay. pac s3 //Huge spike in the first lag, suggesting that we have an AR(1) with the coefficient being equal to one. //This is obviously not a stationarry process. corrgram s3 //We try to differenciate the series to see if its stationary: gen diff_s3 = D.s3 tsline diff_s3 if(t<100) ac diff_s3 pac diff_s3 //Seems stationary. It also seem we have no lag explaining the difference. /* This means that the first difference of the series is somehow a white noise. */ wntestq diff_s3 //This is also suggested by this test. //The statistics are : sum diff_s3 /* This suggest that the model is given by y = 1.09 + L.y + u where u ~N(0, 1) */ log close