Large Sample Properties and Highly Persistent Data
The time series assumptions from Chapter 10 (like strict exogeneity) are often violated in the real world.
Models like the AR(1) model must violate strict exogeneity:
yt = β0 + β1yt-1 + ut
The error ut affects yt, which means ut is correlated with future values of the explanatory variable (yt, yt+1, etc.). This violates TS.3.
When our assumptions for unbiasedness fail, we rely on asymptotic properties (consistency and asymptotic normality) which hold as the sample size grows.
For our large sample results to hold, a time series must be weakly dependent.
This means that observations in the distant past are not too strongly correlated with observations in the present. As the time between them grows, the correlation between them approaches zero.
When |ρ| < 1, the correlation between yt and yt-h is ρh, which goes to zero as h → ∞.
When ρ = 1, shocks are permanent and the series is highly persistent (not weakly dependent).
Under a new, weaker set of assumptions (TS.1' to TS.5') that require weak dependence and contemporaneous exogeneity (E(ut|xt)=0), we get our key results:
1. Consistency: The OLS estimators converge to the true population values as n gets large.
2. Asymptotic Normality: The OLS estimators are approximately normally distributed in large samples.
The Big Takeaway: We can still use our usual t-stats, F-stats, and confidence intervals for inference, but now they are only justified in large samples. We don't need strict exogeneity or normality!
Many economic time series (like asset prices, GDP, exchange rates) are not weakly dependent. They are better described as a random walk.
yt = yt-1 + et
A process like this is called integrated of order one, or I(1). Weakly dependent series are I(0).
A stable AR(1) process (blue) always returns to its mean. A random walk (red) does not; shocks have permanent effects.
Regressing one I(1) time series on another unrelated I(1) time series will often produce a high R2 and a statistically significant t-statistic, even though there is no real relationship between them.
This is called a spurious regression. The variables appear related only because they are both trending together by chance.
These two random walks were generated independently. But if you regressed one on the other, you'd likely find a "significant" relationship!
Given the danger of spurious regression, what is the fix when we suspect our variables have a unit root?
The first difference of an I(1) process is I(0) (weakly dependent). If yt is a random walk, then:
Δyt = yt - yt-1 = et
Since et is weakly dependent, we can safely use the first difference in our regressions. This is the standard way to avoid spurious regressions.
Large sample properties are essential for most time series applications, where the strict classical assumptions are unlikely to hold.