Inference
In Chapter 3, we learned how to estimate the parameters (βj) of our model using OLS. But an estimate from one sample is just that—one estimate. How confident can we be in it? How do we test our economic theories?
To perform inference, we need to know the sampling distribution of our OLS estimators. We add one final assumption to the Gauss-Markov assumptions:
Assumption MLR.6 (Normality):
The population error term 'u' is independent of the explanatory variables and is normally distributed with a mean of 0 and variance σ2.Why? This assumption implies that the OLS estimators (β̂j) are also normally distributed. This is the foundation that lets us use the t-test and F-test.
This full set of six assumptions is called the Classical Linear Model (CLM) assumptions.
The most common test we run is whether a variable xj has a statistically significant effect on y. We are testing a hypothesis about the unknown population parameter βj.
This is the "boring" case, the theory we want to test against. Most often:
H0: βj = 0
(xj has no ceteris paribus effect on y)
What you believe if the null is false. Can be:
This tells us how many standard errors our estimate is from the hypothesized value (usually 0).
We compare our t-statistic to a critical value from the t-distribution. If our t-stat is "extreme" enough, we reject the null hypothesis.
For a 5% two-sided test, we reject H0 if |t-statistic| > critical value. This happens if the t-stat falls in either of the 2.5% tails.
The p-value is a more informative way to summarize the evidence against the null hypothesis.
"The p-value is the smallest significance level at which we could reject the null hypothesis. It's the probability of observing a t-statistic as extreme as we did, if the null hypothesis were true."
Small p-value (e.g., < 0.05): Strong evidence against H0. We say the result is "statistically significant".
Large p-value (e.g., > 0.10): Weak evidence against H0. We "fail to reject" the null.
Your regression output for the effect of `education` on `log(wage)` shows:
Coefficient = 0.092, Standard Error = 0.007
Instead of just a point estimate, a confidence interval gives us a plausible range of values for the true population parameter βj.
Interpretation: "If we drew many random samples and constructed a 95% CI for each, we would expect 95% of those intervals to contain the true population parameter βj."
A useful shortcut: A 95% CI is roughly the point estimate plus or minus two standard errors.
If the 95% CI does not contain 0, it's equivalent to rejecting H0: βj = 0 at the 5% level against a two-sided alternative.
What if we want to test if a group of variables has an effect on y? For example, in a salary regression, do performance metrics as a group matter?
We can't just check their t-stats individually. We need a joint hypothesis test.
The full model with all variables.
log(sal) = β0 + β1yrs + β2games + β3bavg + β4hruns
The model where the null (H0: β3=0, β4=0) is imposed.
log(sal) = β0 + β1yrs + β2games
The F-test checks if the R-squared increases enough when we move from the restricted to the unrestricted model to justify adding the variables.
The F-statistic is calculated based on the R-squareds from the two models.
R2ur = R-squared from the unrestricted model (the bigger one).
R2r = R-squared from the restricted model.
q = Number of restrictions (variables dropped).
n - k - 1 = Degrees of freedom in the unrestricted model.
A large F-statistic provides evidence against the null hypothesis, suggesting the variables are "jointly significant."
You run a regression and find that three variables (x1, x2, x3) are individually not statistically significant (their t-stats are small). However, the F-test for their joint significance (H0: β1=β2=β3=0) has a very small p-value.
How is this possible?
This is a classic symptom of multicollinearity. The variables x1, x2, and x3 are likely highly correlated with each other.
Because they move together, it's hard for OLS to disentangle their individual effects, leading to high standard errors and insignificant t-statistics. However, the F-test shows that as a group, they still have significant explanatory power.
This is one of the most important lessons in econometrics. The two concepts are not the same!
Determined by the size of the t-statistic (or p-value). It tells us how confident we are that an effect is not zero.
It is heavily influenced by sample size; with a huge sample, even tiny effects can become "statistically significant".
Determined by the size and sign of the coefficient (β̂j). It tells us if the variable's effect is large enough to be important in the real world.
This requires subject-matter knowledge and judgment.
Always discuss both! An effect can be statistically significant but too small to matter, or economically large but estimated too imprecisely to be statistically significant (especially in small samples).