What Happens When the Error Variance Is Not Constant?
One of the Gauss-Markov assumptions (MLR.5) is homoskedasticity. It means the variance of the error term `u` is constant for any value of the explanatory variables.
Var(u | x1, ..., xk) = σ2
Heteroskedasticity is the violation of this assumption. The variance of the error term changes with the values of the x's.
The graph shows the error variance for savings vs. income. It's homoskedastic if the spread is constant, but heteroskedastic if the spread increases with income.
If we have heteroskedasticity, what breaks?
The Bottom Line: Your coefficient estimates are fine, but you can't trust your statistical inference (hypothesis tests and CIs).
Fortunately, there's a straightforward fix. We can compute heteroskedasticity-robust standard errors (often called White standard errors).
Modern Practice: Many researchers now use robust standard errors as the default in cross-sectional regressions, because it protects them against heteroskedasticity without costing much.
Even if we use robust standard errors, it's good practice to test for heteroskedasticity. The basic idea is to see if the squared OLS residuals can be predicted by the explanatory variables.
H0: Var(u | x) = σ2 (Homoskedasticity)
The White Test is a more general version that also includes squares and interactions of the x's in the auxiliary regression.
You estimate a wage equation and find that the robust standard error for `education` is much larger than the usual OLS standard error. What does this tell you?
It's a strong sign that heteroskedasticity is present and is important in your model. The usual OLS standard error was likely too small, making you overly confident in the precision of your estimate.
You should definitely use the robust standard error for hypothesis testing and confidence intervals, as the usual OLS inference is unreliable.
If heteroskedasticity is present, OLS is no longer the most efficient estimator. If we know the form of the heteroskedasticity, we can use Weighted Least Squares (WLS) to get a more efficient estimator.
Caution: WLS is only better than OLS if you correctly specify the form of the variance. If you get it wrong, WLS can be worse than OLS. This is why robust standard errors are a more popular "catch-all" solution.
Heteroskedasticity is a common issue in cross-sectional data, but thankfully, we have good tools to deal with it.