Chapter 8: Heteroskedasticity

What Happens When the Error Variance Is Not Constant?

Homoskedasticity vs. Heteroskedasticity

One of the Gauss-Markov assumptions (MLR.5) is homoskedasticity. It means the variance of the error term `u` is constant for any value of the explanatory variables.

Var(u | x₁, ..., x_k) = σ²

Heteroskedasticity is the violation of this assumption. The variance of the error term changes with the values of the x's.

The graph shows the error variance for savings vs. income. It's homoskedastic if the spread is constant, but heteroskedastic if the spread increases with income.

Consequences of Heteroskedasticity

If we have heteroskedasticity, what breaks?

What's Still OK?

OLS estimators are still unbiased.
OLS estimators are still consistent.
Your interpretation of R² is unchanged.

What's Broken?

The OLS standard errors are biased and invalid.
This means our t-stats, F-stats, and confidence intervals are unreliable.
OLS is no longer the Best Linear Unbiased Estimator (BLUE).

The Bottom Line: Your coefficient estimates are fine, but you can't trust your statistical inference (hypothesis tests and CIs).

The Solution: Robust Standard Errors

Fortunately, there's a straightforward fix. We can compute heteroskedasticity-robust standard errors (often called White standard errors).

These use a different formula that is valid even if the form of heteroskedasticity is unknown.
They are justified in large samples.
Most modern software can compute them with a simple command or option.
Once you have robust standard errors, you can compute robust t-stats and robust F-stats and proceed with inference as usual.

Modern Practice: Many researchers now use robust standard errors as the default in cross-sectional regressions, because it protects them against heteroskedasticity without costing much.

Testing for Heteroskedasticity

Even if we use robust standard errors, it's good practice to test for heteroskedasticity. The basic idea is to see if the squared OLS residuals can be predicted by the explanatory variables.

H₀: Var(u | x) = σ² (Homoskedasticity)

The Breusch-Pagan Test:

Run your original OLS regression and get the squared residuals, û².
Run an auxiliary regression of û² on all of your original explanatory variables (x₁, ..., x_k).
Compute the F-statistic for the joint significance of all x's in the auxiliary regression.
A significant F-stat is evidence of heteroskedasticity.

The White Test is a more general version that also includes squares and interactions of the x's in the auxiliary regression.

Check Your Understanding

You estimate a wage equation and find that the robust standard error for `education` is much larger than the usual OLS standard error. What does this tell you?

Answer:

It's a strong sign that heteroskedasticity is present and is important in your model. The usual OLS standard error was likely too small, making you overly confident in the precision of your estimate.

You should definitely use the robust standard error for hypothesis testing and confidence intervals, as the usual OLS inference is unreliable.

An Alternative: Weighted Least Squares (WLS)

If heteroskedasticity is present, OLS is no longer the most efficient estimator. If we know the form of the heteroskedasticity, we can use Weighted Least Squares (WLS) to get a more efficient estimator.

The Idea Behind WLS

The procedure gives less weight to observations with a higher error variance.
It transforms the original model so that the new, transformed model is homoskedastic.
OLS on the transformed model gives the WLS estimates.

Caution: WLS is only better than OLS if you correctly specify the form of the variance. If you get it wrong, WLS can be worse than OLS. This is why robust standard errors are a more popular "catch-all" solution.

Chapter 8 Summary

Heteroskedasticity is a common issue in cross-sectional data, but thankfully, we have good tools to deal with it.

Heteroskedasticity means the error variance is not constant.
It does not cause bias or inconsistency in OLS coefficients.
It does invalidate the usual standard errors, t-stats, and F-stats.
The primary solution is to use heteroskedasticity-robust standard errors for inference. This is easy and works for any unknown form of heteroskedasticity.
We can formally test for heteroskedasticity using the Breusch-Pagan or White tests.
If the form of heteroskedasticity is known, Weighted Least Squares (WLS) is a more efficient estimator than OLS.