Chapter 6: Further Issues

Data Scaling, Functional Form, and Prediction

Effects of Data Scaling

What happens if we change the units of our variables (e.g., from dollars to thousands of dollars)?

If you multiply the dependent variable `y` by `c`:

All coefficients (β̂j) and their standard errors are multiplied by `c`. The t-stats, F-stats, and R2 remain unchanged.

If you multiply an independent variable `x`j by `c`:

Its coefficient (β̂j) and standard error are divided by `c`. All other stats are unchanged.

The key takeaway is that the fundamental results of the regression (significance, goodness-of-fit) are not affected by simple rescaling. It's often done for cosmetic reasons to make coefficients easier to read.

Beta Coefficients (Standardized Coefficients)

To compare the "importance" of explanatory variables that are measured in different units, we can standardize them. A beta coefficient is the result of a regression where both the dependent and all independent variables have been converted to z-scores.

Interpretation: "A one standard deviation increase in xj is associated with a β̂j(beta) standard deviation change in y."

Check Your Understanding

Your regression output shows a beta coefficient for `education` of 0.5 and for `experience` of 0.2. What does this suggest?

Answer:

It suggests that in terms of standard deviation units, education has a larger effect on the dependent variable than experience does. A one standard deviation increase in education is associated with a half standard deviation increase in y, compared to only a 0.2 standard deviation increase for experience.

Functional Form I: Quadratics

We can model relationships that are not straight lines by adding quadratic terms (e.g., x2) to our regression. This is great for capturing diminishing or increasing marginal effects.

y = β0 + β1x + β2x2 + u

This graph shows `wage` as a function of `experience`. Initially, experience has a strong positive effect, but this effect diminishes and eventually turns negative after the "turning point". The turning point is at x = -(β̂1 / 2β̂2).

Functional Form II: Interaction Terms

Sometimes the effect of one variable depends on the level of another. We can model this by including an interaction term (the product of two variables).

y = β0 + β1x1 + β2x2 + β3(x1 × x2) + u

Interpretation: The partial effect of x1 on y is now β1 + β3x2. It's no longer constant but depends on the value of x2!

Check Your Understanding

In a model explaining student test scores, you interact `class_size` with `%_poor_students`. You find a negative coefficient on the interaction term. What does this imply?

Answer:

It implies that the negative effect of larger class sizes is even worse in schools with a higher percentage of poor students. The two factors reinforce each other's negative impact.

A Better Goodness-of-Fit: Adjusted R2

A problem with the standard R2 is that it always increases when you add a new variable, even if that variable is useless. This can tempt us to build overly complex models.

The Adjusted R-squared (R̄2)

  • It modifies the R2 formula to penalize the inclusion of extra variables.
  • It only increases if the added variable has a t-statistic greater than 1 (in absolute value).
  • It's useful for comparing nonnested models (models where neither is a special case of the other).

Important: You can't use R̄2 to compare models with different dependent variables (e.g., `wage` vs. `log(wage)`).

Prediction and Residual Analysis

We can use our estimated OLS model to make predictions. But it's important to know what we're predicting.

Confidence Interval for the Prediction

This is a range for the average value of y for a given set of x's. For example, "What is the average GPA for all students with a 1200 SAT?".

This interval is relatively narrow.

Prediction Interval

This is a range for a single, specific outcome of y. For example, "What is the GPA for a specific student, Jane, who had a 1200 SAT?".

This interval is always wider because it must also account for the unobserved error term `u` for that specific person.

Predicting `y` from a `log(y)` Model

If your dependent variable is `log(y)`, your regression predicts the log of y, not y itself. Simply exponentiating the prediction, exp(log(ŷ)), will give a biased (under)estimate of y.

A Simple Correction

A better prediction for `y` can be made by including a correction factor based on the model's residuals:

  1. Estimate your model and get the fitted values, log(ŷ)i.
  2. Exponentiate them: i = exp(log(ŷ)i).
  3. Run a simple regression of the original yi on i (with no intercept). The slope coefficient from this regression, α̃0, is your correction factor.
  4. Your final prediction is ŷ = α̃0 exp(log(ŷ)).

This method also allows you to create a comparable R-squared to decide whether a `y` or `log(y)` model fits the data better.

Chapter 6 Summary

This chapter covered several practical issues that make multiple regression a flexible and powerful tool.

  • Data Scaling changes coefficients in predictable ways but does not alter fundamental conclusions.
  • Logarithms, Quadratics, and Interactions are powerful tools for modeling nonlinear relationships.
  • The Adjusted R2 penalizes model complexity and is useful for comparing nonnested models.
  • We must be careful not to "over control" by adding variables that we shouldn't be holding fixed.
  • We can construct confidence intervals for our predictions, but must distinguish between predicting an average value and predicting an individual outcome.