Data Scaling, Functional Form, and Prediction
What happens if we change the units of our variables (e.g., from dollars to thousands of dollars)?
All coefficients (β̂j) and their standard errors are multiplied by `c`. The t-stats, F-stats, and R2 remain unchanged.
Its coefficient (β̂j) and standard error are divided by `c`. All other stats are unchanged.
The key takeaway is that the fundamental results of the regression (significance, goodness-of-fit) are not affected by simple rescaling. It's often done for cosmetic reasons to make coefficients easier to read.
To compare the "importance" of explanatory variables that are measured in different units, we can standardize them. A beta coefficient is the result of a regression where both the dependent and all independent variables have been converted to z-scores.
Interpretation: "A one standard deviation increase in xj is associated with a β̂j(beta) standard deviation change in y."
Your regression output shows a beta coefficient for `education` of 0.5 and for `experience` of 0.2. What does this suggest?
It suggests that in terms of standard deviation units, education has a larger effect on the dependent variable than experience does. A one standard deviation increase in education is associated with a half standard deviation increase in y, compared to only a 0.2 standard deviation increase for experience.
We can model relationships that are not straight lines by adding quadratic terms (e.g., x2) to our regression. This is great for capturing diminishing or increasing marginal effects.
y = β0 + β1x + β2x2 + u
This graph shows `wage` as a function of `experience`. Initially, experience has a strong positive effect, but this effect diminishes and eventually turns negative after the "turning point". The turning point is at x = -(β̂1 / 2β̂2).
Sometimes the effect of one variable depends on the level of another. We can model this by including an interaction term (the product of two variables).
y = β0 + β1x1 + β2x2 + β3(x1 × x2) + u
Interpretation: The partial effect of x1 on y is now β1 + β3x2. It's no longer constant but depends on the value of x2!
In a model explaining student test scores, you interact `class_size` with `%_poor_students`. You find a negative coefficient on the interaction term. What does this imply?
It implies that the negative effect of larger class sizes is even worse in schools with a higher percentage of poor students. The two factors reinforce each other's negative impact.
A problem with the standard R2 is that it always increases when you add a new variable, even if that variable is useless. This can tempt us to build overly complex models.
Important: You can't use R̄2 to compare models with different dependent variables (e.g., `wage` vs. `log(wage)`).
We can use our estimated OLS model to make predictions. But it's important to know what we're predicting.
This is a range for the average value of y for a given set of x's. For example, "What is the average GPA for all students with a 1200 SAT?".
This interval is relatively narrow.
This is a range for a single, specific outcome of y. For example, "What is the GPA for a specific student, Jane, who had a 1200 SAT?".
This interval is always wider because it must also account for the unobserved error term `u` for that specific person.
If your dependent variable is `log(y)`, your regression predicts the log of y, not y itself. Simply exponentiating the prediction, exp(log(ŷ)), will give a biased (under)estimate of y.
A better prediction for `y` can be made by including a correction factor based on the model's residuals:
This method also allows you to create a comparable R-squared to decide whether a `y` or `log(y)` model fits the data better.
This chapter covered several practical issues that make multiple regression a flexible and powerful tool.