7.4 Mean Centering Predictors
In this case, and in many other cases, the intercept does not have a ‘useful’ interpretation for the empirical example. This is because no students had a Grade 1 verbal score equal to 0.
Therefore, if we want to make the intercept more meaningful, we need to make a Grade 1 verbal score with a more meaningful 0 point. Typically we center the predictor variables in regression analysis.
For example, we create a centered variable, \(x^{*}_{1i}\) by subtracting the sample mean, \(\bar{x_1}\) from each observation,
\[ x^{*}_{1i} = x_{1i} - \bar{x_1} \]
Our model becomes
\[ y_i = b_0(1_i) + b_1(x^{*}_{1i}) + \epsilon_i \]
We can sample-mean center \(verb_{1i}\) in R as follows
#calculate the mean centered variable
wiscsub$verb1_star <- wiscsub$verb1 - mean(wiscsub$verb1, na.rm = TRUE)
Then we can fit a new model using \(verb^{*}_{1i}\), such that
\[ verb_{2i} = b_0(1_i) + b_1(verb^{*}_{1i}) + \epsilon_i \]
##
## Call:
## lm(formula = verb2 ~ 1 + verb1_star, data = wiscsub, na.action = na.exclude)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.5305 -3.0362 0.2526 2.7147 12.5020
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25.41534 0.29831 85.20 <2e-16 ***
## verb1_star 0.75495 0.05149 14.66 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.261 on 202 degrees of freedom
## Multiple R-squared: 0.5156, Adjusted R-squared: 0.5132
## F-statistic: 215 on 1 and 202 DF, p-value: < 2.2e-16
Note: Mean centering should be used to aid interpretation. Historically, it has been suggested that mean centering will reduce multicollinearity, however this is not the case. See for more information.
7.4.1 Interpreting Model Parameters
Note that the estimate for the slope \(b_1\) stays the same, but the estimate for the intercept is different. This is because the variable ‘verb1_star’ equals 0 when a child has an average 1st grade verbal score. Therefore the expected value for the 2nd grade verbal score, for a child with an average 1st grade verbal score, is 25.41534.
7.4.2 Plotting Regression Line
ggplot(data=wiscsub, aes(x=verb1_star,y=verb2)) +
geom_point(size = 2, shape=19) +
geom_smooth(method=lm,se=TRUE,fullrange=TRUE,colour="red", size=2) +
labs(x= "Sample-Centered Verbal Ability Grade 1", y= "Verbal Ability Grade 2") +
xlim(-20,20) +
ylim(0,50) +
#theme with white background
theme_bw() +
#eliminate background, gridlines, and chart border
theme(
plot.background = element_blank()
,panel.grid.major = element_blank()
,panel.grid.minor = element_blank()
,panel.border = element_blank()
) +
#draws x and y axis line
theme(axis.line = element_line(color = 'black')) +
#set size of axis labels and titles
theme(axis.text = element_text(size=12),
axis.title = element_text(size=14))
## `geom_smooth()` using formula = 'y ~ x'
Note the change of scale on the x-axis.