7.4 Mean Centering Predictors

In this case, and in many other cases, the intercept does not have a ‘useful’ interpretation for the empirical example. This is because no students had a Grade 1 verbal score equal to 0.

Therefore, if we want to make the intercept more meaningful, we need to make a Grade 1 verbal score with a more meaningful 0 point. Typically we center the predictor variables in regression analysis.

For example, we create a centered variable, \(x^{*}_{1i}\) by subtracting the sample mean, \(\bar{x_1}\) from each observation,
\[ x^{*}_{1i} = x_{1i} - \bar{x_1} \] Our model becomes
\[ y_i = b_0(1_i) + b_1(x^{*}_{1i}) + \epsilon_i \] We can sample-mean center \(verb_{1i}\) in R as follows

#calculate the mean centered variable
wiscsub$verb1_star <- wiscsub$verb1 - mean(wiscsub$verb1, na.rm = TRUE)

Then we can fit a new model using \(verb^{*}_{1i}\), such that

\[ verb_{2i} = b_0(1_i) + b_1(verb^{*}_{1i}) + \epsilon_i \]

model3 <- lm(verb2 ~ 1 + verb1_star,
              data = wiscsub,
              na.action = na.exclude)
summary(model3)

## 
## Call:
## lm(formula = verb2 ~ 1 + verb1_star, data = wiscsub, na.action = na.exclude)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.5305  -3.0362   0.2526   2.7147  12.5020 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 25.41534    0.29831   85.20   <2e-16 ***
## verb1_star   0.75495    0.05149   14.66   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.261 on 202 degrees of freedom
## Multiple R-squared:  0.5156, Adjusted R-squared:  0.5132 
## F-statistic:   215 on 1 and 202 DF,  p-value: < 2.2e-16

Note: Mean centering should be used to aid interpretation. Historically, it has been suggested that mean centering will reduce multicollinearity, however this is not the case. See Olvera & Kroc (2018) for more information.

7.4.1 Interpreting Model Parameters

Note that the estimate for the slope \(b_1\) stays the same, but the estimate for the intercept is different. This is because the variable ‘verb1_star’ equals 0 when a child has an average 1st grade verbal score. Therefore the expected value for the 2nd grade verbal score, for a child with an average 1st grade verbal score, is 25.41534.

7.4.2 Plotting Regression Line

ggplot(data=wiscsub, aes(x=verb1_star,y=verb2)) +
  geom_point(size = 2, shape=19) +
  geom_smooth(method=lm,se=TRUE,fullrange=TRUE,colour="red", size=2) +
  labs(x= "Sample-Centered Verbal Ability Grade 1", y= "Verbal Ability Grade 2") +
  xlim(-20,20) +
  ylim(0,50) +
  #theme with white background
  theme_bw() +
  #eliminate background, gridlines, and chart border
  theme(
    plot.background = element_blank()
    ,panel.grid.major = element_blank()
    ,panel.grid.minor = element_blank()
    ,panel.border = element_blank()
  ) +
  #draws x and y axis line
  theme(axis.line = element_line(color = 'black')) +
  #set size of axis labels and titles
  theme(axis.text = element_text(size=12),
        axis.title = element_text(size=14))

## `geom_smooth()` using formula = 'y ~ x'

Note the change of scale on the x-axis.