7.2 Intercept-Only Model
For our first example, we focus on verbal ability at Grade 2 as an outcome (verb2
in the data frame wiscsub
). Examining the distribution for ‘verb2’.
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 204 25.42 6.11 25.98 25.4 6.57 5.95 39.85 33.9 -0.06 -0.34 0.43
ggplot(data=wiscsub, aes(x=verb2)) +
geom_histogram(binwidth=2.5, fill="white", color="black", boundary=0) +
xlab("Verbal Ability Grade 2") + ylab("Count") +
xlim(0,50) +
theme_classic()
7.2.1 Intercept-Only Equation
The simplest model is an intercept only model. In this case, we would fit the model
\[ verb_{2i} = b_0 + \epsilon_{i}\]
Written out explicitly with the “silent” 1 in it, we get
\[ verb_{2i} = b_01_i + \epsilon_{i}\]
This is helpful for explicit translation into the R code, specifically the formula
within the lm()
function.
7.2.2 Intercept-Only Model in R
We fit the model using the following code. Note that the code has the ‘1’ predictor variable stated explicitly.
##
## Call:
## lm(formula = verb2 ~ 1, data = wiscsub, na.action = na.exclude)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.4653 -4.6403 0.5647 4.2822 14.4347
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25.4153 0.4275 59.45 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.106 on 203 degrees of freedom
Note that we used na.exclude
instead of na.omit
(default); practically speaking na.omit
deletes missing data entries while na.exclude
just excludes from the analysis.Therefore with na.exclude
, in the residuals and fitted values, NA
will show up where there were missing values.
The output indicates that \(b_0\) = 25.4153, and its standard error = 0.4275.
The intercept reflects the expected value of the outcome variable when all of the predictor variables (i.e. \(\left\{ x_{1i}, ..., x_{qi}\right\}\)) = 0.
So, in the absence of any additional information other than the descriptive statistics of \(verb_{2i}\), what is our best guess for a person’s \(verb_{2i}\) score?
It is the mean of \(verb_{2i}\). The regression above confirms this notion; regressing the outcome on a vector of 1s allows us to ‘recover’ the mean.
7.2.3 Intercept as Mean of Outcome
Notice we can confirm this finding using matrix algebra, as well.
\[ \mathbb{E}(verb_{2i}) = \mathbb{E}(b_01_i + \epsilon_{i}) \]
From the properties of expectation, we have \(\mathbb{E}(X+Y)=\mathbb{E}(X) + \mathbb{E}(Y)\).
\[ \mathbb{E}(verb_{2i}) = \mathbb{E}(b_01_i) + \mathbb{E}(\epsilon_{i}) \]
Another property of expectation relates to taking the expectation of a constant, \(\mathbb{E}(c)=c\), thus \(\mathbb{E}(b_01_i)=b_0\), implying
\[ \mathbb{E}(verb_{2i}) = b_01_i + \mathbb{E}(\epsilon_{i}). \] Remembering Assumption 1, \(\mathbb{E}(\epsilon_{i})=0\), we have
\[ \mathbb{E}(verb_{2i}) = b_0 \] We can confirm this by looking at Verbal Scores at Wave 2.
## [1] 25.41534
7.2.4 Intercept-Only Model \(R^2\)
Yes - we recovered the mean, but we did not attempt to explain any of the variance. Let’s take a look at the variance explained for Model 1.
## [1] 0
It thus makes sense that we get 0 as the R-square. From the properties of variance, we know that \(\mathbb{V}(c)=0\). There is no variability due to the regression model because there are no predictors, only a constant.