8.2 Introduction to GLMs

Earlier we noted linear regression is typically applied to continuous variables. The ubiquity of categorical data leads us to a modeling framework better suited to handling a wide range of categorical outcomes: the Generalized Linear Model (GLM).

In the GLM, the response variable \(y_{i}\) is assumed to follow an exponential distribution with mean \(\mu_{i}\), which itself is a nonlinear function of \(x^{'}_{i}\beta\). We can think about \(\mu\) as the mean of a conditional response distribution at a given point in the covariate space.

There are three important components to the GLM:

A random component: The random component of the GLM contains the response variable \(\mathbf{Y}\) and its probability distribution (e.g. the binomial distribution of \(\mathbf{Y}\) in the binary regression model).
A Linear Predictor: The linear predictor typically takes the form of \(\mathbf{X}\boldsymbol{\beta}\) where \(\mathbf{X}\) is an \(n \times q\) matrix of observations and \(\boldsymbol{\beta}\) is an \(q \times 1\) column vector.
Link Function: The link function, typically specified as \(g()\), is used to relate each component of \(\mathbb{E}(\mathbf{Y})\) to the linear predictor, \(g[\mathbb{E}(\mathbf{Y})]=\mathbf{X}\boldsymbol{\beta}\).

8.2.1 Linear Regression as GLM

Linear regression can be formulated in the GLM framework as follows:

\[ \mu_{i} = \beta_{0} + \beta_{1}x_{1i} \]

A random component: We can make specify \(\mathbf{Y} \sim \mathcal{N}(\mu, \sigma^2)\).
A Linear Predictor: \(\mathbf{X}\) are the continuous or discrete explanatory variables. The way we think about the structural component here doesn’t really differ from how we think about it with standard linear models; in fact, that’s one of the nice advantages of the GLM.
Link Function: For linear regression we use the identity link (e.g. \(\eta=g[\mathbb{E}(\mathbf{Y})]=\mathbb{E}(\mathbf{Y})\)).

8.2.2 Logistic Regression as GLM

Let’s also take a look at binary logistic regression formulated as GLM.

\[ \mathrm{logit}(\pi_i) = log(\frac{\pi_{i}}{1-\pi_i})=\beta_{0} + \beta_{1}x_{1i} \]

A random component: The distribution of \(\mathbf{Y}\) is assumed to be binomial with success probability \(\mathbb{E}(\mathbf{Y})=\pi\).
A Linear Predictor: \(\mathbf{X}\) are the continuous or discrete explanatory variables.
Link Function: For logistic regression we use the log-odds (or logit) link (e.g. \(\eta=g(\pi)=log(\frac{\pi_{i}}{1-\pi_i})\)), where \(\eta\) is the transformed outcome.

8.2.3 Poisson Regression as GLM

Poisson regression can also be formulated as a GLM:

\[ \mathrm{log}(\lambda_i) = \beta_{0} + \beta_{1}x_{1i} \]

A random component: The distribution of \(\mathbf{Y}\) is assumed to be Poisson with mean \(\lambda\) \(\mathbb{E}(\mathbf{Y})=\pi\).
A Linear Predictor: \(\mathbf{X}\) are the continuous or discrete explanatory variables.
Link Function: For Poisson regression the log link is used.

8.2.4 Additional Remarks

When the outcome data \(\mathbf{Y}\) are not normally distributed, we can always do transformation to change its scale. These are typically done via link functions denoted as \(g(\cdot)\) - so we get \(g(\mathbf{Y})\). If we denote the transformed outcome as \(\boldsymbol{\eta}\), then we can denote it as:

\[g(\mathbf{Y})=\boldsymbol{\eta}\]

From a conceptual point of view, the link function \(g(\cdot)\) transforms \(\mathbf{Y}\) into a normal outcome. Note that we are simplifying notation somewhat: while we are modeling some expectation of \(\mathbf{Y}\), not exactly \(\mathbf{Y}\), we will keep on using \(\mathbf{Y}\). This is to say the link is applied to the parameter governing the response distribution, not the actual response data. We use link functions to formalize that the conditional expectation for \(\mathbf{Y}\) (conditional because it is the expected value of \(Y\) depending on the level of the predictors and the chosen link).

Each link function also has an inverse, \(h(\cdot)=g^{-1}(\cdot)\), which allows us to define

\[\mathbf{y}=g^{-1}(\boldsymbol{\eta})=h(\boldsymbol{\eta})\]

The inverse of a link function back-converts the linear combination of predictors into the original outcome.