8.3 Binary Logistic Regression

8.3.1 Overcoming LPM

To avoid the problems of the LPM we’d like a model where

\[P(\text{Event Occurs}|x_{1},\dots,x_{q})\]

is forced to be within the range of \(0\) to \(1\). One way to do this is to transform the probability above into the odds metric,

\[ \mathrm{Odds}(\mathbf{Y})= \frac{P(\text{Outcome = 1}|x_{1},\dots,x_{q})}{P(\text{Outcome = 0}|x_{1},\dots,x_{q})} = \frac{P(\text{Outcome = 1}|x_{1},\dots,x_{q})}{P(1- \text{Outcome = 1}|x_{1},\dots,x_{q})} \]

which has a range of \(0\) to \(\infty\). So, we are halfway there. Indeed, by taking the log of the odds (or logit) we extend the support of \(\mathbf{Y}\) to have a range of \(-\infty\) to \(\infty\). This maps probability ranging between \(0\) and \(1\) to log odds ranging from negative infinity to positive infinity.

This is one example of why the logit link is used for logistic regression. Nowe we can seamlessly model the probability of an event occurring, giving the explanatory variables, \(x_{1},\dots,x_{q}\).

We denote this probability as \(\pi(x_{1},\dots,x_{q})\) , or equivalently, \(P(\text{Event Occurs}|x_{1},\dots,x_{q})\).

Often times you will simply see \(\pi\) for convenience, but it is important to remember this probability is conditional on the explanatory variables in the model.

8.3.2 Model

The binary logistic regression model is expressed as

\[ log(\frac{\pi_{i}}{1-\pi_i})=\beta_{0} + \beta_{1}x_{1i}. \]

Where \(\left(\frac{(\pi_{i})}{1-(\pi_{i})}\right)\) is the odds of an event occurring and \(log\) is the natural logarithm. Therefore, the parameter estimates from a generalized linear regression using the logistic link function are scaled in log-odds or logit units.

We can also rewrite the model above, solving for \(\pi_{i}\), as

\[ \pi_{i}=\frac{\mathrm{exp}(\beta_{1}x_{1i})}{1+\mathrm{exp}(\beta_{1}x_{1i})} \]

This is also called the inverse function for the logit link function, or the logistic link, \(h(\cdot) = \frac{e^{(\cdot)}}{1+e^{(\cdot)}}\). In practice, this transformation is what is used for solving the regression equation, and it is called logistic regression: