7.4 Statistical Control Gone Wrong
In certain fields, it has become common practice to include as many covariates as possible.
Unfortunately, it is not true that simply adding more covariates will improve the estimate of a causal effect.
There are two types of variables that researchers should not control for without taking into account potential negative side effects: colliders and mediators.
Whereas confounders causally affect the independent variable of interest, colliders and mediators are causally affected by the independent variable.
7.4.1 Collider Bias
A collider for a certain pair of variables is any variable that is causally influenced by both of them.
Controlling for, or conditioning analysis on, such a variable (or any of its descendants) can introduce a spurious (i.e., noncausal) association between its causes.
In DAG terminology, a collider is the variable in the middle of an inverted fork, for example, variable B in A → B ← C.
The collider variable normally blocks the path, but when one controls for it, a spurious association between A and C can arise.
7.4.2 Conditioning on a Collider
Thought Experiment
Imagine we are interested in the effect of child maltreatment on personality features, such as extroversion.
For this thought experiment, let us assume that there is actually no causal effect of child maltreatment on extroversion.
To investigate the association, we look at all individuals with substantiated claims of maltreatment via CPS.
We find a sizable negative association: those who experienced more maltreatment show less extroversion and vice versa.
Suppose we then realize bias might be an issue and conduct a follow-up study on individuals who self-report experiencing maltreatment but do not have CPS involvement.
Again we find a sizable negative association.
By assessing substantiated and unsubstantiated cases separately, we have stratified, or conditioned, our analyses by CPS involvement.
However, let’s assume that exposure to child maltreatment and extroversion are likely to be associated with CPS contact.
In the simplest case, both have a positive effect:
- With increasing child maltreatment, the likelihood of CPS involvement increases.
- With increasing extroversion, the likelihood of CPS involvement increases.
In our thought experiment, there is no association between child maltreatment and extroversion if all individuals—with and without CPS involvement—are considered simultaneously without statistical control (of the collider or any descendants)
![Collider Bias](imgs/collider_bias.jpg)
The spurious negative correlation emerges only when the joint outcome of the two variables of interest is controlled for.
This observation generalizes to similar situations in which selection into a group is based on multiple desirable features:
- Group membership is a collider variable, and conditioning analysis on it will introduce or exaggerate trade-offs between desirable features.
7.4.3 Avoiding Collider Bias
Avoiding collider bias requires two steps.
One must be aware of the collider variable, and this may entail using a DAG to identify colliders that exist between and independent variable and outcome.
One must be able to run analyses that are not conditional on the collider. This entails not controlling for the collider when examining the main effect of interest.
Thought Experiment
In our thought experiment, we must include individuals involved and not involved with CPS
Outside of thought experiments, one might often be unaware of collider variables or collect data in such a way that collider bias is built in.
7.4.4 Variations on Collider Bias: Nonresponse Bias
Nonresponse bias occurs if, for example, a researcher analyzes only completed questionnaires, and the variables of interest are associated with questionnaire completion.
Assume that we are interested in the association between grit and intelligence, and our assessment ends up being very burdensome.
- Both grit and intelligence make it easier for respondents to push through and complete the assessment.
- Questionnaire completion is thus a collider between grit and intelligence.
- Although there might be no association between grit and intelligence in the population, we might find a spurious negative association if we analyze only completed questionnaires.
- completers low on intelligence and high levels of grit
- completers low on grit and high on intellgence
- noncompleter low on both variables less likely to finish
7.4.5 Controlling for Mediators
Overcontrol bias is another example of statistical control hurting instead of helping:
- If mediating variables are controlled for, the very processes of interest are controlled away.
Consider our previous example, now slightly modified: