7.4 Statistical Control Gone Wrong

In certain fields, it has become common practice to include as many covariates as possible.

Unfortunately, it is not true that simply adding more covariates will improve the estimate of a causal effect.

There are two types of variables that researchers should not control for without taking into account potential negative side effects: colliders and mediators.

Whereas confounders causally affect the independent variable of interest, colliders and mediators are causally affected by the independent variable.

7.4.1 Collider Bias

A collider for a certain pair of variables is any variable that is causally influenced by both of them.

Controlling for, or conditioning analysis on, such a variable (or any of its descendants) can introduce a spurious (i.e., noncausal) association between its causes.

In DAG terminology, a collider is the variable in the middle of an inverted fork, for example, variable B in A → B ← C.

The collider variable normally blocks the path, but when one controls for it, a spurious association between A and C can arise.

7.4.2 Conditioning on a Collider

Thought Experiment

Imagine we are interested in the effect of child maltreatment on personality features, such as extroversion.

For this thought experiment, let us assume that there is actually no causal effect of child maltreatment on extroversion.

To investigate the association, we look at all individuals with substantiated claims of maltreatment via CPS.

We find a sizable negative association: those who experienced more maltreatment show less extroversion and vice versa.

Suppose we then realize bias might be an issue and conduct a follow-up study on individuals who self-report experiencing maltreatment but do not have CPS involvement.

Again we find a sizable negative association.

By assessing substantiated and unsubstantiated cases separately, we have stratified, or conditioned, our analyses by CPS involvement.

However, let’s assume that exposure to child maltreatment and extroversion are likely to be associated with CPS contact.

In the simplest case, both have a positive effect:

  • With increasing child maltreatment, the likelihood of CPS involvement increases.
  • With increasing extroversion, the likelihood of CPS involvement increases.

In our thought experiment, there is no association between child maltreatment and extroversion if all individuals—with and without CPS involvement—are considered simultaneously without statistical control (of the collider or any descendants)

Collider Bias
Collider Bias

The spurious negative correlation emerges only when the joint outcome of the two variables of interest is controlled for.

This observation generalizes to similar situations in which selection into a group is based on multiple desirable features:

  • Group membership is a collider variable, and conditioning analysis on it will introduce or exaggerate trade-offs between desirable features.

7.4.3 Avoiding Collider Bias

Avoiding collider bias requires two steps.

  • One must be aware of the collider variable, and this may entail using a DAG to identify colliders that exist between and independent variable and outcome.

  • One must be able to run analyses that are not conditional on the collider. This entails not controlling for the collider when examining the main effect of interest.

Thought Experiment

  • In our thought experiment, we must include individuals involved and not involved with CPS

  • Outside of thought experiments, one might often be unaware of collider variables or collect data in such a way that collider bias is built in.

7.4.4 Variations on Collider Bias: Nonresponse Bias

Nonresponse bias occurs if, for example, a researcher analyzes only completed questionnaires, and the variables of interest are associated with questionnaire completion.

Assume that we are interested in the association between grit and intelligence, and our assessment ends up being very burdensome.

  • Both grit and intelligence make it easier for respondents to push through and complete the assessment.
  • Questionnaire completion is thus a collider between grit and intelligence.
  • Although there might be no association between grit and intelligence in the population, we might find a spurious negative association if we analyze only completed questionnaires.
    • completers low on intelligence and high levels of grit
    • completers low on grit and high on intellgence
    • noncompleter low on both variables less likely to finish

7.4.5 Controlling for Mediators

Overcontrol bias is another example of statistical control hurting instead of helping:

  • If mediating variables are controlled for, the very processes of interest are controlled away.

Consider our previous example, now slightly modified:

Fox, John, and Sanford Weisberg. 2019. An R Companion to Applied Regression. Third. Thousand Oaks CA: Sage. https://socialsciences.mcmaster.ca/jfox/Books/Companion/.
McArdle, J. J. 1988. “Dynamic but Structural Equation Modeling of Repeated Measures Data.” In Handbook of Multivariate Experimental Psychology, 2nd Ed, 561–614. Perspectives on Individual Differences. New York, NY, US: Plenum Press. https://doi.org/10.1007/978-1-4613-0893-5_17.
Mcardle, J. J., and MARK S. Aber. 1990. “Chapter 5 - Patterns of Change Within Latent Variable Structural Equation Models.” In Statistical Methods in Longitudinal Research, edited by Alexander von Eye, 1:151–224. Statistical Modeling and Decision Science. San Diego: Academic Press. https://doi.org/10.1016/B978-0-12-724960-5.50010-X.
McArdle, J. J., and David Epstein. 1987. “Latent Growth Curves Within Developmental Structural Equation Models.” Child Development 58 (1): 110–33. https://doi.org/10.2307/1130295.
McArdle, J. J., and John R. Nesselroade. 1994. “Using Multivariate Data to Structure Developmental Change.” In Life-Span Developmental Psychology: Methodological Contributions, 223–67. The West Virginia University Conferences on Life-Span Developmental Psychology. Hillsdale, NJ, US: Lawrence Erlbaum Associates, Inc.
Osborne, R. T., and D. E. Suddick. 1972. “A Longitudinal Investigation of the Intellectual Differentiation Hypothesis.” The Journal of Genetic Psychology: Research and Theory on Human Development 121 (1): 83–89. https://doi.org/10.1080/00221325.1972.10533131.
Revelle, William. 2021. Psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, Illinois: Northwestern University. https://CRAN.R-project.org/package=psych.
Tierney, Nicholas, Di Cook, Miles McBain, and Colin Fay. 2021. Naniar: Data Structures, Summaries, and Visualisations for Missing Data. https://CRAN.R-project.org/package=naniar.
Wechsler, David. 1949. Wechsler Intelligence Scale for Children. Wechsler Intelligence Scale for Children. San Antonio, TX, US: Psychological Corporation.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2021. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.