8.1 Categorical Data in the Social Sciences

Linear regression is a workhorse procedure of modern statistics. Our introduction to regression in this class was framed around the idea of a continuous dependent (outcome) variable. However, categorical data is extremely common in many health, behavioral and social science applications.

8.1.1 Examples of Categorical Data

  • Binary Variables have two categories and are often used to indicate that an event has occurred or a characteristic is present. Are you sick? Did you vote in the last election? Are you married?

  • Ordinal variables have categories that can be ranked. Surveys often ask respondents to indicate their agreement to a statement, how frequently then engage in a behavior, or even educational attainment.

  • Nominal variables occur when there are multiple outcomes that cannot be ordered. For example, left or right handedness or occupation.

  • Censored variables occur when the value of a variable is unknown over some range of the variable. For example, measuring hourly wages might be restricted on the lower end by minimum wage laws.

  • Counts indicate the number of times that some event has occurred. How many drinks last week? How many people living in a house? How many years of education? Censored and count variables are often lumped in with more traditional categorical variables under the umbrella of limited dependent variables.