Course notes for HDFS 523
1
About This Book
1.1
Why this book?
1.2
Code Folding
1.3
Acknowledgements
2
Data Cleaning
2.1
Example Data
2.2
Reading in Repeated Measures Data
2.3
Familiarize Yourself with the Data
2.4
Look for Duplicated IDs
2.5
Using
table()
to Spot Irregularities
2.6
Missing Data
2.6.1
Generating Example Data
2.6.2
Recoding Values with
NA
2.6.3
Missing Data Visualization
2.7
Exporting Data
2.8
Reshaping Repeated Measures Data
2.8.1
Reshape Wide to Long
2.8.2
Reshape Long to Wide
3
Describing Longitudinal Data
3.1
Example Data
3.2
Describing Means and Variances
3.2.1
Verbal Ability (All Persons and Occasions)
3.2.2
Verbal Ability (Across Time)
3.3
Describing Covariances
3.4
Individual-Level Descriptives
3.5
References
4
Matrix Algebra
4.1
Types of matrices
4.1.1
Square
4.1.2
Symmetric
4.1.3
Diagonal
4.1.4
Identity
4.2
Operations on Matrices
4.2.1
Matrix Transpose
4.2.2
Matrix Trace
4.2.3
Addition
4.2.4
Subtraction
4.2.5
Matrix Multiplication
4.2.6
Matrix Division
4.3
References
5
Ordinary Least Squares
5.1
Linear Regression Model
5.2
Ordinary Least Squares (OLS)
5.3
Assumptions of OLS
5.3.1
Assumption 1.
\(\mathbb{E}(\epsilon_{i}) = 0\)
5.3.2
Assumption 2. Homoscedasticity
5.3.3
3.
\(\mathbb{E}(\epsilon_{i}\epsilon_{j}) = 0\)
5.3.4
4. No Perfect Collinearity
5.3.5
5.
\(\mathbb{C}(\epsilon_{i},x_{ki}) = 0\)
5.4
Properties of the OLS Estimator
5.4.1
1. Consistentcy of
\(\boldsymbol{{\beta}}\)
5.4.2
2. Asymptotic Normality
5.4.3
Variance of
\(\hat{\beta}\)
5.5
Failure to Meet Assumptions
5.5.1
Failure of Assumption 1.
5.5.2
Failure of Assumption 2 or 3.
5.5.3
Failure of Assumption 5.
5.6
Regression and Matrix Notation
5.6.1
An Intercept-Only Model
5.6.2
Intercept-Only Model in Matrix Form
5.6.3
Simple Regression in Matrix Form
5.6.4
Multiple Regression in Matrix Form
5.7
Solving the Regression Equation
5.7.1
Matrix Multiplication and Transpose
5.7.2
Matrix Inverse
5.8
The Linear Probability Model
5.8.1
Advantages of the LPM
5.8.2
Disadvantages of the LPM
6
Statistical Control
6.1
Statistical Control
6.2
Directed Acyclic Graphs (DAGs)
6.2.1
Introduction to DAGs
6.2.2
Introduction to DAGs: Paths
6.2.3
Introduction to DAGs: Chains
6.2.4
Introduction to DAGs: Descendants and Ancestors
6.2.5
Introduction to DAGs: Forks
6.2.6
Introduction to DAGs: Inverted Forks
6.2.7
Introduction to DAGs: Acyclicity
6.3
Statistical Control Done Right
6.3.1
Building a DAG
6.3.2
Building a DAG: Back-Door Paths
6.4
Statistical Control Gone Wrong
6.4.1
Collider Bias
6.4.2
Conditioning on a Collider
6.4.3
Avoiding Collider Bias
6.4.4
Variations on Collider Bias: Nonresponse Bias
6.4.5
Controlling for Mediators
7
Linear Regression
7.1
Example Data
7.2
Intercept-Only Model
7.2.1
Intercept-Only Equation
7.2.2
Intercept-Only Model in R
7.2.3
Intercept as Mean of Outcome
7.2.4
Intercept-Only Model
\(R^2\)
7.3
Simple Linear Regression
7.3.1
Regression Equation and Model Fitting
7.3.2
Path Diagram
7.3.3
Interpreting Model Parameters
7.3.4
Plotting Regression Line
7.4
Mean Centering Predictors
7.4.1
Interpreting Model Parameters
7.4.2
Plotting Regression Line
7.5
Multiple Linear Regression
7.5.1
Regression Equation
7.5.2
Fit Model in R
7.5.3
Path Diagram
7.5.4
A Note on Interpretation
7.6
Categorical Variable Interaction
7.6.1
Interaction as Moderation
7.6.2
Moderation by Categorical Variable
7.6.3
Interpretation
7.6.4
Fit Regression Model in R
7.6.5
Path Diagram
8
Logistic Regression
8.1
Categorical Data in the Social Sciences
8.1.1
Examples of Categorical Data
8.2
Introduction to GLMs
8.2.1
Linear Regression as GLM
8.2.2
Logistic Regression as GLM
8.2.3
Poisson Regression as GLM
8.2.4
Additional Remarks
8.3
Binary Logistic Regression
8.3.1
Overcoming LPM
8.3.2
Model
8.4
Example Data
8.4.1
Variables
8.5
Intercept-Only Model
8.5.1
Intercept-Only Model in R
8.5.2
Interpretation
8.6
Single Predictor Model
8.6.1
Overdispersion
8.6.2
Coefficients
8.7
Marginal Effects
8.7.1
A Definition of Marginal Effects
8.7.2
A Few Observations
8.7.3
Types of Marginal Effects
8.7.4
Example Model
8.7.5
Marginal Effects at Representative Values (MER)
9
Poisson Regression
9.1
Poisson Regression
9.1.1
Review of GLM
9.1.2
Poisson Regression as GLM
9.2
Poisson Distribution
9.3
Notes on Interpretation
9.3.1
One Predictor Model
9.3.2
Similarity to Logistic Regression
9.3.3
Percent Change
9.4
Example Data
9.4.1
Dependent variable
9.4.2
Explanatory Variables
9.5
Single Predictor Model
9.5.1
Read in Data
9.5.2
Single Predictor Model in GLM
9.5.3
Deviance and Goodness of Fit
9.5.4
Overdispersion?
9.5.5
Interpretation of Single Predictor Model
9.6
Multiple Predictor Model
9.7
Revisisting Overdispersion
9.7.1
Quassi-Poisson Family
9.7.2
Marginal Effects Examples
9.7.3
References
10
Marginal Effects
10.1
Reintroducing Marginal Effects
10.2
Data from Ferraro et al. (2016)
10.3
Fitting a Quasi-Poisson Model
10.4
marginaleffects: Slopes
10.5
marginaleffects: Predictions
10.6
References
11
Two-Occassion Change
11.1
Introduction
11.1.1
A Thought Experiment
11.2
Example Data I
11.3
Two-Occassions of Change
11.4
Autoregressive Model
11.5
Difference Score Model
11.6
Critiques and Comparisons
11.7
Adding Explanatory Variables
11.8
Lord’s Paradox
11.9
Example Data II
11.9.1
Generate Some Data According to
(Castro-Schilo and Grimm 2018)
11.9.2
Plot Data
11.9.3
Fit Residualized Change Model
11.9.4
Fit Difference Score Model
11.10
Example Data III
11.10.1
Generate Some Data According to
(Castro-Schilo and Grimm 2018)
Example B
11.10.2
Plot Data
11.10.3
Fit Residualized Change Model
11.10.4
Fit Difference Score Model
11.11
Closing Thoughts
11.12
References
12
Introduction to Growth
12.1
Introduction
12.1.1
What is a multilevel model?
12.1.2
Two Faces of MLM
12.1.3
Two-Level Longitudinal Data
12.2
Example Data
12.2.1
Data Preparation and Description
12.2.2
Sample Moments
12.3
A General Model
12.4
Unconditional Means Model
12.4.1
Level 1
12.4.2
Level 2
12.4.3
Single Equation
12.4.4
Model Elaboration
12.4.5
Estimated Quantities
12.4.6
More Notation
12.4.7
Unconditional Means Model in R
12.4.8
Intra-Class Correlation
12.4.9
Model-Impled Moments
12.4.10
Model Residuals
12.5
Repeated Measures ANOVA
12.5.1
Intra-Class Correlation
12.5.2
Model-Implied Mean Vector
12.5.3
Model-Implied Covariance Matrix
12.6
Repeated Measures MANOVA
12.6.1
Model-Implied Mean Vector
12.6.2
Model-Implied Covariance Matrix
12.7
Repeated Measures MANOVA (Unstructured)
12.7.1
Model-Implied Mean Vector
12.7.2
Model-Implied Covariance Matrix
12.7.3
References
13
Growth Curve Modeling
13.1
Introduction
13.2
Data Preparation and Description
13.2.1
Loading libraries used in this script.
13.3
Individual Growth Models
13.3.1
Visualizing Individual Change
13.3.2
Multiple Individuals
13.4
Unconditional Means Model
13.4.1
Predicted Trajectories
13.5
Linear Growth Model
13.5.1
Random Intercept Model
13.5.2
Random Intercept and Slopes Model
13.5.3
Model Comparison
13.5.4
MLM and Individual Models
13.6
Quadratic Growth Model
13.7
Conditional Growth Model
13.7.1
Conditional Growth Equation
13.7.2
Conditional Growth Model 1
13.7.3
Conditional Growth Model 1
13.8
Alternative Time Metrics
13.8.1
Recentering time metrics
13.8.2
Rescaling time metric
13.8.3
Remapping Time
13.8.4
Compare Growth Metrics
13.9
Intperpreting Interactions
14
Nonlinear Growth Curves
14.1
Review of Linear Growth
14.1.1
Theory of Linear Growth
14.1.2
Characteristics of Linear Growth
14.1.3
No Growth Model
14.1.4
Random Intercept Model
14.1.5
Linear Growth Model
14.1.6
Quadratic Growth Model
14.1.7
Quadratic Growth Equations
14.2
Introducing Nonlinear Growth
14.2.1
Types of Nonlinearity
14.2.2
Flexibility of Nonlinear Growth
14.2.3
Utility of Nonlinear Growth
14.2.4
Some Nonlinear Growth Models
14.2.5
Need for Nonlinear Models
14.3
Example Data (Ram & Grimm, 2007)
14.3.1
Read in Cortisol Data
14.3.2
Reshaping Data
14.3.3
Descriptives
14.3.4
Density Plots
14.3.5
Individual-level Trajectories
14.4
Linear Growth (Cortisol)
14.4.1
Equation
14.4.2
Fit Model
14.4.3
Predicted Trajectories
14.4.4
Interpretation
14.5
Quadratic Growth (Cortisol)
14.5.1
Equation
14.5.2
Fit Model
14.5.3
Predicted Trajectories
14.5.4
Interpretation
14.5.5
Interpretational Caution
14.5.6
Nonlinear or Linear Model?
14.6
Latent Basis (Cortisol)
14.6.1
Equation
14.6.2
Fit Model
14.6.3
Predicted Trajectories
14.6.4
Interpretation
14.7
Exponential Growth (Cortisol)
14.7.1
Fit Model
14.7.2
Predicted Trajectories
14.7.3
Interpretation
14.7.4
Nonlinear or Linear Model?
14.8
Multiphase Growth (Cortisol)
14.8.1
Equation
14.8.2
Fit Model
14.8.3
Predicted Trajectories
14.8.4
Interpretation
14.9
Bilinear Spline (Cortisol)
14.9.1
Equation
14.9.2
Fit Model
14.9.3
Predicted Trajectories
14.9.4
Interpretation
14.9.5
Fit Model
14.10
References
15
Dyadic Data Analysis
15.1
Introduction
15.1.1
Interpersonal Phenomena
15.1.2
Dyadic Measurement
15.1.3
Discussion question
15.2
Interdependence
15.2.1
Definition
15.2.2
Ignoring Interdependence
15.2.3
Linkage Types
15.2.4
Sources of Interdependence
15.2.5
Discussion Question
15.3
Basic Definitions
15.3.1
Distinguishability
15.3.2
Variable Types
15.4
Dyadic Designs
15.4.1
Standard Dyadic Design
15.4.2
Social Relations Model
15.4.3
One-with-many Design
15.4.4
Discussion Question:
15.5
Actor Partner Interdependence Model (APIM)
15.5.1
Model
15.5.2
Conceptual Interpretations
15.5.3
Actor-Partner Interactions
15.6
Longitudinal APIM
15.6.1
Model
15.6.2
Estimated Parameters
15.6.3
Parameter Covariation
15.7
Data Example
15.7.1
Preliminaries
15.7.2
Modeling Scenario
15.7.3
Descriptives
15.7.4
Dyadic Data Prep
15.7.5
APIM
15.7.6
Null Model
15.7.7
Full APIM
15.8
Data Example 2
15.8.1
Overview
15.8.2
Outline
15.8.3
The Modeling Enterprise
15.8.4
The Data
15.8.5
Plotting the Data
15.8.6
The Multilevel Model
15.8.7
Fit Male Model
15.8.8
Fit Female Model
15.8.9
Fit Full Model
15.8.10
Conclusion
15.9
Reference
16
Cluster Analysis
16.1
Introduction
16.1.1
Supervised Learning
16.1.2
Unsupervised Learning
16.2
Cluster Analysis
16.3
Clustering Algorithms
16.4
Hierarchical Clustering
16.4.1
Distances
16.4.2
Distance Between Clusters
16.4.3
Linkages Applied Example
16.4.4
Dendograms
16.5
K-Means Clustering
16.5.1
Within-Cluster Variation
16.5.2
K-Means Algorithm
16.5.3
K-means Example
16.5.4
Choosing K
16.6
DBSCAN
16.6.1
Example of DBSCAN
16.7
Issues in Clustering
16.7.1
Recommendations
16.8
Applied Examples
16.8.1
Preliminaries
16.8.2
Preparing Data
16.8.3
Scaling
16.8.4
Plotting
16.8.5
Distances
16.8.6
K-Means
16.8.7
Hierarchical Clustering
16.8.8
Two-step Approach
16.8.9
Final Thoughts
16.9
Reference
17
Vector Autoregressive Models
17.1
Introduction
17.2
The N=1 AR Model
17.2.1
Random Shocks
17.2.2
Autoregressive Parameter
17.2.3
AR(1) Model Example
17.2.4
Fitting a AR(1) Model in R
17.2.5
AR Coefficients Example
17.3
The N=1 VAR Model
17.3.1
Fitting an VAR(1) Model in R
17.3.2
Stationarity and Non-Stationarity
17.3.3
A Note on path Diagrams
17.4
Introducing the Multi-level VAR
17.4.1
Model
17.5
Applied Example
17.5.1
Loading Data
17.5.2
Temporal Effects
17.5.3
Contemporaneous Effects
17.5.4
Between-Subjects Effects
17.6
Reference
Published with bookdown
HDFS 523: Strategies for Data Analysis in Developmental Research
16.9
Reference