Overview

We will be using data from the same study (PANDAS) as from last week’s challenge to practice making longitudinal multilevel models. As a reminder, we are not focusing on running models yet. These data include a variable for ID and gender, and then assess self-worth (SPPA_SWORTH), communication quality with mothers (MPACS), and generalized anxiety (SCARED_GAD). Self-worth, communication and GAD were assessed every 6-months at 6 different timepoints. The csv file is the same as the lesson mlm_datacleaning.csv and can be found here: https://columbia-sipps.github.io/workshops/coding_advanced/

Our question for today will be how did self worth change over time? We’ll use longitudinal multilevel models in rstanarm to approach this

Load in packages and data

library(tidyverse)
library(rstanarm)
library(bayesplot)
library(tidybayes)

d = read_csv('https://columbia-sipps.github.io/workshop_files/coding-adv-04/mlm-data-cleaning.csv')

Setup

Filter data for only IDs up to 100 (just for demo purposes of a smaller dataset)

d <- d %>% 
  filter(ID <= 100)

Select only the variables we will be plotting

Being: id, gender, and all of the self worth scores

d <- d %>% 
  select(ID, contains("SPPA"))

Rename your variables

Rename them so that they are easier to understand

d <- d %>% 
# rename vectors
  rename(
    id = 1,                   
    sworth_1 = 2,   
    sworth_2 = 3,   
    sworth_3 = 4,   
    sworth_4 = 5,   
    sworth_5 = 6,   
    sworth_6 = 7
    )

Transform our data from wide format to long format

Hint: use the pivot longer function. Note, we want there to be one column for time, and another column for self worth scores. There should be one row of data for each individual’s self worth score at each timepoint (e.g., 6 rows per person if no missing data)

d_long <- d %>% 
  tidyr :: pivot_longer(., cols = contains("sworth"), names_to = "time", values_to = "self_worth")

Clean your data in the long format

Drop the rows with NA values (the observations where there is no self worth score). Then recode time into a factor and rename the variables so that they are only numeric values that range from time 0 to time 5 instead of time 1 to time 6.

d_long <- d_long %>%
  dplyr::filter(!is.na(self_worth)) %>%
  dplyr::mutate(time = as.numeric(gsub('sworth_', '', time)))

Plot the data

ggplot(data = d_long, aes(x = time, y = self_worth)) +
  geom_line(aes(group = id), color = "black") + geom_point(aes(group = id), size = .4) +
  facet_wrap("id") + # Group variable gives text labels to panels
  labs(x = "Time",
       y = "Self-worth (self-reported)",
       title = "")

## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

1. Model changes in self worth quality over time with random intercepts only

Make a random intercept model with rstanarm

(1|id) gives us random intercepts - intercepts for each person

m1 = rstanarm::stan_glmer(data = d_long, self_worth ~ time + (1|id), cores = 4)

Interpret the ‘fixed’ effects: what is the average change in self worth over time?

# point estimates
fixef(m1)

## (Intercept)        time 
##  3.46465180 -0.08943226

# summary of full model
m1_summary = summary(m1, probs = c(.025, .975))
m1_summary[1:10, ]

##                            mean         mcse         sd         2.5%
## (Intercept)          3.46388003 0.0014967630 0.06467047  3.338117858
## time                -0.08942295 0.0002135406 0.01524420 -0.119868637
## b[(Intercept) id:1] -0.16016513 0.0025823953 0.16985973 -0.492089150
## b[(Intercept) id:2] -0.76832437 0.0028386596 0.20350379 -1.165416272
## b[(Intercept) id:3]  0.34686894 0.0026190080 0.17195948  0.009442167
## b[(Intercept) id:4]  0.43773540 0.0028000295 0.18139829  0.086202225
## b[(Intercept) id:5] -0.32916274 0.0031829013 0.22428210 -0.773475638
## b[(Intercept) id:6]  0.29266237 0.0025694379 0.17440615 -0.043970021
## b[(Intercept) id:7] -0.89665622 0.0028051782 0.20525850 -1.292143626
## b[(Intercept) id:8] -0.76718368 0.0029469372 0.19643870 -1.148060354
##                           97.5% n_eff      Rhat
## (Intercept)          3.58842888  1867 1.0003305
## time                -0.05893875  5096 0.9994887
## b[(Intercept) id:1]  0.17402994  4326 0.9998193
## b[(Intercept) id:2] -0.36703143  5139 0.9995735
## b[(Intercept) id:3]  0.69155946  4311 0.9994173
## b[(Intercept) id:4]  0.79989640  4197 1.0001505
## b[(Intercept) id:5]  0.10494675  4965 1.0000878
## b[(Intercept) id:6]  0.62290630  4607 1.0002792
## b[(Intercept) id:7] -0.49704225  5354 0.9992195
## b[(Intercept) id:8] -0.36809148  4443 0.9998679

Woah! Why so much output here? Well, all of the b[(Intercept) id:1] parameters in the model are the intercepts for each different person (each id), so there are a lot of parameterss in this model!

Plot the model predictions for the group average self worth quality (fixed effect) over time

Note that we have to add the re_formula = NA argument in the add_fitted_draws() function here to tell the model we want just the fixed effects (not predictions for specific participants)

# define a data frame of timepoints to make predictions for for the group average
pred_frame = data.frame(time = 1:6)

# get predictions using tidybayes! 
# again this dataframe looks a little confuisng at first, but the outcome (self worth) is always in the .value column
fixed_preds_m1 = tidybayes::add_fitted_draws(m1, newdata = pred_frame, re_formula = NA)

Make a plot of predicted self worth over time

ggplot(fixed_preds_m1, aes(x = time, y= .value)) +
  tidybayes::stat_lineribbon(alpha = .4) +
  labs(x = 'Time', y = 'Self-reported self worth')

a little nicer looking…

ggplot(fixed_preds_m1, aes(x = time, y= .value)) +
  tidybayes::stat_lineribbon(alpha = .4) +
  labs(x = 'Time', y = 'Self-reported self worth') +
  scale_fill_brewer()

Now, add ‘spaghetti plot’ lines for each participant’s predicted communicaaion quality over time to your plot

We can get predictions for each point in the original dataset by specificying the original dataset d_long as newdata

We also use the median_qi function to summarize and get the median and a posterior interval for each prediction. Now the .value column will be the median model prediction for each datapoint. We won’t use .upper and .lower yet but we could later

participant_preds_m1 = tidybayes::add_fitted_draws(model = m1, newdata = d_long) %>%
  median_qi(.value, .width = .95)

## Warning: `combine()` was deprecated in dplyr 1.0.0.
## Please use `vctrs::vec_c()` instead.

Now, add participant ‘spaghetti’ to our plot. We can see all the lines are parallel – they have different intercepts (random intercepts!) but the same exact slope as the fixed effect. This is because there are no random slopes – slopes cannot vary for different participants under this model.

ggplot(fixed_preds_m1, aes(x = time, y= .value)) +
  geom_line(data = participant_preds_m1, aes(group = id), alpha = .5) +
  tidybayes::stat_lineribbon(alpha = .4) +
  labs(x = 'Time', y = 'Self-reported self worth') +
  scale_fill_brewer()

2. Model changes in self worth quality over time with random intercepts AND random slopes

Make a random slope + intercept model with rstanarm

m2 = rstanarm::stan_glmer(data = d_long, self_worth ~ time + (time|id), cores = 4)

Interpret the ‘fixed’ effects: what is the average change in self worth over time?

# point estimates
fixef(m1)

## (Intercept)        time 
##  3.46465180 -0.08943226

# summary of full model (a good way to shorten output)
g = summary(m2, probs = c(.025, .975))
g[1:5,]

##                            mean         mcse         sd       2.5%       97.5%
## (Intercept)          3.45436219 0.0009549072 0.06268023  3.3325389  3.57788559
## time                -0.08559143 0.0004424263 0.02061516 -0.1252687 -0.04494921
## b[(Intercept) id:1] -0.01269774 0.0030898036 0.23607912 -0.4845167  0.45622626
## b[time id:1]        -0.04951025 0.0009222081 0.06387882 -0.1822634  0.07523683
## b[(Intercept) id:2] -0.56752480 0.0047533576 0.27912209 -1.1455183 -0.05639859
##                     n_eff      Rhat
## (Intercept)          4309 1.0011004
## time                 2171 1.0029477
## b[(Intercept) id:1]  5838 0.9992595
## b[time id:1]         4798 0.9995458
## b[(Intercept) id:2]  3448 0.9996078

Now, we have parameters like b[time id:2] for each participant! These are the random slopes!

Plot the model predictions for the group average self worth quality (fixed effect) over time

Just like with the previous model

fixed_preds_m2 = tidybayes::add_fitted_draws(m2, newdata = pred_frame, re_formula = NA)

ggplot(fixed_preds_m2, aes(x = time, y= .value)) +
  tidybayes::stat_lineribbon(alpha = .4) +
  labs(x = 'Time', y = 'Self-reported self worth') +
  scale_fill_brewer()

Now, add ‘spaghetti plot’ lines for each participant’s predicted self worth quality over time to your plot

participant_preds_m2 = tidybayes::add_fitted_draws(model = m2, newdata = d_long) %>%
  median_qi(.value, .width = .95)

ggplot(fixed_preds_m2, aes(x = time, y= .value)) +
  geom_line(data = participant_preds_m2, aes(group = id), alpha = .5) +
  tidybayes::stat_lineribbon(alpha = .4) +
  labs(x = 'Time', y = 'Self-reported self worth') +
  scale_fill_brewer()

Now, we can see the lines for the model predictions for different participants are not parallel! Random intercepts AND random slopes

Create indiviudal panel plots to check our model fits

Now, we might want to make a panel plot to see how well our model fits each participant, comparing the model predicitons with the raw data

First we plot the raw data

Then we add in our predictions from m2 & m1

panel_plt = ggplot(data = d_long, aes(x = time, y = self_worth)) +
  geom_line(aes(group = id, color = "Raw Daata")) + geom_point(aes(group = id), size = .4) + 
  geom_line(data = participant_preds_m1 , aes(x = time, y = .value, color = 'Random Intercept Model Prediction')) + 
  geom_line(data = participant_preds_m2, aes(x = time, y = .value, color = 'Random Slope Model Prediction')) + 
  facet_wrap("id") + # Group variable gives text labels to panels
  labs(x = "Time",
       y = "Self-reported self worth",
       title = "") 


# save to a file since it's a little hard to see inline
ggsave(panel_plt, file = 'panel_plt.png', height = 8, width = 15)

It’s a little hard to see because the panels are small, but we can look to visually inspect which model seems to fit better!

Multilevel Longitudinal Models

Overview

Load in packages and data

Setup

Filter data for only IDs up to 100 (just for demo purposes of a smaller dataset)

Select only the variables we will be plotting

Rename your variables

Transform our data from wide format to long format

Clean your data in the long format

Plot the data

1. Model changes in self worth quality over time with random intercepts only

Make a random intercept model with rstanarm

Interpret the ‘fixed’ effects: what is the average change in self worth over time?

Plot the model predictions for the group average self worth quality (fixed effect) over time

Make a plot of predicted self worth over time

Now, add ‘spaghetti plot’ lines for each participant’s predicted communicaaion quality over time to your plot

2. Model changes in self worth quality over time with random intercepts AND random slopes

Make a random slope + intercept model with rstanarm

Interpret the ‘fixed’ effects: what is the average change in self worth over time?

Plot the model predictions for the group average self worth quality (fixed effect) over time

Now, add ‘spaghetti plot’ lines for each participant’s predicted self worth quality over time to your plot

Create indiviudal panel plots to check our model fits