An experimental dataset

Here, we’ll look at some pilot data from an experiment! In this experiment, the researchers manipulated the type of music that participants were exposed to – either a familiar condition where songs were by artists that participants had previously indicated they knew well, an unfamiliar condition where songs were by artists that participants had reported not knowing, and a control condition where the clips were not songs at all but instead weather and traffic clips. For today’s purposes, we’ll just focus on the familiar and unfamiliar conditions.

In particular, we’ll use multilevel models here because the data are nested! This is a within-participants study where each participant completed multiple listening trials in each condition. Because of this, a multilevel model will help us take into account that each person might respond differently to the manipulation (while still estimating the group-wide response)!

Our question

Here we’ll ask the question: did clips in the familiar condition evoke more positive emotions than clips in the unfamiliar condition

Participants rated how each clip made them feel immediately after listening on a likert-type scale from 1 (extremely negative) to 7 (extremely positive), in the musicValence column

library(tidyverse)
library(rstanarm)
library(tidybayes)

Data cleaning (you don’t have to change anything here)

df = read_csv('https://raw.githubusercontent.com/pab2163/amfm_public/master/pilot_data/pilot_responses.csv') %>%
  dplyr::filter(condition != 'C', is.na(Skipped)) %>%
  dplyr::select(Year, Title, Artist, Genre, condition, musicValence, familiarityRating, participantID) %>%
  dplyr::mutate(condition = dplyr::recode(condition, 'F' = 'Familiar', 'U' = 'Unfamiliar'))


head(df, 3)

1. Model the effect of condition on evoked emotions (musicValence)

Make a multilevel model using rstanarm::stan_glm() with random slopes and intercepts for each participant

vm = rstanarm::stan_glmer(data = df, musicValence ~ condition + (condition | participantID))

2. Interpret the model. How would you describe the effect of condition on musicValence?

m_summary = summary(vm, probs = c(.025, .975))
m_summary[1:5,]
##                                                      mean        mcse        sd
## (Intercept)                                    5.72406510 0.011839586 0.4197725
## conditionUnfamiliar                           -0.76073525 0.009579719 0.3617231
## b[(Intercept) participantID:pilot007]         -0.05638515 0.012626759 0.4721663
## b[conditionUnfamiliar participantID:pilot007]  0.62669272 0.011705870 0.4751995
## b[(Intercept) participantID:pilot008]          0.50418256 0.012054689 0.4638101
##                                                     2.5%       97.5% n_eff
## (Intercept)                                    4.8224795  6.57335992  1257
## conditionUnfamiliar                           -1.4766210 -0.02750033  1426
## b[(Intercept) participantID:pilot007]         -1.0230932  0.86263653  1398
## b[conditionUnfamiliar participantID:pilot007] -0.1524269  1.67313737  1648
## b[(Intercept) participantID:pilot008]         -0.3833722  1.50394494  1480
##                                                    Rhat
## (Intercept)                                   1.0005199
## conditionUnfamiliar                           1.0013919
## b[(Intercept) participantID:pilot007]         0.9995995
## b[conditionUnfamiliar participantID:pilot007] 1.0014827
## b[(Intercept) participantID:pilot008]         1.0016159

Looks like on average, musicValence is lower by about -0.7607353, 95% PI [-1.476621, -0.0275003] or the unfamiliar condition as compared to the familiar condition. So, within 95% confidence, our model thinks that the unfamiliar condition causes musicValence to be lower relative to the familiar condition.

3. Use spread_draws() to get the full posterior distribution for the conditionUnfamilar term, and plot the posterior distribution for this effect

Hint: stat_halfeye() is a good way to go here

condition_posterior = vm %>% spread_draws(conditionUnfamiliar)

ggplot(condition_posterior, aes(x = conditionUnfamiliar)) +
  stat_halfeye(.width = c(0.5, 0.95)) +
  labs(x = 'Estimated Difference in Evoked Emotion\nUnfamiliar - Familiar Condition', y = 'Relative Posterior Density') +
  theme_bw() +
  geom_vline(xintercept = 0, lty = 2)

We can see that the 95% posterior interval comes pretty close to 0 but doesn’t overlap here. What does it mean if it comes close to 0? What if we plotted the 97% posterior interval instead and used that to make a decision?

4. Plot the model predictions for musicValence for each condition

First plot the predictions for musicValence in each condition for the group average, but then also overlay predictions for each participant

Bonus: connect each participant’s mean (or median) prediction for each condition to make ‘slopes’. Which participant has the steepest slope?

pred_frame = data.frame(condition = c('Unfamiliar', 'Familiar')) 
pred_frame_participants = expand.grid(condition = c('Unfamiliar', 'Familiar'), participantID = unique(df$participantID), stringsAsFactors = FALSE)

# use re_formula = NA so that the random effects aren't considered
vm_predictions = tidybayes::add_fitted_draws(newdata = pred_frame, model = vm, re_formula = NA)


# now consider the random effects
vm_predictions_participant = tidybayes::add_fitted_draws(model = vm, newdata = pred_frame_participants) %>%
  median_qi()
## Warning: `combine()` was deprecated in dplyr 1.0.0.
## Please use `vctrs::vec_c()` instead.
ggplot(vm_predictions, aes(x = condition, y = .value)) +
  stat_halfeye() +
  geom_point(data = vm_predictions_participant, 
                     aes(x = condition, y = .value, color = participantID), 
                     position = position_dodge(.3)) +
   geom_errorbar(data = vm_predictions_participant, 
                     aes(x = condition, y = .value, ymin = .lower, ymax = .upper, color = participantID), 
                     position = position_dodge(.3),
                     width = 0) +
  geom_line(data = vm_predictions_participant, 
                     aes(x = condition, y = .value, color = participantID, group = participantID), 
                     position = position_dodge(.3)) +
  labs(y = 'Evoked Emotion', x = 'Condition') +
  theme_bw()

5. Compare the model predictions to the raw data by making a panel plot

Make 1 panel for each participant with condition on the x-axis and musicValence on the y-axis. Plot both the model predictions and the raw data (or participant-level means and standard errors for musicValence).

Is the model making good predictions for each participant? Are there any participants where the model is predicting particularly well or poorly?

ggplot(data = df, aes(x = condition, y = musicValence)) +
  geom_point() +
  stat_summary(fun.data = mean_se) +
  geom_point(data = vm_predictions_participant, 
                     aes(x = condition, y = .value, color = participantID), 
                     position = position_dodge(.3)) +
  facet_grid(~participantID) +
  labs(color = 'Model Predictions') +
  theme_bw() +
  labs(y = 'Evoked Emotion') +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
## Warning: Removed 1 rows containing non-finite values (stat_summary).
## Warning: Removed 1 rows containing missing values (geom_point).

  • The model predictions are not exactly the same as the raw data for all participants becuase the model is partial pooling – because the model considers all participants’ random slopes and intercepts to each come from normal distributions, the model pools people to look more like one another.
  • For example, because 5/6 participants show more positive evoked emotion in the familiar condition, the model is pulling the estimates for pilot007 to show an effect in this direction (red points) when the raw data for this participant actually go in the opposite direction (more positive emotion in the unfamiliar condition). However, the model’s effect for pilot007 isn’t fully pooled – the estimated effect for this participant is weaker for the other participants, meaning the model is taking into account both the participant-level data and group-level data simultaneously.

6. Run a posterior predictive check using pp_check()

Does the distribution of the model’s predictions for musicValence seem to match the actual distribution of musicValence?

pp_check(vm)

Not really, the model is predicting a normal distribution when the distribution of the actual musicValence is more ‘lumpy’

pp_check(vm, plotfun = 'ppc_hist', nreps = 1)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

  • Now we can see the model isn’t doing so well matching the distribution because musicValaence responses are ordinal on a Likert scale from 1-7. In fact, the only responses in the real data were in [2,3,4,5,6,7].
  • Since the model is a linear regression, it is assuming the outcome variable is normally distributed. This can’t really be true since likert-type data are bounded (they have a minimum and maximum) and no responses can be given between integers. Next time, we can talk about some models that might make better assumptions given likert-type data!