An experimental dataset

Here, we’ll look at some pilot data from an experiment! In this experiment, the researchers manipulated the type of music that participants were exposed to – either a familiar condition where songs were by artists that participants had previously indicated they knew well, an unfamiliar condition where songs were by artists that participants had reported not knowing, and a control condition where the clips were not songs at all but instead weather and traffic clips. For today’s purposes, we’ll just focus on the familiar and unfamiliar conditions.

In particular, we’ll use multilevel models here because the data are nested! This is a within-participants study where each participant completed multiple listening trials in each condition. Because of this, a multilevel model will help us take into account that each person might respond differently to the manipulation (while still estimating the group-wide response)!

Our question

Here we’ll ask the question: did clips in the familiar condition evoke more positive emotions than clips in the unfamiliar condition?

Participants rated how each clip made them feel immediately after listening on a likert-type scale from 1 (extremely negative) to 7 (extremely positive), in the musicValence column

library(tidyverse)
library(rstanarm)
library(tidybayes)

Data cleaning (you don’t have to change anything here)

df = read_csv('https://raw.githubusercontent.com/pab2163/amfm_public/master/pilot_data/pilot_responses.csv') %>%
  dplyr::filter(condition != 'C', is.na(Skipped)) %>%
  dplyr::select(Year, Title, Artist, Genre, condition, musicValence, familiarityRating, participantID) %>%
  dplyr::mutate(condition = dplyr::recode(condition, 'F' = 'Familiar', 'U' = 'Unfamiliar'))


head(df, 3)

1. Model the effect of condition on evoked emotions (musicValence)

Make a multilevel model using rstanarm::stan_glm() with random slopes and intercepts for each participant

2. Interpret the model. How would you describe the effect of condition on musicValence?

3. Use spread_draws() to get the full posterior distribution for the conditionUnfamilar term, and plot the posterior distribution for this effect

Hint: stat_halfeye() is a good way to go here

4. Plot the model predictions for musicValence for each condition

First plot the predictions for musicValence in each condition for the group average, but then also overlay predictions for each participant

Bonus: connect each participant’s mean (or median) prediction for each condition to make ‘slopes’. Which participant has the steepest slope?

5. Compare the model predictions to the raw data by making a panel plot

Make 1 panel for each participant with condition on the x-axis and musicValence on the y-axis. Plot both the model predictions and the raw data (or participant-level means and standard errors for musicValence).

Is the model making good predictions for each participant? Are there any participants where the model is predicting particularly well or poorly?

6. Run a posterior predictive check using pp_check()

Does the distribution of the model’s predictions for musicValence seem to match the actual distribution of musicValence?