Here, we’ll look at some pilot data from an experiment! In this experiment, the researchers manipulated the type of music that participants were exposed to – either a familiar condition where songs were by artists that participants had previously indicated they knew well, an unfamiliar condition where songs were by artists that participants had reported not knowing, and a control condition where the clips were not songs at all but instead weather and traffic clips. For today’s purposes, we’ll just focus on the familiar and unfamiliar conditions.
In particular, we’ll use multilevel models here because the data are nested! This is a within-participants study where each participant completed multiple listening trials in each condition. Because of this, a multilevel model will help us take into account that each person might respond differently to the manipulation (while still estimating the group-wide response)!
Here we’ll ask the question: did clips in the familiar condition evoke more positive emotions than clips in the unfamiliar condition
Participants rated how each clip made them feel immediately after listening on a likert-type scale from 1 (extremely negative) to 7 (extremely positive), in the musicValence
column
library(tidyverse)
library(rstanarm)
library(tidybayes)
df = read_csv('https://raw.githubusercontent.com/pab2163/amfm_public/master/pilot_data/pilot_responses.csv') %>%
dplyr::filter(condition != 'C', is.na(Skipped)) %>%
dplyr::select(Year, Title, Artist, Genre, condition, musicValence, familiarityRating, participantID) %>%
dplyr::mutate(condition = dplyr::recode(condition, 'F' = 'Familiar', 'U' = 'Unfamiliar'))
head(df, 3)
musicValence
)Make a multilevel model using rstanarm::stan_glm()
with random slopes and intercepts for each participant
vm = rstanarm::stan_glmer(data = df, musicValence ~ condition + (condition | participantID))
condition
on musicValence
?m_summary = summary(vm, probs = c(.025, .975))
m_summary[1:5,]
## mean mcse sd
## (Intercept) 5.72406510 0.011839586 0.4197725
## conditionUnfamiliar -0.76073525 0.009579719 0.3617231
## b[(Intercept) participantID:pilot007] -0.05638515 0.012626759 0.4721663
## b[conditionUnfamiliar participantID:pilot007] 0.62669272 0.011705870 0.4751995
## b[(Intercept) participantID:pilot008] 0.50418256 0.012054689 0.4638101
## 2.5% 97.5% n_eff
## (Intercept) 4.8224795 6.57335992 1257
## conditionUnfamiliar -1.4766210 -0.02750033 1426
## b[(Intercept) participantID:pilot007] -1.0230932 0.86263653 1398
## b[conditionUnfamiliar participantID:pilot007] -0.1524269 1.67313737 1648
## b[(Intercept) participantID:pilot008] -0.3833722 1.50394494 1480
## Rhat
## (Intercept) 1.0005199
## conditionUnfamiliar 1.0013919
## b[(Intercept) participantID:pilot007] 0.9995995
## b[conditionUnfamiliar participantID:pilot007] 1.0014827
## b[(Intercept) participantID:pilot008] 1.0016159
Looks like on average, musicValence
is lower by about -0.7607353, 95% PI [-1.476621, -0.0275003] or the unfamiliar condition as compared to the familiar condition. So, within 95% confidence, our model thinks that the unfamiliar condition causes musicValence
to be lower relative to the familiar condition.
spread_draws()
to get the full posterior distribution for the conditionUnfamilar
term, and plot the posterior distribution for this effectHint: stat_halfeye()
is a good way to go here
condition_posterior = vm %>% spread_draws(conditionUnfamiliar)
ggplot(condition_posterior, aes(x = conditionUnfamiliar)) +
stat_halfeye(.width = c(0.5, 0.95)) +
labs(x = 'Estimated Difference in Evoked Emotion\nUnfamiliar - Familiar Condition', y = 'Relative Posterior Density') +
theme_bw() +
geom_vline(xintercept = 0, lty = 2)
We can see that the 95% posterior interval comes pretty close to 0 but doesn’t overlap here. What does it mean if it comes close to 0? What if we plotted the 97% posterior interval instead and used that to make a decision?
musicValence
for each conditionFirst plot the predictions for musicValence
in each condition for the group average, but then also overlay predictions for each participant
Bonus: connect each participant’s mean (or median) prediction for each condition to make ‘slopes’. Which participant has the steepest slope?
pred_frame = data.frame(condition = c('Unfamiliar', 'Familiar'))
pred_frame_participants = expand.grid(condition = c('Unfamiliar', 'Familiar'), participantID = unique(df$participantID), stringsAsFactors = FALSE)
# use re_formula = NA so that the random effects aren't considered
vm_predictions = tidybayes::add_fitted_draws(newdata = pred_frame, model = vm, re_formula = NA)
# now consider the random effects
vm_predictions_participant = tidybayes::add_fitted_draws(model = vm, newdata = pred_frame_participants) %>%
median_qi()
## Warning: `combine()` was deprecated in dplyr 1.0.0.
## Please use `vctrs::vec_c()` instead.
ggplot(vm_predictions, aes(x = condition, y = .value)) +
stat_halfeye() +
geom_point(data = vm_predictions_participant,
aes(x = condition, y = .value, color = participantID),
position = position_dodge(.3)) +
geom_errorbar(data = vm_predictions_participant,
aes(x = condition, y = .value, ymin = .lower, ymax = .upper, color = participantID),
position = position_dodge(.3),
width = 0) +
geom_line(data = vm_predictions_participant,
aes(x = condition, y = .value, color = participantID, group = participantID),
position = position_dodge(.3)) +
labs(y = 'Evoked Emotion', x = 'Condition') +
theme_bw()
Make 1 panel for each participant with condition
on the x-axis and musicValence
on the y-axis. Plot both the model predictions and the raw data (or participant-level means and standard errors for musicValence
).
Is the model making good predictions for each participant? Are there any participants where the model is predicting particularly well or poorly?
ggplot(data = df, aes(x = condition, y = musicValence)) +
geom_point() +
stat_summary(fun.data = mean_se) +
geom_point(data = vm_predictions_participant,
aes(x = condition, y = .value, color = participantID),
position = position_dodge(.3)) +
facet_grid(~participantID) +
labs(color = 'Model Predictions') +
theme_bw() +
labs(y = 'Evoked Emotion') +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
## Warning: Removed 1 rows containing non-finite values (stat_summary).
## Warning: Removed 1 rows containing missing values (geom_point).
- The model predictions are not exactly the same as the raw data for all participants becuase the model is partial pooling – because the model considers all participants’ random slopes and intercepts to each come from normal distributions, the model pools people to look more like one another.
- For example, because 5/6 participants show more positive evoked emotion in the familiar condition, the model is pulling the estimates for
pilot007
to show an effect in this direction (red points) when the raw data for this participant actually go in the opposite direction (more positive emotion in the unfamiliar condition). However, the model’s effect forpilot007
isn’t fully pooled – the estimated effect for this participant is weaker for the other participants, meaning the model is taking into account both the participant-level data and group-level data simultaneously.
pp_check()
Does the distribution of the model’s predictions for musicValence
seem to match the actual distribution of musicValence
?
pp_check(vm)
Not really, the model is predicting a normal distribution when the distribution of the actual musicValence
is more ‘lumpy’
pp_check(vm, plotfun = 'ppc_hist', nreps = 1)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
- Now we can see the model isn’t doing so well matching the distribution because
musicValaence
responses are ordinal on a Likert scale from 1-7. In fact, the only responses in the real data were in[2,3,4,5,6,7]
.- Since the model is a linear regression, it is assuming the outcome variable is normally distributed. This can’t really be true since likert-type data are bounded (they have a minimum and maximum) and no responses can be given between integers. Next time, we can talk about some models that might make better assumptions given likert-type data!