Bayesian Regression Challenge 2 Answers

Overview

The questions and exercises below all include examples using a dataset put together in the Tottenham Lab of many Billboard Hot 100 songs from 1946-1983. Today in this challenge you’ll want to work to answer several questions:

Did pop and rock songs on the Billboard charts tend to get louder on averaage from 1946-1983?
From the 1946-1983 Billboard charts, which were louder on average – pop or rock songs?
Did changes in average loudness from 1946-1983 differ for pop and rock songs?

To answer these questions, please use a combination of Bayesian regression models with rstanarm::stan_glm(), and plotting with ggplot()
If anything is confusing or you get stuck, please use the previous lesson plans and challenge answer keys as a reference, or reach out to an instructor!
Remember, we do not expect you to memorize syntax, nor is this a crucial part of coding or stats. Use google, stack overflow, documentation, previous lessons and challenges, colleagues, and instructors as a reference – this is what ‘real-life’ coding is like!

0. Setup

You don’t have to write code in this section, but it will be helpful to read through to understand the data a little better.

Load the needed packages (install if necessary and ASK AN INSTRUCTOR if you’re getting stuck here

library(tidyverse)
library(car)
library(rstanarm)
library(tidybayes)
library(bayesplot)

Now, we can load in the data from Github using read_csv()

music = read_csv('../../../DANLAB/Projects/AMFM/music_databases/familiar_music_database.csv')

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   title = col_character(),
##   artist = col_character(),
##   id = col_character(),
##   uri = col_character(),
##   genre = col_character(),
##   cover = col_logical(),
##   start_time = col_time(format = "")
## )

## See spec(...) for full column specifications.

names(music)

##  [1] "year"             "title"            "artist"           "id"              
##  [5] "uri"              "rank"             "genre"            "cover"           
##  [9] "danceability"     "energy"           "key"              "loudness"        
## [13] "speechiness"      "acousticness"     "instrumentalness" "liveness"        
## [17] "valence"          "tempo"            "mode"             "time_signature"  
## [21] "duration_ms"      "start_time"

We have a lot of columns here we could look at, but for today’s challenge just focus on genre, loudnesss, and year. We’ll also only look at the Pop and Rock genres for today

# select just the relevant columns
music = dplyr::select(music, title, artist, year, genre, loudness) %>%
  # filter the data to only include Pop/Rock genre
  dplyr::filter(genre %in% c('Pop', 'Rock')) %>%
  # scale/center loudness
  dplyr::mutate(loudness = as.vector(scale(loudness, center = TRUE, scale = TRUE)))

head(music)

## # A tibble: 6 x 5
##   title                                         artist       year genre loudness
##   <chr>                                         <chr>       <dbl> <chr>    <dbl>
## 1 I Can't Begin To Tell You - Single Version    Bing Crosby  1946 Pop     -2.96 
## 2 Let It Snow! Let It Snow! Let It Snow!        Vaughn Mon…  1946 Pop      1.24 
## 3 Doctor, Lawyer, Indian Chief                  Betty Hutt…  1946 Pop     -0.461
## 4 Personality                                   Johnny Mer…  1946 Pop     -1.68 
## 5 Oh! What It Seemed to Be                      Frankie Ca…  1946 Pop     -1.13 
## 6 Prisoner of Love (with Russ Case & His Orche… Perry Como   1946 Pop     -0.495

1. Did pop and rock songs on the Billboard charts tend to get louder on averaage from 1946-1983?

Build a Bayesian regression model using rstanarm::stan_glm() with 1 continuous predictor to understand relationships betweenyear and loudness in the music dataset!
Interpret your model parameters: what can you conclude about average changes in loudness between 1946-1983? how strong is your evidence?
Graph your model predictions using tidybayes and ggplot()
- Bonus: graph model predictions and the raw data on the same plot!

Fit the model

year_loud_model = rstanarm::stan_glm(data = music, loudness ~ year)

# use probs = c(.025, .975) to get the middle 95% of the posterior
# use digits=3 to see more digits after the decimal for each term
year_loud_summary = summary(year_loud_model, probs = c(.025, .975), digits = 3)
year_loud_summary

## 
## Model Info:
##  function:     stan_glm
##  family:       gaussian [identity]
##  formula:      loudness ~ year
##  algorithm:    sampling
##  sample:       4000 (posterior sample size)
##  priors:       see help('prior_summary')
##  observations: 2135
##  predictors:   2
## 
## Estimates:
##               mean    sd      2.5%    97.5%
## (Intercept) -43.769   4.219 -52.002 -35.474
## year          0.022   0.002   0.018   0.026
## sigma         0.976   0.015   0.947   1.006
## 
## Fit Diagnostics:
##            mean   sd     2.5%   97.5%
## mean_PPD  0.000  0.030 -0.059  0.058 
## 
## The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
## 
## MCMC diagnostics
##               mcse  Rhat  n_eff
## (Intercept)   0.064 0.999 4356 
## year          0.000 0.999 4356 
## sigma         0.000 1.000 3623 
## mean_PPD      0.000 1.000 3830 
## log-posterior 0.029 1.003 1912 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

On average, the music was get louder in later years. On average, each year was associated with an average loudness increase of 0.0222496, 95% PI [0.018034, 0.0264386] (in standardized units)

# Make a dataframe of years, then get model predictions for average loudness each year in the dataset
year_df = data.frame(year = unique(music$year))

# get model predictions for each year
year_loudness_predictions = tidybayes::add_fitted_draws(model = year_loud_model, newdata = year_df)


head(year_loudness_predictions)

## # A tibble: 6 x 6
## # Groups:   year, .row [1]
##    year  .row .chain .iteration .draw .value
##   <dbl> <int>  <int>      <int> <int>  <dbl>
## 1  1946     1     NA         NA     1 -0.445
## 2  1946     1     NA         NA     2 -0.469
## 3  1946     1     NA         NA     3 -0.492
## 4  1946     1     NA         NA     4 -0.438
## 5  1946     1     NA         NA     5 -0.412
## 6  1946     1     NA         NA     6 -0.519

What’s going on in this dataframe of predictions?

Tidybayes has pulled out 4000 estimates of predicted loudness for each year – these estimates are in the .value column. In general with this add_fitted_draws() function, the .value column is the outcome variable in your regression model
We don’t have to pay much attention to the .row, .chain, and .iteration columns. The .draw column is helpful though, because it helps us see that there are exactly 4000 ‘draws’ (estimate ‘drawn’ from the posterior distribution) for each value of year. This will be true in general - tidybayes will generate 4000 draws for every combination of predictors in the dataframe given as the newdata argument

**Now, let’s plot the predictions, using tidybayes::stat_lineribbon(), which gives us a nice ‘ribbon band’ that summarizes the 4000 predictions at different value for year and makes a smooth ‘ribbon’ for different levels of posterior uncertainty. Here, the darker inner band shows the 80% posterior interval and the outer lighter band shows the 95% posterior interval

ggplot(data = year_loudness_predictions, aes(x = year, y = .value)) +
  tidybayes::stat_lineribbon(.width = c(.8, .95)) +
  scale_fill_brewer() +
  labs(fill = 'Posterior Interval Level', x = 'Year', y = 'Song Loudness (standardized)') +
  theme_bw()

Now, with points for raw data too. This helps us see how large any changes in average volume are on the scale of the distribution of all songs

ggplot(data = year_loudness_predictions, aes(x = year, y = .value)) +
  geom_point(data = music, aes(x = year, y = loudness), alpha = .4) + 
  tidybayes::stat_lineribbon(.width = c(.8, .95)) +
  scale_fill_brewer() +
  labs(fill = 'Posterior Interval Level', x = 'Year', y = 'Song Loudness (standardized)') +
  theme_bw()

2. From the 1946-1983 Billboard charts, which were louder on average – pop or rock songs?

Build a Bayesian regression model using rstanarm::stan_glm() with 1 categorical predictor to understand relationships between genre and loudness in the music dataset!
Interpret your model parameters: what can you conclude about average difference in loudness between rock versus pop songs? how strong is your evidence?
Graph your model predictions using tidybayes and ggplot()
- Bonus: graph model predictions and the raw data on the same plot!

genre_loud_model = rstanarm::stan_glm(data = music, loudness ~ genre)

# use probs = c(.025, .975) to get the middle 95% of the posterior
# use digits=3 to see more digits after the decimal for each term
genre_loud_summary = summary(genre_loud_model, probs = c(.025, .975), digits = 3)
genre_loud_summary

## 
## Model Info:
##  function:     stan_glm
##  family:       gaussian [identity]
##  formula:      loudness ~ genre
##  algorithm:    sampling
##  sample:       4000 (posterior sample size)
##  priors:       see help('prior_summary')
##  observations: 2135
##  predictors:   2
## 
## Estimates:
##               mean   sd     2.5%   97.5%
## (Intercept) -0.116  0.028 -0.168 -0.062 
## genreRock    0.286  0.044  0.201  0.370 
## sigma        0.991  0.015  0.962  1.020 
## 
## Fit Diagnostics:
##            mean   sd     2.5%   97.5%
## mean_PPD  0.001  0.030 -0.057  0.062 
## 
## The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
## 
## MCMC diagnostics
##               mcse  Rhat  n_eff
## (Intercept)   0.000 1.000 3919 
## genreRock     0.001 1.000 3664 
## sigma         0.000 1.000 4156 
## mean_PPD      0.000 1.000 3990 
## log-posterior 0.028 1.001 1950 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

On average, the rock songs were louder than pop. This is indicated by the genreRock term which describes differencess in loudness for rock songs relative to pop songs. On average, each rock was associated with an average loudness increase of 0.2863652, 95% PI [0.2009516, 0.3702407] (in standardized units) compared to pop music

# Make a dataframe of years, then get model predictions for average loudness each genre in the dataset
# We use stringsAsFactors=FALSE because the genre variable was treated as a string, not as a factor, in the raw data fed into the model
genre_df = data.frame(genre = c('Rock', 'Pop'), stringsAsFactors = FALSE)

# get model predictions for each genre
genre_loudness_predictions = tidybayes::add_fitted_draws(model = genre_loud_model, newdata = genre_df)


head(genre_loudness_predictions)

## # A tibble: 6 x 6
## # Groups:   genre, .row [1]
##   genre  .row .chain .iteration .draw .value
##   <chr> <int>  <int>      <int> <int>  <dbl>
## 1 Rock      1     NA         NA     1 0.171 
## 2 Rock      1     NA         NA     2 0.181 
## 3 Rock      1     NA         NA     3 0.195 
## 4 Rock      1     NA         NA     4 0.0819
## 5 Rock      1     NA         NA     5 0.114 
## 6 Rock      1     NA         NA     6 0.220

Like with the first model, we get 4000 posterior predictions for each level of the predictors in newdata. Here, the dataframe is 8000 rows – 4000 predictions in the .value column for the averaage loudness of Rock and Pop songs, respectively.

One way of plotting this – we use tidybayes::stat_halfeye() to get plots of the full posterior distributions plus points (posterior mean) and eror bars (we use the .width parameter to set here that we want an 80% (thicker bar) and 95% (thinner bar) posterior interval)

ggplot(data = genre_loudness_predictions, aes(x = .value, fill = genre)) +
  tidybayes::stat_halfeye(.width = c(.8, .95)) +
  labs(y = 'Relative Posterior Density', 
       x = 'Estimated Loudness (standardized)', 
       fill = 'Song Genre') +
  theme_bw()

Or we could flip the axes the other way

ggplot(data = genre_loudness_predictions, aes(x = genre, y = .value, fill = genre)) +
  tidybayes::stat_halfeye(.width = c(.8, .95)) +
  labs(y = 'Estimated Loudness (standardized)', 
       x = 'Song Genre', 
       fill = 'Song Genre') +
  theme_bw() +
  theme(legend.position = 'none')

We can also add the raw data in as jittered points, or a boxplot.

ggplot(data = genre_loudness_predictions, aes(x = genre, y = .value, fill = genre)) +
  geom_jitter(data = music, aes(x = genre, y = loudness), width = .4, alpha = .2) +
  tidybayes::stat_halfeye(.width = c(.8, .95)) +
  labs(y = 'Estimated Loudness (standardized)', 
       x = 'Song Genre', 
       fill = 'Song Genre') +
  theme_bw() +
  theme(legend.position = 'none')

Hmm, looks kinda weird - why?

The average differences between pop/rock song volume are pretty small, and we’re also pretty sure about them, so the posterior distributions now look very narrow and spiky once we add in all the raw datapoints (check out how the y axis has changed from the previous plot)
The 80% and 95% posterior intervals for the estimates of average loudness for pop/rock songs are small enough that we can’t even really see them on this plot – they blend in with the point estimates.

With a boxplot instead

ggplot(data = genre_loudness_predictions, aes(x = genre, y = .value, fill = genre)) +
  geom_boxplot(data = music, aes(x = genre, y = loudness), width = .3, alpha = .2, position = position_nudge(-.3)) +
  tidybayes::stat_halfeye(.width = c(.8, .95)) +
  labs(y = 'Estimated Loudness (standardized)', 
       x = 'Song Genre', 
       fill = 'Song Genre') +
  theme_bw() +
  theme(legend.position = 'none')

3. Did changes in average loudness from 1946-1983 differ for pop and rock songs?

Build a Bayesian regression model using rstanarm::stan_glm() with 1 categorical predictor, 1 continuous predictor, and an interaction term between them, to understand loudness as a function of genre, year, and their interaction.
Interpret your model parameters: how do changes in loudness over time from 1946-1983 differ for pop versus rock songs? how strong is your evidence?
- And – how do the results you are seeing here add more nuance to your interpretations from parts 1 & 2.
Graph your model predictions using tidybayes and ggplot()
- Bonus: graph model predictions and the raw data on the same plot!

# here we center year first to make the interaction term more interpretable
music = dplyr::mutate(music, year_center = as.vector(scale(year, center = TRUE, scale = FALSE)))

# the mean year in the dataset, so year_scale=0 represents this year
mean(music$year)

## [1] 1967.181

interaction_model = rstanarm::stan_glm(data = music, loudness ~ year_center*genre)

# use probs = c(.025, .975) to get the middle 95% of the posterior
# use digits=3 to see more digits after the decimal for each term
interaction_summary = summary(interaction_model, probs = c(.025, .975), digits = 3)
interaction_summary

## 
## Model Info:
##  function:     stan_glm
##  family:       gaussian [identity]
##  formula:      loudness ~ year_center * genre
##  algorithm:    sampling
##  sample:       4000 (posterior sample size)
##  priors:       see help('prior_summary')
##  observations: 2135
##  predictors:   4
## 
## Estimates:
##                         mean   sd     2.5%   97.5%
## (Intercept)           -0.048  0.027 -0.102  0.006 
## year_center            0.028  0.003  0.023  0.033 
## genreRock              0.225  0.045  0.136  0.312 
## year_center:genreRock -0.030  0.005 -0.039 -0.021 
## sigma                  0.964  0.015  0.936  0.994 
## 
## Fit Diagnostics:
##            mean   sd     2.5%   97.5%
## mean_PPD  0.000  0.030 -0.059  0.056 
## 
## The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
## 
## MCMC diagnostics
##                       mcse  Rhat  n_eff
## (Intercept)           0.000 1.001 4641 
## year_center           0.000 1.000 2961 
## genreRock             0.001 1.001 4050 
## year_center:genreRock 0.000 0.999 3196 
## sigma                 0.000 1.000 4554 
## mean_PPD              0.000 1.000 4227 
## log-posterior         0.036 1.001 1759 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

From this model, the year_center term now describes the estimated difference in loudness associated with 1 year specifically for pop songs. The genreRock term describes the estimated difference in rock > pop music loudness at year = 0 (1967, since we centered year), and the year_center:genreRock term describes how different the associations between year and loudness are for rock songs, compared to pop songs. In other words, this interaction term describes how the slope of year (where x is year, y = loudness) differs for rock > pop music. For each of these terms, the 95% posterior interval excludes 0, so we might interpret these as consistent associations in the dataset.

Overall, the estimates indicate: * that pop songs tended to be louder in later years as indicated by the year_center parameter * At year_center=0, rock songs tended to be louder than pop songs * rock songs had a more negative association between year and loudness compared to pop songs

However, this is a little hard to parse from the model output alone, so let’s plot!

First, make a dataframe of years AND genre using expand.grid() – now our dataframe is ALL combinations of genre and year present in the dataset

interaction_df = expand.grid(genre = c('Rock', 'Pop'), 
                       year_center = unique(music$year_center),
                       stringsAsFactors = FALSE) 

# for plotting purposes we'll want the actual year back in
interaction_df$year = interaction_df$year_center + mean(music$year)

head(interaction_df)

##   genre year_center year
## 1  Rock    -21.1808 1946
## 2   Pop    -21.1808 1946
## 3  Rock    -20.1808 1947
## 4   Pop    -20.1808 1947
## 5  Rock    -19.1808 1948
## 6   Pop    -19.1808 1948

# get model predictions for each combindation of genre and year
interaction_predictions = tidybayes::add_fitted_draws(model = interaction_model, newdata = interaction_df)
head(interaction_predictions)

## # A tibble: 6 x 8
## # Groups:   genre, year_center, year, .row [1]
##   genre year_center  year  .row .chain .iteration .draw .value
##   <chr>       <dbl> <dbl> <int>  <int>      <int> <int>  <dbl>
## 1 Rock        -21.2  1946     1     NA         NA     1  0.210
## 2 Rock        -21.2  1946     1     NA         NA     2  0.135
## 3 Rock        -21.2  1946     1     NA         NA     3  0.249
## 4 Rock        -21.2  1946     1     NA         NA     4  0.144
## 5 Rock        -21.2  1946     1     NA         NA     5  0.114
## 6 Rock        -21.2  1946     1     NA         NA     6  0.165

Great, now we have 4000 posterior predictions for loudness for every combination of genre and year_center, with the predictions stored in the .value column. We can make a plot now very similar to the first one we made with tidybayes::stat_lineribbon() only this time we’ll set fill=genre

ggplot(data = interaction_predictions, aes(x = year, y = .value, fill = genre)) +
  tidybayes::stat_lineribbon(.width = c(.95), alpha = .8) +
  labs(fill = 'Song Genre', x = 'Year', y = 'Song Loudness (standardized)') +
  theme_bw()

This gives us a nice clear visualization showing that while pop songs tended to be louder in later years, this wasn’t true for rock songs. However, rock songs in the earlier years were louder than pop, with pop songs eventually catching up in loudness around 1975. Now let’s add the raw data in too

ggplot(data = interaction_predictions, aes(x = year, y = .value, fill = genre)) +
  geom_point(data = music, aes(x = year, y = loudness, color = genre)) +
  tidybayes::stat_lineribbon(.width = c(.95), alpha = .8) +
  labs(fill = 'Song Genre', x = 'Year', y = 'Song Loudness (standardized)') +
  theme_bw() +
  guides(color = FALSE)

## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

Apart from contextualizing these trends on the scaale of the data, this graph also reveals something else important when we see the raw data – there really weren’t many rock songs in the dataset at all before the the late 1950s! So, how might this impact our conclusions if we hoped to say that pop music increased in loudness over time while rock did not? We might have to scale back our inference and dig a little deeper into what kinds of songs were released over time, and what processes lead to different songs being labeled as ‘Rock’ versus ‘Pop’ in the first place (i.e. record label and radio promotion strategies, geographical locationsand identities of the musical artists, who was doing the genre ‘labeling’).

4. Wrap-up

Congrats! You’ve finished the challenge!

Before you move on, we recommend that you take 5 minutes to go back and comment your code – as a gift to yourself when you come back later to run similar analyses! Commenting and documenting right away is almost always going to ultimately save you time and energy.
For any concepts or pieces of code that don’t make sense yet, write down your questions! Be sure to either ask a friend or an instructor right away, or ask in slack so you can learn.