June 7, 2018

discogger (day one)

I’ve been working on a new R package. discogger is an open-source effort, at an experimental stage of development, providing straightforward tools to help out with Discogs’ API v2.0. Discogs is a crowd-sourced music database and marketplace, a proper treasure trove for record diggers and music lovers to learn more about what they’re jamming to. A better route into this data for R users should yield some nice results.

discogger 101

First up, you’ll need to register a Discogs application via https://www.discogs.com/settings/developers. Doing so will grant you a “personal access token” (accessible at the same link above) which you can store as an environment variable (DISCOGS_API_TOKEN) using the discogs_api_token() function. All functions in the discogger package will automatically look for your token and prompt entry if it isn’t found.

The discogger interface to Discogs data currently spans Database (releases, artists, labels) and User Collection API requests, with a view to growing functionality in these spaces and eventually expanding into Marketplace (inventory, orders). The next section will demo how these functions can be put in to use, and chained together for quietly impressive analyses.

Diggin’ in to Dance Mania

Dance Mania is about as iconic as it gets when it comes to Chicago House record labels (you hopefully got a flavour by clicking on the name…). discogger is well equipped to retrieve record label information, with discogs_label() first up.

# load packages
library(discogger)
library(tidyverse)

# get DM label info
discogs_label(label_id = 314)
## <Discogs labels/314>
## List of 11
##  $ profile     : chr "American house music label. \r\n\r\nChicago label, founded by [a=Jesse Saunders] in 1985. Started off as Dance "| __truncated__
##  $ releases_url: chr "https://api.discogs.com/labels/314/releases"
##  $ name        : chr "Dance Mania"
##  $ contact_info: chr "Barney's One Stop Records \r\n3145 West Roosevelt Road \r\nChicago \r\nIllinois 60612 \r\nUSA \r\n\r\ntel: +1 7"| __truncated__
##  $ uri         : chr "https://www.discogs.com/label/314-Dance-Mania"
##  $ sublabels   :List of 4
##   ..$ :List of 3
##   .. ..$ resource_url: chr "https://api.discogs.com/labels/163742"
##   .. ..$ id          : int 163742
##   .. ..$ name        : chr "Dance Mania Digital"
##   ..$ :List of 3
##   .. ..$ resource_url: chr "https://api.discogs.com/labels/6507"
##   .. ..$ id          : int 6507
##   .. ..$ name        : chr "Freak Mode"
##   ..$ :List of 3
##   .. ..$ resource_url: chr "https://api.discogs.com/labels/211818"
##   .. ..$ id          : int 211818
##   .. ..$ name        : chr "Subterranean Playhouse LLC."
##   ..$ :List of 3
##   .. ..$ resource_url: chr "https://api.discogs.com/labels/5868"
##   .. ..$ id          : int 5868
##   .. ..$ name        : chr "Subterranean Playhouse Series"
##  $ urls        :List of 3
##   ..$ : chr "http://dancemaniarecords.com"
##   ..$ : chr "http://www.facebook.com/DanceManiaRecords"
##   ..$ : chr "http://www.myspace.com/dance_mania_records"
##  $ images      :List of 4
##   ..$ :List of 6
##   .. ..$ uri         : chr "https://img.discogs.com/8PGEb_C6As0EN_OHd4Pu1MlcUQw=/fit-in/175x49/filters:strip_icc():format(jpeg):mode_rgb():"| __truncated__
##   .. ..$ height      : int 49
##   .. ..$ width       : int 175
##   .. ..$ resource_url: chr "https://img.discogs.com/8PGEb_C6As0EN_OHd4Pu1MlcUQw=/fit-in/175x49/filters:strip_icc():format(jpeg):mode_rgb():"| __truncated__
##   .. ..$ type        : chr "primary"
##   .. ..$ uri150      : chr "https://img.discogs.com/KxBBy6laZLchayYbKLD79hdDeVQ=/fit-in/150x150/filters:strip_icc():format(jpeg):mode_rgb()"| __truncated__
##   ..$ :List of 6
##   .. ..$ uri         : chr "https://img.discogs.com/vMlkxiCuOQClKfkSkbmTw_ZqUmg=/fit-in/428x183/filters:strip_icc():format(jpeg):mode_rgb()"| __truncated__
##   .. ..$ height      : int 183
##   .. ..$ width       : int 428
##   .. ..$ resource_url: chr "https://img.discogs.com/vMlkxiCuOQClKfkSkbmTw_ZqUmg=/fit-in/428x183/filters:strip_icc():format(jpeg):mode_rgb()"| __truncated__
##   .. ..$ type        : chr "secondary"
##   .. ..$ uri150      : chr "https://img.discogs.com/aVpJTY8XlnvI_HEWE5NEv09Jxd4=/fit-in/150x150/filters:strip_icc():format(jpeg):mode_rgb()"| __truncated__
##   ..$ :List of 6
##   .. ..$ uri         : chr "https://img.discogs.com/2GtH1TZaxGaGTfHrudDk6LOiURA=/fit-in/326x86/filters:strip_icc():format(jpeg):mode_rgb():"| __truncated__
##   .. ..$ height      : int 86
##   .. ..$ width       : int 326
##   .. ..$ resource_url: chr "https://img.discogs.com/2GtH1TZaxGaGTfHrudDk6LOiURA=/fit-in/326x86/filters:strip_icc():format(jpeg):mode_rgb():"| __truncated__
##   .. ..$ type        : chr "secondary"
##   .. ..$ uri150      : chr "https://img.discogs.com/2YB6vAViUfMMtBMfcVEGCMIIGck=/fit-in/150x150/filters:strip_icc():format(jpeg):mode_rgb()"| __truncated__
##   ..$ :List of 6
##   .. ..$ uri         : chr "https://img.discogs.com/EbovJSq-DmlEbQStSDYjUTkUk-o=/fit-in/175x49/filters:strip_icc():format(jpeg):mode_rgb():"| __truncated__
##   .. ..$ height      : int 49
##   .. ..$ width       : int 175
##   .. ..$ resource_url: chr "https://img.discogs.com/EbovJSq-DmlEbQStSDYjUTkUk-o=/fit-in/175x49/filters:strip_icc():format(jpeg):mode_rgb():"| __truncated__
##   .. ..$ type        : chr "secondary"
##   .. ..$ uri150      : chr "https://img.discogs.com/kLTNU4xpYKI-bOg60VXiW-2BKUk=/fit-in/150x150/filters:strip_icc():format(jpeg):mode_rgb()"| __truncated__
##  $ resource_url: chr "https://api.discogs.com/labels/314"
##  $ id          : int 314
##  $ data_quality: chr "Correct"

Hopefully you learnt something new from that output, but it’s not particularly ready for any analysis. I’m quite interested in getting more detailed metadata around Dance Mania’s 12" releases, which can be done with discogs_label_releases().

# get DM releases
dm_releases <- discogs_label_releases(label_id = 314)
  
# extract request content
dm_releases_df <- dm_releases$content %>%
  # return 12" releases only
  filter(grepl('12"', format))

dm_releases_df
## # A tibble: 581 x 10
##    status  thumb     format  title catno  year resource_url artist      id
##    <chr>   <chr>     <chr>   <chr> <chr> <int> <chr>        <chr>    <int>
##  1 Accept… https://… "12\""  Hous… 14040  1987 https://api… The Ho… 6.40e4
##  2 Accept… https://… "12\""  What… 5100…  1985 https://api… The Br… 1.09e6
##  3 Accept… https://… "12\""  Hous… B.C.…  1987 https://api… The Ho… 4.21e5
##  4 Accept… https://… "12\""  Hous… B.C.…  1987 https://api… The Ho… 1.97e5
##  5 Accept… https://… "12\""  Frea… D.J.…  1996 https://api… D.J. F… 6.75e4
##  6 Accept… https://… "12\",… This… D.M.…  1988 https://api… Mello … 5.17e6
##  7 Accept… https://… "12\""  This… D.M.…  1988 https://api… Mello … 4.27e5
##  8 Accept… https://… "12\""  Hous… DM 0…  1986 https://api… The Ho… 7.30e3
##  9 Accept… https://… "12\""  Hard… DM 0…  1987 https://api… Duane … 1.40e3
## 10 Accept… https://… "12\",… Hard… DM 0…  1987 https://api… Duane … 6.52e6
## # ... with 571 more rows, and 1 more variable: label_id <dbl>

The tidy tibble format returned is much better suited to analysis in R, but the content I’m after is not quite there. Turns out community metrics, such as the number of users who own or have marked a record as one they “want”, are available from a records release page (not it’s label release listing). discogs_release() can be used, in conjunction with map() (from the purrr package), to iterate through releases and collect this information.

# get DM release info
dm_releases_info <- map(dm_releases_df$id, discogs_release)

# extract release content
dm_releases_content <- map(dm_releases_info, "content")
  
# extract fields relevant for community metrics analysis
dm_community_metrics <- tibble(
  title = map_chr(dm_releases_content, "title"),
  artist = map_chr(dm_releases_content, "artists_sort"),
  year = map_int(dm_releases_content, "year"),
  have = map_int(dm_releases_content, c("community", "have")),
  want = map_int(dm_releases_content, c("community", "want"))
  ) %>%
  # for each record...
  group_by(title, artist) %>%
  # keep the one most users have...
  filter(have == max(have),
         # ...and rm releases with missing release year
         year >= 1985)

dm_community_metrics
## # A tibble: 317 x 5
## # Groups:   title, artist [317]
##    title                      artist                      year  have  want
##    <chr>                      <chr>                      <int> <int> <int>
##  1 Freaky Style Take: 2       DJ Funk                     1996   170   168
##  2 This X-Mas Rap             Mello D.                    1988     3    23
##  3 This Christmas (Rap)       Mello D.                    1988    30    72
##  4 House Nation               Housemaster Boyz, The And…  1986   967  1113
##  5 Hard Core (On The One)     Duane & Co.                 1987   401   261
##  6 Jack My Body               Yellow House                1987   321   297
##  7 Frequency (Out Of Control) Lil' Louis                  1987     3   379
##  8 Insane                     Suburban Boyz               1988   253   480
##  9 The Original Video Clash   Lil' Louis                  1988   893  1275
## 10 I Want Your Love           Victor Romeo & The Move F…  1988   280   210
## # ... with 307 more rows

With this output, a visualisation can be knocked up to explore which Dance Mania 12-inches Discogs users want or own.

# load viz helpers
library(scico)
library(hrbrthemes)
library(ggrepel)

# plot
ggplot(dm_community_metrics, aes(x=have, y=want)) +
  geom_point(aes(colour=year)) +
  geom_text_repel(aes(label=paste0(artist, " - ", title)), size=2,
                  data = subset(dm_community_metrics, have >= 800 | want >= 1000),
                  family = "IBM Plex Sans") +
  labs(title="Dance Mania 12\"s on Discogs", 
       subtitle="How many Discogs user's own/want Dance Mania 12-inches (1985 - 2018).",
       x = "Own it", y = "Want it",
       caption="source: Discogs | made by @ewen_") +
  theme_ipsum_ps(grid = FALSE) +
  scico::scale_colour_scico(palette = "lajolla") +
  guides(col = guide_colourbar(direction = "horizontal", title = "Release year",
                               barheight = 0.5, barwidth = 10, title.vjust = 1)) +
  theme(legend.position = "bottom")

Close

For more on installation, development status and for (very welcome) contributions or issues, check in at the Github repo. I’ll look to list features I intend to introduce there - feel free to beat me to the punch on any of those.

Music is a language, you see, a universal language. Sun Ra

© Ewen Henderson 2017-18