Discogger (Day One)
I’ve been working on a new R package.
discogger
is an open-source effort,
at an experimental stage of development, providing straightforward tools to help
out with Discogs’ API v2.0.
Discogs is a crowd-sourced music database and
marketplace, a proper treasure trove for record diggers and music lovers to
learn more about what they’re jamming to. A better route into this data for R
users should yield some nice results.
discogger 101
First up, you’ll need to register a Discogs application via
https://www.discogs.com/settings/developers. Doing so will grant you a
“personal access token” (accessible at the same link above) which you can store
as an environment variable (DISCOGS_API_TOKEN) using the
discogs_api_token()
function. All functions in the discogger
package will automatically look for
your token and prompt entry if it isn’t found.
The discogger
interface to Discogs data currently spans
Database (releases,
artists, labels) and
User Collection API
requests, with a view to growing functionality in these spaces and eventually
expanding into
Marketplace (inventory,
orders). The next section will demo how these functions can be put in to use,
and chained together for quietly impressive analyses.
Diggin’ in to Dance Mania
Dance Mania is about as iconic as it gets when
it comes to Chicago House record labels (you hopefully got a flavour by clicking
on the name…). discogger
is well equipped to retrieve record label
information, with
discogs_label()
first up.
# load packages
library(discogger)
library(tidyverse)
# get DM label info
discogs_label(label_id = 314)
#> <Discogs labels/314>
#> Preview: 1 of 12 results.
#> List of 1
#> $ id: int 314
Hopefully you learnt something new from that output, but it’s not particularly
ready for any analysis. I’m quite interested in getting more detailed metadata
around Dance Mania’s 12" releases, which can be done with
discogs_label_releases()
.
# get DM releases
dm_releases <- discogs_label_releases(label_id = 314)
# extract request content
dm_releases_df <- dm_releases$content %>%
# return 12" releases only
dplyr::filter(grepl('12"', format)) %>%
as_tibble()
dm_releases_df
#> # A tibble: 599 × 14
#> status format catno thumb resource_url title id year artist
#> <chr> <chr> <chr> <chr> <chr> <chr> <int> <int> <chr>
#> 1 Accepted "12\"" 14040 https:/… https://api.… Hous… 6.40e4 1987 The Hou…
#> 2 Accepted "12\"" 51002… https:/… https://api.… What… 1.09e6 1985 The Bro…
#> 3 Accepted "12\"" B.C. … https:/… https://api.… Hous… 4.21e5 1987 The Hou…
#> 4 Accepted "12\"" B.C. … https:/… https://api.… Hous… 1.97e5 1987 The Hou…
#> 5 Accepted "12\"" BASIC… https:/… https://api.… Akce… 3.64e4 1998 DJ Deeon
#> 6 Accepted "12\"" D.J. … https:/… https://api.… Frea… 6.75e4 1996 D.J. Fu…
#> 7 Accepted "12\", TP" D.M. … https:/… https://api.… This… 5.17e6 1988 Mello D.
#> 8 Accepted "12\"" D.M. … https:/… https://api.… This… 4.27e5 1988 Mello D.
#> 9 Accepted "12\"" DM 003 https:/… https://api.… Hous… 7.30e3 1986 The Hou…
#> 10 Accepted "12\"" DM 004 https:/… https://api.… Hard… 1.4 e3 1987 Duane &…
#> # … with 589 more rows, and 5 more variables:
#> # stats.community.in_wantlist <int>, stats.community.in_collection <int>,
#> # stats.user.in_wantlist <int>, stats.user.in_collection <int>,
#> # label_id <dbl>
The tidy tibble format returned is much better suited to analysis in R, but the
content I’m after is not quite there. Turns out community metrics, such as the
number of users who own or have marked a record as one they “want”, are
available from a records release page (not it’s label release listing).
discogs_release()
can be used, in conjunction with
map()
(from the
purrr
package), to iterate through releases
and collect this information.
# get DM release info
dm_releases_info <- map(dm_releases_df$id, discogs_release)
# extract release content
dm_releases_content <- map(dm_releases_info, "content")
# extract fields relevant for community metrics analysis
dm_community_metrics <- tibble(
title = map_chr(dm_releases_content, "title"),
artist = map_chr(dm_releases_content, "artists_sort"),
year = map_int(dm_releases_content, "year"),
have = map_int(dm_releases_content, c("community", "have")),
want = map_int(dm_releases_content, c("community", "want"))
) %>%
# for each record...
group_by(title, artist) %>%
filter(
# keep the one most users have...
have == max(have),
# ...and rm releases with missing release year
year >= 1985
) %>%
ungroup()
dm_community_metrics
#> # A tibble: 313 × 5
#> title artist year have want
#> <chr> <chr> <int> <int> <int>
#> 1 What's That Browns, The 1985 86 276
#> 2 Akceier 8 DJ Deeon 1998 214 82
#> 3 Freaky Style Take: 2 DJ Funk 1996 221 257
#> 4 This X-Mas Rap Mello D. 1988 3 36
#> 5 This Christmas (Rap) Mello D. 1988 41 98
#> 6 House Nation Housemaster Boyz, The And Rude … 1986 1381 1494
#> 7 Hard Core (On The One) Duane & Co. 1987 538 344
#> 8 Jack My Body Yellow House 1987 402 359
#> 9 Frequency (Out Of Control) Lil' Louis 1987 4 460
#> 10 Insane Suburban Boyz 1988 340 608
#> # … with 303 more rows
With this output, a visualisation can be knocked up to explore which Dance Mania 12-inches Discogs users want or own.
# load viz helpers
library(scico)
library(ewenthemes)
library(ggrepel)
# plot
ggplot(dm_community_metrics, aes(x=have, y=want)) +
geom_point(aes(colour=year)) +
geom_text_repel(aes(label=paste0(artist, " - ", title)), size=3,
data = subset(dm_community_metrics, have >= 800 | want >= 1000),
family = "IBM Plex Sans") +
labs(title="Dance Mania 12\"s on Discogs",
subtitle="How many Discogs user's own/want Dance Mania (1985 - 2018) 12-inches",
x = "Own it", y = "Want it",
caption="source: Discogs | made by @ewen_") +
theme_ewen_rs(grid = FALSE, subtitle_size = 11, subtitle_margin = 20) +
scico::scale_colour_scico(palette = "lajolla") +
guides(col = guide_colourbar(direction = "horizontal", title = "Release year",
barheight = 0.5, barwidth = 10, title.vjust = 1)) +
theme(legend.position = "bottom")

Close
For more on installation, development status and for (very welcome) contributions or issues, check in at the Github repo. I’ll look to list features I intend to introduce there - feel free to beat me to the punch on any of those.
Music is a language, you see, a universal language. Sun Ra