December 10, 2018

understatr

understat is a treasure trove of football statistics and advanced metrics. All of your “old man yells at cloud” conjuring household favourites are here. Democratized access to xG for casuals everywhere, so we can do unholy things like this.

While the site itself is easy to navigate and features nice off-the-shelf exploratory viz tools (à la the above radar), many (myself included) can get a bit fidgety without access to all this stuff in a structured format for analysis.

So, I made a thing to grab understat data and tidy it up, so you can get on with the interesting stuff. Say 👋 to understatr, an R package that makes pulling xG numbers as easy as 🥧

# load package
library(understatr)

# get dortmund playing squad data for this season
dortmund <- get_team_squad_stats(team_name = "Borussia Dortmund", year = 2018)

dortmund
## # A tibble: 23 x 19
##    team_name  year player_name player_id games  time goals     xG assists
##  * <chr>     <int> <chr>           <int> <int> <int> <int>  <dbl>   <int>
##  1 Borussia…  2018 Paco Alcác…      2380     9   354    10 4.11         0
##  2 Borussia…  2018 Marco Reus        319    14  1215     9 7.72         5
##  3 Borussia…  2018 Jadon Sanc…      6345    14   810     5 2.72         6
##  4 Borussia…  2018 Jacob Bruu…      5355    11   775     2 2.34         1
##  5 Borussia…  2018 Mahmoud Da…       205     7   501     1 0.972        0
##  6 Borussia…  2018 Lukasz Pis…       310    10   900     1 0.297        3
##  7 Borussia…  2018 Julian Wei…       372     3   154     1 0.0980       0
##  8 Borussia…  2018 Mario Götze       422     7   320     1 1.09         1
##  9 Borussia…  2018 Christian …      2662     9   393     1 0.787        2
## 10 Borussia…  2018 Marius Wolf      2693     4   260     1 0.234        0
## # ... with 13 more rows, and 10 more variables: xA <dbl>, shots <int>,
## #   key_passes <int>, yellow_cards <int>, red_cards <int>, position <chr>,
## #   npg <int>, npxG <dbl>, xGChain <dbl>, xGBuildup <dbl>

You can use this in conjunction with the tidyverse to get from data to viz lighting-quick.

library(tidyverse)
library(ewenthemes) # (a personal chart themes pkg)

dortmund %>% 
  # remove players w/ zero xG
  filter(xG > 0) %>% 
  # chart xG in desc order
  ggplot(aes(x = reorder(player_name, xG), y = xG)) +
  # make it a bar chart
  geom_col() +
  # flip the bars
  coord_flip() +
  # add some labels (making sure to credit understat!)
  labs(title = "Expected goals contributions", subtitle = "Borussia Dortmund, 2018/19",
       x = NULL, caption = "source: understat.com | @ewen_") +
  # add my personal chart theme
  theme_ewen_ws(grid = "X", axis = FALSE, axis_text_size = 9) +
  theme(axis.text.y = element_text(family = "Work Sans Light"))

Peep the project’s GitHub page for updates, for now at least. As noted in the readme, while non-commercial use of the data is fine for now, I don’t own it and similarly don’t control changes to the website. I say this to say that understatr may break (or become illegal) in future, so enjoy it while you can 🎈 I’d be psyched to hear about people using it, or even contributing to it. Esp. here for the hottest of hot take Messi radar trolls, please cite the proj in all of those.