December 10, 2018

understatr

understat is a treasure trove of football statistics and advanced metrics. All of your “old man yells at cloud” conjuring household favourites are here. Democratized access to xG for casuals everywhere, so we can do unholy things like this.

While the site itself is easy to navigate and features nice off-the-shelf exploratory viz tools (à la the above radar), many (myself included) can get a bit fidgety without access to all this stuff in a structured format for analysis.

So, I made a thing to grab understat data and tidy it up, so you can get on with the interesting stuff. Say 👋 to understatr, an R package that makes pulling xG numbers as easy as 🥧

# load package
library(understatr)

# get dortmund playing squad data for this season
dortmund <- get_team_players_stats(team_name = "Borussia Dortmund", year = 2018)

dortmund
## # A tibble: 23 x 19
##    player_id player_name games  time goals     xG assists    xA shots
##        <int> <chr>       <int> <int> <int>  <dbl>   <int> <dbl> <int>
##  1       319 Marco Reus     19  1665    13 12.4         6 4.56     52
##  2      2380 Paco Alcác…    18   750    12  6.39        0 1.30     36
##  3      6345 Jadon Sanc…    23  1536     8  4.08       10 5.74     23
##  4       422 Mario Götze    16   894     4  2.93        4 3.10     16
##  5      3059 Axel Witsel    23  1973     3  1.95        1 1.84     29
##  6      3455 Raphael Gu…    14   908     2  2.37        5 2.60     20
##  7      5197 Achraf Hak…    17  1524     2  1.56        4 2.33     17
##  8      5355 Jacob Bruu…    16   864     2  2.26        1 1.77     15
##  9       205 Mahmoud Da…    10   604     1  0.972       0 0.896    10
## 10       310 Lukasz Pis…    17  1513     1  0.382       6 3.00     10
## # … with 13 more rows, and 10 more variables: key_passes <int>,
## #   yellow_cards <int>, red_cards <int>, position <chr>, team_name <chr>,
## #   npg <int>, npxG <dbl>, xGChain <dbl>, xGBuildup <dbl>, year <int>

You can use this in conjunction with the tidyverse to get from data to viz lighting-quick.

library(tidyverse)
library(ewenthemes) # (a personal chart themes pkg)

dortmund %>% 
  # remove players w/ zero xG
  filter(xG > 0) %>% 
  # chart xG in desc order
  ggplot(aes(x = reorder(player_name, xG), y = xG)) +
  # make it a bar chart
  geom_col() +
  # flip the bars
  coord_flip() +
  # add some labels (making sure to credit understat!)
  labs(title = "Expected goals contributions", subtitle = "Borussia Dortmund, 2018/19",
       x = NULL, caption = "source: understat.com | @ewen_") +
  # add my personal chart theme
  theme_ewen_ws(grid = "X", axis = FALSE, axis_text_size = 9) +
  theme(axis.text.y = element_text(family = "Work Sans Light"))

Peep the project’s GitHub page for updates, for now at least. As noted in the readme, while non-commercial use of the data is fine for now, I don’t own it and similarly don’t control changes to the website. I say this to say that understatr may break (or become illegal) in future, so enjoy it while you can 🎈 I’d be psyched to hear about people using it, or even contributing to it. Esp. here for the hottest of hot take Messi radar trolls, please cite the proj in all of those.