March 8, 2019

...xPoints?

To anyone paying attention over the last couple of years, some “advanced” football metrics (or expected goals, at least) seem to have successfully navigated the five stages of grief haranguing from the media charged with upholding the “magic” of the game. Imagine peak Lawrenson and Hansen in this scene:

Back in August 2017, StatsBomb’s Ted Knutson predicted that the relentless xG circus would:

Create an entire new generation of highly educated fans and coaches who view the game itself in a more knowledgeable light.

Fantasy Premier League (FPL) is the official fantasy football game of the Premier League, with more than five million “managers” (plus many thousands of passive aggressive workplace league rivalries) and counting. FPL managers, particularly those taking it seriously/semi-seriously, probably represent an early adopter population in Ted’s above prophecy - fans with a vested interest in gaining competitive advantage. It definitely shows… Michael Caley chatting to the FML FPL podcast recently, for one.

I think there’s some appetite around for this stuff. What’s also interesting to me is that, as an application of metrics like expected goals (xG), fantasy football player evaluation is a far simpler optimisation problem than the real thing. Especially for attacking players, FPL player evaluation is predominantly about individual goal contributions (goals and assists) because they result in points (three for an assist, while grabbing a goal equals four or five points for midfielders and forwards respectively). Other contributions to general play can be glossed over for the most part, even the “pre-assist”, hence why the original FPL valuation for N’Golo Kanté (£5.0m) was so much lower than Sadio Mané’s (£9.5m) in the current (2018/19) season.

Therefore, a useful FPL evaluation of attacking players could be made using metrics that quantify activity at the very sharp end of attacking moves, namely variations of:

  • xG: How many goals should a player have scored on average, given the shots they took?
  • xA: How many assists should a player have provided on average, given the passes they made?

These can subsequently be combined to form an expected goals contributions measure, and thus a shorthand for understanding which attacking FPL assets to consider. So, what’s the hold-up?

Some helpers I made earlier 💁

Turns out that getting at the data should be quite simple - I’ve already authored a pair of R packages ready for the job.

Both can be installed from GitHub using another library, remotes:

remotes::install_github(c("ewenme/fplr", "ewenme/understatr"))

fplr helps get data on FPL players…

library(fplr)

fpl_data <- fpl_get_players()

fpl_data
## # A tibble: 606 x 60
##       id photo web_name team_code status   code first_name second_name
##    <int> <chr> <chr>        <int> <chr>   <int> <chr>      <chr>      
##  1     1 1133… Cech             3 Avail…  11334 Petr       Cech       
##  2     2 8020… Leno             3 Avail…  80201 Bernd      Leno       
##  3     3 5150… Kosciel…         3 Avail…  51507 Laurent    Koscielny  
##  4     4 9874… Bellerín         3 Injur…  98745 Héctor     Bellerín   
##  5     5 3841… Monreal          3 Avail…  38411 Nacho      Monreal    
##  6     6 1560… Holding          3 Injur… 156074 Rob        Holding    
##  7     7 6914… Mustafi          3 Avail…  69140 Shkodran   Mustafi    
##  8     8 1114… Kolasin…         3 Avail… 111457 Sead       Kolasinac  
##  9    10 2339… Mavropa…         3 Avail… 233963 Konstanti… Mavropanos 
## 10    11 2733… Lichtst…         3 Avail…  27335 Stephan    Lichtstein…
## # … with 596 more rows, and 52 more variables: squad_number <int>,
## #   news <chr>, now_cost <dbl>, news_added <chr>,
## #   chance_of_playing_this_round <int>,
## #   chance_of_playing_next_round <int>, value_form <dbl>,
## #   value_season <dbl>, cost_change_start <dbl>, cost_change_event <dbl>,
## #   cost_change_start_fall <int>, cost_change_event_fall <int>,
## #   in_dreamteam <lgl>, dreamteam_count <int>, selected_by_percent <dbl>,
## #   form <dbl>, transfers_out <int>, transfers_in <int>,
## #   transfers_out_event <int>, transfers_in_event <int>, loans_in <int>,
## #   loans_out <int>, loaned_in <int>, loaned_out <int>,
## #   total_points <int>, event_points <int>, points_per_game <dbl>,
## #   ep_this <dbl>, ep_next <dbl>, special <lgl>, minutes <int>,
## #   goals_scored <int>, assists <int>, clean_sheets <int>,
## #   goals_conceded <int>, own_goals <int>, penalties_saved <int>,
## #   penalties_missed <int>, yellow_cards <int>, red_cards <int>,
## #   saves <int>, bonus <int>, bps <int>, influence <dbl>,
## #   creativity <dbl>, threat <dbl>, ict_index <dbl>, ea_index <int>,
## #   element_type <int>, team <int>, team_name <chr>, position <chr>

…and understatr can help fetch understat data.

library(understatr)

# get EPL team data
epl_team_stats <- get_league_teams_stats(league_name = "EPL", year = 2018)

# get EPL player data
epl_player_stats <- purrr::map_dfr(unique(epl_team_stats$team_name), get_team_players_stats, year = 2018)

epl_player_stats
## # A tibble: 493 x 19
##    player_id player_name games  time goals    xG assists    xA shots
##        <int> <chr>       <int> <int> <int> <dbl>   <int> <dbl> <int>
##  1       714 Gylfi Sigu…    29  2364    11 8.68        3 4.43     65
##  2      6026 Richarlison    27  2084    10 8.12        1 1.03     58
##  3      5555 Dominic Ca…    27  1060     5 3.40        1 0.660    26
##  4       503 Theo Walco…    28  1932     3 5.19        2 2.78     33
##  5      1823 Lucas Digne    26  2160     3 0.931       3 6.30     26
##  6      6477 Cenk Tosun     21   925     2 3.67        3 0.438    31
##  7       585 Seamus Col…    22  1934     1 0.536       1 2.11     10
##  8       935 Kurt Zouma     24  1893     1 0.880       2 0.443    14
##  9      1653 Michael Ke…    26  2340     1 1.77        2 1.18     25
## 10      2383 André Gomes    21  1548     1 1.14        1 0.553    14
## # … with 483 more rows, and 10 more variables: key_passes <int>,
## #   yellow_cards <int>, red_cards <int>, position <chr>, team_name <chr>,
## #   npg <int>, npxG <dbl>, xGChain <dbl>, xGBuildup <dbl>, year <int>

There isn’t a common identifier for joining across datasets, so a bit of prep is required to help join the datasets (e.g. standardising team/player names). Names are annoying, so I missed off some recoding of non-descript (read: little minutes) attacking players.1 Sorry.

Once that’s down, the data can be merged and the attackers with reasonable minutes (at least 900) isolated. At the same time, we can put together the metrics of interest discussed above.

# join understat/fpl data
player_stats <- left_join(
  select(fpl_data, id, web_name, status, now_cost, cost_change_start, 
         total_points, minutes, team_name, position), 
  select(epl_player_stats, -year, -yellow_cards:-position), 
  by = c("web_name", "team_name") 
  ) %>%
  # select midfielders/forwards w/ good mins
  filter(position %in% c("Midfielder", "Forward"), minutes >= 900, 
         status != "Unavailable") %>% 
  # metric calculations
  mutate_at(c("goals", "xG", "assists", "xA", "shots", "npg", "npxG"),
            list(`90` = ~(./minutes)*90)) %>% 
  mutate(xGA_90 = xG_90 + xA_90)

player_stats
## # A tibble: 149 x 31
##       id web_name status now_cost cost_change_sta… total_points minutes
##    <int> <chr>    <chr>     <dbl>            <dbl>        <int>   <int>
##  1    13 Özil     Avail…      7.9             -0.6           70    1185
##  2    14 Ramsey   Avail…      7.1             -0.4           73    1051
##  3    15 Iwobi    Avail…      5.4             -0.1           76    1519
##  4    17 Xhaka    Avail…      5.2             -0.3           69    2062
##  5    18 Mkhitar… Avail…      6.8             -0.2           85    1156
##  6    22 Lacazet… Avail…      9.5              0            138    1884
##  7    23 Aubamey… Avail…     10.9             -0.1          157    2119
##  8   450 Torreira Suspe…      4.8             -0.2           72    2072
##  9   451 Guendou… Avail…      4.4             -0.1           38    1688
## 10    35 Surman   Avail…      4.6             -0.4           33    1348
## # … with 139 more rows, and 24 more variables: team_name <chr>,
## #   position <chr>, player_id <int>, player_name <chr>, games <int>,
## #   time <int>, goals <int>, xG <dbl>, assists <int>, xA <dbl>,
## #   shots <int>, key_passes <int>, npg <int>, npxG <dbl>, xGChain <dbl>,
## #   xGBuildup <dbl>, goals_90 <dbl>, xG_90 <dbl>, assists_90 <dbl>,
## #   xA_90 <dbl>, shots_90 <dbl>, npg_90 <dbl>, npxG_90 <dbl>, xGA_90 <dbl>

Let’s get some of these numbers on the board, already. I’ll start with a simple scatter plot of xG90 vs xA90, and also mark out players’ FPL positions (PS - the chart styles come courtesy of another personal pkg, ewenthemes).

This is starting to look useful for an FPL manager. It’s possible to spot players who are regularly creating and getting goalscoring opportunities. A bit more work is needed to actually identify who the best picks are, based on these metrics.

Speaking the same language 💬

First, it’s probably more helpful to express these expected contributions in terms that are familiar to FPL managers, and take account of the different rewards based on player position. Remember, goals turn into different points for midfielders (5) and forwards (4). Expected assists can similarly be expressed as a rate of FPL points. How does the rate of attacking points scoring stack up?

Now it’s easier to differentiate a Mo’ and a Kun, for example. Still, crucial information has been left out thus far - player price. Each FPL player is attributed a cost in £millions. By adjusting for this, players who represent the best value can be unearthed.

Ashley Barnes and Danny Ings, Turf Moor behemoths present and past, top the charts here.

Reprise 🔮

This was a fairly quick run-through of how advanced football metrics can be expressed in fantasy football terms, using some bits of kit I’ve built and shared that I felt guilty about not using much. There are things I think could be explored to make this better:

  • A linearly optimized fantasy team (h/t Martin Eastwood), but an expected points variant.
  • Predicting expected clean sheet rate
    • Clean sheets are v important in FPL. Modelling teams’ expected clean sheet rate would allow for an expected points linear optimization for an entire squad.
  • Account for penalties
    • The xG figures in here include penalties. Preferable would be non-penalty goals with pens captured differently.

  1. To keep the post concise I don’t show all of the code, especially code that generates figures. But you can find all the code here.