May 30, 2017

Tracking London's Pub & Bar Landscape with geofacet

Introducing geofacet

The geofacet package dropped recently (I want to say last week?), a ggplot2 extension allowing for the user to facet plots in a way that can retain an underlying geographical dimension. It perhaps goes a little bit further towards abstraction than tilegrams, but the visualizations we produce can be a bit more complex as well. This post documents my first foray into this packages functionality.1

Getting to grips with grids

If we think about faceting in ggplot2 (facet_wrap, facet_grid), we are already familiar with the idea of placing a sequence of plots into a number of panels, arranged according to a variable(s) of interest.

geofacet is based on the assumption that we want to arrange our sequence of plots into a grid that preserves some known geographical orientation. These known geographical orientations are stored as grids, which provide us with row and column values corresponding to a panel’s position in a faceted plot. Take this US states example, providing grid positions for US states:

head(geofacet::us_state_grid1)
##   row col code       name
## 1   6   7   AL    Alabama
## 2   7   2   AK     Alaska
## 3   5   2   AZ    Arizona
## 4   5   5   AR   Arkansas
## 5   4   1   CA California
## 6   4   3   CO   Colorado

This grid also contains geographical data (state codes/names). We can simply join our data, that sits nicely within this geographical framework, to this grid and start ‘geofaceting’.

There is a function in the package, submit_grid, which allows users to submit their own grids (as a dataframe, like in the above example) as a github issue and hopefully get them incorporated into the package proper. Having kicked off with just the US states, there’s now grids for EU countries, South African provinces, Australian states and London boroughs. Nice - my business is with the latter…

Geofaceting London

A couple of months ago, a dataset was published containing the number of public houses (pubs) and bars by local authority (borough) in London from 2001 through 2016. We’ll use this to demonstrate how this package can help us visualize trends in data without losing the geography along the way.

Below is a map of London’s boroughs, in case you aren’t familiar.

With that cleared up, let’s load up this data and get it into shape.

library(geofacet)
library(tidyverse)
library(gdata)
library(stringr)
library(viridis)
library(scales)
library(hrbrthemes)

#read/clean pubs data
pubs <- read.xls("http://files.datapress.com/london/dataset/the-number-of-public-houses-and-bars-in-london-2001-2016/2017-04-13T14:37:50.73/numberofpublichousesandbarsinlondon2001to2016.xls",
                 sheet = "Number of workplaces by LA", skip=2, stringsAsFactors=FALSE) %>%
  filter(!`X.1` %in% c("", "London")) %>%
  gather(key=year, value=count, -X, -`X.1`) %>%
  rename(code_ons=X, area_name=`X.1`) %>%
  mutate(count=as.numeric(count), year=as.numeric(str_replace(year, "X", ""))) %>%
  # add year-on-year pct change
  arrange(area_name) %>%
  group_by(area_name) %>%
  mutate(prev=lag(count, order_by=year, default=first(count)),
         yoy_chg=(count-prev)/prev)

glimpse(pubs)
## Observations: 528
## Variables: 6
## $ code_ons  <chr> "E09000002", "E09000002", "E09000002", "E09000002", ...
## $ area_name <chr> "Barking and Dagenham", "Barking and Dagenham", "Bar...
## $ year      <dbl> 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009...
## $ count     <dbl> 45, 40, 45, 40, 40, 40, 30, 30, 25, 25, 25, 30, 20, ...
## $ prev      <dbl> 45, 45, 40, 45, 40, 40, 40, 30, 30, 25, 25, 25, 30, ...
## $ yoy_chg   <dbl> 0.00000000, -0.11111111, 0.12500000, -0.11111111, 0....

So, we have a tidy dataframe with yearly counts and year-on-year change in pub numbers by borough. Let’s peek at the existing geofacet grid for London boroughs.

glimpse(london_boroughs_grid)
## Observations: 33
## Variables: 4
## $ row      <int> 4, 4, 2, 5, 3, 6, 3, 6, 3, 1, 5, 3, 4, 2, 2, 3, 3, 4,...
## $ col      <int> 5, 8, 4, 8, 3, 6, 4, 5, 2, 5, 7, 6, 2, 5, 3, 8, 1, 1,...
## $ code_ons <chr> "E09000001", "E09000002", "E09000003", "E09000004", "...
## $ name     <chr> "City of London", "Barking and Dagenham", "Barnet", "...

We’re ready to join these on the ONS code field. I’ll also put in some line breaks to the borough names - I think some of these might be too long for a densely paneled plot, otherwise.

#clean boro strings in pubs df
pubs$area_name <- str_replace(pubs$area_name, " ", "\n")

#clean boro strings in geofacet grid df
london_boroughs_grid$name <- str_replace(london_boroughs_grid$name, " ", "\n")

# join pubs w/ grid
pubs <- inner_join(pubs, london_boroughs_grid, by ="code_ons") %>%
  select(-name)

Ready to plot. All we need to do is call facet_geo in the same way we might have faceted by borough using facet_wrap previously. This has to be accompanied by a grid argument, specifying the grid we want to facet with. We can go on to tweak the plot using familiar ggplot2 commands.

Here’s my first effort:

# geofaceted plot of year-on-year pubs % change 
ggplot(data = pubs, aes(x=year, y=yoy_chg, fill=yoy_chg)) +
  geom_col() +
  geom_hline(yintercept = 0) +
  facet_geo( ~ area_name, grid = "london_boroughs_grid") +
  scale_x_continuous(breaks = c(2001, 2016), 
                     labels = NULL) +
  scale_y_percent(limits=c(-0.35,0.25), breaks=c(-0.3, 0, 0.25), labels = NULL) +
  labs(title = "The Evolution of London's Pubs & Bars",
       subtitle = "No. of Public Houses and Bars by London borough, 2001-2016",
       caption = "Source: data.london.gov.uk", x = "", y = "") +
  theme_minimal(base_family = "Arial Narrow", base_size = 14) +
  scale_fill_viridis(option="D",limits=c(-0.35,0.25), name="Year-on-year\n% change",
                     labels=percent) +
  theme(legend.position=c(.9, .95))

Close

As the package author, Ryan Hafen accepts that some of these plot’s readability does rely on some pre-existing knowledge of the underlying geography (Ryan also says to watch out for some ideas to deal with this). Bearing this in mind, there are opportunities to make eye-catching plots on the occasions when this assumption holds (see this New York Times example). I reckon I’m gonna have fun with it…


  1. To keep the post concise I don’t show all of the code, especially code that generates figures. But you can find the full code here.

© Ewen Henderson 2017-18