Milos Popovic

Using API to make maps in R

January 30, 2022 | Milos Popovic

There’s been a new kid in programming town and their name is API. In reality, API goes by the terrifying name of Application Programming Interface. It’s a web interface used to communicate with servers and access data for free in raw format. APIs allow you to retrieve data in a quick and safe way without banging your head against brick wall with poorly formatted csv files. This makes it an ideal solution for data analysis, visualization or app development. Here you can find literally thousands of public APIs from which you can retrieve raw data on, for example, NBA stats, weather conditions or travel feeds.

The good news is that APIs are built around the HTTP protocol, which means that we can use programming language like R to fetch some cool data 😎. So, in this tutorial, we’ll dive into the World Health Organization API, Global Health Observatory (GHO), the host to over 2k indicators on environment, health and safety. Incredible! 😲

APIs in R

We start off by loading usual culprits for data wrangling and visualization with two newcomers: httr and jsonlite. We’ll use the former to make a request to the WHO server and retrieve info about the available indicators, while the latter will come in handy to parse info from the returned JSON file.

if(!require("httr")) install.packages("httr") 
if(!require("jsonlite")) install.packages("jsonlite")  
if(!require("sf")) install.packages("sf")
if(!require("tidyverse")) install.packages("tidyverse")  
if(!require("grid")) install.packages("grid")  
if(!require("gridExtra")) install.packages("gridExtra") 
if(!require("giscoR")) install.packages("giscoR") 
if(!require("classInt")) install.packages("classInt")   

library(httr, quietly=T) #send request to API
library(jsonlite, quietly=T) #JSON parser and generator    
library(tidyverse, quietly=T) #data wrangling
library(sf, quietly=T) #geospatial analysis
library(grid, quietly=T) #make grid
library(gridExtra, quietly=T) #make grid
library(giscoR, quietly=T) #shapefile of Europe
library(classInt, quietly=T) #bins

windowsFonts(georg = windowsFont('Georgia'))

To make a request, we’ll use GET() followed by an url of the the GHO portal. We received a file with info on all the indicators, including the date of access as well as data type and size.

res <- GET("https://ghoapi.azureedge.net/api/Indicator")
res

Response [https://ghoapi.azureedge.net/api/Indicator]
  Date: 2022-01-30 09:57
  Status: 200
  Content-Type: application/json; odata.metadata=minimal; odata.streaming=true; charset=utf-8
  Size: 295 kB

At this stage, we only have a list of indicators. The list is composed of two elements and one of them, value, seems to store the indicator code and name.

indicators <- fromJSON(rawToChar(res$content))
str(indicators)

List of 2
 $ @odata.context: chr "https://ghoapi.azureedge.net/api/$metadata#Indicator"
 $ value         :'data.frame':	2251 obs. of  3 variables:
  ..$ IndicatorCode: chr [1:2251] "AIR_10" "AIR_11" "AIR_12" "AIR_13" ...
  ..$ IndicatorName: chr [1:2251] "Ambient air pollution  attributable DALYs per 100'000 children under 5 years" "Household air pollution attributable deaths" "Household air pollution attributable deaths in children under 5 years" "Household air pollution attributable deaths per 100'000 capita" ...
  ..$ Language     : chr [1:2251] "EN" "EN" "EN" "EN" ...

Let’s extract the value component and store it into a new object data. Note that data is not a list anymore, but a dataframe. This provides a neat overview of the indicators avaiable at the GHO Portal.

data <- indicators$value
head(data)
>   IndicatorCode
1        AIR_10
2        AIR_11
3        AIR_12
4        AIR_13
5        AIR_14
6        AIR_15
                                                                    IndicatorName
1    Ambient air pollution  attributable DALYs per 100'000 children under 5 years
2                                     Household air pollution attributable deaths
3           Household air pollution attributable deaths in children under 5 years
4                  Household air pollution attributable deaths per 100'000 capita
5 Household air pollution  attributable deaths per 100'000 children under 5 years
6                                      Household air pollution attributable DALYs
  Language
1       EN
2       EN
3       EN
4       EN
5       EN
6       EN

There are 2251 indicators and we’ll explore traffic death rate, as it’s the hottest topic in road safety. We use grep to filter indicators that include “road traffic death” in their name. Then we select rows in our data that match this condition. We’ll be using the Estimated road traffic death rate (per 100 000 population) dataframe from the list of available dataframes.

rtd <- grep("road traffic death", data$IndicatorName, perl=T)
sel_rows <- data[rtd,]
sel_rows

IndicatorCode                                                IndicatorName
1084        RS_196                      Estimated number of road traffic deaths
1085        RS_198   Estimated road traffic death rate (per 100 000 population)
1089        RS_208            Attribution of road traffic deaths to alcohol (%)
1104        RS_246 Distribution of road traffic deaths by type of road user (%)
     Language
1084       EN
1085       EN
1089       EN
1104       EN

In the final step, we go back to the beginning of this tutorial and request the traffic data using GET() followed by an url of the indicator.

You may wonder why we took all those intermediary steps only to return to the beginning. The aim was to show you how to access and filter the whole list of available GHO indicators programmatically in R instead of going through the glossary on the website. Naturally, next time you might want to select a specific indicator at the outset and the previous chunks will help you do so in a straightforward way. You might also find more detailed instructions on how to use the GHO API here.

traffic_deaths <- GET("https://ghoapi.azureedge.net/api/RS_198")
traffic_d <- fromJSON(rawToChar(traffic_deaths$content))

Data wrangling

We just need to make a few more transformations before map-making. First, let’s grab the value element from our traffic_d list and inspect the dataframe.

td <- traffic_d$value
head(td)
        Id IndicatorCode SpatialDimType SpatialDim TimeDimType TimeDim Dim1Type
1 25257658        RS_198        COUNTRY        AFG        YEAR    2017      SEX
2 25257659        RS_198        COUNTRY        AFG        YEAR    2004      SEX
3 25257660        RS_198        COUNTRY        AFG        YEAR    2000      SEX
4 25257661        RS_198        COUNTRY        AFG        YEAR    2011      SEX
5 25257662        RS_198        COUNTRY        AFG        YEAR    2017      SEX
6 25257663        RS_198        COUNTRY        AFG        YEAR    2002      SEX
  Dim1 Dim2Type Dim2 Dim3Type Dim3 DataSourceDimType DataSourceDim
1 FMLE       NA   NA       NA   NA                NA            NA
2  MLE       NA   NA       NA   NA                NA            NA
3  MLE       NA   NA       NA   NA                NA            NA
4  MLE       NA   NA       NA   NA                NA            NA
5  MLE       NA   NA       NA   NA                NA            NA
6  MLE       NA   NA       NA   NA                NA            NA
             Value NumericValue      Low     High Comments
1    6.0 [5.2-6.8]      5.99219  5.17048  6.81715       NA
2 22.8 [19.0-26.5]     22.76968 19.01817 26.51513       NA
3 22.3 [18.3-26.2]     22.26858 18.34848 26.19614       NA
4 21.9 [18.6-25.3]     21.90351 18.55693 25.25520       NA
5 23.8 [20.5-27.0]     23.76755 20.50830 27.03967       NA
6 22.7 [18.8-26.5]     22.65267 18.80891 26.48972       NA
                           Date TimeDimensionValue        TimeDimensionBegin
1 2021-02-09T16:20:04.763+01:00               2017 2017-01-01T00:00:00+01:00
2 2021-02-09T16:20:04.793+01:00               2004 2004-01-01T00:00:00+01:00
3  2021-02-09T16:20:04.81+01:00               2000 2000-01-01T00:00:00+01:00
4  2021-02-09T16:20:04.84+01:00               2011 2011-01-01T00:00:00+01:00
5 2021-02-09T16:20:04.857+01:00               2017 2017-01-01T00:00:00+01:00
6 2021-02-09T16:20:04.887+01:00               2002 2002-01-01T00:00:00+01:00
           TimeDimensionEnd
1 2017-12-31T00:00:00+01:00
2 2004-12-31T00:00:00+01:00
3 2000-12-31T00:00:00+01:00
4 2011-12-31T00:00:00+01:00
5 2017-12-31T00:00:00+01:00
6 2002-12-31T00:00:00+01:00

Ooops, it seems there are a bunch of columns and that the data is organized on the country-sex-year level. We only need info on both sexes for every country and most recent year.

So let’s first filter the dataframe to include figures for both sexes denoted as “BTSX”. Then we’ll select only ISO3 (SpatialDim), year (SpatialDim), mean point estimate of traffic deaths (NumericValue) as well as lower and upper bounds. Next, we’ll group by ISO3 code and take only the last available year via slice(which.max(TimeDim)). Finally, we’ll rename the columns to more familiar names.

td <- td %>%
	  filter(Dim1 == "BTSX") %>%
	  select(SpatialDim, TimeDim, NumericValue, Low, High) %>%
	  group_by(SpatialDim) %>% 
	  slice(which.max(TimeDim)) %>%
	  rename(ISO3 = SpatialDim,
         year = TimeDim,
         value = NumericValue)

And here is the dataframe!

head(td)
# A tibble: 6 x 5
# Groups:   ISO3 [6]
  ISO3   year value   Low  High
  <chr> <int> <dbl> <dbl> <dbl>
1 AFG    2019 15.9  13.7   18.0
2 AFR    2019 27.2  21.9   32.6
3 AGO    2019 26.1  19.9   32.4
4 ALB    2019 11.7  10.8   12.6
5 AMR    2019 15.3  12.7   18.2
6 ARE    2019  8.91  7.65  10.1

Mapping

Let’s fetch the national map of Europe using giscoR

europe <- giscoR::gisco_get_countries(
  year = "2016",
  epsg = "4326",
  resolution = "10",
  region = "Europe"
)

Next thing is to define the WGS84 and Lambert projections. We’ll use the WGS84 projection to create the bounding box and then convert it to Lambert projection, which is more suitable for mapping Europe.

crsLONGLAT <- "+proj=longlat +datum=WGS84 +no_defs"
crsLAEA <- "+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +datum=WGS84 +units=m +no_defs"


bb <- st_sfc(
  st_polygon(list(cbind(
    c(-10.6600, 33.00, 33.00, -10.6600, -10.6600),
    c(32.5000, 32.5000, 71.0500, 71.0500, 32.5000) 
    ))),
  crs = crsLONGLAT)

laeabb <- st_transform(bb, crs = crsLAEA)
b <- st_bbox(laeabb)

In the following chunk, we combine the traffic accident dataframe with our Europe spatial object.

c <- right_join(td, europe, by=c("ISO3"="ISO3_CODE")) %>%
     st_as_sf() %>%
     st_transform(crs = crsLAEA)

As usual, we’ll define our breaks using Jenks breaks and provide a customized palette.

# bins
brk <- round(classIntervals(c$value, 
              n = 6, 
              style = 'equal')$brks, 0)

# define the color palette
cols = rev(c('#05204d', '#004290', '#6458ae', '#dd98d1', '#eab0a2'))
newcol <- colorRampPalette(cols)
ncols <- 7
cols2 <- newcol(ncols)
vmin <- min(c$value, na.rm=T)
vmax <- max(c$value, na.rm=T)

We’re almost ready to make our map! But there is one more thing left and that’s annotating the country polygons with traffic death rate figures. Ideally, we want the figures to be placed in the middle of each country. We run a simple function st_centroid() that calculates the centroids.

cents <- c %>% st_centroid() %>%  
  as_Spatial() %>%                  
  as.data.frame()

However, France and Russia have a much larger territory and their centroids end up misplaced on the map. So, we got to calculate their centroids using the coordinates of a more convenient place, i.e. their capitals. Both Paris and Moscow will be visible in the map. We first declare the WGS84 coordinates for the two capitals and then transform them into Lambert projection. Finally, we replace the old coordinate values for Paris and Moscow in the centroid dataframe.

moscow <- data.frame(long = 37.61556, lat = 55.75222)
paris <- data.frame(long = 2.3488, lat = 48.85341)

mc <- st_as_sf(x = moscow,                         
           coords = c("long", "lat"),
           crs = 4326) %>%
     st_transform(crs = crsLAEA) %>%  
     as_Spatial() %>%                  
     as.data.frame()

pa <- st_as_sf(x = paris,                         
           coords = c("long", "lat"),
           crs = 4326) %>%
     st_transform(crs = crsLAEA) %>%  
     as_Spatial() %>%                  
     as.data.frame()

#plug into centroids
cn <- cents %>% 
  mutate(coords.x1=ifelse(CNTR_ID=="RU", mc$coords.x1,coords.x1),
         coords.x2=ifelse(CNTR_ID=="RU", mc$coords.x2,coords.x2)) %>%
  mutate(coords.x1=ifelse(CNTR_ID=="FR", pa$coords.x1,coords.x1),
         coords.x2=ifelse(CNTR_ID=="FR", pa$coords.x2,coords.x2))       

In the chunk below, we use the spatial data with info on traffic road death rate and limit the view to a bounding box. We transform the spatial polygons into Lambert projection. After providing the colors and breaks we move to printing the values for every country.

p1 <- 
ggplot() +
geom_sf(data = c, 
  aes(fill=value), 
  color="white", 
  size=0.25) +
coord_sf(crs = crsLAEA, 
  xlim = c(b["xmin"], b["xmax"]), 
  ylim = c(b["ymin"], b["ymax"])) +
labs(y="", 
    subtitle="",
    x = "",
    title="Estimated road traffic death rate (2019)",
    caption="©2022 Milos Popovic https://milospopovic.net\nSource: WHO")+
scale_fill_gradientn(name="deaths per 100,000 people",
                       colours=cols2,
                       breaks=brk,
                       labels=brk,
					   limits = c(min(brk),max(brk)))+
geom_text(data = cn, 
    aes(coords.x1, coords.x2, 
      label=round(value, 1)),
      size=2.75,
      color="grey20",
      family="georg") + 
guides(fill=guide_legend(
            direction = "horizontal",
            keyheight = unit(1.15, units = "mm"),
            keywidth = unit(12, units = "mm"),
            title.position = 'top',
            title.hjust = 0.5,
            label.hjust = .5,
            nrow = 1,
            byrow = T,
            reverse = F,
            label.position = "bottom"
          )
    ) +
theme_minimal() +
theme(text = element_text(family = "georg"),
panel.background = element_blank(), 
legend.background = element_blank(),
legend.position = c(.45, .02),
panel.border = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
plot.title = element_text(size=16, color='#6458ae', hjust=0.5, vjust=0),
plot.subtitle = element_text(size=14, color='#ac63a0', hjust=0.5, vjust=0),
plot.caption = element_text(size=7, color="grey60", hjust=0.5, vjust=10),
axis.title.x = element_text(size=10, color="grey20", hjust=0.5, vjust=-6),
legend.text = element_text(size=9, color="grey20"),
legend.title = element_text(size=10, color="grey20"),
strip.text = element_text(size=12),
plot.margin     =   unit(c(t=1, r=-2, b=-1, l=-2),"lines"),
axis.title.y = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank())

And, here is the output! The values are printed and they show up as envisaged. Except for Albania, Croatia, and Greece, which we’ll fix in a moment. We also notice that the values higher than 9 are barely visible. So, let’s fix the misplaced labels and increase the visibility of other labels.

photo2

We’ll overwrite our previous ggplot2 object and start off with a quick fix for Albania, Croatia, and Greece. The reason why we first modify the labels for these individual countries is that filtering by value will include these countries even if we explicitly declare that we want them out by country code or name. Toying with the value for vjust and hjust allows us to move the value labels.

p1 <- 
ggplot() +
geom_sf(data = c, 
  aes(fill=value), 
  color="white", 
  size=0.25) +
coord_sf(crs = crsLAEA, 
  xlim = c(b["xmin"], b["xmax"]), 
  ylim = c(b["ymin"], b["ymax"])) +
labs(y="", 
    subtitle="",
    x = "",
    title="Estimated road traffic death rate (2019)",
    caption="©2022 Milos Popovic https://milospopovic.net\nSource: WHO")+
scale_fill_gradientn(name="deaths per 100,000 people",
                       colours=cols2,
                       breaks=brk,
                       labels=brk,
					   limits = c(min(brk),max(brk)))+
geom_text(data = filter(cn, CNTR_ID=="HR"), 
    aes(coords.x1, coords.x2, 
      label=round(value, 1)),
      size=2.75,
      vjust=-1.5,
      color="grey20",
      family="georg") + 
geom_text(data = filter(cn, CNTR_ID=="EL"), 
    aes(coords.x1, coords.x2, 
      label=round(value, 1)),
      size=2.75,
      hjust=1.5,
      color="grey20",
      family="georg") +
geom_text(data = filter(cn, CNTR_ID=="AL"), 
    aes(coords.x1, coords.x2, 
      label=round(value, 1)),
      size=2.75,
      hjust=1.5,
      color="grey20",
      family="georg") +      
guides(fill=guide_legend(
            direction = "horizontal",
            keyheight = unit(1.15, units = "mm"),
            keywidth = unit(12, units = "mm"),
            title.position = 'top',
            title.hjust = 0.5,
            label.hjust = .5,
            nrow = 1,
            byrow = T,
            reverse = F,
            label.position = "bottom"
          )
    ) +
theme_minimal() +
theme(text = element_text(family = "georg"),
panel.background = element_blank(), 
legend.background = element_blank(),
legend.position = c(.45, .02),
panel.border = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
plot.title = element_text(size=16, color='#6458ae', hjust=0.5, vjust=0),
plot.subtitle = element_text(size=14, color='#ac63a0', hjust=0.5, vjust=0),
plot.caption = element_text(size=7, color="grey60", hjust=0.5, vjust=10),
axis.title.x = element_text(size=10, color="grey20", hjust=0.5, vjust=-6),
legend.text = element_text(size=9, color="grey20"),
legend.title = element_text(size=10, color="grey20"),
strip.text = element_text(size=12),
plot.margin     =   unit(c(t=1, r=-2, b=-1, l=-2),"lines"),
axis.title.y = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank())

print(p1)

photo3

It looks much better! Now let’s make the labels more visible for the countries with high values and create the final version.

p2 <- 
p1 +
geom_text(data = filter(cn, !CNTR_ID%in%c("AL", "HR", "EL") & value < 9), 
    aes(coords.x1, coords.x2, 
      label=round(value, 1)),
      size=2.75,
      color="grey20",
      family="georg") + 
geom_text(data = filter(cn, !CNTR_ID%in%c("AL", "HR", "EL") & value >= 9), 
    aes(coords.x1, coords.x2, 
      label=round(value, 1)),
      size=2.75,
      color="grey80",
      family="georg")

print(p2)

photo3

In this tutorial, I showed you a few data tricks that would hopefully help you use APIs to kick off your own dataviz projects. First, I showed you how to easily fetch a dataframe from the GHO portal without downloading and loading a csv file into R. Second, you’ve learned to position labels on your map using geom_text.

Feel free to check the full code here, clone the repo and reproduce, reuse and modify the code as you see fit. The road from here is wide open as I showed you how to map only 1 out of over 2000 indicators from the GHO Portal. With a slight modification of this code, you’ll be able to produce a myriad of maps and all that programmatically in R instead of spending your precious time on manually downloading the data.

I’d be happy to hear your view on how this map could be improved or extended to other geographic realms. To do so, please follow me on Twitter, Instagram or Facebook.