Get Miles: using treemap to visualise running distances

By 30th September 2022, I had clocked up a total of over 2000 km of running in 2022. This milestone was a good opportunity to look at how I got to this point.

The code is shown below. First, we can make a histogram to look at the distance of runs.

Standard ggplot histogram to look at the frequency of run distances

From this type of plot it’s clear that my runs this year consist of a lot of 4-5 km runs and then a chunk of 21 km plus. This is because my run commute is ~5 km (5.5 km but with a summer-only shorter route of 4.4 km) and I do this a lot plus I do a weekly long run of at least 21.1 km.

A histogram like this obscures how much distance these runs contribute to the total, since one 10 km run is worth two 5 km runs. We need a better way to visualise this info.

Enter treemap, a way to see this information more clearly.

Treemap

Treemap of 2022 runs so far

This visualisation shows the total distance in each category as an area. The runs are organised into bins of 1 km distance and then grouped by 5 km distance intervals.

Although the runs of 20-25 km in distance were far fewer in number, they make up more distance than the 5-10 km bracket. This was not so easy to see in the histogram.

The code

 library(treemap) library(ggplot2) library(dplyr) # load the data (output from process_data() within a timeframe of interest) all_data <- read.csv("Output/Data/alldata_2022-01-01_2022-12-31.txt",sep = "\t") # make histogram of running distances p <- ggplot(all_data, aes(x = Distance)) + geom_histogram(breaks = seq(from = 0, to = 45, by = 1)) + labs(x = "Distance (km)", y = "Runs") ggsave("Output/Plots/distanceHist.png", p) # bin the data at 5 km and 1 km resolution all_data <- all_data %>% mutate(km5 = cut(Distance, breaks = seq(from = 0, to = 45, by = 5)), km1 = cut(Distance, breaks = seq(from = 0, to = 45, by = 1))) # two functions to rename the categories rename_km5 <- function(x) { x <- sub("\\(", "", x) x <- sub("\\]", " km", x) x <- sub(",", " - ", x) return(x) } rename_km1 <- function(x) { x <- sub("\\(", "", x) x <- sub(",[[:digit:]]+\\]", "", x) return(x) } # rename the categories to give nice labels all_data$labelkm5 <- rename_km5(all_data$km5) all_data$labelkm1 <- rename_km1(all_data$km1) # PNG device png("Output/Plots/tremap.png", width = 800, height = 800) treemap(all_data, index = c("labelkm5","labelkm1"), vSize = "Distance", type = "index", align.labels=list( c("left", "top"), c("center", "center") ), palette = "Set2", overlap.labels = 1, title="") dev.off() 

A few comments on the code for anyone interested in replicating the plot. The data loaded in are runs within a time-frame of interest. I generated the file to load using some code I wrote previously. All that is needed is a dataframe of runs with a column called Distance.

Binning the data can be done with mutate and cut this factorises the distances into defined bin widths. Unfortunately, the names of the bins don’t look great on the plot, so I made two functions to reformat them to something nice. In this was (0,5] turns into 0 - 5 km for example.

There’s several ways to customise the Treemap and I didn’t go crazy optimising it. The palette (Set2) looked good to me and specifying type as index worked well for my needs.

The post title comes from “Get Miles” by Gomez from their debut LP “Bring It On”.

5 thoughts on “Get Miles: using treemap to visualise running distances

    1. Great suggestion.

      p2 <- ggplot(all_data, aes(x = Distance, weight = Distance)) + geom_histogram(breaks = seq(from = 0, to = 45, by = 1)) + labs(x = "Distance (km)", y = "Total (km)")

      This is a good way to show the total distance in each bin.

Comments are closed.