Your first D3 visualisation with {r2d3} and Scooby-Doo
Get the code for this blog on GitHub
What is this tutorial and who is it for?
This tutorial is aimed mainly at R users who want to learn a bit of D3, and specifically those who are interested in how you can incorporate D3 into your existing workflows in RStudio. It will gloss over a lot of the fundamentals of D3 and related topics (JavaScript, CSS, and HTML) to fast-forward the process of creating your first D3.js visualisation. It will therefore be far from a comprehensive guide. I’ve tried to include what I think is important, but if you have absolutely no experience with any of those topics you will almost definitely be left with some questions. Hopefully, the satisfaction of creating your first plot will inspire you to break and tweak the code I have provided to learn more.
What is D3?
D3.js, or just D3 as it’s more often referred to, is a JavaScript library used for creating interactive data visualisations optimised for the web. D3 stands for Data-Driven Documents. It is commonly used by those who enjoy making creative or otherwise unusual visualisations as it offers you a great deal of freedom as well as options for interactivity such as animated transitions and plot zooming.
Why should I care?
One benefit of D3 is its aforementioned creative control. Another benefit is that rather than creating raster images (e.g. PNG, JPEG) like a lot of plotting libraries it renders your figures as SVGs (scalable vector graphics), which stay crisp no matter how far you zoom in and are generally faster to load (note: when there are many data points, an SVG may be slower than a raster image, learn more about which image file type to use in our blog post on image formats). If you are an R user, you should also care because the {r2d3} package lets you easily incorporate D3 visualisations into your R workflow, and use them in e.g. R Markdown reports or R Shiny dashboards.
Is learning D3 worth the effort?
The short answer is: it depends. It can be quite tricky and
time-consuming to learn D3 and all associated skills (JavaScript, HTML,
CSS) if you have no previous experience. On the other hand, learning D3
can be a fun way to take your first steps into web development
technologies. Furthermore, you may be perfectly happy with available
plotting libraries in R, e.g. {ggplot2}, as what they offer is indeed
highly flexible and suitable for interactivity. You can even save ggplot
plots as SVG with ggsave()
and svglite
. Therefore, I don’t think
learning D3 is a necessity for data visualisation, but it can be an
addition to your skill set and can be a great first step into creative
coding or web development.
What is {r2d3}?
If you are still with me, let’s get into {r2d3}.
{r2d3} is an R package that lets you
create D3 visualisations with R. One way it enhances this process is by
being able to translate between R objects and D3-friendly data
structures. This means that you can clean your data in R, and then just
plot it using D3 without having to go near any data wrangling using
JavaScript. Another cool feature is that you can create D3-rendering
chunks in an R Markdown file that will preview inline, so you can easily
incorporate a D3 visualisation in your reports. You can also easily add
a D3 visualisation to a Shiny app using the renderD3()
and
d3Output()
functions. If you need help with a Shiny Application, we
can
help.
The basics
OK, let’s get set up to create our first D3 visualisation in RStudio. We’re gonna be using this fun dataset on Scooby-Doo manually aggregated by user plummeye. We are gonna make a line chart that shows the cumulative total number of monsters caught by each member of Mystery Incorporated. Then we will add some unique D3 flair to it to make an unusually painful line chart worth it.
First, you’ll need to install the {r2d3} package as usual.
install.packages("r2d3")
This allows you to write D3 in RStudio in two main ways:
- D3 chunks in an .Rmd file
- A D3 script - a
.js
file with some autopopulated D3 code
For this blog post, we will be writing our code in a separate .js file, but we will be running it in an R Markdown chunk to preview it (However, it is also possible to preview your code from the script directly, but this way will hopefully show you how easily you can include D3 visualisations in an R Markdown report).
So, we will start by creating two files:
- An R Markdown document:
scoobydoo.Rmd
- A D3 script:
scoobydoo.js
To ensure that the files are able to interact with each other, I
recommend working in an RStudio project (File > New Project) with both
files at the .Rproj
level.
Data cleaning in R
You will need to install some packages for the cleaning steps, which you can install with this line of code:
install.packages(c("dplyr", "lubridate", "r2d3",
"stringr", "tidyr", "tidytuesdayR"))
In your .Rmd file, you can copy the following steps to load necessary packages, read in the data, and clean it in preparation of our D3 visualisation. We won’t go through these steps as this blog post assumes you know R and some basic Tidyverse already! If you don’t, we offer courses to help you get started! You can download the data we will be using manually from here if you prefer reading it in from a CSV file.
# in scoobydoo.Rmd
library("dplyr")
library("tidyr")
library("stringr")
library("lubridate")
# load data from tidytuesday
tuesdata = tidytuesdayR::tt_load(2021, week = 29)
scoobydoo = tuesdata$scoobydoo
# wrangling data into nice shape
monsters_caught = scoobydoo %>%
select(date_aired, starts_with("caught")) %>%
mutate(across(starts_with("caught"), ~ as.logical(.))) %>%
pivot_longer(cols = caught_fred:caught_not,
names_to = "character",
values_to = "monsters_caught") %>%
drop_na() %>%
filter(!(character %in% c("caught_not", "caught_other"))) %>%
mutate(year = year(date_aired), .keep = "unused") %>%
group_by(character, year) %>%
summarise(caught = sum(monsters_caught),
.groups = "drop_last") %>%
mutate(
cumulative_caught = cumsum(caught),
character = str_remove(character, "caught_"),
character = str_to_title(character),
character = recode(character, "Daphnie" = "Daphne")
)
I recommend investigating the resulting columns of the data by printing
monsters_caught
at this stage, as it will help you better understand
the D3 code later on. You will see that there are 5 columns, character
which contains the names of our Mystery Inc. members (Daphne, Fred,
Scooby, Shaggy, and Velma); year
which contains years between 1969 and
2021 obtained from when the episode was aired; caught
which contains
how many monsters were caught for each mystery member in each year and
cumulative_caught
which is the cumulative sum of monsters caught for
each member.
We are going to add a final column which will contain a unique colour for each character, so that our line chart will look a bit nicer. The colours are represented by hex codes obtained from official artwork of the characters.
# setting up colors for each character
character_hex = tribble(
~ character, ~ color,
"Fred", "#76a2ca",
"Velma", "#cd7e05",
"Scooby", "#966a00",
"Shaggy", "#b2bb1b",
"Daphne", "#7c68ae"
)
monsters_caught = monsters_caught %>%
inner_join(character_hex, by = "character")
We will also add a new chunk which includes the following code:
library("r2d3")
r2d3(data = monsters_caught,
script = "scoobydoo.js",
d3_version = "5")
The r2d3()
function lets you communicate with our scoobydoo.js
script using the monsters_caught
tibble that we’ve created in R. As
our script is currently empty, nothing shows up when you run this line.
After we add some new code to our scoobydoo.js
script we can go back
to scoobydoo.Rmd
and re-run this line to view the output. We are
specifying our D3 version as 5
to ensure our code will continue to
work despite potentially breaking updates to D3.
Your first lines of D3
Okay, let’s add some code to our D3 script. We are defining some variables as constants that set up the size of our margins, plot width and height, and some font and line sizes for later on. Defining our constants at the top makes them easy to find and change if we want to change the sizes throughout our script.
Note: Comments in JavaScript are denoted by //
, and variable names are
often written in camelCase
.
Another important concept being introduced in the code below are attributes. An SVG element has a number of properties and these can be set as attributes. For example, here we are setting the width attribute of the SVG as the width of our (upcoming) plot plus the left and the right margin (white space around the plot). Finally, we set up a group that will represent the plot inside our SVG element, and then move this plot to start where the left and top margin end using the “transform” attribute.
// in scoobydoo.js
// set up constants used throughout script
const margin = {top: 80, right: 100, bottom: 40, left: 60}
const plotWidth = 800 - margin.left - margin.right
const plotHeight = 400 - margin.top - margin.bottom
const lineWidth = 3
const mediumText = 18
const bigText = 28
// set width and height of svg element (plot + margin)
svg.attr("width", plotWidth + margin.left + margin.right)
.attr("height", plotHeight + margin.top + margin.bottom)
// create plot group and move it
let plotGroup = svg.append("g")
.attr("transform",
"translate(" + margin.left + "," + margin.top + ")")
If we run our r2d3()
line in R Markdown again, the output is still
empty, but if we right-click on the space below our chunk and click
“Inspect Element”, we can now see that there is indeed an SVG element
(everything inside the SVG tags <svg> </svg>
), with the width and
height that we’ve provided in the SVG attributes. Getting comfortable
with using either the RStudio Developer Tools to inspect the element, or
inspecting it in a browser, will help you more easily understand D3
visualisations.
Adding axes
Next, let’s create some axes. At the bottom of scoobydoo.js
we add the
lines defining the , add the following lines which define two functions
xAxis
and yAxis
. These will be used to scale our data to a
coordinate system.
// x-axis values to year range in data
// x-axis goes from 0 to width of plot
let xAxis = d3.scaleLinear()
.domain(d3.extent(data, d => { return d.year; }))
.range([ 0, plotWidth ]);
// y-axis values to cumulative caught range
// y-axis goes from height of plot to 0
let yAxis = d3.scaleLinear()
.domain(d3.extent(data, d => { return d.cumulative_caught; }))
.range([ plotHeight, 0]);
We set the limits of the x- and y-axes to be between the min and max of
the respective columns (returned by d3.extent
with an anonymous
function returning all values from our respective columns). We then
define the actual length of our axes to be our full plot width and plot
height. Notice that when we define the y-axis, it is defined from top to
bottom (from plot height to 0).
Then, let’s add these axes to the plot. We move the x axis to start at
the bottom of the plot, and define it with a built-in D3 function used
to create a bottom horizontal axis (d3.axisBottom
) and a left vertical
axis (d3.axisLeft
) which require a scale (which we created with
d3.scaleLinear
in our xAxis
and yAxis
functions). We also set
stroke widths and font sizes for both axes.
// add x-axis to plot
// move x axis to bottom of plot (height)
// format tick values as date (no comma in e.g. 2,001)
// set stroke width and font size
plotGroup.append("g")
.attr("transform", "translate(0," + plotHeight + ")")
.call(d3.axisBottom(xAxis).tickFormat(d3.format("d")))
.attr("stroke-width", lineWidth)
.attr("font-size", mediumText);
// add y-axis to plot
// set stroke width and font size
plotGroup.append("g")
.call(d3.axisLeft(yAxis))
.attr("stroke-width", lineWidth)
.attr("font-size", mediumText);
Adding lines
Now, we need reformat our data slightly to be able to create a line chart with multiple lines. Each line will represent a Mystery Inc. member, so we want to create a hierarchical tree structure with the data for each character nested inside a separate key.
// turns data into nested structure for multiple line chart
// d3.nest() no longer available in D3 v6 and above hence version set to 5
let nestedData = d3.nest()
.key(d => { return d.character;})
.entries(data);
Here, d => {return d.character}
defines an anonymous function which
takes our data as an input and iterates through the character column so
we can create a separate key for each character with key()
. We then
supply the data values associated with that character inside the key
inside entries()
. You can investigate the structure of the nested data
by running nestedData
in the JavaScript console when in “Inspect
Element” mode.
Then, we create a path element which will have new class defined by us
called drawn_lines
(we can create a new class called whatever we want
in the class attribute) so that we can access this specific path element
later on. We define another anonymous function to color the line by the
hex codes in our color column. Finally, we define how we want the path
to use our data (it will be a line (d3.line
) whose x position is
determined by our year
column, and y position by our
cumulative_caught
column)
let path = plotGroup.selectAll(".drawn_lines")
.data(nestedData)
.enter()
.append("path")
// set up class so only this path element can be removed
.attr("class", "drawn_lines")
.attr("fill", "none")
// color of lines from hex codes in data
.attr("stroke", d => {return d.values[0].color})
.attr("stroke-width", lineWidth)
// draw line according to data
.attr("d", d => {
return d3.line()
.x(d => { return xAxis(d.year);})
.y(d => { return yAxis(d.cumulative_caught);})
(d.values)
})
Adding text
Now we will add a plot title. Create a text element for the plot title,
defining where it is anchored, the x and y position of the anchor, what
the actual text says, and its color, font size and font weight. We
append the text to the whole svg, rather than just the plot. So that the
title is above the tallest point of the y axis (end of the plotGroup
).
// create plot title
svg.append("text")
.attr("text-anchor", "start")
.attr("x", margin.left)
.attr("y", margin.top/3)
.text("Monsters caught by Mystery Inc. members")
.attr("fill", "black")
.attr("font-size", bigText)
.attr("font-weight", "bold")
Now we’ll create legend labels for each line which will identify which
character each line belongs to. Here, we create another group in our
plot that is going to contain text from nestedData
. We set some
attributes in terms of how it will look, as well as give it a custom
class name_labels
. We also decide where these labels will go, giving
them an x position slightly after the last data point on the x axis
(2021) and a y position based on the location of the final value on the
y axis (where the line ends). The text and color of the label will
depend on the character and color columns in the dataset.
// create legend labels i.e. character names
plotGroup.append("g")
.selectAll("text")
.data(nestedData)
.enter()
.append("text")
// add class so name_labels can be removed in drawLines()
.attr("class", "name_labels")
.style("font-weight", "bold")
.style("font-size", mediumText)
// set location for labels (at the end)
.attr("x", xAxis(2021) + mediumText/2)
.attr("y", (d, i) => yAxis(d.values[d.values.length-1].cumulative_caught) + mediumText/3)
.attr("fill", d => {return d.values[0].color})
.text(d => {return d.values[0].character})
Adding transitions
First, we will add a transition for the labels we just created. By
wrapping our plot-creating code in functions we can recreate the plot at
specific times. We will start by wrapping everything in the previous
chunk inside a function called drawLabels()
and add a transition which
makes the labels appear after 500 milliseconds, giving them a “fade in”
effect.
function drawLabels() {
<insert code from previous chunk in here>
.attr("opacity", 0)
.transition()
.duration(500)
.attr("opacity", 1)
}
We are also gonna create a transition for the lines that makes them
appear as if they’re being drawn from the start to end. Unfortunately,
the easiest way to do this involves some trickery involving the
stroke-dasharray
attribute of each line. This attribute defines the
dashed pattern of a line. So far, the lines on our plot are completely
solid. We will introduce a dash so large that the length of the dash and
the gap between each dash is longer than the width of the plot itself.
We then manipulate the offset of the dashes to make it appear that the
line is growing over time.
To do this, we need to create two functions. The first, tweenDash()
returns a function to take the stroke-dasharray
attribute of a line as
an argument, then manipulate it to get the next “frame” of the
animation. This will keep looping until the dash is covering the entire
length of the line, making it visible. And it will take 2500ms to do
this, as defined by duration(2500)
.
The other function, lineTransition()
, takes a path (i.e. line) as an
argument and passes that path’s stroke-dasharray
attribute into the
function returned by tweenDash()
. It then applies the new dash
configuration to the path. Note that when the transition ends
(.on("end", ...)
), our drawLabels
function is called. This is to
ensure that the labels appear only when the lines have fully appeared.
function tweenDash() {
let l = this.getTotalLength(),
i = d3.interpolateString("0," + l, l + "," + l);
return function(t) { return i(t) };
}
function lineTransition(path) {
path.transition()
.duration(2500)
.attrTween("stroke-dasharray", tweenDash)
.on("end", () => {
drawLabels();
});
}
Now, wrap your line-drawing code (the code chunk starting with let path =
) in a new function called drawLines()
. We add two new lines at the
top which removes any previously drawn lines and labels. We chain on a
call to the lineTransition()
function at the end of our path code.
function drawLines() {
// remove previously drawn lines when re-drawing
plotGroup.selectAll(".drawn_lines").remove()
// remove labels e.g. "Daphne" when re-drawing
plotGroup.selectAll(".name_labels").remove()
<code which starts with 'let path =' goes here>
.call(lineTransition)
}
Finally, add a line to call our new drawLines()
function at the bottom
of the script.
drawLines()
Now we have a working, animated D3 visualisation! I’ve added a button
to the blogpost to redraw the plot, but you should see the graph animate
as you re-run your r2d3()
line.
Make it resizable
You might’ve already noticed that your local plot is of a static size and if you resize your RStudio window, your plot gets cut off. Luckily, {r2d3} comes with built-in width and height objects that change based on the size of the plot container. This means that we can use these variables to make our plot flexibly resize as we resize the window.
If we want to keep similar dimensions between the margins, plot width and height and line and text sizes, you can replace your constant-defining code at the top with the following, but you can play around with the multipliers to determine what relationships you want between sizes.
const margin = {top: 0.1 * width,
right: 0.125 * width,
bottom: 0.05 * width,
left: 0.075 * width}
const plotWidth = width - margin.left - margin.right
const plotHeight = height - margin.top - margin.bottom
const lineWidth = 0.004 * plotWidth
const mediumText = 0.03 * plotWidth
const bigText = 0.04 * plotWidth
Now, if you re-run your plot, it should automatically resize when you change the size of the window. And notice, because the plot is an SVG (scalable vector graphics) element, our plot stays sharp as we make it bigger or smaller.
Get the final .Rmd and .js files
Summary
We’ve now created our first D3 visualisation from scratch using the {r2d3} package in RStudio! As you can see, creating a line chart with many lines requires a lot of code and so, if you’re creating a basic plot for non-aesthetic purposes, sticking to {ggplot2} may make more sense. However, if you want your plot to be an interactive website statement piece or a creative, user-driven exploration of data or ideas, D3 may better suit your needs. As this blogpost was aimed at beginners, the end result is not particularly dramatic, but if this has inspired you to learn more, I have provided some links to some amazing D3 creators and resources below.
Further resources
If you are looking for more comprehensive materials to learn D3, I highly recommend these two video tutorials by Curran Kelleher: Data Visualization with D3.js and Data Visualization with D3, JavaScript, React. Moreover, the The D3.js Graph Gallery by Yan Holtz is a good reference website to see what kind of plots you can make and how. Check out Observable for plenty of creative community-made D3 visualisations. Finally, if you need to be convinced that you can make cool stuff in D3, I highly recommend checking out Shirley Wu, Nadieh Bremer, and Amelia Wattenberger.