Dr Colin Gillespie

Colin has been using R since 1999. He’s the author of a number of R packages and has published the book Efficient R Programming with O’Reilly.

litmus: Maintainer Criteria

Author: Colin Gillespie

Published: July 15, 2025

How often do bugs get fixed? Does the package use source control? Is the package a solo or a group effort? These questions aid our understanding about the long-term viability of a package, and how "risky" it is.

tags: r, litmus, validation, maintainers, scoring

R Package Quality: Code Quality

Author: Colin Gillespie

Published: July 10, 2025

Code quality is what typically comes to mind when talking about "good packages". Does that package pass standard checks? What is the Unit test coverage? How many dependencies does the package have? This post discusses how we use code quality when determining the package litmus score.

tags: r, litmus, validation, code, scoring

R Package Quality: Documentation

Author: Colin Gillespie

Published: July 3, 2025

Not all R packages are clearly “good” or “risky”, most fall somewhere in between. This post introduces our scoring framework around package documentation. We investigate the different measures, then look at a few well known packages.

tags: r, litmus, validation, documentation, scoring

R Package Quality: Package Popularity

Author: Colin Gillespie

Published: June 26, 2025

The popularity of an R package can be measured using package downloads and reverse dependencies. While these measures clearly have issues, they do provide a good indication of the package health. This post looks at these measures in more detail.

tags: r, litmus, validation, popularity, scoring

R Package Quality: Validation and beyond!

Authors: Colin Gillespie & Astrid Radermacher

Published: June 19, 2025

Not all R packages are clearly “good” or “risky”, most fall somewhere in between. This post introduces a scoring framework to help users assess package quality, based on documentation, code, maintenance, and popularity. We also share key principles to ensure the scores are useful, fair, and adaptable to different contexts.

tags: r, litmus, validation, scoring

The Jumping Rivers Dashboard Gallery

Authors: Russ Hyde, Keith Newman, Pedro Silva, Tim Brock & Colin Gillespie

Published: April 15, 2025

At Jumping Rivers we love data dashboards and are delighted to announce the release of a gallery to showcase our application-development skills.

tags: dashboard, shiny, r, python

Introducing Litmus Dashboard

Authors: Pedro Silva, Astrid Radermacher & Colin Gillespie

Published: April 7, 2025

The Litmusverse is a suite of R tools that automates the risk assessment, scoring, and reporting of R packages—supporting compliance in regulated environments. Built for use across CRAN, Bioconductor, and internal repos, it helps teams streamline package approval and maintain a production-ready R ecosystem.

tags: r, litmusverse, litmus, validation

Should I Use Your R Package?

Authors: Astrid Radermacher & Colin Gillespie

Published: March 31, 2025

Wondering if your R package collection is any good? It all boils down to your risk appetite. Happy to wing it? Go for it! Need rock-solid reliability? You'll want proper validation.

tags: r, litmus, validation

Diffify & Posit Package Manager

Authors: Colin Gillespie & Myles Mitchell

Published: December 12, 2024

The latest version of Posit Package Manager allows us to add metadata to package pages. This means we can now directly link R and Python packages to diffify.com!

tags: r, python, diffify, posit

A timeline of R's first 30 years

Authors: Tim Brock, Colin Gillespie & the Jumping Rivers Team

Published: June 27, 2024

R has come a long way since it's initial public release in August 1993. Explore some highlights of the last thirty years in an interactive timeline.

tags: r, shiny, tidyverse, packages

Vetiver: Model Deployment

Author: Colin Gillespie

Published: June 20, 2024

Part 2 of our series of blogs on vetiver for MLOps. In this post, we demonstrate how to deploy a machine learning model to production using Docker, Posit Connect, and SageMaker. Docker allows developers to bundle application code with necessary dependencies, simplifying deployment. We outline the process of creating a Dockerfile with the {vetiver} package and running the model locally. Additionally, we show how to publish the model to Posit Connect and SageMaker for broader accessibility.

tags: r, vetiver, machine-learning, production, mlops

Vetiver: First steps in MLOps

Author: Colin Gillespie

Published: June 13, 2024

Part 1 of our series of blogs on vetiver for MLOps. This post introduces MLOps and its integration into the traditional data science workflow, focusing on continuous model deployment and maintenance. It demonstrates automating data importation, creating a model with {tidymodels}, and using {vetiver} to store and deploy the model. The process includes creating an API with {plumber} and deploying it locally. Finally, it verifies the API functionality, setting the stage for future production deployments. description: Part 1 of our series of blogs on vetiver for MLOps. This post introduces MLOps and its integration into the traditional data science workflow, focusing on continuous model deployment and maintenance. It demonstrates automating data importation, creating a model with {tidymodels}, and using {vetiver} to store and deploy the model. The process includes creating an API with {plumber} and deploying it locally. Finally, it verifies the API functionality, setting the stage for future production deployments.

tags: r, vetiver, machine-learning, production, mlops

Parquet vs the RDS Format

Author: Colin Gillespie

Published: February 1, 2024

A benefit of using the {arrow} package with parquet files is it enables you to work with ridiculously large data sets from the comfort of an R session. In this post we explore the timescales associated with different methods of data storage.

tags: r, arrow, parquet, rds

Reading and Writing Data with {arrow}

Author: Colin Gillespie

Published: January 18, 2024

Apache Arrow is a cross-language format for super fast in-memory data. It's designed for efficient analytic operations. In this post, we look at reading and writing data using Arrow and the advantages of the parquet file format.

tags: r, arrow, parquet

Security Headers for Shiny Applications

Author: Colin Gillespie

Published: January 11, 2024

Shiny applications are easy to set up. However, as they often contain sensitive information, care should be taken. This post discusses standard server headers that can be set to harden your application.

tags: r, csp, shiny, server, headers

I'm an R user: Quarto or R Markdown?

Authors: Nicola Rennie & Colin Gillespie

Published: December 8, 2022

Are you an R user considering switching from R Markdown to Quarto? Here are our favourite features that we think R users might benefit from.

tags: r

Automating Dockerfile creation for Shiny apps

Authors: Jamie Owen & Colin Gillespie

Published: October 20, 2022

Deploying shiny applications can be frustrating, making sure your production environment matches your local environment where you can see your application running. In this blog post we explore how we might start writing code to automate the process of creating Dockerfiles for producing images that make our local, running, shiny application able to be deployed in a container.

tags: r, shiny, docker

RStudio2022: Talks to watch out for

Authors: Colin Gillespie & Nicola Rennie

Published: July 26, 2022

rstudio::conf(2022) is well underway and, after two days of workshops, the main talks begin tomorrow. Here are our highlights of talks we're most excited about!

tags: r, rstudioconf2022

Creating a Reproducible Example

Authors: Colin Gillespie & Jack Walton

Published: May 31, 2022

Complex software bugs are often difficult to reproduce. Unfortunately, without this reproducibility it can be hard to get help and input from others. In this post we discuss a particularly nasty bug we encountered and how we made it reproducible.

tags: r, python, reprex, reticulate, docker

New features in R 4.2.0

Author: Colin Gillespie

Published: April 1, 2022

R version 4.2.0 is about to be released. This release includes an update to the native pipe, changes to logical operators and improvements to the help page. In this blog post, we take a look at (some of) these new features. Highlighting (in our opinion) the most exciting changes.

tags: r

Forgotten features of R 4.0.0

Author: Colin Gillespie

Published: January 25, 2022

R 4.0 was released almost two years ago. However, the majority of R users didn't immediately adopt the new version due to obvious constraints when updating software. The consequence is that many of the new and useful features are forgotten about. This post highlights the features as we've moved to R 4.0.

tags: r, python

Git: Moving from Master to Main

Author: Colin Gillespie

Published: October 19, 2021

In 2020, GitHub took the correct decision to change the default branch from master to main. For single, independent repositories, this is relatively straightforward. But moving groups or organisations is more complex and requires planning.

tags: r, git, gitlab, github

Understanding the Parquet file format

Author: Colin Gillespie

Published: September 27, 2021

Apache Parquet is a column storage file format used by many Hadoop systems. This post describes what Parquet is and the tricks it uses to minimise file size. We also discuss how to use Parquet, within an R workflow.

tags: r, big-data, parquet, feather, storage

Webinars: R in Production

Author: Colin Gillespie

Published: July 19, 2021

Bridging the gap between data science and IT teams is much easier than you might expect! This two-part webinar will discuss why open source languages are suitable for enterprise data science, and how data scientists can work with the IT team to get their organisational buy-in.

tags: r, production, packages, webinar

Moving to Hugo

Author: Colin Gillespie

Published: March 29, 2021

Moving your website to Hugo brings a lot of benefits, but there are also challenges. In this post, we'll discuss our top tips for making that move to Hugo as smooth as possible.

tags: r, hugo

Default knitr options and hooks

Author: Colin Gillespie

Published: March 12, 2021

Having consistent {knitr} options and hook, improves reproducibility and reduces errors. This post provides a set of standard {knitr} chunk arguments to simplify workflows and make consistent graphs.

tags: r, knitr, rmarkdown

External Graphics with knitr

Author: Colin Gillespie

Published: February 23, 2021

Adding images with {knitr} is straightforward; we simply use include_graphics(). However, it is easy to add an image that is too large, or has the wrong dimensions. This post tells you what to watch out for, and how to optimise your images for the web.

tags: r, knitr, graphics, rmarkdown

Selecting the correct image file type

Author: Colin Gillespie

Published: February 19, 2021

When including graphics within a markdown document, it's crucial to use the correct file type from generating graphics. However, there isn't one size fits all, instead, we should choose what's most appropriate for the image.

tags: r, knitr, graphics

Image sizes in an R markdown Document

Author: Colin Gillespie

Published: February 15, 2021

Setting the correct images sizes in an R markdown document, can be tricky. There are a number of different arguments that all interact with each other. In this post, we look at how we should create the correct image sizes using {knitr}.

tags: r, knitr, graphics, resolution

Detecting Security Vulnerabilities in R Packages

Author: Colin Gillespie

Published: August 28, 2020

One of our main roles at Jumping Rivers is to set-up and provide ongoing maintenance to R, Python and RStudio infrastructure. This typically involves ensuring software is up-to-date and making sure everything is running smoothly. The {oysteR} package is an R interface to the OSS Index that allows users to scan their installed R packages.

tags: r, python, security

Speeding up your Continuous Integration Builds

Author: Colin Gillespie

Published: June 25, 2020

Continuous integration is an amazing tool when developing R packages. We push a change to the server, and a process is spawned that checks we haven’t done something silly. It protects us from ourselves! However this process can become slow, as typically the CI process starts with a blank virtual machine (VM).

tags: r, tidyverse, packages, ci

Setting the Graphics Device in a RMarkdown Document

Author: Colin Gillespie

Published: April 15, 2020

In our recent post about saving R graphics, it became obvious that achieving consistent graphics across platforms or even saving the “correct” graph on a particular OS was challenging. Getting consistent fonts across platforms often failed, and for the default PNG device under Windows, anti-aliasing was also an issue.

tags: r, graphics, markdown, rmarkdown, knitr, cairo

Saving R Graphics across OSs

Author: Colin Gillespie

Published: April 14, 2020

R is known for it’s amazing graphics. Not only {ggplot2}, but also {plotly}, and the other dozens of packages at the graphics task view. There seems to be a graph for every scenario. However once you’ve created your figure, how do you export it? This post compares standard methods for exporting R plots as PNGs/PDFs across different OSs.

tags: r, graphics, markdown, rmarkdown, knitr, cairo

Customising your Rprofile

Author: Colin Gillespie

Published: January 17, 2020

Every time R starts, it runs through a couple of R scripts. One of these scripts is the .Rprofile. This allows users to customise their particular set-up. However, some care has to be taken, as if this script is broken, this can cause R to break. If this happens, just delete the script!

tags: r, packages, rstudio, rprofile

Timing hash functions with the bench package

Author: Colin Gillespie

Published: May 21, 2019

This blog post has two goals. Investigate the {bench} package for timing R functions. Consequently explore the different algorithms in the {digest} package using {bench}. What Is {Digest}? The {digest} package provides a hash function to summarise R objects.

tags: r, tidyverse, timing, digest

R Packages: Are we too trusting?

Author: Colin Gillespie

Published: February 4, 2019

One of the great things about R, is the myriad of packages. Packages are typically installed via CRAN, Bioconductor and GitHub. But how often do we think about what we are installing? Do we pay attention or just install when something looks neat? Do we think about security or just take it that everything is secure?

tags: r, security

{benchmarkme}: new version

Author: Colin Gillespie

Published: January 29, 2019

When discussing how to speed up slow R code, my first question is what is your computer spec? It’s always surprised me that people are wondering why analysing big data is slow, yet they are using a five-year-old cheap laptop. Spending a few thousand pounds would often make their problems disappear.

tags: r, package, benchmarkme

Hacking Bioconductor

Author: Colin Gillespie

Published: November 19, 2018

Domain squatting or URL hijacking is a straightforward attack that requires little skill. An attacker registers a domain that is similar to the target domain and hopes that a user accidentally visits the site. For example, if the domain is example.com, then a typo-squatter would register similar domains such as

tags: r, security, bioconductor

What R version do you really need for a package?

Author: Colin Gillespie

Published: November 1, 2018

At Jumping Rivers we run a lot of R courses. Some of our most popular courses revolve around the tidyverse, in particular, our Introduction to the tidyverse and our more advanced mastering course. We even trained over 200 data scientists NHS - see our case study for more details.

tags: r, tidyverse, packages

R from the turn of the century

Author: Colin Gillespie

Published: September 20, 2018

Last week I spent some time reminiscing about my PhD and looking through some old R code. This trip down memory lane led to some of my old R scripts that amazingly still run. My R scripts were fairly simple and just created a few graphs.

tags: r, tidyverse, ggplot2

Styling {ggplot2} Graphics

Author: Colin Gillespie

Published: August 21, 2018

In our previous post, we demonstrated that contrary to popular opinion, it is possible to generate attractive looking plots using just base graphics. Although we did confess, that it did take a lot of time and effort. In this post, we repeat the same exercise.

tags: r, graphics, ggplot2

Our Logo In R

Author: Colin Gillespie

Published: February 1, 2018

Hi all, so given our logo here at Jumping Rivers is a set of lines designed to look like a Gaussian Process, we thought it would be a neat idea to recreate this image in R. To do so we’re going to need a couple packages. We do the usual install.packages() dance (remember this step can be performed in parallel).

tags: r, tidyverse, ggplot2, ggalt, theme_void

Styling Base R Graphics

Author: Colin Gillespie

Published: January 25, 2018

Base R graphics get a bad press (although to be fair, they could have chosen their default values better). In general, they are viewed as a throw back to the dawn of the R era. I think that most people would agree that, in general, there are better graphics techniques in R (e.g. {ggplot2}).

tags: r, graphics, base

Hosting RStudio Server on Azure

Author: Colin Gillespie

Published: December 2, 2017

Can’t be bothered reading, tell me now. Host RStudio server on an azure instance. Configure the instance to access RStudio with a nice url. Getting started: Azure is cloud computing framework provided by Microsoft, the same idea as AWS by Amazon.

tags: r, azure, cloud, rstudio

Comparing plotly & ggplotly plot generation times

Authors: Theo Roe & Colin Gillespie

Published: November 27, 2017

The {plotly} package. A godsend for interactive documents, dashboard and presentations. For such documents, there is no doubt that anyone would prefer a plot created in {plotly} rather than {ggplot2}. Why? Using {plotly} gives you neat and crucially interactive options at the top, whereas {ggplot2} objects are static.

tags: r, tidyverse, graphics, ggplot2, plotly

Speeding up package installation

Author: Colin Gillespie

Published: November 15, 2017

Can’t Be Bothered Reading, Tell Me Now. A simple one line tweak can significantly speed up package installation and updates. The Wonder Of CRAN: One of the best features of R is CRAN. When a package is submitted to CRAN, not only is it checked under three versions of R

tags: r, tidyverse, packages, parallel