A timeline of R's first 30 years
Published: June 27, 2024
R has come a long way since it's initial public release in August 1993. Explore some highlights of the last thirty years in an interactive timeline.
Published: June 27, 2024
R has come a long way since it's initial public release in August 1993. Explore some highlights of the last thirty years in an interactive timeline.
Published: June 20, 2024
In this post, we demonstrate how to deploy a machine learning model to production using Docker, Posit Connect, and SageMaker. Docker allows developers to bundle application code with necessary dependencies, simplifying deployment. We outline the process of creating a Dockerfile with the {vetiver} package and running the model locally. Additionally, we show how to publish the model to Posit Connect and SageMaker for broader accessibility.
Published: June 13, 2024
This post introduces MLOps and its integration into the traditional data science workflow, focusing on continuous model deployment and maintenance. It demonstrates automating data importation, creating a model with {tidymodels}, and using {vetiver} to store and deploy the model. The process includes creating an API with {plumber} and deploying it locally. Finally, it verifies the API functionality, setting the stage for future production deployments. description: This post introduces MLOps and its integration into the traditional data science workflow, focusing on continuous model deployment and maintenance. It demonstrates automating data importation, creating a model with {tidymodels}, and using {vetiver} to store and deploy the model. The process includes creating an API with {plumber} and deploying it locally. Finally, it verifies the API functionality, setting the stage for future production deployments.
Published: February 1, 2024
A benefit of using the {arrow} package with parquet files is it enables you to work with ridiculously large data sets from the comfort of an R session. In this post we explore the timescales associated with different methods of data storage.
Published: January 18, 2024
Apache Arrow is a cross-language format for super fast in-memory data. It's designed for efficient analytic operations. In this post, we look at reading and writing data using Arrow and the advantages of the parquet file format.
Published: January 11, 2024
Shiny applications are easy to set up. However, as they often contain sensitive information, care should be taken. This post discusses standard server headers that can be set to harden your application.
Published: December 8, 2022
Are you an R user considering switching from R Markdown to Quarto? Here are our favourite features that we think R users might benefit from.
Published: October 20, 2022
Deploying shiny applications can be frustrating, making sure your production environment matches your local environment where you can see your application running. In this blog post we explore how we might start writing code to automate the process of creating Dockerfiles for producing images that make our local, running, shiny application able to be deployed in a container.
Published: July 26, 2022
rstudio::conf(2022) is well underway and, after two days of workshops, the main talks begin tomorrow. Here are our highlights of talks we're most excited about!
Published: May 31, 2022
Complex software bugs are often difficult to reproduce. Unfortunately, without this reproducibility it can be hard to get help and input from others. In this post we discuss a particularly nasty bug we encountered and how we made it reproducible.
Published: April 1, 2022
R version 4.2.0 is about to be released. This release includes an update to the native pipe, changes to logical operators and improvements to the help page. In this blog post, we take a look at (some of) these new features. Highlighting (in our opinion) the most exciting changes.
Published: January 25, 2022
R 4.0 was released almost two years ago. However, the majority of R users didn't immediately adopt the new version due to obvious constraints when updating software. The consequence is that many of the new and useful features are forgotten about. This post highlights the features as we've moved to R 4.0.
Published: October 19, 2021
In 2020, GitHub took the correct decision to change the default branch from master to main. For single, independent repositories, this is relatively straightforward. But moving groups or organisations is more complex and requires planning.
Published: September 27, 2021
Apache Parquet is a column storage file format used by many Hadoop systems. This post describes what Parquet is and the tricks it uses to minimise file size. We also discuss how to use Parquet, within an R workflow.
Published: July 19, 2021
Bridging the gap between data science and IT teams is much easier than you might expect! This two-part webinar will discuss why open source languages are suitable for enterprise data science, and how data scientists can work with the IT team to get their organisational buy-in.
Published: March 29, 2021
Moving your website to Hugo brings a lot of benefits, but there are also challenges. In this post, we'll discuss our top tips for making that move to Hugo as smooth as possible.
Published: March 12, 2021
Having consistent {knitr} options and hook, improves reproducibility and reduces errors. This post provides a set of standard {knitr} chunk arguments to simplify workflows and make consistent graphs.
Published: February 23, 2021
Adding images with {knitr} is straightforward; we simply use include_graphics(). However, it is easy to add an image that is too large, or has the wrong dimensions. This post tells you what to watch out for, and how to optimise your images for the web.
Published: February 19, 2021
When including graphics within a markdown document, it's crucial to use the correct file type from generating graphics. However, there isn't one size fits all, instead, we should choose what's most appropriate for the image.
Published: February 15, 2021
Setting the correct images sizes in an R markdown document, can be tricky. There are a number of different arguments that all interact with each other. In this post, we look at how we should create the correct image sizes using {knitr}.
Published: August 28, 2020
One of our main roles at Jumping Rivers is to set-up and provide ongoing maintenance to R, Python and RStudio infrastructure. This typically involves ensuring software is up-to-date and making sure everything is running smoothly. The {oysteR} package is an R interface to the OSS Index that allows users to scan their installed R packages.
Published: June 25, 2020
Continuous integration is an amazing tool when developing R packages. We push a change to the server, and a process is spawned that checks we haven’t done something silly. It protects us from ourselves! However this process can become slow, as typically the CI process starts with a blank virtual machine (VM).
Published: April 15, 2020
In our recent post about saving R graphics, it became obvious that achieving consistent graphics across platforms or even saving the “correct” graph on a particular OS was challenging. Getting consistent fonts across platforms often failed, and for the default PNG device under Windows, anti-aliasing was also an issue.
Published: April 14, 2020
R is known for it’s amazing graphics. Not only {ggplot2}, but also {plotly}, and the other dozens of packages at the graphics task view. There seems to be a graph for every scenario. However once you’ve created your figure, how do you export it? This post compares standard methods for exporting R plots as PNGs/PDFs across different OSs.
Published: January 17, 2020
Every time R starts, it runs through a couple of R scripts. One of these scripts is the .Rprofile. This allows users to customise their particular set-up. However, some care has to be taken, as if this script is broken, this can cause R to break. If this happens, just delete the script!
Published: May 21, 2019
This blog post has two goals. Investigate the {bench} package for timing R functions. Consequently explore the different algorithms in the {digest} package using {bench}. What Is {Digest}? The {digest} package provides a hash function to summarise R objects.
Published: February 4, 2019
One of the great things about R, is the myriad of packages. Packages are typically installed via CRAN, Bioconductor and GitHub. But how often do we think about what we are installing? Do we pay attention or just install when something looks neat? Do we think about security or just take it that everything is secure?
Published: January 29, 2019
When discussing how to speed up slow R code, my first question is what is your computer spec? It’s always surprised me that people are wondering why analysing big data is slow, yet they are using a five-year-old cheap laptop. Spending a few thousand pounds would often make their problems disappear.
Published: November 19, 2018
Domain squatting or URL hijacking is a straightforward attack that requires little skill. An attacker registers a domain that is similar to the target domain and hopes that a user accidentally visits the site. For example, if the domain is example.com, then a typo-squatter would register similar domains such as
Published: November 1, 2018
At Jumping Rivers we run a lot of R courses. Some of our most popular courses revolve around the tidyverse, in particular, our Introduction to the tidyverse and our more advanced mastering course. We even trained over 200 data scientists NHS - see our case study for more details.
Published: September 20, 2018
Last week I spent some time reminiscing about my PhD and looking through some old R code. This trip down memory lane led to some of my old R scripts that amazingly still run. My R scripts were fairly simple and just created a few graphs.
Published: August 21, 2018
In our previous post, we demonstrated that contrary to popular opinion, it is possible to generate attractive looking plots using just base graphics. Although we did confess, that it did take a lot of time and effort. In this post, we repeat the same exercise.
Published: February 1, 2018
Hi all, so given our logo here at Jumping Rivers is a set of lines designed to look like a Gaussian Process, we thought it would be a neat idea to recreate this image in R. To do so we’re going to need a couple packages. We do the usual install.packages() dance (remember this step can be performed in parallel).
Published: January 25, 2018
Base R graphics get a bad press (although to be fair, they could have chosen their default values better). In general, they are viewed as a throw back to the dawn of the R era. I think that most people would agree that, in general, there are better graphics techniques in R (e.g. {ggplot2}).
Published: December 2, 2017
Can’t be bothered reading, tell me now. Host RStudio server on an azure instance. Configure the instance to access RStudio with a nice url. Getting started: Azure is cloud computing framework provided by Microsoft, the same idea as AWS by Amazon.
Published: November 27, 2017
The {plotly} package. A godsend for interactive documents, dashboard and presentations. For such documents, there is no doubt that anyone would prefer a plot created in {plotly} rather than {ggplot2}. Why? Using {plotly} gives you neat and crucially interactive options at the top, whereas {ggplot2} objects are static.
Published: November 15, 2017
Can’t Be Bothered Reading, Tell Me Now. A simple one line tweak can significantly speed up package installation and updates. The Wonder Of CRAN: One of the best features of R is CRAN. When a package is submitted to CRAN, not only is it checked under three versions of R