Forgotten features of R 4.0.0

Published: January 25, 2022

tags: r, python

R version 4.0.0 was released almost two years ago. The change in the major version, 3.x.y to 4.0.0, represented significant and potentially breaking changes. For an organisation to start using these new features, everyone in the company must have access to that version; otherwise code isn’t shareable. This naturally slows down adoption.

We moved our internal R projects to depend on version R 4.0.0 around twelve months ago - a few months after the release date. Over the last year we’ve also assisted a number of clients in making the move; typically with Shiny applications. This post aims to highlight some of the features we’ve found useful and also some of the potential pitfalls.

StringsAsFactors

From the beginning, R converted imported strings to factors. For most users, this typically occurred when reading in data using read.csv(). This default made sense for statistical modelling, but was a little tricky for new users. Especially as today’s data sets tend to have messy string data.

In R 4.0.0, this default changed, with stringsAsFactors now being FALSE by default. For our internal applications, this didn’t really cause issues, but we’ve had to help a number of clients “upgrade” their Shiny app to run using R version 4.0.0. If you are planning on making this move, here’s our standard “gotcha” check-list:

Are there any calls to read.csv(), read.table() or read.delim()? If so, this could cause issues. You can either set stringsAsFactors = TRUE in these functions, or fix any issues that crop up.
Are there any data frames saved as rds files? If so, check the columns for factors.
Do you use data.frame() to create data frames? If so, factors might creep in.
Do packages return data frames that you use? This is the trickiest bug to track down.

Raw Character Strings

Using the syntax r"(some characters)" we can now define literal strings. This avoids the painful adding of backslashes when escaping special characters. We’ve recently started using this regularly when generating PDF documents that have LaTeX in them. For example,

r"(Avoiding \texttt{backslash} and "speech mark" hell.)"
#> [1] "Avoiding \\texttt{backslash} and \"speech mark\" hell."

Other uses are regular expressions and HTML code.

Caching with R_user_dir()

Buried deep within the changelog was a reference to R_user_dir() from the {tools} package. This function provides a nice, cross-platform method for creating directories that can be used to store R-related user-specific data, configuration and cache files.

For example,

tools::R_user_dir("my_pkg", which = "cache")
#> [1] "/home/ncsg3/.cache/R/my_pkg"

provides a string that can be used to create a directory. In the {oysteR} package, I use this idea to cache API results. As R generates the path, I don’t have to worry about which OS the user is on.

Also of note

I’ve not used the new reference counting directly, but by switching to R 4.0.0 I’ve certainly benefited from a slightly faster, less resource-hungry version of R. Likewise, the {grid} package was improved, so {ggplot2} is also a little quicker. This is one of the benefits of upgrading R versions; things just get a bit nicer.

References

R Changelog
A nice overview of R 4.0.0 by David Smith.