A simple one line tweak can significantly speed up package installation and updates.
One of the best features of R is CRAN. When a package is submitted to CRAN, not only is it checked under three versions of R
but also three different operating systems
CRAN also checks that the updated package doesn’t break existing packages. This last part is particularly tricky when you consider all the dependencies a package like ggplot2 or Rcpp have. Furthermore, it performs all these checks within 24 hours, ready for the next set packages.
What many people don’t realise is that for CRAN to perform this miracle of package checking, it builds and checks these packages in parallel; so rather than installing a single package at a time, it checks multiple packages at once. Obviously some care has to be taken when checking/installing packages due to the connectivity between packages, but R takes care of these details.
If you examine the help package of
?install.packages, there’s a sneaky argument called
Ncpus. From the help page:
Ncpus: The number of parallel processes to use for a parallel install of more than one source package.
The default value of this argument is
Ncpus = getOption(‘Ncpus’, 1L)
getOption() part determines if the value has been set in
options(). If no value is found, the default number of processes to use is
1. If you haven’t heard of
Ncpus, it’s almost certainly 1, but you can check using
getOption("Ncpus", 1L) ##  6
To test if changing the value of
Ncpus makes a difference, we’ll install the tidyverse package with all it’s associated dependencies. On my machine, all packages live in a directory called
/rpackages/, for each test below I deleted
/rpackages/ so all tidyverse dependencies were reinstalled.
My machine has eight cores
parallel::detectCores() #  8
So it doesn’t make sense to set
Ncpus above 8. Another point is that although R reports that I have 8 cores, I only have 4 physical cores; the other 4 are due to hyper-threading. In practice, this means that I’m only likely to get at most a 6 fold speed-up.
For this experiment, I used the RStudio CRAN repository, set via
options(repos = c("CRAN" = "https://cran.rstudio.com/"))
To time the installation procedure, I just use the standard
After removing the
/rpackages/ directory, I set
Ncpus equal to
1 and installed the tidyverse package with dependencies
options(Ncpus = 1) system.time(install.packages("tidyverse")) ## Time in seconds # user system elapsed #372.252 15.468 409.364
So a standard installation takes almost 7 minutes (409/60)!
Before we go on, it’s worth noting a couple of caveats:
Repeating this experiment with different values of
Ncpus gives the table below:
Ncpus to 2 allows us to half the installation time from 409 seconds to around 224 (seconds). Increasing
Ncpus to 4 gives a further speed boost. Due to the dependencies between packages, we’ll never achieve a perfect speed-up, e.g. if package X depends on Y, then we have to install X first. However, for a simple change we get an easy speed boost.
Ncpus gives a speed boost when updating packages via
options(Ncpus = 6) before you install a package is a pain. However, you can just add it to your
.Rprofile file. In a future blog post, we cover the
.Rprofile in more detail, but for the purposes of this post, your
.Rprofile is a file that contains R code that runs whenever R starts. You can test whether you have an
.Rprofile file using the command
If you don’t have an
.Rprofile file, create one in your home area
Then simply add
options(Ncpus = XX) to your file.
The one remaining question is what value should you set
XX. I typically set it to six since I have eight cores. This allows packages to be installed in parallel, while giving me a little bit of wiggle room to check email and listen to music.
If you are interested in how CRAN handles the phenomenal number of package submissions, check out this recent talk: