Should I learn Stan?

Published: February 9, 2023

A little bit about you

Let’s assume you’re familiar with Bayesian statistics; you know what I mean when I say prior, likelihood and posterior. Recall that an MCMC scheme constructs a Markov chain as a method to sample from the posterior density.

You may have used a probabilistic programming language (PPL) in the past, such as BUGS, to perform Bayesian inference. You’ve heard about Stan and want to learn a little more. Or maybe you’re about to step into the Bayesian paradigm and don’t know where to start. You want to know whether you should make the switch from JAGS to Stan, or you’ve used neither of JAGS or Stan and want to know which will suit you best. This post will focus solely on the differences between JAGS and Stan as I have experience with both of them, but there are many more PPLs out there. For example, I have never used Bean Machine, but of all the PPLs, it certainly takes the crown for best name.

Although Stan is a PPL, JAGS technically isn’t a programming language (more on this later). We will use the term “Bayesian modelling software” to talk about them both.

Why use Bayesian modelling software?

When we do a (fully) Bayesian analysis we essentially have two ways to estimate the model parameters. If you have too much spare time on your hands, option A: write a bespoke sampling scheme might appeal to you. If you have other things to do, and want your inferences to be reliable, then I’d recommend option B: construct your model with purpose built software.

The advantages of using Bayesian modelling software over hand-coding a bespoke sampler are similar to the advantages of using a package or library over hand-coding any other model. There’s no need to reinvent the wheel when somebody else has done all the hard work.

Differences at a glance

Stan is a free, open source PPL based on C++. It was developed to allow us to conduct Bayesian inference without the need to write bespoke sampling algorithms. Stan is named after Stanislaw Ulam, who helped develop the first MCMC methods in the 1940s. Andrew Gelman, one of the lead Stan developers, thinks that in hindsight, Arianna would have been a better name than Stan, as it was Arianna Rosenbluth who programmed the first MCMC algorithm. A basic Stan program for linear regression looks like this:

// A linear regression in Stan
data {
  int N;       // sample size 
  vector[N] y; // response variable
  vector[N] x; // predictor variable
}
parameters {
  real alpha;             // intercept
  real beta;              // slope
  real<lower=0> tau; // precision
}
model {
  // likelihood
  y ~ normal(alpha + beta * x, 1 / sqrt(tau));
  // prior
  alpha ~ normal(0, 1);
  beta ~ normal(0, 1);
  tau ~  gamma(2, 2);
}

Stan might feel intimidating if you’ve never used a statically typed language before (languages like C++ and Java). Statically typed means we must declare the type of all variables in Stan. For example, our sample size, N is of type int: it is an integer. If we try to set N = 12.5 the Stan program will not run! R programmers like myself often take types for granted, especially numerical types.

Similarly, JAGS is free, written in C++, and allows us to perform Bayesian computation without knowing too much about MCMC schemes. JAGS is an acronym for “Just Another Gibbs Sampler”; we’ll expand on this a bit later. A simple linear regression in JAGS might look like:

## A linear regression in JAGS
model {
  # likelihood
  for (i in 1:N) {
    y[i] ~ dnorm(alpha + beta * x[i], tau)
  }
  # prior
  alpha ~ dnorm(0, 1) # intercept
  beta ~ dnorm(0, 1)  # slope
  tau ~ dgamma(2, 2)  # residual precision
}

One difference between the two softwares is that a Stan program is broken into “blocks” which allows the user to tell Stan what all the different variables in our code represent. There are more optional blocks to a Stan program. Conversely, the JAGS model is usually just one block. JAGS will work out for itself which of the included variables are known (data) and which are unknown (parameters) based on what data is passed to the JAGS program. Another difference in the model specification is vectorisation. Stan allows (and encourages) you to vectorise your code. In Stan, we wrote y ~ normal(alpha + beta * x, 1 / sqrt(tau)). This is “short hand” for a for loop; y[i] ~ normal(alpha + beta * x[i], 1 / sqrt(tau)). Vectorising gives us cleaner looking code and can bring computational advantages. Conversely, JAGS code is much more difficult to vectorise, thus we must rely on slow for loops more often.

If you’ve used R a lot, the JAGS code might invoke some kind of déjà vu. JAGS code is supposed to look a bit like R code. Distributions are specified by d*(), the types of a variable are interpreted (JAGS figures out if things are real, integer, etc) and we use # to write comments. With JAGS, the normal distribution is parameterised by the precision rather than the variance, but otherwise, if you have a basic understanding of R, you will be pretty good at guessing what JAGS code does.

Differences in user experience

Running your models

JAGS and Stan can be run on their own, via the command line, but we will likely be pre-processing our data in a more general language like R or Python. An interface between our go-to language and our Bayesian modelling software allows our “main” language to run Stan or JAGS code. For R users, {rstan} provides this functionality for Stan, and {rjags} provides this for JAGS. There are similar interfaces for other languages (e.g. for Python, use PyStan and PyJAGS). These interfaces have a similar feel.

Writing code

A big part of coding up a Bayesian model in Bayesian modelling software is, well, coding up the model.

One thing I really like about Stan is the functions block. Good coding practice tells us we should put commonly used blocks of code into a function. This could be handy if we want to fit a non-linear regression model. In Stan, if we wanted to use the expression \( \alpha + e^ {\beta x}\) many times we could define this as a function:

functions {
  real non_linear_mean(real alpha, real beta, real x) {
    return alpha + exp(beta * x);
  }
}

This function can be used just like any inbuilt Stan function. JAGS does not have friendly support for user-defined functions or distributions; essentially you need to write your own JAGS module in C++.

Another thing I like about Stan is that syntax highlighting is supported in many popular IDEs. RStudio has out the box support for Stan, whilst other popular IDEs such as Vim and Jupyter have Stan plug-ins. This is because Stan is a language. As far as I can tell, the only editor that supports JAGS syntax is Emacs (and that’s not even out of the box). The lack of support is probably because JAGS is a program and not a language. Personally, I’d find a full data science workflow in Emacs less than ideal. For this post, I used R syntax highlighting on the JAGS code. However, we normally write JAGS code within a string, so the entire model would be highlighted as a string. Stan syntax highlighting is supported by more user friendly environments and is even supported by Quarto, which is handy if you’re teaching Bayesian modelling!

Getting help

Reaching out online to get help is a huge part of the user experience. The Stan Forums are a hive of activity with questions regularly being answered by the Stan development team themselves. As far as I can tell, the JAGS community does not seem to be as active. Stan even has its own conference!

Differences under the hood

This section is a little technical. The TL;DR is: JAGS has a toolbox of relatively simple sampling schemes; under special circumstances some of these schemes are very effective. Stan uses a Behemoth of a sampling scheme called Hamiltonian Monte Carlo (HMC). This is a complex sampling scheme but can be very effective for complex models.

Unpacking the technical stuff

JAGS looks at the Bayesian model you have provided and tailors the type of sampling schemes used to maximise performance. When it can be used, JAGS will use a Gibbs sampler. Gibbs is a computationally cheap sampling scheme but can only be used for a small set of likelihood-prior combinations. When JAGS can’t use simple, tractable sampling methods it uses more general purpose, but often less efficient methods, such as slice sampling.

The HMC algorithms within Stan are inspired by statistical physics. By default Stan uses the No U-Turn Sampler (NUTS), a variant on HMC. NUTS utilises Hamiltonian Dynamics, which relies on the gradient of the log posterior, \( \nabla \log \pi (\theta \mid x) \). This clever mathematics will (hopefully!) produce a statistically efficient MCMC scheme. The downside of employing complex mathematics is that each iteration of the MCMC scheme can be computationally complex. For more on HMC see Michael Betancourt’s introduction to HMC. The power of HMC is that it can produce a statistically efficient MCMC scheme; we may not need to run the MCMC scheme for as many iterations to obtain satisfactory results, thus the overall run time may be less.

One sampling scheme to rule them all?

Suppose we wanted to fit a model in Stan where one or more of the unknowns is discrete. This might be because I have some missing count data, for example. In this instance, \( \nabla \log \pi (\theta \mid x) \) will not exist, and therefore HMC, and thus Stan, cannot be used. Algorithms like slice sampling can work in this situation, so JAGS would be an appropriate tool. I also mentioned that JAGS can be fast for simple models. If your Bayesian model exhibits (semi-)conjugacy, JAGS will probably be more efficient as Gibbs sampling can be used.

If your Bayesian model does not exhibit any conjugacy, Stan will probably be a better option. There are also other scenarios where Stan will probably be better than JAGS. The first is when the model is complex to write down; if your model is complex enough to warrant user-defined functions, I’d use Stan.

Stan also has other functionally that JAGS does not. For example, Stan has an extensive math library. This allows us to solve algebraic equations and differential equations. You can use Stan to solve these types of equations as standalone problems. If we have observed some data about a physical system described by differential equations, you can use Stan’s differential equation solvers in a Bayesian framework to conduct uncertainty quantification about the fitted parameters of a differential equation and propagate posterior beliefs into predictions. Pretty cool, right?

One sampling scheme to rule them all?

I was taught JAGS as an undergraduate student but taught myself Stan as a postgraduate researcher. I think for that reason, it’s not entirely fair to compare my learning experiences, but I did find self-teaching Stan to be harder than learning JAGS from a professor. Prior to learning Stan, I had never worked with a statically typed language which definitely took some getting used to. However, I was keen to learn Stan because my complex models were taking a long time to run in JAGS. I found that, after a lot of teething problems (mostly forgetting to end lines with ;), that my Stan implementations of models were much faster than the JAGS equivalent.

So, which sampling algorithm is best? As with everything in statistics, it depends. Jorgen Bolstad’s blog post compared the efficiency of JAGS and Stan for a handful of different Bayesian models. The broad summary is, from an efficiency perspective models with conjugacy are better suited to JAGS, whereas non-conjugate models are better suited to Stan. I think from a programming and user-experience perspective, Stan really wins for complex models.

From a programming perspective, after getting over the learning curve, Stan is a better environment for developing Bayesian models. For me, this is because Stan has the flexibility for user-defined functions (which can allow us to specify bespoke distributions as well!).

I can’t tell you what’s going to be best for your particular circumstances, but as a general rule I’d say for simpler models, JAGS is probably better and for complex models, Stan is probably better.

If you do think Stan is the right tool for you, then why not consider attending one of our Stan courses? Our courses are a great hands-on and interactive way of getting up-and-running and fitting models with Stan!

Should I learn Stan?

A little bit about you

Why use Bayesian modelling software?

Differences at a glance

Differences in user experience

Running your models

Writing code

Getting help

Differences under the hood

Unpacking the technical stuff

One sampling scheme to rule them all?

One sampling scheme to rule them all?

Recent Posts

Top Tags

Authors