What is S7? A New OOP System for R
This blog post aims to give a brief introduction to S7, a new R package for OOP in R. It’s not a tutorial on how to write code using S7 - the documentation provides great instructions for getting started if you’re already ready to start programming in S7.
Note: This blog post has been updated to reflect the change in name from R7 to S7.
What is OOP?
Before we talk about S7, we should probably talk about OOP. OOP (short for Object Oriented Programming) is a programming framework that focuses on objects and their interactions, rather than on the evaluation of functions (as in a functional programming framework). If you’re an R user, you’ve almost certainly used OOP approaches even if you haven’t realised it yet. For example, if you call print()
on a vector the output it returns is very different to the output it returns if you call print()
on a plot.
OOP in R
In typical object-oriented systems, each object is of a particular class (type) and has data and methods (object-specific functions) associated with it. The behaviour observed when a method is called depends on the class of the object that the method is associated with. There are multiple OOP systems that already exist in R, including:
S3: the simplest and most commonly used object-oriented system, where the
class
attribute defines the type of an object. S3 is widely used throughout base R so it’s important to know about it if you want to extend functions to work with different inputs. Its name comes from version 3 of the S language!S4: similar to S3 but includes more formal class definitions and validation. In S4, the data contained in an object is defined by the slots in the class definition. S4 is a bit more complicated than S3 but results in better guarantees. S4 isn’t quite as widely used as S3, though the Bioconductor community is a long-term user of S4, so it’s important to know if you want to contribute to Bioconductor packages. The {lme4} package and some spatial packages (including {sp}, {rgdal}, and {rgeos}), also make use of S4 classes.
Reference Classes (RC): a special type of S4 that also allows objects to be modified in place. Reference classes have very low adoption within the R community, and are not widely used.
R6: similar to RC but simpler to use, and which uses S3 instead of S4. Unlike the previous OOP systems mentioned, R6 is a package rather than part of base R. It’s primarily been developed by Posit (formerly RStudio) and is used within the Shiny package.
This blog post doesn’t aim to go into the details about object-oriented systems in R, and I’d recommend reading the Object Oriented chapters of Advanced R for more details.
So if we already have all these OOP systems in R, why do we need another one? You can watch Hadley Wickham’s talk from rstudio::conf(2022) for some more background information on the motivation for developing S7.
Image: xkcd.com/927
What is S7?
The two main OOP systems in R, S3 and S4, both have their advantages and their limitations. For example, in S3 there’s no systematic object validation to make sure an object’s class is correct. In S4, the syntax for defining classes is rather unusual and relies on side effects. Issues such as these mean that, unlike other programming languages, there isn’t a dominant approach to OOP in R.
Now imagine you could take the best bits of S3 and the best bits of S4. That’s where S7 comes in. The S7 package is a new OOP system designed to be a successor to S3 and S4. Unlike S3 and S4 (which were developed for S), S7 is specifically developed for R. S7 is currently being developed by The R Consortium Working Group on OOP. The long-term goal is to merge S7 into base R.
You can install the development version of S7 from GitHub:
remotes::install_github("rconsortium/OOP-WG")
library("S7")
Defining a class in S7
S7 classes are defined formally, and the definition includes a list of properties and a (optional) validator. You can use the (intuitively named) new_class()
function to define a new S7 class. For example, if we want to define a simple S7 class with two properties about breakfast cereals (their name
and year_of_launch
) we can use the following code:
cereal = new_class(name = "cereal",
properties = list(
name = class_character,
year_of_launch = class_numeric
)
)
It’s not a coincidence that we’ve assigned the new class to an object with the same name as the class. It’s how we construct new instances of the cereal
class. For example, to construct an instance of the cereal
class, you call cereal()
, and pass in the values of the properties as arguments:
coco_pops = cereal(name = "Coco Pops", year_of_launch = 1957)
After you’ve created an S7 object, you can use @
to access and set properties. For example, Coco Pops were actually released in 1958, so you could update and correct the value using:
coco_pops@year_of_launch = 1958
Alternatively, using prop(coco_pops, "year_of_launch") = 1958
does the same thing.
One of the things I really like about S7 is that the type of the property is automatically validated. When I defined cereal()
earlier, I specified that name
must be a character. If I was to pass in a numeric value when creating a new instance, it would return an error. You can also include a validator
argument to new_class()
to provide more complex checks on inputs.
It will also return an error if you try to assign a value to a property that hasn’t been defined. For example, coco_pops@manufacturer <- "Kellogs"
returns an error because manufacturer
isn’t in the list of properties defined in cereal()
.
If you want a property to be dynamic i.e., if you want to compute the property when it’s accessed then the new_property()
function is worth exploring. For example, if you wanted to return the current system time every time you called coco_pops@time
, you could use new_property()
in the class definition:
cereal = new_class(name = "cereal",
properties = list(
name = class_character,
year_of_launch = class_numeric,
time = new_property(getter = function(self) Sys.time())
)
)
To me, this already feels a lot more intuitive compared to some of the other OOP systems in R. For more information on dynamic properties, validation, generics, and methods, read the vignette on S7 basics by viewing the documentation on the package website.
What’s Different in S7?
Since S7 is designed to be the successor to S3 and S4, you might be wondering two things: (i) how isS7 different to S3?, and (ii) how is S7 different to S4?
S7 vs S3
The good news is that, since S7 is built on top of S3, S7 objects are S3 objects. However, there are a couple of differences between the two:
S3 objects have a
class
attribute. S7 objects also have anS7_class
attribute that contains the object that defines the class.S3 objects have attributes. S7 objects have properties (which are built on top of attributes). This means that you can still access properties using the
attr()
function. However, when working in S7 you generally shouldn’t use attributes directly - it just means that your old code will still work.
This means that most S7 will just work with S3. You can create S7 methods for S7 classes and S3 generics, and vice versa. You can also use S7 classes to extend S3 classes, and vice versa.
S7 vs S4
The aforementioned properties that S7 objects have are essentially equivalent to the slots that S4 objects have. The main difference between the two is that, in S7 objects, properties can be dynamic. As with S3, you can combine S7 methods with S4 generics, and vice versa. S4 classes can extend S3 classes (which extends to cover S7 classes). However, S7 classes cannot be used to extend S4 classes.
Should I switch to S7?
If you’re already using S3, switching to S7 should be fairly seamless. You can keep doing everything you’re already doing, plus you get some extra functionality for free.
As I mentioned above, S7 classes cannot be used to extend S4 classes so if you’re an existing user of S4 and have a large codebase built primarily in S4 that you wish to continue to extend - switching to S7 might take a little bit more work. However, if you’re unlikely to want to extend existing S4 classes, the change to S7 should also be relatively smooth. S7 also aims to fix some of the problems with the {methods} package which implements S4, including performance and complexity issues, which is perhaps another reason to give it a go.
If you’re at the point where you think you might need a bit more control than you can achieve with S3, I’d recommend trying S7 before S4. At least from my experience, S7 felt more intuitive and easier to learn than S4.
Note that since R6 is built on encapsulated objects, rather than generic functions like S3 and S4, it’s a very different type of Object Oriented system from S7. So if you’re primarily an R6 (or Reference Classes) user, S7 isn’t going to be a replacement for your existing approaches.
We’re excited to see the developments in S7 over the next few months, and we’ll soon be updating the material in our Object Oriented Programming in R training course to cover S7!