SatRdays London 2023: Speakers

Published: March 7, 2023

tags: r, python

SatRdays London is fast approaching, and we are happy to announce our full lineup of speakers for the event! Read on for more info. If you want to join the fun, head over to the conference website to sign up!

Keynote Speakers

Julia Silge - Posit

Julia Silge is a data scientist and software engineer at Posit PBC (formerly RStudio) where she works on open source modeling and MLOps tools. She is an author, an international keynote speaker, and a real-world practitioner focusing on data analysis and machine learning. Julia loves text analysis, making beautiful charts, and communicating about technical topics with diverse audiences.

Oliver Hawkins - Financial Times

Oliver Hawkins works as an editorial data scientist for the visual and data journalism team at the Financial Times. He has previously worked as a statistical researcher and a data scientist for the House of Commons Library, and as a data journalist for the BBC. He is interested in statistics, machine learning and data visualisation.

Contributed talks

Botan Ağın and Michael Stevens - SamKnows

AutRmatic reporting: billions of internet measurements, hundreds of reports and one repository to rule them all

SamKnows has been pioneering internet performance measurements for over 14 years. The reason we exist is to provide a source of truth for how the internet is really performing. The data we collect can be used as a common language between government regulators, internet service providers, academics, and content providers to optimise and improve internet performance for everyone.

Day to day SamKnows uses R to handle a huge range of automated and self-serve workloads. Keeping track of each report’s recipients, delivery schedule, dependencies and deployment procedure can be tricky, especially in the nightmare scenario of suddenly needing to migrate all of your jobs to a new server or cloud environment.

In this presentation, we will talk about how we structure our regularly-scheduled reports as standardised entities within a monorepo. We will explain how this approach reduces the latency in setting up a report, makes it easier for new team members to contribute, and lets us uphold standards while retaining the flexibility to deliver work in diverse formats with a range of complexity levels and opportunities for manual intervention. We will go into detail on specific workflows that take the terabytes of data collected by SamKnows from cloud and on-premises data sources, process them into an R Markdown document, formatted spreadsheet, and raw CSV output, and distribute them through cloud file storage, FTP servers, email, Slack and more.

Vyara Apostolova and Laura Cole - National Audit Office

ScRutinising government spending

“The National Audit Office supports Parliament in holding government to account both via its Financial Audit and Value for Money work. The Analysis Hub is a central team that utilises a range of analytical techniques to support both strands of work. The proposed presentation will showcase two examples of how we in the Analysis Hub use R to support our mission to hold government to account.

We use R to reproduce complex models that departments employ to produce accounting estimates for their financial accounts. Our R reproductions allow us to assess if departments have implemented their selected methodology correctly and to highlight any model integrity issues. We also implement additional sensitivity testing, including via Monte Carlo simulations to capture the uncertainty around model outputs. The presentation will cover an overview of our approach and a demo of a reproduction of a dummy model.

We have also built a R-shiny app, Covid-19 Cost tracker, that brings together data from across the UK government on the costs of measures in response to the Covid-19 pandemic. It is one of the very few sources of comprehensive information on Covid-19 related spending and the only one as an interactive tool. With it the public can examine spending by department and category of spend as well as interact with bubble graphs to explore the costs of individual policies. The presentation will include an overview of how the data analytics team and audit team collaborated to produce the output and a demo of the app.”

Andrew Collier - Fathom Data

Dark Corners of the Tidyverse

“In the realm of the Tidyverse, there are functions which are always in the spotlight. These are the titans: well known and loved, frequently invoked and virtually indispensable. There are other, lesser-known functions which stand quietly in the shadows. Unacknowledged, somewhat obscure and almost forgotten. Waiting for their moment to shine.

I’ll talk about five of these Unsung Heroes of the Tidyverse, lauding their virtues and showing how they can help you succeed on your next Data Science quest.”

Jack Davison - Ricardo Energy & Environment

“Put it on a map!” – Developments in Air Quality Data Analysis

“An understanding of air quality is crucial as it can have significant public health, environmental and economic effects. However, air quality data is complex, constantly changing in space and time, and influenced by a myriad of factors such as meteorology and human activity. This makes air quality analysis challenging, and communicating the results of this analysis more challenging still!

Just over a decade ago, the {openair} package was authored to provide an open-source toolkit to help air quality practitioners get the most out of their data, and is still used widely in academia, consultancy and industry today. While {openair} itself has not changed hugely in recent years, much thought has been put into extending it through leveraging more recent tools and packages.

In this talk I will discuss how we have recently married {leaflet} and {openair} to create effective, interactive air quality maps. In particular, I’ll discuss the development of the {openairmaps} package – a toolset which makes it easy to create interactive “directional analysis” maps to help explore the geospatial context of pollution monitoring data.”

Russ Hyde - Jumping Rivers

Does code quality even matter in data science?

“It depends!
If you need to quickly summarise some data for an ad-hoc request, then knock out the code in whatever manner gets the job done.

But what happens when you start getting a lot of similar requests, or you are working on a more substantial project, or you are collaborating within a larger team? Now, productivity should be viewed ‘across the team’ and ‘across all projects’. What can you do to help yourself and your colleagues, and what tools exist to help?

Code quality concerns those aspects of software that make it easier to work with, easier to explain to others and easier to maintain or extend.

In this talk, I’ll take you through the source code for an evolving analysis project. We’ll discuss how to (and how not to) modularise code. Along the way, we’ll talk about actions and calculations, body-tweaking, duplicate stomping and a few tools that help automate the boring low-level stuff that teams sometimes disagree about.”

Ella Kaye and Heather Turner - University of Warwick

Sustainability and EDI (Equality, Diversity and Inclusion) in the R Project

The R Project is over 20 years old, but its future is not secure - many of the R Core Team are nearing retirement and there are not enough new contributors to sustain the work. We present a number of initiatives, organised under Heather Turner’s ‘Sustainability and EDI (Equality, Diversity and Inclusion) in the R Project’ fellowship, to encourage and train a new, more diverse, generation of contributors. These include R contributor office hours, collaboration campfires, bug BBQs, translatathons and an updated R development guide. This presentation is also a call to action to encourage others to get involved in supporting this language, a fundamental piece of software in many disciplines, used by an estimated 2 million people.