James Gammerman

James Gammerman

Data Scientist


About Me

I am a data scientist working at the British energy supplier Centrica (British Gas).
I have 4 years of professional experience in data science, and 8 years in data analysis more generally.

My main area of interest is machine learning, in which I have an MSc from Royal Holloway University of London (2018). I enjoy applying machine learning and data analysis techniques to business data. I have also recently started publishing research related to my commercial work in academic journals.

My programming language of choice is R but I am also comfortable with Python. I am currently gaining experience with cloud technologies.

Outside of my career my main interests are sports, technology and travelling.


  • Classical Machine Learning
  • Deep Learning
  • Natural Language Processing / Text Mining
  • Reliable Machine Learning (Conformal Prediction)
  • Visualisation
  • Data Analysis


  • MSc in Machine Learning, 2018

    Royal Holloway University of London

  • MSci in Chemistry, 2013

    Imperial College London



Data Scientist


Apr 2017 – Present London, UK
  • Key analytical responsibilities:

    • Applying machine learning and statistical modelling techniques to company data
    • Other stages of the data science life cycle (eg. data cleaning/preprocessing, exploratory data analysis, feature engineering, model deployment)
  • Training colleagues in machine learning and statistical programming

    • Instructor for several 3-day workshops across the UK
  • Managing collaboration between Centrica and Royal Holloway university, which has led to publication of research


Business Analyst


Aug 2013 – Apr 2017 London, UK
  • Gas & Power Marketing and Trading: Wide range of data analytics tasks/projects relating to European markets

Blog Posts

How (and why) to make a lollipop plot in R

Introduction In this post I’m going to reproduce an unusual chart I saw David Robinson make during one of his recent Tidy Tuesday screencasts looking at data from the US Bureau of Labour Statistics.

Predicting NFL stadium attendances with tidymodels

Introduction In this post I’m going to be trying out the new tidymodels framework in R. I’ve been reading through the corresponding book and it looks to me like a real game-changer for building robust statistical/ML models quickly in the R language.

Cocktail recipes analysis

Introduction For this first blog post I thought I’d take a recent #TidyTuesday dataset and do some analysis of it. Where better to start than one of my favourite things?

Talks and Teaching

Machine Learning: Progress & Prospects

A talk I gave on machine learning in Ukraine in 2018.

Business Analytics in R - An Introduction to Statistical Programming

This is a training course that I co-authored and have presented several times to company employees across the UK. The course introduces statistical and machine learning techniques and applies them in business contexts using the R programming language.


Research developed from my commercial work

Multi-Level Conformal Clustering: A Distribution-free Technique for Clustering and Anomaly Detection

This paper was developed from my MSc thesis and published in Neurocomputing (volume 397, 2020). It introduces a novel clustering technique which also incorporates anomaly detection.

Anomaly Detection based on Association Rules and Conformal Prediction

This poster abstract was published in the Proceedings of Machine Learning Research (volume 105, 2019), and introduces a novel machine learning technique for database cleaning.

Predictive Maintenance with Conformal and Probabilistic Prediction: A Commercial Case Study

My MSc thesis. Won award for best dissertation.