Skip to main content
AI in Production 2026 is now open for talk proposals.
Share insights that help teams build, scale, and maintain stronger AI systems.
items
Menu
  • About
    • Overview 
    • Join Us  
    • Community 
    • Contact 
  • Training
    • Overview 
    • Course Catalogue 
    • Public Courses 
  • Posit
    • Overview 
    • License Resale 
    • Managed Services 
    • Health Check 
  • Data Science
    • Overview 
    • Visualisation & Dashboards 
    • Open-source Data Science 
    • Data Science as a Service 
    • Gallery 
  • Engineering
    • Overview 
    • Cloud Solutions 
    • Enterprise Applications 
  • Our Work
    • Blog 
    • Case Studies 
    • R Package Validation 
    • diffify  

Creating a Reproducible Example

Authors: Colin Gillespie & Jack Walton

Published: May 31, 2022

tags: r, python, reprex, reticulate, docker

Maintaining training materials

Over the last few years, we increased both the number and types of training courses we offer. In addition to our usual R courses in {dplyr} and {shiny}, we also offer training on Docker, Python, Stan, TensorFlow, and others.

As the number of courses we offer increased, so did the maintenance burden of our associated training materials (lecture notes, slides, exercises, and more). To ease this burden, and to assist in ensuring that our training materials build consistently, we developed an R package called {jrNotes2}. Amongst other things, this package ensures that all courses:

  • have identical “template files”: .gitlab-ci.yml, .gitignore, Makefiles, index.Rmd, …;
  • have the same directory structure, and
  • pass a set of quality-assurance checks.

To make a change to course content, a team member must push their suggestions to a branch on GitLab. This action launches a CI job, which runs a Docker container that performs a set of checks. The templated .gitlab-ci.yml file ensures that every course undergoes the same build process and quality-assurance checks. If the content passes these checks, and an eligible approver approves the changes, then the changes are merged into the main branch.

Cartoon showing arrows from Data scientist to GitLab to Docker container to Continuous Integration

This means course content in a main branch should never fail our checks. Well, not quite…

Why we can’t freeze all dependencies

When teaching a course, we want to teach with the exact same packages an attendee would get via an install.packages() or pip install command. This means we must always use the latest versions of packages available on CRAN and PyPI. However, always using the latest available packages has it dangers: a change to a package used by a course can suddenly cause our teaching materials to begin failing our build checks.

To try and pre-empt package changes breaking our training materials we use scheduled CI runs. That is, at regular intervals a CI job automatically runs our tests and checks against a course’s training materials. If a course’s materials fail these checks, we are notified via a message in a Slack channel. Around early January, we started getting notifications about our Introduction to Python course:

Screenshot of slack notification showing the failed pipeline, where failed job is notes-build.

Do you require help building a Shiny app? Would you like someone to take over the maintenance burden? If so, check out our Shiny and Dash services.

The problem

Unfortunately, the traceback given by the CI wasn’t the most enlightening:

segfault traceback screenshot

Strangely, the course materials

  • built successfully on Colin’s laptop;
  • failed to build on Jack’s laptop, and
  • failed to build on the CI runner.

As far as we could see, everything appeared roughly the same on all three systems: with all three running the same operating system, the same R version, and using the same package versions.

Whilst we could reproduce the error in a docker container, the error was difficult to debug as

  • the container used a large number of internal Jumping Rivers R packages;
  • the materials build process involved a set of non-trivial Rmd files, and
  • the error wasn’t encountered until around eight minutes into the build and test process.

In short, whilst we had a reproducible example of the error, it was only reproducible by a Jumping Rivers employee, and it was far from a minimal example.

Simplifying the problem

To make progress, we had to simplify the docker container. We asked ourselves the following questions:

  • Can we remove all unnecessary files, such as presentation slides? Yes.
  • Can we simplify the course notes? Yes: we were able to find a single Python code chunk that caused the issue.
  • Can we remove all of our custom Rmd styling? Yes: a simpler Rmd file with the same chunk gave the same error.
  • Can we reproduce the issue without R Markdown? Yes: a simple R script can reproduce the same error.
  • Does the Dockerfile need to be complex? No: we can remove most of the unnecessary Python, Debian and R related packages.

A minimal reproducible example

After all of our simplifications, we arrived at a minimal reproducible example with the Dockerfile:

FROM rocker/r-ver:latest
RUN apt update && apt install -y python3 python3-dev python3-venv
RUN install2.r --error reticulate
COPY test.R /root/

and associated R script:

reticulate::virtualenv_create(
  envname = "./venv",
  packages = "matplotlib"
)
reticulate::use_virtualenv("./venv")
reticulate::py_run_string("import matplotlib.pyplot as plt; plt.plot([1, 2, 3], [1, 2, 3])")

By simplifying the problem, we were now in a position to ask for help from others.

As this appeared to be a bug (it used to work, but now it doesn’t), we raised an issue against the {reticulate} repository.

A (partial) solution

Soon after posting we received a response from one of the {reticulate} developers. Their response revealed that matplotlib was nothing but an innocent bystander in our issue, and that the real culprits were the incompatible BLAS (Basic Linear Algebra Subprograms) libraries being used by R and numpy!

The suggested solution was to was compile the numpy package from source within Docker. However, compiling numpy at container runtime added around 3 minutes to the CI checks every time they ran. As such, we opted to build the numpy package from source at image build-time, effectively caching the package build, and avoiding re-compiling numpy every time our build tests ran against our training materials.

Although compiling numpy from source did fix our issue, it currently presents as more of a workaround than a long-term solution. Hopefully, a future change to the BLAS libraries used by the rocker image series or numpy, can allow the two to be friends again. Here’s to hoping!

Take-aways

  • Using scheduled CI jobs allowed us to catch this issue early, and gave us plenty of time to fix it before the next time the course ran.

  • Having a CI ensured we had an (internally) reproducible example, as the CI is based on a docker container.

  • In order to get help, it was crucial to simplify the problem.

  • Debugging is hard, and it’s okay to ask for help!

References

  • https://github.com/rstudio/reticulate/issues/1133

Jumping Rivers Logo

Recent Posts

  • Start 2026 Ahead of the Curve: Boost Your Career with Jumping Rivers Training 
  • Should I Use Figma Design for Dashboard Prototyping? 
  • Announcing AI in Production 2026: A New Conference for AI and ML Practitioners 
  • Elevate Your Skills and Boost Your Career – Free Jumping Rivers Webinar on 20th November! 
  • Get Involved in the Data Science Community at our Free Meetups 
  • Polars and Pandas - Working with the Data-Frame 
  • Highlights from Shiny in Production (2025) 
  • Elevate Your Data Skills with Jumping Rivers Training 
  • Creating a Python Package with Poetry for Beginners Part2 
  • What's new for Python in 2025? 

Top Tags

  • R (236) 
  • Rbloggers (182) 
  • Pybloggers (89) 
  • Python (89) 
  • Shiny (63) 
  • Events (26) 
  • Training (23) 
  • Machine Learning (22) 
  • Conferences (20) 
  • Tidyverse (17) 
  • Statistics (14) 
  • Packages (13) 

Authors

  • Amieroh Abrahams 
  • Tim Brock 
  • Aida Gjoka 
  • Theo Roe 
  • Colin Gillespie 
  • Gigi Kenneth 
  • Osheen MacOscar 
  • Sebastian Mellor 
  • Keith Newman 
  • Pedro Silva 
  • Shane Halloran 
  • Russ Hyde 
  • Myles Mitchell 

Keep Updated

Like data science? R? Python? Stan? Then you’ll love the Jumping Rivers newsletter. The perks of being part of the Jumping Rivers family are:

  • Be the first to know about our latest courses and conferences.
  • Get discounts on the latest courses.
  • Read news on the latest techniques with the Jumping Rivers blog.

We keep your data secure and will never share your details. By subscribing, you agree to our privacy policy.

Follow Us

  • GitHub
  • Bluesky
  • LinkedIn
  • YouTube
  • Eventbrite

Find Us

The Catalyst Newcastle Helix Newcastle, NE4 5TG
Get directions

Contact Us

  • hello@jumpingrivers.com
  • + 44(0) 191 432 4340

Newsletter

Sign up

Events

  • North East Data Scientists Meetup
  • Leeds Data Science Meetup
  • Shiny in Production
British Assessment Bureau, UKAS Certified logo for ISO 9001 - Quality management British Assessment Bureau, UKAS Certified logo for ISO 27001 - Information security management Cyber Essentials Certified Plus badge
  • Privacy Notice
  • |
  • Booking Terms

©2016 - present. Jumping Rivers Ltd