Abstract

Hands-on reproducible computing with Guix

Effective computational research requires combination of programs, data and a runtime environment to produce new data to be analyzed and get results eventually published. Trust and value in open science rely on the ability to reproduce results published in the past literature, and to replicate or generalize the results of a study with new data in the future. Yet, a significant portion of studies cannot be reproduced, even by their very same authors (1).

Computational environments constitute a large source of analytical variability because they are insufficiently captured. With the large adoption of open source, software relies on an ever-growing list of dependencies, sometimes spanning multiple languages. Besides its complexity, the resulting dependency graph is under a constant flux of changes, with new releases, bug-fix and security updates. Such a complex and volatile graph cannot be captured by a simple README file (2).

Specialized tooling is required to efficiently formalize, build and deploy reproducible computational environments. Guix is a package manager, running on GNU/Linux, designed with such reproducibility in mind (3). It can build and execute binaries independently from the host system, perform builds from source at any point in time, and deploy programs and environments to various targets, including containers.

This workshop provides a guided tour of Guix, focused on introducing the key concepts and commands of this tool, and getting hands-on experience with its practice. First, you will be introduced to the Guix package management commands, create a reproducible environment with guix shell, and deploy it with guix time-machine. Then, we will walk you through the process of Guixifying a representative scientific pipeline from the neuroimaging community (4). The session will end with a discussion on Guix adoption in HPC, some of its advanced features, and available support channels.

References

1.
Baker M. 1,500 scientists lift the lid on reproducibility. Nature [Internet]. 2016 May 26 [cited 2025 Jul 10];533(7604):452–4. Available from: https://www.nature.com/articles/533452a
2.
Vallet N, Michonneau D, Tournier S. Toward practical transparent verifiable and long-term reproducible research using Guix. Sci Data [Internet]. 2022 Oct 4 [cited 2024 Sep 9];9(1):597. Available from: https://www.nature.com/articles/s41597-022-01720-9
3.
Courtès L, Wurmus R. Reproducible and User-Controlled Software Environments in HPC with Guix. In: 2nd International Workshop on Reproducibility in Parallel Computing (RepPar) [Internet]. Vienne, Austria; 2015. Available from: https://inria.hal.science/hal-01161771
4.
Botvinik-Nezer R, Holzmeister F, Camerer CF, Dreber A, Huber J, Johannesson M, et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature [Internet]. 2020 Jun 4 [cited 2025 Jul 10];582(7810):84–8. Available from: https://www.nature.com/articles/s41586-020-2314-9