Abstract
Hands-on reproducible computing with Guix
Effective computational research requires combination of programs, data and a runtime environment to produce new data to be analyzed and get results eventually published. Trust and value in open science rely on the ability to reproduce results published in the past literature, and to replicate or generalize the results of a study with new data in the future. Yet, a significant portion of studies cannot be reproduced, even by their very same authors (1).
Computational environments constitute a large source of analytical variability because they are insufficiently captured. With the large adoption of open source, software relies on an ever-growing list of dependencies, sometimes spanning multiple languages. Besides its complexity, the resulting dependency graph is under a constant flux of changes, with new releases, bug-fix and security updates. Such a complex and volatile graph cannot be captured by a simple README file (2).
Specialized tooling is required to efficiently formalize, build and deploy reproducible computational environments. Guix is a package manager, running on GNU/Linux, designed with such reproducibility in mind (3). It can build and execute binaries independently from the host system, perform builds from source at any point in time, and deploy programs and environments to various targets, including containers.
This workshop provides a guided tour of Guix, focused on introducing the key concepts and commands of this tool, and getting hands-on experience with its practice. First, you will be introduced to the Guix package management commands, create a reproducible environment with guix shell
, and deploy it with guix time-machine
. Then, we will walk you through the process of Guixifying a representative scientific pipeline from the neuroimaging community (4). The session will end with a discussion on Guix adoption in HPC, some of its advanced features, and available support channels.