Hands-on reproducible computing with Guix

Julien Castelneau

Inria

Ghislain Vaillant

Inria

Before we start

This presentation is available at:

https://rt-santenum.inria.fr/guix-workshop

The source code repository is available at:

https://gitlab.inria.fr/rt-santenum/guix-workshop

We will leave time for questions at regular intervals.

Motivation

Reproducibility and replicability

“Reproducibility is the ability of independent investigators to draw the same conclusions from an experiment by following the documentation shared by the original investigators.” (1)

“Replicability is a poor substitute for reproducibility […] reproducibility requires changes; replicability avoids them.” (2)

Goals for reproducibility

  • Dealing with sources of variations in a given experiment
  • Sources of variation may be controlled by / induced for the observer
  • Goals for reproducibility (3):
    • Ensure control of sources of variation in an experiment
    • Decouple experimental results from the observer

Reproducibility crisis

“More than 70% of researchers have tried and failed to reproduce another scientist’s1 experiments, and more than half have failed to reproduce their own2 experiments.” (4)

Common sources of variation:

  • Different observer 1
  • Future time, additional data, etc. 2

Why does reproducibility matter?

  • Scientific findings are often inaccurate (5)
  • Non-reproducible science cost money (6)
  • Academic retractions are on the rise (7), of which experimental errors account for an estimated 25% (8)

Why is science difficult to reproduce?

  • Big picture reproducibility (9):
Reproducibility Target Addressed by
Empirical Collection Experimental protocol, open data
Statistical Analysis Study pre-registration
Computational Measurement Documentation, FAIR practices, tooling
  • With modern research relying more and more on software (10), the importance of computational reproducibility keeps rising.

Challenges for computational reproducibility

  • Availability of code, data and programs (archiving)
  • Sufficiently detailed instructions (documentation)
  • Capturing and deploying the computational environment (tooling)

The case for Guix

Transparent, verifiable and long-term reproducibility (11) with a single command:

$ guix time-machine -C channels.scm -- \
    shell -m manifest.scm -- \
    python run.py [...]

The rest of the workshop will run you through the specifics of this command.

Practice

Our case study

  • A subset of a larger neuroimaging processing pipeline (12)
  • Experiment can be replicated using the documentation
  • Project dependencies are well specified with Conda

Let’s attempt to guixify this experiment for long-term reproducibility!

Computational workflow

flowchart LR
  dat[(Structural dataset)] --> bfc[Bias field correction] --> reg[Registration]
  tpl[(Anatomical template)] --> reg --> bet[Brain Extraction]
  bet -- final image --> out[(Output data)]
  bfc -- estimated fieldmap --> out
  reg -- registration matrix --> out

Experimental setup

# in environment.yml

name: guix-workshop
channels:
  - https://fsl.fmrib.ox.ac.uk/fsldownloads/fslconda/public/
  - conda-forge
dependencies:
  - ants
  - click
  - fsl-avwutils
  - fsl-bet2
  - fsl-data_standard
  - fsl-fast4
  - fsl-flirt
  - nipype
$ conda activate guix-workshop
$ export FSLDIR=$CONDA_PREFIX
$ source $FSLDIR/etc/fslconf/fsl.sh
$ python3 run.py [...]
.
└── OASIS10400
    ├── sub-OASIS10400_ses-M00_desc-bfc_fmap.nii.gz
    ├── sub-OASIS10400_ses-M00_desc-bfc_T1w.nii.gz
    ├── sub-OASIS10400_ses-M00_desc-preproc_T1w.nii.gz
    ├── sub-OASIS10400_ses-M00_desc-reg_T1w.nii.gz
    └── sub-OASIS10400_ses-M00_desc-reg_xfm.mat

Running the experiment

  1. Process a single participant:
$ python3 run.py /path/to/dataset/dir -o /path/to/output/dir \
    -w /path/to/workspace/dir -t /path/to/template/dir -p OASIS10400
  1. Process participants concurrently:
$ python3 run.py [...] -j 8
  1. Run processing using ANTs for bias field correction:
$ python3 run.py [...] --bfc-use-ants

Going further

Using Guix without Guix

  • Use cases: HPC clusters, no sudo, no guix.
  • Introducing the guix pack command
  • Supports relocatable tarballs, Docker / Singularity / Apptainer, DEB / RPM packages, etc.
  • Guix as a container factory (13)
  • Guix as an environment modules factory (14)
  • Modern HPC workflow: tutorial and examples.

Available channels

Channel Description
guix Main Guix repository
guix-science Scientific software under an opensource license
guix-science-nonfree Scientific software under a proprietary license
guix-hpc HPC software under an opensource license
guix-hpc-non-free HPC software under a proprietary license
guix-cran Packages automatically built from CRAN
guix-bioc Packages automatically built from Bioconductor

…and many more listed on awesome-guix

Getting help

Contributing to Guix

  • Development now happens directly on Codeberg
  • Contributions are very much encouraged, subject to the project’s code of conduct.
  • Have a look the Contributing section of the manual for tips to get yourself started.

Areas of active research

  • Full-source bootstrap (15)

    Reduce the number and size of binary seeds required by the software supply chain.

  • Reproducible research papers (16)

    Provide transparent, verifiable and long-term means to reproduce research.

  • Long-time source code archival (17)

    Preserve and recover source code in a long-term archive by connecting Guix with Software Heritage.

Issues with long-term reproducibility

  • Time traps (18)
  • Software relying on expired resources
  • Software no longer building on modern hardware
  • Retention period of substitutes

Can the r13y crisis ever be solved?

  • Societal changes are required (19)
  • The research community has got bigger fish to fry (7)
  • Things are getting worse with enshittifcation and generative AI (20)

“There are some things that can’t be reproduced; for everything else, there’s Guix.”

Merci

  • Pierre-Antoine BOUTTIER (GRICAD)
  • Alice BRENON (ENS Lyon)
  • Yao CHI (Inria Rennes)
  • Boris CLENET (Inria Rennes)
  • Ludovic COURTÈS (Inria Bordeaux)
  • Romain GARBAGE (Inria Bordeaux)
  • Tanguy LE CARROUR (Paris Guix meetup)
  • Simon TOURNIER (Paris Cité University)

References

1.
Gundersen OE. The Fundamental Principles of Reproducibility. Phil Trans R Soc A [Internet]. 2021 May 17 [cited 2025 Aug 11];379(2197):20200210. Available from: http://arxiv.org/abs/2011.10098
2.
Drummond C. Replicability is not reproducibility: nor is it good science. In Collection / Collection : NRC Publications Archive / Archives des publications du CNRC; 2009. (Evaluation Methods for Machine Learning Workshop, the 26th ICML, June 14-18, 2009, Montreal, Canada).
3.
Tournier S. Reproducible computational environment, when? [Internet]. 2024 Nov 26. Available from: https://hpc.guix.info/static/doc/caf%C3%A9-guix/tournier-20241126.pdf
4.
Baker M. 1,500 scientists lift the lid on reproducibility. Nature [Internet]. 2016 May 26 [cited 2025 Jul 10];533(7604):452–4. Available from: https://www.nature.com/articles/533452a
5.
Ioannidis JPA. Why Most Published Research Findings Are False. PLoS Med [Internet]. 2005 Aug 30 [cited 2025 Aug 11];2(8):e124. Available from: https://dx.plos.org/10.1371/journal.pmed.0020124
6.
Freedman LP, Cockburn IM, Simcoe TS. The Economics of Reproducibility in Preclinical Research. PLoS Biol [Internet]. 2015 Jun 9 [cited 2025 Aug 11];13(6):e1002165. Available from: https://dx.plos.org/10.1371/journal.pbio.1002165
7.
Van Noorden R. More than 10,000 research papers were retracted in 2023 — a new record. Nature [Internet]. 2023 Dec 21 [cited 2025 Aug 11];624(7992):479–81. Available from: https://www.nature.com/articles/d41586-023-03974-8
8.
Fang FC, Steen RG, Casadevall A. Misconduct accounts for the majority of retracted scientific publications. Proc Natl Acad Sci USA [Internet]. 2012 Oct 16 [cited 2025 Aug 11];109(42):17028–33. Available from: https://pnas.org/doi/full/10.1073/pnas.1212247109
9.
Stodden V, Leisch F, Peng RD, editors. Implementing Reproducible Research [Internet]. 1st ed. Chapman and Hall/CRC; 2018 [cited 2025 Aug 11]. Available from: https://www.taylorfrancis.com/books/9781315362762
10.
Goble C. Better Software, Better Research. IEEE Internet Comput [Internet]. 2014 Sep [cited 2025 Sep 3];18(5):4–8. Available from: https://ieeexplore.ieee.org/document/6886129/
11.
Vallet N, Michonneau D, Tournier S. Toward practical transparent verifiable and long-term reproducible research using Guix. Sci Data [Internet]. 2022 Oct 4 [cited 2024 Sep 9];9(1):597. Available from: https://www.nature.com/articles/s41597-022-01720-9
12.
Wen J, Thibeau-Sutre E, Diaz-Melo M, Samper-González J, Routier A, Bottani S, et al. Convolutional neural networks for classification of Alzheimer’s disease: Overview and reproducible evaluation. Medical Image Analysis [Internet]. 2020 Jul [cited 2025 Jul 24];63:101694. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1361841520300591
13.
Tournier S. Guix: a factory for containers? [Internet]. 2024. Available from: https://simon.tournier.info/posts/2024-12-11-jres-guix-container-factory.html
14.
Courtès L. Back to the future: modules for Guix packages [Internet]. Guix-HPC. 2022. Available from: https://hpc.guix.info/blog/2022/05/back-to-the-future-modules-for-guix-packages/
15.
Nieuwenhuizen J, Courtès L. The Full-Source Bootstrap: Building from source all the way down [Internet]. Guix. 2023. Available from: https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down/
16.
Courtès L, Felšöci M, Hinsen K, Swartvagher P. A guide to reproducible research papers [Internet]. Guix-HPC. 2023. Available from: https://hpc.guix.info/blog/2023/06/a-guide-to-reproducible-research-papers/
17.
Courtès L, Sample T, Zacchiroli S, Tournier S. Source Code Archiving to the Rescue of Reproducible Deployment. In: Proceedings of the 2nd ACM Conference on Reproducibility and Replicability [Internet]. Rennes France: ACM; 2024 [cited 2025 Aug 11]. p. 36–45. Available from: https://dl.acm.org/doi/10.1145/3641525.3663622
18.
Courtès L. Adventures on the quest for long-term reproducible deployment [Internet]. Guix. 2024. Available from: https://guix.gnu.org/blog/2024/adventures-on-the-quest-for-long-term-reproducible-deployment/
19.
Cobey KD, Ebrahimzadeh S, Page MJ, Thibault RT, Nguyen PY, Abu-Dalfa F, et al. Biomedical researchers’ perspectives on the reproducibility of research. Lino De Oliveira C, editor. PLoS Biol [Internet]. 2024 Nov 5 [cited 2025 Aug 13];22(11):e3002870. Available from: https://dx.plos.org/10.1371/journal.pbio.3002870
20.
Timpka T. The “enshittification” of online information services obligates rigorous management of scientific journals. Journal of Science and Medicine in Sport [Internet]. 2024 Oct [cited 2025 Aug 13];27(10):665–6. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1440244024004936