Computational reproducibility: safeguarding the backbone of science

From The Embassy of Good Science

Computational reproducibility: safeguarding the backbone of science

What is this about?

Many phases of research rely on computers: data acquisition, visualization, data analysis, and so on. It is near impossible to imagine doing research without the use of a computer. Using a computer means using software, which relies on source code. This computer language might have a black-box-character for some, whereas others might be perfectly able to modify it. Integrity in this area concerns reproducibility, the ability to share, and storage.

Why is this important?

Computational reproducibility means describing and sharing your software tools and data to enable evaluation and use by others .[1] Researchers’ activities regarding data have been extensively explored, but a thorough investigation of how they use, share and value software has been lacking until recently.[2] Using a survey,[3] Alnoamany and Borghi (2018) pioneered this field to find implications for improving research reproducibility. Ultimately, effective software management and reproducibility could save time and money, increase transparency and advance science.

The survey revealed that software-related practices vary widely and often do not fully commit to the challenge of reproducibility. One particular finding demonstrates this nicely. That is, researchers do save their software (often for long periods), but often they do not actively maintain or preserve it. Preserving software is the active process to ensure that it is reusable in the future, being compatible with different (or newer) overarching programs and hardware that make us of them. On the other hand, saving software only refers to making it findable and accessible for others.

The full range of programming language and software is extensive and therefore offering direct support might not be feasible. Instead, more general frameworks or guidelines could sustain software reproducibility .[2]

  1. Stodden V, Guo P, Ma Z. (2013). Toward reproducible computational research: an empirical analysis of data and code policy adoption by journals. PLOS ONE, 8(6):e67111. doi: 10.1371/journal.pone.0067111.
  2. 2.0 2.1 Alnoamany, Y., & Borghi, J. A. (2018). Towards computational reproducibility: researcher perspectives on the use and sharing of software. PeerJ Computer Science, 4. doi: 10.7717/peerj-cs.163
  3. Alnoamany, Y., & Borghi, J. A. (2018). Questionnaire ‘Understanding researcher needs and values related to software questionnaire and consent form’, available at: https://peerj.com/articles/cs-163/#supplemental-information

For whom is this important?

What are the best practices?

A number of frameworks exist that can be used to advance sharing, (re)using and valuing software. A guideline originally created for data management, the FAIR principles (Findable, Accessible, Interoperable and Reusable), can similarly provide an infrastructure for software reproducibility.[1] To specify, interoperability means the ability of non-collaborating researchers to integrate and work with each other’s resources with minimal effort. A recent collaboration between the Netherlands eScience Center and DANS (Data Archiving and Networked Services), launched a website with a step-by-step route to create FAIR software: https://fair-software.nl/.

The TOP (Transparency and Openness Promotion) guidelines seek to establish a new shared standard of openness and citation, applying to both data and software.[2] In summary, the TOP guidelines consist of eight principles (citation standards, code -and material transparency amongst others) and provide ‘levels’ that reflect how strictly might be adopted by journals. Of course, this boils down to the efforts by the researchers. The Reproducibility Enhancement Principles (REP), part of TOP, addresses software specifically. For one, they highlight that software needs not ‘merely’ be shared, but also the workflow and details regarding the computational environment should be communicated. The guidelines are available at: https://cos.io/top/. In their discussion, Alnoamany and Borghi (2018) add that education should give researchers a basic understanding of software, to later guide them in this process. [3](p18)

Lastly, mention-worthy is the Software Preservation Network (SPN), although not purely specific for research software. They seek to ensure long-term access to software .[4] Their five core activities are law & policy, training & education, metadata & standards, technological infrastructure and research-in-Practice. Furthermore, they have a number of running projects and a database of resources regarding the theme, all available at their website: https://www.softwarepreservationnetwork.org/.  

  1. Wilkinson MD, et al. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3:Article 160018. doi: 10.1038/sdata.2016.18.
  2. Nosek B.A. et al. (2015). Promoting an open research culture. Science, 348(6242):1422-1425 doi: 10.1126/science.aab2374.
  3. Alnoamany, Y., & Borghi, J. A. (2018). Towards computational reproducibility: researcher perspectives on the use and sharing of software. PeerJ Computer Science, 4. doi: 10.7717/peerj-cs.163
  4. Meyerson J. et al. (2017). The software preservation network (SPN): a community effort to ensure long term access to digital cultural heritage. D-Lib Magazine, 23(5/6) doi: 10.1045/may2017-meyerson.

Other information

Who
Virtues & Values
Good Practices & Misconduct