Publishing software in zenodo

A key aspect of an open source software project is to make the software discoverable, searchable and referenceable to the largest possible number of communities. To achieve this objective is essential to publish the software on multiple platforms/channels and provide a rich set of metadata. An additional challenge for the gCube system (the software powering the BlueBRIDGE VREs) is the high number of components (over 500) and the high number of releases (about one per month) that need to be published.

Before BlueBRIDGE the source code of gCube was only hosted in a Subversion repository internal to CNR and the binaries packages were only published on the gCube website. This resulted in a software that was difficult to discover and access.

The BlueBRIDGE best practice

 

In order to improve the gCube distribution process, it was chosen to publish the releases of gCube components in the Zenodo (https://zenodo.org/communities/gcube-system/) portal. Zenodo offers a rich set of metadata (e.g. authors, description, funding, license, keywords, relationships with other objects) that can be associated to each software component uploaded. This makes the software uploaded in Zenodo easily discoverable and searchable (also programmatically through REST and OAI-PMH interfaces). Zenodo also assigns a unique identifier (i.e. DOI) to each object uploaded. This solves the problem of identifying and referencing a particular software component (also support versioning). Finally, Zenodo offers storage capabilities to host the binary and source packages of gCube system.

In addition to Zenodo, the source code of gCube components is also hosted in GitHub (https://github.com/gcube-team/gcube-releases). GitHub is the biggest community for open source components and this allows the visibility of gCube software to be increased considerably. Furthermore, this integrates nicely with Zenodo, allowing each object uploaded in Zenodo to be lnked to the corresponding source code in GitHub.

Given the size of gCube software and the release frequency, all the publication steps (both in Zenodo and GitHub) have been automated.

 

This practice can be very beneficial for the following stakeholders:

  • Developers who need access to software to inspect the source code, fix and/or improve it, build new functionalities on top of it;
  • Researchers using the software to do their experiments, as in order to publish their worksand assure the repeatability of the experiments, they need a way to reference and cite the software used during the experiments;
  • Infrastructure Managers who need to access software packages and the documentation to install and maintain a gCube infrastructure.

 

Why this is considered a best practice    

Best Practice Analysis

Validation

N.A.  

Innovation

The innovation relies in the application of an open science best practice. 

Success Factors

The software development process in the organization must be mature enough to have a meaningful and coherent versioning system, include and maintain meaningful metadata in the software, set-up specific procedures for software integration, release and distribution.

Technologically, this practice relies heavily on the automated extraction of metadata information (e.g. authors, description, license) from the software source code. To make it work, the software must include this information in a semi-structured format coherent with all components of the software.

Sustainability

N.A.

Replicability and/or up-scaling

This practice has been used for gCube software, but can be easily applied to any other software. In fact, all the procedures, tools and data used do not make any assumption as to the type of software.

This practice has already been proven to work well for big software projects, since gCube is composed of over 500 components and about one release per month. However, thanks to the automation put in place, it is easy even to apply this practice to bigger projects without considerable extra-effort.

Lessons Learnt

The main challenge of applying this practice is to make sure that the data (source code, binaries and metadata) published is correct - also considering that once published on Zenodo, the information cannot be removed.

To overcome this challenge in gCube, a set of compliance rules have been defined and communicated to the developers. Components compliance to these rules is automatically checked during the integration process to assure the quality of components released.