
Making science reproducible
Most users of environmental datasets are trying to do reproducible and accountable science, but different post-processing workarounds and tools can lead to published results which are not repeatable or comparable.
The BlueBRIDGE best practice The European Commission Joint Research Centre (JRC) delivers and maintains the Digital Observatory for Protected Areas (DOPA[1]). The DOPA is a set of web services and applications primarily used to assess, monitor, report and possibly forecast the state of and the pressure on protected areas at multiple scales. The data, indicators, maps and tools provided by DOPA are relevant, for example, to support spatial planning, resource allocation, protected area development and management, and national and international reporting. Indeed, DOPA was acknowledged by the Convention on Biological Diversity (CBD) Secretariat as a reference tool for Country reporting. Maintaining DOPA requires management of large datasets with highly complex geometries, topological inconsistencies, multiple representations of the same geographical entities, for example coastlines, and licensing requirements in order to continuously update indicators in response to monthly changes in authoritative data. In order to compute and publish these arrays of indicators, JRC is using a range of open source tools (including GRASS, R, python, GDAL, PostGIS, geometry libraries for Hadoop, Geoserver, Geonode, Mapserver) coupled with some commercial software (such as ArcGIS Pro and the Google Earth Engine platform). To make all of this reproducible, JRC is trying to move the entire processing chain to open source tools and share it as a versioned resource. The latter is done with the help of BlueBRIDGE with whom JRC is collaborating. JRC and BlueBRIDGE have developed the PAIM Virtual Research Environment (https://bluebridge.d4science.org/web/protectedareaimpactmaps/) aimed at reporting which features are represented in protected area networks and other managed areas. In particular, the ongoing use case has been developed in the context of the Biodiversity and Protected Areas Management Programme (BIOPAMA Reference Information System), which aims to address threats to biodiversity in African, Caribbean and Pacific (ACP) countries, while improving socio-economic conditions of the local communities in and around protected areas. The DOPA can directly consume the outputs of the PAIM Virtual Research Environment to update statistics on ecologically important seafloor features represented in protected areas. |
Why this is considered a best practice
Best Practice Analysis |
|
Validation |
JRC is using the PAIM Virtual Research Environment to test the reproducibility of the workflow for the calculation of indicators. As documented in the paper “Processing Conservation Indicators with Open Source Tools: Lessons Learned from the Digital Observatory for Protected Areas”, authored by JRC, adopting BlueBRIDGE is facilitating reproducibility. |
Innovation |
The innovation relies in moving the entire processing chain to open source tools and share it as a versioned resource. |
Sustainability |
The practice has been adopted in the context of the Biodiversity and Protected Areas Management Programme (BIOPAMA Reference Information System), which aims to address threats to biodiversity in African, Caribbean and Pacific (ACP) countries, while improving socio-economic conditions of the local communities in and around protected areas. |
Replicability and/or up-scaling |
This practice is applicable in all the context where management of large datasets with highly complex geometries, topological inconsistencies, multiple representations of the same geographical entities is required. |
Lessons Learnt
Different post-processing workarounds and tools can lead to published results which are not repeatable or comparable. To work more effectively, the ideal process would be to share value-added data processed to an agreed standard and format. Since legal restrictions currently forbid this type of redistribution, the next best solution is to share the processing workflow, including code and environmental settings or parameters. The PAIM Virtual Research Environment provides access to a cloud based processing workflow that can be reproducible executed, using the same source data, to produce repeatable results.