
Publishing AquaMaps Native Habitat Data and Metadata as Exportable NetCDF Files
Communities interested in Niche Modeling require structured data on species distributions. Distributions are often published as text, image files, or vector data not suitable for the kind of processing scientists usually perform, while a portable self-describing format such as the NetCDF enables faster access to data and metadata, as well the creation of faster ecological models. For this reason, BlueBRIDGE decided to reduce the inertia communities encountered when dealing with geospatial data for ecological modeling purposes, significantly lowering the data preparation time.
In particular, the goal of this activity was to create a collection of AquaMaps Native Range layers in a NetCDF format, while defining a standard general procedure to create such repositories of raster maps starting from other representation formats. It is worth pointing out that it is not a mere conversion, but implies data value enhancement.
The BlueBRIDGE best practice The workflow that has been implemented to achieve the above mentioned objectives can be summarized in 4 general steps:
Before BlueBRIDGE, the species distribution information was mainly published in a text and image format, forcing the communities to pre-process and prepare the data before being able to actually use them for their purposes. Moreover, proprietary and textual formats are not easily portable and do not allow for metadata embedding. After BlueBRIDGE, much of this data will be available in a more portable format and the workflow to produce it can be reused for other sets of data. A result of this activity has also been a process to convert a generic CSV file into a NetCDF file, made available as a service accessible from DataMiner. While a manual conversion of this kind would require deep knowledge of the NetCDF format itself, this algorithm is easy to use and masks all the complexity of such translation.
Many stakeholders can benefit from the outputs of this workflow as well as in the workflow itself, which can be easily generalized. Specifically, all the communities involved in BlueBRIDGE, the AquaMaps and the Fishbase communities, and the research groups operating in Ecological Niche Modeling and Fisheries in general. In fact, millions of people in these communities access and download species distribution data every month, for taxonomic, ecological, and fisheries research. Having the information they are searching for in a standard portable format such as the NetCDF allows them to skip the data pre-processing phase and obviously enhance their productivity. |
Why this is considered a best practice
Best Practice Analysis |
|
Validation |
The procedure has been validated internally at CNR, through a pairwise automatic comparison of the newly-generated raster data and the original vector data. |
Innovation |
This activity lead to the creation of a more reusable and shareable species distribution information catalogue. |
Success Factors |
For this best practice to be usable, the species distribution information needs to be open to a certain community (at the very least), through geospatial services such as GeoServer or MapServer |
Sustainability |
The sustainability of the workflow’s output depends on the availability of a services publication tailored to raster data management, such as Thredds. The infrastructure providing the data publication facilities, and specifically its availability and its reliability, have a direct impact on the process sustainability. |
Replicability and/or up-scaling |
The whole procedure can be easily reproduced and the scalability depends on many factors:
|
Lessons Learnt
The conversion of a certain probability distribution into a raster dataset, as well as being pretty challenging and requiring a profound knowledge of the data formats, is also demanding in terms of computational power, thus making a scalable architecture an almost mandatory choice. The validation process is also fundamental and should be carried out using an automatic procedure. As well as defining a very straightforward portable services workflow to accomplish a polygon-to-raster layer conversion, provides the communities with a wide collection of layers. The NetCDF is much more reusable and portable than the other formats already supported by WFS and is naturally suited to represent raster information (like uniformly spaced species distributions). In fact, on top of being self-describing and designed to represent n-dimensional data with n >= 2, it is also widely used as a standard and there are plenty of tools to visualize and manipulate this format. Moreover, additional information on the data can be included in the file itself as attributes, creating more complex objects without the need for any external references to be fully understandable, and thus reusable and portable.
This activity facilitates the usability of species distribution maps and lays the foundations for collaboration with the AquaMaps and Fishbase communities, as well as any other group interested in global habitat distributions. It also simplifies use of the information and the metadata recovery process, as the metadata is directly attached to the data.
Useful References
Scarponi, P., Coro, G., Pagano, P. (2018). A Collection of AquaMaps Native Layers in NetCDF Format. Data in Brief, Ed. Elsevier. Vol. 17, 292-296, DOI 10.1016/j.dib.2018.01.026. Open Access at http://www.sciencedirect.com/science/article/pii/S2352340918300295
Links to the CSV-to-NetCDF converters, only the first one has been used for this activity, but the others may be used for similar tasks:
- CSV_TO_NETCDF_CONVERTER_XY: A process to convert a generic CSV file into a basic NetCDF one with 2 dimensions (latitude, longitude) http://bluebridge.d4science.org/group/rprototypinglab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.CSV_TO_NETCDF_CONVERTER_XY
- CSV_TO_NETCDF_CONVERTER_XYT: A process to convert a generic CSV file into a basic NetCDF one having 3 dimensions (latitude, longitude, time) https://bluebridge.d4science.org/group/rprototypinglab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.CSV_TO_NETCDF_CONVERTER_XYT
- CSV_TO_NETCDF_CONVERTER_XYZ: A process to convert a generic CSV file into a basic NetCDF one having 3 dimensions (latitude, longitude, altitdue/depth) https://bluebridge.d4science.org/group/rprototypinglab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.CSV_TO_NETCDF_CONVERTER_XYZ
- CSV_TO_NETCDF_CONVERTER_XYZT: A process to convert a generic CSV file into a basic NetCDF one having 4 dimensions (latitude, longitude, altitude/depth, time) https://bluebridge.d4science.org/group/rprototypinglab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.CSV_TO_NETCDF_CONVERTER_XYZT
- CSV_TO_NETCDF_CONVERTER_DIMCHAR: A process to convert a generic CSV file into a basic NetCDF one having a single string-type dimension https://bluebridge.d4science.org/group/rprototypinglab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.CSV_TO_NETCDF_CONVERTER_DIMCHAR
Link to the metadata publication process: