
Many research communities working in biology and related fields are committed to building and preserving a wide collection of environmental and species distribution data. In order for these communities to be able to carry out their studies in a fast and efficient manner, the data needs to be well organized, meticulously described and possibly represented in a standard format that enables re-use. Extracting information from the data and applying some kind of data processing workflows while cutting down the data preparation and preprocessing time is key. To meet this requirement a suite of automated processes to convert standard CSV maps into NetCDF files was made available on the BlueBRIDGE e-Infrastructure, the potential of which has been already exploited for an intensive coversion task involving more than 20,000 Aquamaps generated layers.
The conversion algorithms are available at the following links:
4.CSV_TO_NETCDF_CONVERTER_XYZT
The converted maps are available in two different THREDDS repositories at the following addresses:
- http://thredds.d4science.org/thredds/catalog/public/netcdf/AquamapsNative/catalog.html
- http://thredds.d4science.org/thredds/catalog/public/netcdf/AquamapsNative2050/catalog.html
The maps have also been published in the Data Catalogue at:
Why NetCDF?
The Network Common Data Format (NetCDF) is a self-describing, machine-independent data format that is meant to represent and store array-oriented n-dimensional data, widely used by many communities and research institutions as a standard. Many tools and libraries written in a large variety of programming languages can visualize, manipulate, and process this format. Furthermore additional information about the data can be included in the file itself as attributes, creating more complex objects that do not need any external reference to be fully understandable, and thus reusable and portable.
About Data Miner
DataMiner is an open source computational system which is able to interoperate with the other services of the D4Science Research e-Infrastructure. It uses the Web Processing Service (WPS) standard to publish/describe the hosted processes and produces a provenance XML file for each experiment in the Prov-O ontological format. DataMiner also implements a Map-Reduce approach for Big Data processing and and saves inputs, outputs, and provenance information onto a collaborative experimentation space that supports the sharing of this information between different users.