Data collation for the implementation of a Regional Database

Regional data, statistics and information are key assets to support evidence based policies making, especially to develop and monitor regional fisheries management plans (FMP) such as the regional WECAFC FMP for the flying fish, the queen conch and the Caribbean spiny lobster. The challenges in regional databases are the national sources of fisheries data and statistics. Collection of data, processing of statistics and information are carried out by national institutions with national focus on countries’ communities like small scale fishers to develop food security policies or larger scale, industrial fisheries to develop the economy while exploiting the fish resource in a sustainable way. It is difficult to build a system that informs where species live, what fisheries target them, and how much effort is deployed to catch the fish. These data are essential for management, e.g. to guarantee the sustainability of marine products. Data can be collected by different national institutions with different objectives, using different formats and references, and a regional integration effort that can cope with different data flows and formats is needed.

 

 

The BlueBRIDGE best practice

 

BlueBRIDGE has developed the RDB features in the WECAFC-FIRMS VRE to address the need for support for a regional (i.e. across selected countries) database for selected fisheries data and models (most fish stocks of high values are shared between countries across a region, especially in the Western Central Atlantic Fisheries Commission (WECAFC) area).

 

The RDB approach implements data harmonization, storage, visualization and analysis of the regional fisheries data, a process that is guided by data standards for fisheries (CWP), storage and harmonization (SDMX and OGC) where it concerns the standardization of reference data on species, areas, countries, gears and other classification, and for visualization and mapping (OGC).

 

Given the often sensitive nature of the data, careful attention to data policies and data confidentiality is needed, and contributing parties need a stage to endorse any information they share. 

 

These are the approaches adopted in the implementation of the VRE:

  • (meta)data are assigned a globally unique and eternally persistent identifier. All data are assigned a UID on entry to the system. However, over the lifetime of the data a UUID is not always relevant, and so a persistent identifier is only applied for published data. In addition, the harmonization process relies on public codelists and reference data (e.g. from FAO of the UN, Area and country codes, ASFIS and WoRMS species codes) all under their own UUIDs, thus implementing this FAIR principle.
  • data are described with rich metadata; the RDB VRE is designed to merge data from countries, and to publish these as SDMX, a format focused on data descriptions in the statistical domain. At each step in this, WF metadata is collected; from the registration of new data (Provenance), its Harmonization (Process), the terms of use (License and validity), and publication (Citation and access points).
  • (meta)data are registered or indexed in a searchable resource; all data are stored in a specialized instance of the Tabular Manager service and published in a SDMX registry (Fusion).
  • metadata specify the data identifier.
  • (meta)data are retrievable by their identifier using a standardized communications protocol. The reference data are published in a SDMX registry, and potentially in a CKAN registry. This also contains a dataset identifier.
  • (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. The RDB data are published in a SDMX fusion registry that enables interoperability between SDMX capable systems.  Derived products such as model output cannot be shared, and no language for knowledge representation can be applied. 
  • (meta)data include qualified references to other (meta)data. The RDB datasets contain reference links to identify both content (reference data) and context (the business metadata).
  • meta(data) have a plurality of accurate and relevant attributes. All data passing through this VRE are enriched with relevant metadata (automatically), and the RDB aims at replacing local classifications with global ones to increase global relevance. 
  • (meta)data are released with a clear and accessible data usage license. The terms of use of BlueBRIDGE are clear on the license, and this is added to most datasets. Users in the VRE are not always authorized to share confidential country data, and a private license can be applied.
  • (meta)data are associated with their provenance. BlueBRIDGE ensures that the RDB Data are well described, and contain links and reference to data contributors name.
  • (meta)data meet domain-relevant community standards. The development of the VRE was driven by community (Including the FAO CWP on statistical reference data) to ensure the pervasive and correct use of community standards. The mapping between local and regional classification is key in this VRE.

 

The primary group of stakeholders for this VRE are data managers in the regional fisheries domain and fisheries policy makers. They need a toolset to collate and analyse their data, and if possible report and visualize on the data. The secondary group of stakeholders are national data managers that will benefit from the standardization and harmonization process to identify gaps/improve quality of their collected data to match regional needs; fisheries management and stock assessments officers/experts will benefit from the regional data, statistics and information.

  •  (meta)data meet domain-relevant community standards. The development of the VRE was driven by community (Including the FAO CWP on statistical reference data) to ensure the pervasive and correct use of community standards. The mapping between local and regional classification is key in this VRE.

 

The primary group of stakeholders for this VRE are data managers in the regional fisheries domain and fisheries policy makers. They need a toolset to collate and analyse their data, and if possible report and visualize on the data. The secondary group of stakeholders are national data managers that will benefit from the standardization and harmonization process to identify gaps/improve quality of their collected data to match regional needs; fisheries management and stock assessments officers/experts will benefit from the regional data, statistics and information.

 

 

 

Why this is considered a best practice   

Best Practice Analysis

Validation

The need for a RDB VRE has been identified in different functional domains such as trade, tourism, agriculture, tourism to support regional policy making. The need in the WECAFC region is also well identified for fisheries as certain species require already similar collation processes (tuna). The need to extend to all species has been validated through different workshops in the WECAFC region. Once the actual services are delivered by development teams, they are first tested and validated by VRE Data managers using a FURPS approach (Functionality – Usability – Reliability – Performance – Security) with data requested from national data managers through different channels. The VRE level tests are done after the software test, and thus only capture the VRE relevant comments. After passing this test, validation of this VRE is mostly done in workshops with real data users (such as the coming first meeting of the WECAFC Fisheries Data and Statistics Working Group planned for early 2018) and training events (E.g. WECAFC training events).  

Innovation

At this stage, the collation of regional datasets can be linked to new methods for stock data analysis across organizations; this can drive innovation in the assessment of marine resources. At technology level, the availability of RDB services that are quite generic for data collation will enable the development a stable data platform for fisheries data collation and analysis.  

Success Factors

The RDB has to prove at the level of data collation that it can offer a flexible and cost-effective way to merge datasets, and publish these in a managed repository. The RDB meets the conditions for the institutional quality requirements to share data, but requires that a regional organization accepts the responsibilities. The RDB VRE facilitates this by offering a well-connected governance team and model.

Similar initiatives can learn from RDB how to establish the community, identify the shared need, and how to work towards a sustainable business model that meets the needs of statistical data reporting.     

Sustainability

The RDB VRE is in essence a tool to provide sustainable access to national and regional datasets. If data losses occur at national level, data are still available at regional level.

The key is to provide sustainable access to the VRE. This is easiest done when there are clear layers of responsibility between infra providers, infra services developers, and communities. A governance model that can develop matches between communities and services providers is essential.

SLA or MoU should be discussed at regional level with the WECAFC secretariat and the new Fisheries Data and Statistics Working Group, most likely to act as a steering committee to the RDB. Once identified, maintenance cost could be shared across the regional and sub-regional organizations that will benefit from the RDB such as the Caribbean Regional Fisheries Mechanism (CRFM), OSPESCA and WECAFC.

Replicability and/or up-scaling

The RDB implements a complete data-driven workflow for the collation and publication of statistical datasets. Any organization in need of merging localized systems data output can benefit from the approach. The current RDB is agnostic of the data structure, relies on complete description of data structure and code lists (metadata) and thus can be applied to other domains with similar needs to aggregate and harmonize national datasets into regional databases. Regional Economic Commissions such as SADC (South African Development Community), EAC (East-African Commission) could benefit from this approach for agriculture, forestry, health, tourism etc., RDB focused on manual entry of larger datasets, and is thus best applied at data collation level.   

 

 

Lessons Learnt

Collaboration on key objectives is fundamental. Raising awareness on the key principles underlying the implementation of a RDB is underpins its success: definition of minimum data requirements to define standard information to be collated from countries, definition of regional classification and mapping with national ones to ensure harmonization of data.

 

Two profiles of VRE end-users that can benefit from the VRE can be identified: the national data manager that contributes to the regional data collation process and the fisheries expert that uses the collated data.

For the national data manager, the need to submit harmonized data has a direct impact on the quality of data (some data might be missing, might too fragmented; national data collection can be improved to match the regional data requirement – national classification can be reviewed to propose more stable lists especially on species list) and on the quantity of data (Effort is deployed by the national data manager to use/exploit/analyse data piled up in the country’s drawers). For the fisheries expert, access to the original fragmented and dispersed datasets is easier: lots of data can be available at a national level but on paper or in excel files stored in local computers. The RDB allows an overview of stocks across country boundaries, and to apply models on stocks shared across countries, something that is very time-consuming to prepare.

These users can now apply more methods, and discuss the results in a regional setting. This significantly helps regional discussions. 

 

However, there is a clear difficulty in establishing stable communities in developing countries that can provide useful data. Turn-over of trained staff is high in the WECAFC region, either because of competition with the private sector or because of national budget cuts (impact of oil price drop on Trinidad and Tobago, Venezuela incomes). There is also resistance to replacement of standing practices that are difficult to change, such as data formats, as this may require changes in national legislation, which takes time. Since RDB maps from existing formats to a global / regional one, it should be easier for organizations to use this system.