Lake Champlain Research Consortium
DATA MANAGEMENT PRIORITIES
Based on email discussion with LCRC participants and community members regarding data management interests in the Lake Champlain Basin, the following priorities were identified during the fall of 2004. In many cases the priorities identified in 1999 were retained or updated to reflect current conditions and the final recommendations for 2004 are provided in no particular order;
1. Integrate LCRC into the larger data community as a primary and vital source of research information for the Lake Champlain Research Data.
The need to prevent information critical to ongoing research from being made public prior to publishing the results is understood. However, the research and public communities should be able to go to go to the LCRC website to know what research has been, is being and needs to be conducted in atmospherics, toxics, land use, cultural, social science, hydrodynamics & sediment, nutrients and lower food webs, middle food webs and exotics, fisheries, wildlife and biodiversity, and ecosystem health. They should be able to get access to tabular and spatial data for any research that has been published. They should know what research is currently being conducted in the basin, who is the researcher, what questions they are asking and data they are collecting, where and how it is being conducted. They should also see a list of additional research questions that are raised as a result of previous work. As part of the integration effort the following points should be considered.
2. Develop data format standards for project data that enables more effective interaction with researchers and data users.
An important data management research priority is the review of current project data tabulation and formatting practices used for research and monitoring in the Basin. This effort should also seek to recognize common data formatting capacities and opportunities for data handling that broaden the usefulness of data. One or more widely acceptable data formats should be recommended as the common standard for shared data.
Data that is to be shared must be readable. Of the several data formats presently in use in the Basin, some are readable by commonly used software and others require special data-processing software that is not widely available in the community. For example, not all water quality or aquatic biology data gathered in the Basin are rendered in EPA’s Water Quality Storage and Retrieval System (STORET) or Ocean Data
Evaluation System (ODES), even though those standards are designed to facilitate data sharing. New
York, Vermont and Quebec geographical location data are commonly obtained and reported in different coordinate reference systems, although more universal systems are available and coordinate transformations are no longer operationally difficult.
3. Create or adopt metadata standards for project data and require their use.
An important data management research priority is the review of current metadata practices used in the Basin, including the documentation protocols that have been developed by other cooperative research efforts and established by various government agencies. This effort should also seek to recognize the minimum essential metadata requirements and recommend protocols that facilitate data exchange and use among the research and monitoring community.
Information about project data is an essential part of any data set. Metadata (data about the research data) describes the research methods, instrumentation and standards that generated the project data and includes the quality assurance & quality control protocols that were applied. Metadata also includes more fundamental project parameters such as research design, location of sampling or measurement, identity of the investigator and how the data are archived and documented. This type of essential information establishes the usefulness of project data both for the original research and for subsequent projects that attempt to use the data.
4. Incorporate data management mechanisms that will best address the need for accession and retention of data by those involved in research and monitoring in the Lake Champlain Basin.
A primary data management research priority is to determine what mechanisms exist or should be developed to implement Basin-wide data management at the level that is desired by participants in research and monitoring. Several models exist in the Basin, such as the GIS protocols developed for the Lake
Champlain Basin Program by the Vermont Center for Geographic Information (VCGI) and the Vermont Monitoring Cooperative (VMC) at UVM
In determining the best mechanism for Lake Champlain data management, the following design parameters should be addressed:
· The need of participating funding agencies to bring data generated through their programs to the public should be accommodated where possible in the design.
5. Increase the focus of data protocol, infrastructure and technological development to more effectively support data sharing to the research and larger community when appropriate.
LCRC research activities represent part of a larger web of the natural and biological information network in this geographic area. In Vermont LCRC shares geographic coverage with Agencies and groups such as the Vermont Monitoring Cooperative, the Vermont Center for Geographic Information, the Lake Chaplain Basin Program, the Agency of Natural Resources, numerous watershed groups and non-profit environmental groups. The LRC Data management future plans should consider website and data storage paradigms that enable increased data and information access by these groups.
An important data management research priority is a determination of suitable protocols for the sharing of research and monitoring data. Some data, predominately that supported with public funds, is potentially in the public domain from the time it is generated, while other data will be provided only at the courtesy of the researcher. Increasingly in research and monitoring, the primary data generated in research retain their significance as a resource far beyond the immediate results of the study.
At some appropriate point in the course of any research project, the sharing of research results with the broader research community is essential; so too in most cases, is the sharing of the data generated by the study. Researchers normally have a professional need to delay the sharing of data until appropriate quality control and quality assurance requirements are concluded. Because research results are normally presented to the public in peer-reviewed journal articles, which are virtually required of many researchers, there may be a legitimate professional need to limit the sharing of some data prior to publication. Protocols that present researchers with workable options for the various degrees and stages of data sharing, and that clearly establish the ethical and professional guidelines for the collaborative use of a colleague's data should be articulated in this effort.