Lake Champlain
Research Consortium
DATA MANAGEMENT PRIORITIES
Based on email discussion with LCRC participants and
community members regarding data management interests in the Lake Champlain
Basin, the following priorities were identified during the fall of 2004. In many cases the priorities identified in
1999 were retained or updated to reflect current conditions and the final
recommendations for 2004 are provided in no particular order;
1. Integrate LCRC into the larger data community as a
primary and vital source of research information for the Lake Champlain
Research Data.
The need to prevent
information critical to ongoing research from being made public prior to
publishing the results is understood.
However, the research and public communities should be able to go to go
to the LCRC website to know what research has been, is being and needs to be
conducted in atmospherics, toxics, land use, cultural, social science,
hydrodynamics & sediment, nutrients and lower food webs, middle food webs
and exotics, fisheries, wildlife and biodiversity, and ecosystem health. They should be able to get access to tabular
and spatial data for any research that has been published. They should know what research is currently
being conducted in the basin, who is the researcher, what questions they are
asking and data they are collecting, where and how it is being conducted. They should also see a list of additional
research questions that are raised as a result of previous work. As part of the integration effort the following
points should be considered.
2. Develop data format standards for project data that
enables more effective interaction with researchers and data users.
An important data management research priority is the
review of current project data tabulation and formatting practices used for
research and monitoring in the Basin.
This effort should also seek to recognize common data formatting
capacities and opportunities for data handling that broaden the usefulness of
data. One or more widely acceptable data
formats should be recommended as the common standard for shared data.
Data that is to be shared must be readable. Of the several data formats presently in use
in the Basin, some are readable by commonly used software and others require
special data-processing software that is not widely available in the community. For example, not all water quality or aquatic
biology data gathered in the Basin are rendered in EPA’s Water Quality Storage
and Retrieval System (STORET) or Ocean Data
Evaluation System (ODES), even though those standards
are designed to facilitate data sharing.
New
York, Vermont and Quebec geographical location data
are commonly obtained and reported in different coordinate reference systems,
although more universal systems are available and coordinate transformations
are no longer operationally difficult.
3. Create or adopt metadata standards for project data
and require their use.
An important data management research priority is the
review of current metadata practices used in the Basin, including the
documentation protocols that have been developed by other cooperative research
efforts and established by various government agencies. This effort should also seek to recognize the
minimum essential metadata requirements and recommend protocols that facilitate
data exchange and use among the research and monitoring community.
Information about project data is an essential part of
any data set. Metadata (data about the research data) describes
the research methods, instrumentation and standards that generated the project
data and includes the quality assurance & quality control protocols that
were applied. Metadata also includes
more fundamental project parameters such as research design, location of
sampling or measurement, identity of the investigator and how the data are
archived and documented. This type of
essential information establishes the usefulness of project data both for the
original research and for subsequent projects that attempt to use the data.
4. Incorporate data management mechanisms that will best
address the need for accession and retention of data by those involved in
research and monitoring in the Lake Champlain Basin.
A primary data management research priority is to
determine what mechanisms exist or should be developed to implement Basin-wide
data management at the level that is desired by participants in research and
monitoring. Several models exist in the
Basin, such as the GIS protocols developed for the Lake
Champlain Basin Program by the Vermont Center for
Geographic Information (VCGI) and the Vermont Monitoring Cooperative (VMC)
at UVM
In determining the best mechanism for Lake Champlain
data management, the following design parameters should be addressed:
·
The
need of participating funding agencies to bring data generated through their
programs to the public should be accommodated where possible in the design.
5. Increase the focus of data protocol,
infrastructure and technological development to more effectively support data
sharing to the research and larger community when appropriate.
LCRC research activities represent part of a larger
web of the natural and biological information network in this geographic area.
In Vermont LCRC shares geographic coverage with Agencies and groups such as the
Vermont Monitoring Cooperative, the Vermont Center for Geographic Information,
the Lake Chaplain Basin Program, the Agency of Natural Resources, numerous
watershed groups and non-profit environmental groups. The LRC Data management future plans should
consider website and data storage paradigms that enable increased data and
information access by these groups.
An important data management research priority is a
determination of suitable protocols for the sharing of research and monitoring
data. Some data, predominately that
supported with public funds, is potentially in the public domain from the time
it is generated, while other data will be provided only at the courtesy of the
researcher. Increasingly in research and
monitoring, the primary data generated in research retain their significance as
a resource far beyond the immediate results of the study.
At some appropriate point in the course of any
research project, the sharing of research results with the broader research
community is essential; so too in most cases, is the sharing of the data
generated by the study. Researchers
normally have a professional need to delay the sharing of data until
appropriate quality control and quality assurance requirements are
concluded. Because research results are
normally presented to the public in peer-reviewed journal articles, which are
virtually required of many researchers, there may be a legitimate professional
need to limit the sharing of some data prior to publication. Protocols that present researchers with
workable options for the various degrees and stages of data sharing, and that
clearly establish the ethical and professional guidelines for the collaborative
use of a colleague's data should be articulated in this effort.