Skip to main content
U.S. flag

An official website of the United States government

Rocky Mountain Research Station

Data science: The Data, Modeling and Applications Support Hub (D*M*A*S*H*)

Status
Ongoing

Existing U.S. Forest Service monitoring systems produce datasets that vary widely in maintenance and accessibility. There is a need to ensure that existing data management systems are sufficiently documented, maintained, and integrated with other data streams. In a time of accelerating demand for real-time decision support, USDA Forest Service Research and Development needs a national information architecture for Forest Service data science, national data management, and modeling capabilities for accessing databases, tools, and models that support fire, social, and environmental sciences and decision support. The Data, Modeling and Applications Support Hub (D*M*A*S*H*) will allow the Forest Service to explore the capabilities of Theory-Guided Data Science and data analytics and extend our science delivery capabilities. 

A researcher working on a laptop indoors.
Photo Credit
Photo by Charity Parks

Data science has been classified into six activities that will be the underpinning of the D*M*A*S*H* initiative:

  1. Data Gathering, Preparation, and Exploration
  2. Data Representation and Transformation
  3. Computing with Data
  4. Data Modeling
  5. Data Visualization and Presentation
  6. Science about Data Science

Creation of the D*M*A*S*H* would require careful development in consultation with agency partners. The developmental phase would identify areas of complementarity and highlight possible duplication of efforts to be avoided. Sub-teams would be established for the two components of the hub.

To facilitate data exploration, the hub would serve Forest Service Research & Development scientists and their collaborators in academia, non-profits, and other State, Federal, and international organizations. Members bring capabilities as data managers, data scientists, research scientists, software developers, resource managers, and other disciplines. Development of the data science hub would explore the opportunities presented by Knowledge Networks and how Knowledge Networks approaches might be integrated with existing Forest Service systems and be included as a test of new technologies. Knowledge discovery and development is accomplished not just by creation anew but also by transfer of knowledge that already exists elsewhere. Knowledge Networks hold the prospect of an accelerated introduction of state-of-art technologies superseding the step-by-step process of transferring know-how and technologies among users and possessors of information. The data science hub utilizes technological advances in storage, retrieval, handling, and dissemination of information. The technology platform serves as an integrating mechanism through which collaborate proceeds in an innovative, interactive development process.

The D*M*A*S*H* would elevate the practice of data management such that it is seen as a critical part of the pursuit of agency research. It would ensure that Forest Service data meet findability, accessibility, interoperability, and reusability data principles. Findable data are essential for automatic discovery of datasets and services. Users need to know how data and metadata can be accessed. The data usually need to be integrated with other data and with applications or workflows for analysis, storage, and processing. To optimize their reuse, metadata and data should be well-described so that they can be replicated and/or combined in different settings.

Specific project activities and accomplishments include, but are not limited to:

  • Conduct a review of existing, state-of-the-art data hubs around the country (and globe) and adopt/adapt the approaches that best meet the needs of the agency and FAIR data principles to derive a first-generation design of D*M*A*S*H*
  •  Utilize the Missoula Discovery Network and its associated Authority to Operate to support rapid distribution of large datasets to be transferred quickly and facilitate internal and external collaborative partnerships that would otherwise be impossible. Missoula Discovery Network has also joined a collaboration between the Forest Service and the Agricultural Research Service to bring SCINet capability and to pilot virtual server technology installed at the Missoula site, yet connected to SCINet, as a service offering to other SCINet users.
  • Identify and adopt (or exceed) all existing agency data standards and common practices that are relevant to D*M*A*S*H*
  • Explore options and adopt an initial data organization framework that will be used to not only organize data layers and their retrieval, but that also facilitate monitoring and accomplishment reporting for the Bipartisan Infrastructure Law, Collaborative Forest Landscape Restoration Program, and the agency’s core mission areas. Examples to be considered include the Framework for Socio-ecological Resilience being used across California, Forest Inventory and Analysis carbon inventory, post-fire water quality monitoring, and others. 
  • Identify core data layers that currently support the agency at National and Regional scales, convene the associated data managers and the Geospatial Technology and Applications Center, set up the data hub network to readily access those data layers and their metadata, identify efficiencies that would make those core data layers, and management systems more readily interoperable, derive a scope of work required to gain those efficiencies and identify what can be accomplished as part of this project.
  •  Identify new core data layers that are being developed by Bipartisan Infrastructure Law projects, convene the associated data developers, identify efficiencies that would make them more interoperable with the existing core data layers and the findability, accessibility, interoperability, and reusability principles of D*M*A*S*H*, ensure that data developers understand the standards for data management and documentation, and provide as much support as possible to meeting those standards, and set up the data hub network to include those data layers and their metadata.
  • Identify priority gaps in National and Regional data sets (e.g., wetland integrity, biodiversity), identify potential sources external to the Forest Service for filling those sources at National or Regional scales, derive a scope of work for filling those gaps, and identify what can be initiated and potentially accomplished as part of this project.
  • Determine how the development of core data stacks for each Region to support expedited assessment and planning (initially but not exclusively in the Wildfire Crisis Strategy landscapes), as accomplished by the ACCEL project in Region 5 as part of the National Shared Stewardship investments) can be supported by D*M*A*S*H*, and demonstrate that function by supporting the data stacks developed across Region 5, and others as they are developed across the west as part of the Bipartisan Infrastructure Law project deliverables.
  • Convene lead designers for existing data management and decision support functions in the agency to identify current functions that D*M*A*S*H* could host or network to directly, high priority gaps and inefficiencies, and determine how D*M*A*S*H* could be designed as a central resource for discovery and access to the primary existing tools being supported and used by the agency.
  • Chart a course for how to fill high priority data management and tool user-support gaps as part of the D*M*A*S*H* Phase 2 investments.
  • Establish a first-generation design of the D*M*A*S*H* dashboard for accessing the first- generation D*M*A*S*H* sandbox.

Objectives

Existing Forest Service monitoring systems produce datasets that vary widely in maintenance and accessibility. There is a need to ensure that existing data management systems are sufficiently documented, maintained, and integrated with other data streams. In a time of accelerating demand for real-time decision support, Forest Service Research and Development needs a national information architecture for Forest Service data science, national data management and modeling capabilities for accessing databases, tools and models that support fire, social and environmental sciences and decision support. The Data, Modeling and Applications Support Hub (D*M*A*S*H*) will allow FS R&D to explore the capabilities of Theory-Guided Data Science and data analytics and extend our science delivery capabilities.

Application of Research Results 

Phase 1

New science, tools and technology emerges to support access and interoperability of existing and newly developed data layers and data stacks at National, Regional and Wildfire Crisis Strategy landscape scales.

Phase 2

A plan for how the agency can invest in directly supporting science-based, large landscape assessment and planning by providing dedicated technical support. Professional staff would assist with interpretation as needed. The D*M*A*S*H* becomes a national resource that transcends the current station-based and discipline- focused structure of Forest Service R&D, allows access to data, provides a platform for multidisciplinary collaborations, and helps spread knowledge about "best practices" and experiences.

Collaborators

  • Rich McKenzie - USDA Forest Service, Washington Office

  • David Vanderzanden - Geospatial Technology and Applications Center

  • Sean Gordon - Oregon State University 

  • Ilkay Altintas - University of California, San Diego 

  • Matthew Ross - Colorado State University 

  • Alexa April - SalesForce 

Last updated May 30, 2024
close