A Data Science Platform and Mechanisms for Its Sustainability


This is a research project using cutting-edge natural language processing and other analytic tools to provide a platform for scholars, citizens, environmental professionals and agency staff to answer a host of critical questions about how we impact the environment and how the environment impact us.


The National Environmental Policy Act of 1969 (NEPA) is the cornerstone of US environmental policy and law and an essential governance strategy for addressing emerging, 21st-century social, environmental, and economic challenges. The heart of NEPA is the environmental impact statement (EIS), a detailed, scientific analysis of the expected impacts of federal actions (plans, projects, and activities) and an assessment of possible alternative actions. Some 37,000 EISs—entailing billions of dollars spent—have generated reams of scientific data (environmental, economic, and social). There is no central, complete database for this information and its magnitude is too voluminous to assess manually. There is a critical need to harness the power of data science to enable the full vision of NEPA to deal with “wicked” problems—complex challenges that span disciplines and are characterized by high uncertainty and rapid change—to learn systematically from the NEPA process, and to make NEPA more efficient, effective, and equitable.

The goal of our research is to usher in a new paradigm of efficiency, transparency, and public participation in environmental governance and to enable new generalizable, empirically grounded understanding of the role of science in decision-making through a standardized online knowledge platform—the eNEPA platform— where all documents from past, present, and future NEPA analyses are deposited. Using cutting-edge natural language processing and other analytic tools, scholars, citizens, environmental professionals and agency staff can use eNEPA to answer a host of critical questions about how we impact the environment and how the environment impacts us. Our proposal operationalizes our vision through the conceptualization, design, testing, and implementation of the eNEPA platform and the assimilation of a robust eNEPA community of scholars and professionals to support its ongoing development and sustainability.

Broader Impacts

This research has significant Broader Impacts in three areas. (1) Undergraduate Student Training: We expand the participation of underrepresented student and create a new generation of scholars equipped with cutting-edge data science skills to solve the challenges our nation faces over the next century; (2)Transformative Public Engagement and Decision-Making: We enable the full vision for NEPA as a transformative strategy to enhance public participation and expert input in environmental decision-making, and to enable science-based environmental and social approaches to solving complex, “wicked” problems by creating and encouraging the widespread adoption of eNEPA and facilitating access to its information; and (3) Improved Efficiency and Effectiveness: We enable efficiencies in the NEPA process by enhancing governmental transparency, accountability and oversight, and increasing access to data that can inform and shorten the expensive and complex NEPA process.

Intellectual Merit

eNEPA enables informed decision-making, public engagement, and catalyzes scholarly inquiry—all consistent with the text and spirit of Congress’ original vision. Our contribution to basic research lies along two important dimensions. (1) Expanded SBE Scholarship: eNEPA expands broad SBE knowledge about the role of science and public engagement in decision-making and society, enabling basic research from law and public administration, to social psychology and economics. Overall, eNEPA provides unprecedented capacity for systematic, large-n study of a decision-making process designed to address a range of grand challenges. (2) Expanded Interdisciplinary Scholarship: eNEPA opens an unprecedented trove of scientific data about environmental change in the US useful to disciplines as diverse as ecology, archaeology, and public health. (3) Expanded Research Paths in Data Science: By developing the eNEPA platform and required tools, the team charts new paths in data science, including computational methods for extracting metadata and linking documents via a fine-grained similarity analysis. We also develop new machine learning and network science methods for identifying implicit links between documents and understanding the impact of collaboration in decision-making. In sum, we make overall contributions to data and computational social science by combining natural language processing with network analysis.

Project Summary

Funding Source: 
National Science Foundation (NSF)

Award Number: 

Funding Amount: 

Dr. Laura Lopez Hoffman
Dr. Marc Miller
Dr. Sudha Ram
Dr. Steven Bethard
Dr. Elizabeth Baldwin