For environmental and civic journalists, an open data tool could help unravel federal water quality data

On the eve of the 50th anniversary of the enactment of the Clean Water Act (CWA) — Oct. 18, 1972 — much work remained to be done, especially in making pollution data accessible to the general public. A group of environmental justice scholars, data scientists, and advocates think they have the tools to help journalists and other investigators level the playing field.

At the Oct. 17 ScienceWriters2022 virtual workshop “What’s in the Water? Stories in Federal Environmental Data”, members of the nonprofit Environmental Data & Governance Initiative (EDGI) provided an overview of the existing CWA data gaps and a demo of an open-source data science tool for anyone to discover stories and develop insights from records from the U.S. Environmental Protection Agency (EPA).

EDGI Civic Science Fellow Kelsey Breseman, a member of the Tlingit Indian Tribe and a neural engineer by training based in Seattle, kicked off the workshop by scrutinizing the premise of the CWA: “Is our water safe for swimming, fishing, and drinking?” Although EDGI was founded in November 2016 over fears that the Trump administration would delete valuable environmental data, it turned out that there had already been major data gaps that render Breseman’s rhetorical question almost impossible to answer.

The complete findings by Breseman and her colleagues were published in the September 2022 report, "How Gaps and Disparities in EPA Data Undermine Climate and Environmental Justice Screening Tools". Continuing her presentation, Breseman explained the key challenges facing journalists and advocates seeking to unravel federal water quality data.

First and foremost, publicly available data only goes back up to three years, which prevents citizens from tracking changes in water quality over time, unless they diligently downloaded databases before they got automatically scrubbed. This is exacerbated by a cascading series of issues that result in the typical facility missing 86% of information as required by law, such as: a failure of facilities to self-report pollution data accurately, if at all; the lack of inspection by local agencies; and an inability by states to submit entire, proper records to the federal EPA.

It’s like removing blocks from a Jenga tower: as pieces are being taken out, the structure becomes hollowed out and more prone to collapse. In the same way, the more snags occur along the process, the less reliable and useful the resulting data becomes.

To make matters worse, the process of collecting, verifying, and submitting data takes time — a process that was particularly hampered by the COVID-19 pandemic — so the data available only captures a moment in the past. Local residents can’t know, for example, how a recent chemical spill or explosion has affected their water until they have been exposed to it for months.

So do journalists just chuck all the flawed data into the trash can? No.

The session speakers said that the lack of violation, inspection, and enforcement data is newsworthy in and of itself. It is a starting point to look into where local agencies are lacking in resources to effectively enforce the CWA, who’s not doing their jobs, or worse still, which polluters are openly flaunting the law. And that’s why making currently available data easily accessible for science writers, local residents, and community advocates is a priority for the EDGI team.

Fellow panelist and EDGI member Eric Nost, an Assistant Professor of Geography at the University of Guelph in Ontario, Canada, provided a demo of the Environmental Enforcement Watch Watershed Notebook. The data science tool leverages a Jupyter Notebook accessible through the Google Colab interface so that users can execute a program and access and visualize data hosted by Stony Brook University without the need for coding.

Some of the highlights of the Environmental Enforcement Watch Watershed notebook include:

  • A United States Geological Survey (USGS) Hydrologic Unit Code (HUC) finder, which removes a lot of guesswork by utilizing the user’s ZIP code;
  • Visualizations showing pollutants ranked by amount emitted, the most consistent violators of an area and fines or penalties against them as well as the number of inspections of facilities made by state or federal regulators;
  • Links to the corresponding reports on the EPA’s Enforcement and Compliance History Online (ECHO) database.

To wrap up his portion of the workshop, Nost demonstrated the Notebook’s real-life impact when the data does prove to be useful by bringing up an anecdote involving his fellow EDGI member Emily Pawley, an Associate Professor of History at Dickinson College and an expert in environmental history. In an opinion piece for the Harrisburg Patriot News, Pawley had pointed out that CWA violations in Pennsylvania in 2019, the last full calendar year before COVID, ballooned to nearly ten times that of 2016. That year happened to be the last calendar year before Trump appointee Scott Pruitt, who in his previous position as Oklahoma Attorney General filed lawsuits against the EPA to block the Clean Water Rule, became EPA Administrator and suspended the rule himself.

If that finding was what Pawley could achieve with incomplete data, imagine the impact concerned citizens and science writers might have if everyone could get their hands on good data. EDGI’s data solutions could be a critical first step in unlocking the black box and helping those who care to save the waters they depend on.

Born and raised in Hong Kong, Alex (Ching Lam) Ip (@alexip718) is the Editor in Chief of The Xylom, a student-led nonprofit exploring the communities influencing and shaped by science. He was a 2021 NASW Summer Diversity Fellow and led a translation of the KSJ Science Editing Handbook to Traditional and Simplified Chinese. Alex is wrapping up his undergraduate degree in Environmental Engineering at Georgia Tech in Atlanta, United States.

This ScienceWriters2022 conference coverage article was produced as part of the NASW Conference Support Grant awarded to Arman to attend the ScienceWriters2022 national conference. Find more 2022 conference coverage at

A co-production of the National Association of Science Writers (NASW), the Council for the Advancement of Science Writing (CASW), and St. Jude Children's Research Hospital, the ScienceWriters2022 national conference featured an online portion Oct. 12-19, followed by an in-person portion held in Memphis, Tenn. Oct. 21-25. Learn more at and follow the conversation on Twitter at #SciWri22

Credits: Reporting by Alex Ching Lam Ip; edited by Ben Young Landis. Photo by Ben Young Landis/NASW