Using data to drive stories

Srivindhya Kolluru

At the onset of the COVID-19 pandemic, a small team sought to track the virus’ spread across the United States by documenting testing rates. Since April, The COVID Tracking Project has grown to hundreds of volunteers that compile state-level COVID-19 data trends each day.

Betsy Ladyzhets, a senior research associate at Stacker, is one such volunteer and organized a ScienceWriters2020 session on the ways journalists can use data to bring a story to life on October 21. Panelists included independent journalist Christie Aschwanden; information designer Duncan Geere; The COVID Tracking Project science communication lead Jessica Malaty Rivera; and The COVID Tracking Project volunteer Sara Simon.

Participants shared key takeaways using #DataforSciComm on Twitter and practiced using a Workbench tutorial during the session.

“In its most simple definition, data journalism is the practice of using numbers and trends to tell a story,” wrote Ladyzhets in The Open Notebook. She noted that there are four key stages in a data journalism project: collecting the data, cleaning the data, analyzing the data, and finding a suitable way to present the data.

Of these steps, cleaning, which can entail organizing and improving the quality of the data, can have a significant impact on the way data is interpreted by readers.

“Since graduate school, where I studied emerging infectious diseases, infodemiology has been something I’ve tracked as much as regular epidemiology,” said Rivera. “With the outbreak of any infectious disease is the outbreak of misinformation.” Where science communication and data journalism intersect, Rivera noted that data can run the risk of becoming weaponized, such as during public health crises, if it isn’t properly translated.

“The numbers can exist on their own but if you don’t have the words to make the story out of it, people can get really misinformed,” said Rivera.

According to Simon and Aschwanden, adding enough context and being transparent about limitations and uncertainties in data-driven stories can reduce misrepresentation and even mistrust with readers.

Aschwanden notes that journalists should avoid immediately writing up a story on a dataset they have been provided with without speaking to people, evaluating how the data was collected, or looking into its limitations.

“I love to use data but I also think it's important to think about whether it's truly answering the question you want to know,” said Aschwanden, using the food frequency questionnaire as an example to illustrate how stories that draw correlations between certain eating habits and health outcomes are often misleading. “We were showing that this tool was an inappropriate tool to answer these questions.”

Providing context for data is also important in recognizing that data and the models used to collect them are biased and come with their own assumptions. “I think there’s this human nature aspect that people tend to believe data over people sometimes,” said Aschwanden.

Geere added that it’s also worth detecting why certain data was collected and what data wasn’t collected to begin with to pinpoint limitations. Finding out this information about datasets can be laborious, so Geere recommends journalists communicate with their editors throughout the process. Working closely with an editor who can poke holes in your story will also prevent cherry-picking of data.

“Readers don’t care so much about data,” said Aschwanden. “They want the answers and the data is just informing that.”

For freelance journalists interested in data-driven stories, Simon suggested pitching stories that can be supported or enhanced with data, as data can be found in every story. In fact, according to Simon, you will be a better data journalist if you regularly step away from your data.

As for tools and software, the panelists agreed that Google spreadsheets are the easiest and most accessible option. Some of the panelists have used python and R for larger, complex projects. Simon noted that, regardless of the tool, it’s important to keep track of what you’re doing with the data, especially for a collaborator to reproduce your work.

“Just work with what you know,” said Geere. “Focus on one tool and you’ll find ways that you can use it for what you need to do with it.”

Srivindhya Kolluru graduated from the University of Toronto in June 2020 with a degree in biological chemistry. She previously held editor positions at The Varsity, Canada’s largest student newspaper. See her work at and follow her on Twitter @vindhya_kolluru.

December 21, 2020

BWF Climate Change and Human Health Seed Grants

EurekAlert! Travel Awards

Eric and Wendy Schmidt Awards for Excellence in Science Communications