Data journalism for every scale and skill level

By Diana Kwon

Data today is being gathered at record rates — and much of it easily accessible. New York City alone has over 1350 open data sets. “Our challenge now harnessing all this,” said Robert Lee Hotz, a science journalist at the Wall Street Journal and one of the four panellists in the data journalism session.

Data can bring stories to life. Geotagging — marking text messages, photos or other media with geographical information — can gather data that is meaningful, useful and personal. This has be used to map cities by language, appetite and even sexual fantasy (for example, an OkCupid analysis revealed that the deeper into Brooklyn you go, the kinkier people get). “This is a bridge to a hidden city,” said Hotz.

Earlier this year, the Wall Street Journal published an online interactive of the bacteria in the New York City transit system. This project arose from a collaboration with researchers at Weill Cornell Medical College who created PathoMap from DNA samples they gathered from each of the city’s 466 open subway stations. Their analysis revealed over 15,000 different species, and Hotz and his used this data to map the city’s microbial signature by station and bacteria type. “This interactive was the most read article on our website for five days,” Hotz told the audience. “It allowed a lot of people to see the city differently — their sense of place transformed by data.”

Small data sets can also tell compelling tales. Tim De Chant, senior digital editor at NOVA Next, walked the audience through a data-centric piece he did for Wired, where he compared two popular tourist destinations: the island city Venice and the Venetian Macao, a 40-story luxury casino. By taking publicly available data from annual reports, throwing them into a spreadsheet, and digging through scientific literature to see how assessments were done, De Chant created an infographic comparing everything from annual visitors to more telling stats such as fresh water use and carbon footprint.

Data is a powerful reporting tool, but it needs to be examined through a critical lens. In psychology and the biosciences, results that yield a p-value of less than 0.05 are usually considered significant. Little changes in analysis can oven have huge impacts on p-value — and this practice, called p-hacking, can be used to make a study more publishable. “It’s easy to get a result, but it’s hard to get an answer — and we too often conflate the two,” said Christie Aschwanden, lead science writer at FiveThirtyEight. To illustrate this, she teamed up with Richie King, senior visual journalist of FiveThirtyEight, to create an interactive story where readers could try p-hacking themselves.

Data is a powerful tool and understanding analysis is an important part of the reporting process. Aschwanden left the audience with a piece of practical advice: “Every journalist should learn some basic statistics — it’s a very powerful antidote to bulls--t."