Tools

Subscribe to RSS - Tools

A new website called EveryCRSReport.com is making the work product of the Congressional Research Service available to everybody. The site now holds more than 8,000 current CRS reports: "It’s every CRS report that’s available on Congress’s internal website. We redact the phone number, email address, and names of virtually all the analysts from the reports. We add disclaimer language regarding copyright and the role CRS reports are intended to play. That’s it."

Joseph Lichterman writes about Science Surveyor, a new service from Columbia and Stanford researchers who want to give reporters the context of new academic papers: "Science Surveyor is still being developed, but broadly the idea is that the tool takes the text of an academic paper and searches academic databases for other studies using similar terms. The algorithm will surface relevant articles and show how scientific thinking has changed through its use of language."

John Wihbey writes that "journalists should scrutinize nonprofit activities just as they would for a government agency or corporate-run institution. There has long been criticism that news media are not aggressive enough in covering nonprofits." How to accomplish that? Wihbey points to archives of IRS Form 990, the public return that most non-profits file, at ProPublica, the Foundation Center, and Guidestar, as well as guides for analyzing its contents from IRE and SPJ.

Poynter's Melody Kramer writes about some unusual uses for the computer code-sharing system GitHub: "While most people and organizations use GitHub for code, others use the platform [for] collaborative work on lists of all sorts of information, including recipes, articles to read and freely available programming books." Kramer also includes a list of tutorials and GitHub projects such as Annotator, a tool that allows anyone to add annotation to text or images.

Matthew Phillips and John Wihbey provide step-by-step instructions for using the Python programming language to "scrape" data from a web site, in this case a list of prison inmates: "Most of the effort in web scraping is digging through the HTML source in your browser and figuring out how the data values are arranged. Start simple — just grab one value and print it out. Once you figure out how to extract one value, you’ll often be very close to the rest of the data."

Gregory Kohs wanted to show that the encyclopedia's self-correcting mechanisms can be beat, so he added phony facts to some articles. After six weeks, most were uncorrected: "Even though Wikipedia’s parent company, the Wikimedia Foundation, collected $5.7 million in surplus cash beyond expenses last fiscal year, the organization … has never spent a dime to evaluate vandalism on Wikipedia." Also, what Wikipedia said about Thoreau's neckbeard.

Leighton Walter Kille and John Wihbey review the data landscape and list some of the best-known — and least-known — sources of federal data, statistical reports and analyses, ranging from the Census Bureau and the IRS to the Food and Drug Administration: "It can seem overwhelming, but there are actually only 13 officially designated 'principal statistical agencies,' each covering a specific area such as education, transportation, criminal justice and economics."

Gary Schwitzer's HealthNewsReview.org has announced a two-year, $1.3 million grant that will allow it to resume work that shut down in mid-2013 when a previous grant lapsed. The announcement says the site will expand its staff and extend its coverage to press releases as well as news stories. It will also offer newsroom training and experiment with podcasts. Comment from CJR and the Association of Health Care Journalists

At Journalist's Resource, Leighton Walter Kille and John Wihbey explain the basics of a common statistical tool and use Microsoft Excel to show the correlation between interest rates and median home prices: "The technique is well known to data journalists, but even savvy reporters may feel a measure of discomfort when they come across it — they seldom have the expertise or time needed to understand advanced mathematics or dig into a study’s original methods and data."

IRE President and New York Times editor Sarah Cohen has some advice on where to get started in working with data: "No matter what others say, a strong facility with Excel is pretty much a baseline. It’s the tool of choice for so many other people that if you are not proficient with it, you really can’t even get started … I know there are people who think Excel is terrible and that we shouldn’t be teaching closed-source tools. But it is still on everybody’s desktop."