From ScienceWriters: Can artificial intelligence beat fake news?

By Brooke Borel

You may have noticed: It's a weird time for facts.

Fact-checkers and journalists try their best to surface facts, but (these days) there are just too many lies and too few of us.

How often the average citizen falls for fake news is unclear. But there are plenty of opportunities for exposure. The Pew Research Center reported last year that more than two-thirds of American adults get news on social media, where misinformation abounds. We also seek it out. In December, political scientists from Princeton University, Dartmouth College, and the University of Exeter reported that one in four Americans visited a fake news site — mostly by clicking to them through Facebook — around the 2016 election.

As partisans, pundits, and even governments weaponize information to exploit our regional, gender, and ethnic differences, big tech companies like Facebook, Google, and Twitter are under pressure to push back. Startups and large firms have launched attempts to deploy algorithms and artificial intelligence (AI) to fact-check digital news.

As a journalist and fact-checker, I wish the algorithms the best. We sure could use the help. But I'm skeptical. Not because I'm afraid the robots are after my job, but because I know what they're up against. As the author of The Chicago Guide to Fact-Checking (and host of the podcast Methods, which explores how journalists, scientists, and other professional truth-finders know what they know), I can tell you that truth is complex and squishy. Human brains can recognize context and nuance, which are both key in verifying information. We can spot sarcasm. We know irony. We understand that syntax can shift even while the basic message remains. And sometimes we still get it wrong. Can machines even come close?

The media has churned out hopeful coverage about how AI efforts may save us from bogus headlines. But because facts are slippery, Cathy O'Neil, a data scientist and author of Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, is not an AI optimist. "The concept of a fact-checking algorithm, at least at first blush, is to compare a statement to what is known truth," she says. "Since there's no artificial algorithmic model for truth, it's just not going to work."

That means computer scientists have to build one. So just how are they constructing their army of virtual fact-checkers? What are their models of truth? And how close are we to entrusting their algorithms to cull fake news? To find out, the editors at Popular Science asked me to try out an automated fact-checker, using a piece of fake news, and compare its process to my own. The results were mixed, but maybe not for the reasons you (or at least I) would have thought.

Chengkai Li is a computer scientist at the University of Texas at Arlington. He is the lead researcher for ClaimBuster, which, as of this writing, was the only publicly available AI fact-checking tool. Starting in 2014, Li and his team built ClaimBuster more or less along the lines of other automated fact-checkers in development. First, they created an algorithm. They then taught their code to identify a claim by feeding it lots of sentences, and telling it which make claims and which don't. Because Li's team originally designed their tool to capture political statements, the words they fed it came from 30 or so of the past U.S. presidential debates, totaling roughly 20,000 claims.

Next, the team taught code to a computer to compare claims to a set of known facts. Algorithms don't have an intrinsic feature to identify facts; humans must provide them. We do this by building what I'll call truth databases. To work, these databases must contain information that is both high-quality and wide-ranging. Li's team used several thousand fact-checks — articles and blog posts written by professional fact-checkers and journalists, meant to correct the record on dubious claims — pulled from reputable news sites like PolitiFact, {Snopes](https://www.snopes.com), factcheck.org, and the Washington Post.

I wanted to see if ClaimBuster could detect fake science news from a known peddler of fact-challenged posts: Infowars.

To create a fair fight, my editor and I agreed on two rules: I couldn't pick the fake news on my own, and I couldn't test the AI until after I had completed my fact-check. A longtime fact-checker at Popular Science pulled seven spurious science stories from Infowars, from which my editor and I agreed on one with a politicized topic: climate change.

Because Li hadn't had the budget to update ClaimBuster's truth database since late 2016, we chose a piece published before then: "Climate Blockbuster: New NASA Data Shows Polar Ice Has Not Receded Since 1979," from May 2015. In checking the report, I relied on facts available only in that period.

We used the first 300 words of the Infowars account. I checked the selection as I would any article: line by line. I identified fact-based statements — essentially every sentence — and searched for supporting or contradictory evidence from primary sources, such as climate scientists and academic journals. I also followed links in the Infowars story to assess their quality and to see whether they supported the arguments. (A sample of my fact-check.)

Take, for example, the story's first sentence: "NASA has updated its data from satellite readings, revealing that the planet's polar ice caps have not retreated significantly since 1979, when measurements began." Online, the words "data from satellite readings" had a hyperlink. I clicked the link, which led to a defunct University of Illinois website, Cryosphere Today. I emailed the school. The head of the university's Department of Atmospheric Sciences gave me the email address for a researcher who had worked on the site: John Walsh, now chief scientist for the International Arctic Research Center in Alaska, whom I later interviewed by phone.

Walsh told me that the "data from satellite readings" wasn't directly from NASA. Rather, the National Snow and Ice Data Center in Boulder, Colo., had cleaned up raw NASA satellite data for Arctic sea ice. From there, the University of Illinois analyzed and published it. When I asked Walsh whether that data had revealed that the polar ice caps hadn't retreated much since 1979, as Infowars claimed, he said: "I can't reconcile that statement with what the website used to show."

In addition to talking to Walsh, I used Google Scholar to find relevant scientific literature and landed on a comprehensive paper on global sea-ice trends in the peer-reviewed Journal of Climate, published by the American Meteorological Society and authored by Claire Parkinson, a senior climate scientist at the NASA Goddard Space Flight Center. I interviewed her too. She walked me through how her research compared with the claims in the Infowars story, showing where the latter distorted the data. While it's true that global sea-ice data collection started in 1979, around when the relevant satellites launched, over time the measurements show a general global trend toward retreat, Parkinson said. The Infowars story also conflated data for Arctic and Antarctic sea ice; although the size of polar sea ice varies from year to year, Arctic sea ice has shown a consistent trend toward shrinking that outpaces the Antarctic's trend toward growth, bringing the global totals down significantly. The Infowars author, Steve Watson, conflates Arctic, Antarctic, global, yearly, and average data throughout the article, and may have cherry-picked data from an Antarctic boom year to swell his claim.

In other cases, the Infowars piece linked to poor sources and misquoted them. Take, for example, a sentence that claims Al Gore warned that the Arctic ice cap might disappear by 2014. The sentence linked to a Daily Mail article — not a primary source — that included a quote allegedly from Gore's 2007 Nobel Prize lecture. But when I read the speech transcript and watched the video on the Nobel Prize website, I found that the newspaper had heavily edited the quote, cutting out caveats and context. As for the rest of the Infowars story, I followed the same process. All but two sentences were wrong or misleading. (An Infowars spokesman said the author declined to comment.)

With my own work done, I was curious to see how ClaimBuster would perform. The site requires two steps to do a fact-check. First, I copied and pasted the 300-word excerpt into a box labeled "Enter Your Own Text," to identify factual claims made in the copy. Within one second, the AI scored each line on a scale of zero to one; the higher the number, the more likely it contains a claim. The scores ranged from 0.16 to 0.78. Li suggested 0.4 as threshold for a claim worth further inspection. The AI scored 12 out of 16 sentences at or above that mark.

In total, there were 11 check-worthy claims among 12 sentences, all of which I had also identified. But ClaimBuster missed four. For instance, it gave a low score of 0.16 to a sentence that said climate change "is thought to be due to a combination of natural and, to a much lesser extent, human influence." This sentence is indeed a claim — a false one. Scientific consensus holds that humans are primarily to blame for recent climate change. False negatives like this, which rate a sentence as not worth checking even when it is, could lead a reader to be duped by a lie.

How could ClaimBuster miss this statement when so much has been written about it in the media and academic journals? Li said his AI likely didn't catch it because the language is vague. "It doesn't mention any specific people or groups," he says. Because the sentence had no hard numbers and cited no identifiable people or institutions, there was "nothing to quantify." Only a human brain can spot the claim without obvious footholds.

Next up, I fed each of the 11 identified claims into a second window, which checks against the system's truth database. In an ideal case, the machine would match the claim to an existing fact-check and flag it as true or false. In reality, it spit out information that was, for the most part, irrelevant.

Take the article's first sentence, about the retreat of the polar ice caps. ClaimBuster compared the string of words to all sentences in its database. It searched for matches and synonyms or semantic similarities. Then it ranked hits. The best match came from a PolitiFact story — but the topic concerned nuclear negotiations between the U.S. and Iran, not sea ice or climate change. Li said the system was probably latching onto similar words that don't have much to do with the topic. Both sentences, for example, contain the words "since," "has," "not," as well as similar words such as "updated" and "advanced." This gets at a basic problem: The program doesn't yet weigh more important words over nonspecific words. For example, it couldn't tell that the Iran story was irrelevant.

Li wasn't surprised. The problem was that ClaimBuster's truth database didn't contain a report on this specific piece of fake news, or anything similar. Remember, it's made up of work from human fact-checkers at places including PolitiFact and the Washington Post. Because the system relies so heavily on information supplied by people, he said, the results were "just another point of evidence that human fact-checkers aren't enough."

That doesn't mean AI fact-checking is all bad. On the plus side, ClaimBuster is way faster than I can ever be. I spent six hours on my fact-check. By comparison, the AI took about 11 minutes.

Still, AI fact-checkers might be our best ally in thwarting fake news. There's a lot of digital foolery to track. One startup, Veracity.ai — backed by the Knight Prototype Fund and aimed at helping the ad industry identify fake news that might live next to online ads — recently identified 1,200 phony-news websites and some 400,000 individual fake posts, a number the company expects to grow. It's so fast and cheap to tell a lie, and it's so expensive and time-sucking for humans correct it. And we could never rely on readers for click-through fact-checking. We'll still need journalists to employ the AI fact-checkers to scour the internet for deception, and to provide fodder for the truth databases.

I asked Li whether my one fact-checked story might have an impact, if it would even make its way into the ClaimBuster truth database. "A perfect automatic tool would capture your data and make it part of the repository," he said.

He added, "Of course, right now, there is no such tool."

"Can AI Solve the Internet's Fake News Problem? A Fact-Checker Investigates," Popular Science, posted March 20, 2018.

Freelance journalist Brooke Borel is the author of The Chicago Guide to Fact-Checking (University of Chicago Press, 2016) and host of Methods, a podcast that explores the secrets behind fact-checking.

(NASW members can read the rest of the Spring 2018 ScienceWriters by logging into the members area.) Free sample issue. How to join NASW.