Tapping into rich veins of information

Unmask plagiarism in PubMed by flagging similar texts. Assess disease risk by finding repeated DNA segments. These are just two applications for new analytical tools from the lab of University of Texas Southwestern Medical Center computational biologist Harold "Skip" Garner.

 

Unmask plagiarism in PubMed by flagging similar texts. Assess disease risk by finding repeated DNA segments.

These are just two applications for new analytical tools from the lab of University of Texas Southwestern Medical Center computational biologist Harold "Skip" Garner.

Kicking off the 2009 New Horizons in Science meeting with a 90-minute talk entitled "Mining hidden knowledge from Medline and DNA," Garner — who called himself a "physicist/engineer turned biologist/medicine person" — argued that his software has benefited and could continue to benefit authors of scientific papers, journal editors, investigative journalists, clinicians, biodefense organizations, and others.

The first project Garner described was a free Web site called eTBLAST. Users enter text into a search box — anything from a sentence to a full paper — and choose from an expanding array of databases, such as PubMed/Medline and NASA. The search engine then determines what the most important keywords are and returns a ranked list of similar abstracts, papers or citations.

Using eTBLAST, Garner said, journalists can input their article drafts and find papers to use as references. They can enter keywords or abstracts and look at the first and last authors of papers that come up to find out who the experts are. And they can see how many papers have been published on a topic from year to year.

Researchers, meanwhile, can search eTBLAST for certain experimental methods or equipment. They can find possible reviewers, collaborators and competitors. They can also get a sense of which journals might be most interested in publishing their work.

Although it was conceived as a tool to better search the scientific literature, eTBLAST has ended up confronting and even shaping the ethics of scientific publishing, Garner said.

He believes eTBLAST can help measure the extent of breaches of ethics, such as plagiarism and multiple publications of the same data. Then publishers can establish rules of conduct, Garner said. . A few journals have already incorporated eTBLAST into their submission review and uncovered plagiarism cases, some high-profile, he said.

Garner has funneled all the "highly similar" citations from eTBLAST into a database called Deja Vu and found a rate of eight to ten duplicates per thousand articles, allowing for legitimately similar entries such as poster presentations that later became papers. Visitors can browse publication thumbnails and see overlapping text highlighted in yellow.

He and his wife also developed analytic software that the National Institutes of Health uses to make sure grant applicants are adhering to its rule of only two submissions for the same application.

The last part of the presentation focused on sorting through data provided by microarrays Garner's lab has developed. These arrays can scan a genome for repetitive DNA segments, such as the "CAG" repeats that indicate Huntington's disease, for example. Until now, Garner said, microarrays have only been able to search for SNPs, segments of DNA where a single nucleotide "letter" varies between people.

His lab has made 5,000 probes, each tuned to pick up a differently "spelled" repeat. The arrays are sensitive enough to pick up on an Epstein-Barr infection in cell cultures because of a "GAGCAG" repeat, he explained.

Garner has played with these microarrays by differentiating genomes between species and teasing out sequences that seem to build or confirm taxonomic relationships. But his main interest is in cancer.

While they can only detect repetitions, not random jumbles, the arrays have shown that certain carcinogens disrupt DNA in measurable ways, Garner reported. He has already filed patents for different repeats that seem to be associated with lung, breast and prostate cancers.

Stephanie Dutchen recently completed MIT's graduate program in science writing and now contracts with the National Institutes of Health.

Oct. 26, 2009