Huge DNA project offers ‘guidebook to human genome’

NEW YORK — In the largest single batch of discoveries about human DNA since the completion of the human genome project in 2003, 442 scientists in labs across three continents released 30 studies jam-packed with finds this week.

The discoveries, representing what the journal Nature calls the “guidebook to the human genome,” range from the esoteric — what is a gene? — to the practical — that just 20 gene switches may underlie 17 seemingly unrelated cancers, giving companies a workable number of drug targets.

The studies come from a $196-million project called the Encyclopedia of DNA Elements, or ENCODE, whose goal is to take the babel produced by the human genome project — the sequence of 3.2 billion chemical “bases” or “letters” that constitute the human genome — and make sense of it.

“We understood the meaning of only a small percentage of the genome’s letters,” said Dr. Eric Green, director of the National Human Genome Research Institute, which paid for the bulk of the study.

ENCODE was launched in 2003 to build a complete “parts list” for Homo sapiens by identifying and pinpointing the location of every stretch of the genome that does something — “a reference map of all the functional elements in the human genome,” said geneticist Joseph Ecker of the Salk Institute for Biological Studies in La Jolla, Calif.

The best-known elements in the genome are the 21,000 or so genes that specify what proteins a cell makes. The dopamine receptor gene makes dopamine receptors in brain cells, for instance, and the insulin gene makes insulin in the pancreas.

Only about one per cent of the genome codes for proteins, however, and the challenge has been to figure out the function of the other 99 per cent, which for years was termed “junk DNA” because it did not code for proteins.

In examining the overlooked part of the genome, the ENCODE scientists discovered that about 80 per cent of the DNA once dismissed as junk performs a biological function. Primarily, the not-so-junky DNA constitutes the most sophisticated control panel this side of NASA’s, with some four million bits of DNA controlling all the rest.

“The ‘junk’ DNA, the 99 per cent, is actually in charge of running the genes,” said Mark Gerstein of Yale University in New Haven, Conn., who led one of the ENCODE teams.

This regulation can influence both normal genes and aberrant ones, affecting the likelihood of disease.

The power of the gene-control elements may explain why simple personal DNA sequencing sometimes concludes that people are at risk for diseases they never get or misses the warning signs of those they do develop.

If the switches quiet an unhealthy gene, “it might reduce levels of proteins that have some nasty effect,” said Ecker. But if they mess up a normal gene, then someone can develop a DNA-based illness nevertheless.

There are nearly 4 million gene switches in the major human organs, with about 200,000 acting in any given kind of cell, such as in heart muscle.

The gene-control system means drug companies may have to look in new locations for influential genes. In one paper, scientists at the University of Washington in Seattle found that most of the DNA variants previously linked to 400-plus diseases lie in regulatory regions often far from the “disease gene.”

As a result, genome analyses that look for glitches only in “diabetes genes” or “cancer genes” or any other “disease genes” are likely to miss those that cause disease by changing when, where and how genes are turned on.

ENCODE has also shown that a gene is not the simple stretch of DNA that makes a protein, as students are taught. Instead, the functional unit is an amalgam of sequences from both strands of the double helix, interleaved like two halves of a deck of cards in the hands of a Vegas dealer.

Nature is making all of the ENCODE research freely available at and through an iPad app.