18 September 2018

Twice as much

Biologists have recalculated human genes anew

Ksenia Malysheva, Naked Science

A complete list of the genes that make up human DNA would be one of the most useful tools in the hands of scientists, primarily biologists and physicians. But despite the fact that the Human Genome project was completed 17 years ago, scientists still have no consensus even on the number of genes, let alone a single exhaustive list. Another attempt to estimate the number of genes and catalog them was made by a group of American biologists, and the new result exceeded the previous estimates by one and a half to two times.

In 1990, when the Human Genome project (HGP) was launched, it was assumed that human DNA contained about 100 thousand genes (in the early nineties, the genome meant a section of DNA carrying information about the structure of a protein). In 2001, the results of HGP and a similar project by Craig Venter and his company Celera Corporation were published; the first article was about 31 thousand, the second about 26,588 genes encoding proteins. In the HGP article that followed three years later, 24 thousand genes were already mentioned. The database created by the Ensembl project participants, in the most up-to-date version (34d), contains information about 22,298 protein-coding genes and their 34,214 transcripts.

Only two organizations are engaged in updating and editing the list of human genes: the already mentioned project of the European Institute of Bioinformatics and the Sanger Institute Ensembl/Gencode and the National Center for Biotechnological Information of the USA, which maintains the RefSeq database. There are hundreds of discrepancies between these catalogs, both in terms of protein-coding genes and in the description of long non-coding RNAs; there is a difference in the typology of genes. In addition, both catalogs are constantly updated: several hundred genes have been added to Gencode and removed from it over the past year alone.

The appearance of RNA sequencing technology in 2008 forced biologists to reconsider the definition of a gene: now many experts tend to consider the genome and the sequence of nucleotides encoding RNA, on which protein is not synthesized, but which itself participates in metabolism. Taking into account such sequences, the number of genes in the human genome can be significantly more than twenty or even thirty thousand.

In 2017, a group of researchers led by Steven Salzberg, a specialist in statistical methods in biology at the Johns Hopkins Institute, began work on a new catalog of human genes. To do this, scientists processed the results of almost 10 thousand RNA sequencing experiments from samples of 31 types of human body tissues. The new database turned out to contain 43,162 genes, of which 21,306 encode proteins, and 21,000 do not. The catalog includes almost five thousand new genes and 30 million new transcript variants, most of which, according to the authors of the work, do not participate in any vital processes; the process of reading DNA in a cell turned out to be very "noisy". A preprint of the article with these results is posted in the bioRxiv repository; at the end of August, Salzberg published an article in BMC Biology in which he spoke about the work of Open questions: How many genes do we have?).

The Salzberg group does not consider its results final; the catalog has recently received its first update – and many more are expected. "I wouldn't be surprised if in ten years we don't come to a consensus on the number of genes in human DNA," Salzberg notes. But, despite this, the scientist believes that the new database will be useful: in particular, to search for genes responsible for the development of hereditary diseases, the cause of which has not yet been established.

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version