13 September 2017

You can't draw a portrait based on DNA

I recognize the cute one by the genome

Sergey Dobrynin, Radio Liberty

Detective leitmotif: DNA analysis from the crime scene allows you to instantly identify the perpetrator. How far is it really from reality? At the end of August, the scientific publication Proceedings of the National Academy of Sciences published an article with a promising title – "Identification of personality by predicting individual traits based on data from the complete decoding of the genome." In fact, the work claims that what forensic experts (and not only them) dream of has been achieved: researchers have learned to build a sketch of a person using a DNA sample and even describe his voice. But aren't they wishful thinking?

For their experiment, the scientists collected detailed biometric data of 1061 residents of San Diego and sequenced their genomes. "Using the algorithm we developed, we were able to correctly identify [by DNA] an average of 8 people in a group of 10 randomly selected subjects if the ethnic composition of the group was mixed, and 5 out of 10 if the group consisted exclusively of African Americans or people of European appearance," the authors of the work state. Moreover, the method allows not only to determine with sufficient accuracy which set of biometric data the DNA sample belongs to, but also to build a portrait of the participant in the experiment. To do this, the face averaged for a given ethnicity, age and gender is modified according to certain parameters – the reference points at the tip of the nose, in the corners of the eyes and other places are shifted according to the calculations of the DNA-based algorithm. The article has illustrations, and it turns out, at least at first glance, it looks quite similar. So the job is a success? Based on a drop of blood left at the crime scene, you can create an accurate sketch of the criminal (which is good), and based on a DNA sample stored in a genomic bank, your identity can be calculated, for example, from photos on social networks (which violates privacy)? But it seems that it is not worth jumping to conclusions.

1 Illustration from the work of Craig Venter and his co-authors darling1.jpg

The main author of the study is the infamous American geneticist Craig Venter. In the late 1990s, Venter created a commercial company that, in competition with the global scientific project "Human Genome", tried to decipher human DNA, it was even planned to patent the description of the genome. In the end, however, Venter had to join forces with other scientists. Since then, the name of the geneticist-adventurer has mainly been associated with the ideas of constructing synthetic life – so far we are talking about the simplest bacteria. But that's not Venter's only interest.

If you look at the composition of the authors of the article published in PNAS, you can see that they are all employees of the same company, Human Longevity, which Venter founded in 2013. The startup's task is to collect the world's largest library of human genomes, at least a million copies. The goal is the noblest: science still only rarely understands the relationship between specific genes or their complexes and diseases. Sometimes a mutation in only one gene is very likely to lead to the development of a hereditary disease, and then, correcting errors in the DNA of the germ cells of parents, for example, using the CRISPR-Cas9 method, it is possible to guarantee the birth of a healthy child (this technology has already been tested and probably works, its clinical application is still hampered by ethical considerations). But in the vast majority of cases, we do not know which parts of the genetic code are associated with the disease. In order to establish such connections, researchers simply do not have enough data – human genomes and information about their carriers.

Venter managed to attract investments of $ 300 million for these purposes. It is believed that Human Longevity has the best tools for DNA sequencing. And although it is far enough to achieve the stated goals, the company has been offering "superanalysis" services since 2015: for 25 thousand dollars, anyone can undergo a "medical examination on steroids," according to Venter. This is not only a detailed medical examination, but also a complete decoding of its own genome and the genome of the microbiota – the intestinal microflora.

But what is the point of paying a lot of money for the decoding of the genome, for which there is little to say about health? Human Longevity needs to demonstrate success in the field of DNA interpretation to clients and investors, and the published work is an obvious step in this direction. However, not everyone agrees that Venter really managed to make this step.

September 6 geneticist, scientific director of the genealogical project MyHeritage.com Yaniv Ehrlich published a note sharply criticizing the work of Craig Venter and his colleagues:

"In this article I cite the significant mistakes made in the work of [Venter and others]. In short: the method proposed by the authors, which in fact differs little from the primitive standard procedure, does not sufficiently use the capabilities of genetic markers, uses technically erroneous metrics and, finally, does not allow anyone to actually identify."

Ehrlich and some other scientists noticed that Venter and his colleagues actually use only those parts of the genome that indicate genealogy and gender to restore the appearance of the DNA carrier. Then the algorithm mixes the averaged faces of representatives of ethnic groups who were among the ancestors of the subject. In other words, no individual characteristics, except for the basic genealogy, are practically not taken into account. That is why the system created by Human Longevity is able to correctly identify a person in an ethnically mixed group of 20 people with a probability of 70 percent, but if you take a set of 20 white men, the probability of error is already 89 percent – this is not so far from a 95 percent error with a completely random choice.

Experts point out that the pool of participants in the experiment was too small so that, based on the study of their genomes, it was possible to obtain new information about which genes actually determine specific features of appearance. "Predicting the type of face is really nothing more than building an average face for your ethnic group. Of course, when you see it, you will always say – wow, something similar to mine!" - the bioinformatician noted in a comment to the MIT Technology Review Jason Piper.

darling2.jpg

Piper is listed among the co–authors of the article published in PNAS, but during the time that its publication took, he managed to quit Human Longevity (now works at Apple), moreover, he became one of the fiercest critics of Venter's work on Twitter - and was even banned by his former boss. In addition to Piper, another co–author of the article left the company - a well-known machine learning specialist Franz Osh, who had previously been outbid by Venter from Google. However, the publication of the article was postponed several times: according to MIT Technology Review, Craig Venter tried to get a job in one of the two most respectable scientific journals in the world, Science, but was refused – and only then released a note in the less prestigious PNAS.

darling3.png

Predicting appearance based on DNA is only a matter of time. Almost all human traits and features of physiology are determined to a greater or lesser extent by genes (for example, eye color – by 98 percent, body mass index – by an amount from 50 to 93 percent), and sooner or later, with the help of computer analysis of large arrays of DNA and data on their carriers, the relationship between genes and facial structure, height, voice and other features will be established. Perhaps this will be done with the help of libraries compiled by commercial companies such as Craig Venter's Human Longevity. But for now, this undoubtedly talented adventurer seems to have hurried again. 

Look at the funny collage made by Yaniv Erlich. On the left is Venter's own face, in the center is a prediction made by an algorithm based on his DNA, and on the right is actor Bradley Cooper. Who does the sketch look more like?

Portal "Eternal youth" http://vechnayamolodost.ru  13.09.2017


Found a typo? Select it and press ctrl + enter Print version