12 October 2018

Under the hood of geneticists

Almost all US residents can be "calculated" through genomic databases

RIA News

Israeli geneticists conducted a curious "investigative experiment", which showed that the identity of an arbitrary US citizen can be established from a single DNA sample in 60% of cases using private genomic databases. Their findings were presented in the journal Science (Erlich et al., Identity inference of genomic data using long-range familial searches).

"We can say that in the near future genomic databases will work as a "GPS system" for searching for anonymous owners of a particular DNA. The role of coordinates in it will be played by family trees that allow you to find certain people through their relatives, even if they themselves have not passed such tests," says Yaniv Erlich from Columbia University in New York (in a press release Crime and privacy: Using consumer genomics to identify anonymous individuals – VM).

Genomic portrait

The development of genomic technologies and the cheaper procedure for decoding DNA has made genetic examination one of the main tools of criminologists, historians and many other specialists not directly related to biology. Today, genomes are used to find criminals, missing people and uncover the secrets of the origin of peoples.

Moreover, last year Craig Venter, a well-known bio-entrepreneur and geneticist, said that his team was able to find DNA sites that control the shape of the face and other anatomical features. Their analysis, according to the geneticist, allows you to make a correct sketch of a person in 75% of cases. Venter's ideas provoked a storm of criticism from other biologists, including Ehrlich.

As Ehrlich noted at the time, the whole point of this "discovery" was that a person's age, as well as his gender and ethnicity, could be calculated from his DNA, and use this data to narrow down the circle of potential "suspects". This works in small groups of people, but will not work at the level of countries and large cities. 

Such arguments and disputes with Venter prompted Ehrlich to think about creating a technique that would actually identify the identity of a random person on the street or help the police search for criminals nationwide using only single samples of their DNA.

Today, as Ehrlich notes, companies such as 23andMe, Family Tree, Ancestry and their other competitors are developing especially rapidly, calculating kinship relationships between their customers and determining their predisposition to various diseases from their DNA samples. 

The services of such startups are now used by millions of people in the United States and other developed countries of the world, thanks to which they have accumulated some of the largest genetic databases in the world. Their data is now used by scientists to search for genes associated with rare hereditary diseases, as well as many other purposes.

New opportunities and threats

Ehrlich and his colleagues used one of these databases, collected by MyHeritage, to check whether they could also be used for "forensic" purposes – to search for unknown individuals, about whose appearance nothing is known.

In total, over 1.2 million people have used the services of this startup, many of whom are relatives of each other. Using random DNA samples of people who had not been tested within the walls of this company, Ehrlich and his colleagues checked whether it was possible to find their relatives and "calculate" them by matching genome segments.

As it turned out, this can be done for about 60% of Americans of European descent, and in many cases scientists were able to identify not only second cousins and other distant relatives, but also direct relatives.

Moreover, the calculations of Ehrlich and his team show that the database, which includes the genomes of only 2% of the inhabitants of a particular country or city, is enough to determine the identity of virtually all of its inhabitants, using the same information about their gender, age and eye color and other features that they relied on Venter and his associates. 

For example, for all 30 "anonymous" whose DNA was analyzed by scientists, the initial list of "candidates" for their role included about 800-900 people. When geneticists took into account information on their age, gender and the point where the sample was obtained, they managed to reduce their number to 1-2 individuals.

Such a successful implementation of this "investigative experiment", as Ehrlich notes, speaks of two things. Firstly, law enforcement officers can now safely use genomic databases to search for criminals and relatives of their victims. Secondly, genomic startups should pay much more attention to protecting their customers' personal data than they do today, and use cryptography to protect them from deanonymization.

Portal "Eternal youth" http://vechnayamolodost.ru

Found a typo? Select it and press ctrl + enter Print version