16 October 2018

The genetic trap

How to catch a law-abiding white maniac

Alexey Aleksenko, Forbes, 16.10.2018

Customer data from companies selling genetic tests can identify almost every white American. This helps to catch criminals, but raises serious ethical questions.

James DeAngelo was a good cop, but he had two problems that together cost him a quiet old age. Firstly, James did not know how to foresee the consequences of technological progress. Secondly, he was a sexual maniac and a murderer.

In the 1970s and 1980s, the "Golden State killer," aka the "rapist from the Eastern District," killed 12 women and left his DNA at the crime scenes. The policeman had nothing to fear, since his samples were not and could not be in the forensic databases. However, thirty years later, the detectives picked up the old case, deciding to take it up in a new way: they decided to look for matches in the GEDmatch project database. This commercial genomic project is engaged in the search for relatives and the compilation of pedigrees. The database contained data from distant relatives (second cousins) of the criminal, which allowed detectives to calculate his identity and arrest him in April 2018. Since then, about a dozen more crimes have been solved in a similar way in the United States.

Individual freedom is under threat

Neutralizing dangerous maniacs is an excellent result, but something in this story has alarmed the general public. If an elderly retired policeman with an impeccable track record and no criminal past can be so easily calculated from the DNA of his distant relatives, does this not mean that the whole of America is under the hood of companies engaged in commercial genomics? And at the same time, the police, special services and any other forces that decide to use the publicly available data for their own purposes.

Commercial genomics is a rapidly growing consumer services industry. She offers her clients a sample of their DNA (a drop of saliva) to restore their origin, find distant relatives, assess the risks of various diseases, choose a diet and a sport. Can this innocent activity really be such a formidable force? What is she capable of?

This question was posed by the authors of two scientific papers published last week in Cell (Kim et al., Statistical Detection of Relatives Typed with Disjoint Forensic and Biomedical Loci) and Science (Erlich et al., Identity inference of genomic data using long-range familial searches). The authors of the papers came to the following conclusion: the data accumulated to date by commercial genomics can already be used to identify almost every American of European origin – regardless of whether their DNA is present in criminal databases and whether they have applied for genetic services themselves. The authors warn that this situation poses a serious threat to privacy.

Catch an anonymous

Yaniv Ehrlich from Columbia University of New York and his colleagues decided to find out how far the possibilities of searching for distant relatives extend. To begin with, they found that for 60% of the clients of the popular search databases for relatives, MyHeritage and GEDmatch, in the same database there were data of relatives no further than second cousins (that is, having a common great-grandfather or great-grandmother). However, in fact, the opportunities that open up are much broader. To make sure of this, the researchers decided to "calculate" an anonymous woman from Utah, who voluntarily provided her DNA for the scientific program "Thousand Genomes".

There were enough samples in commercial databases that could belong to distant relatives of this unknown lady. Of these, two – belonging to individuals from Wyoming and North Dakota – have been linked to public genealogical data. It took scientists less than a day to find an anonymous DNA donor from Utah. They kept her name a secret, although they informed the Thousand Genomes project that the identity of one of their volunteers was revealed as a result of simple calculations.

The vast majority of samples in commercial genomic databases belong to white Americans of European descent. The authors of the article conclude that the approach they used allows us to uniquely identify 60% of such Americans – despite the fact that commercial databases contain data from only 0.5% of the population. If the clientele of consumer genomics doubles, the figure will grow to 90%, that is, almost every white American will be under the hood of geneticists.

White and black, substitutions and repetitions

The genetic differences of people are quite diverse, however, two main parameters are used to compile databases. The first of them is SNP, or "single nucleotide polymorphisms", they are just "substitutions". There are about 3 million points in the human genome where different individuals may have different "letters" (nucleotides). The totality of data about these points is a unique genetic portrait of a person. This portrait is compiled during genotyping using a DNA chip. Such data is collected in the databases of commercial companies.

On the other hand, criminal DNA databases contain information about another type of differences: there are sections in the human genome where repeating short fragments of "text" follow each other, and the length of such sections may be different – one person has a motive repeated 10 times, and another 25. Such "short tandem repeats", or STR, are based on DNA forensics, which many people know about in the O.J. Simpson case or from detective television series.

The STR method is good because, unlike commercial genotyping, it works even with badly damaged DNA samples, which, as a rule, are available to criminologists. Fortunately for the investigators, in the case of the "killer from the Golden State", DNA preservation was very good, which made it possible to compare police data with commercial data. The second difference between the two types of databases is quite delicate: if in commercial genomics the vast majority of clients are white, then in police databases the situation is exactly the opposite. Relatives of the white policeman were found, of course, in the "white" base.

These reservations, however, generally prevent the use of genomics data to catch maniacs of any skin color. The authors of the second article published last week show how this difficulty can be circumvented. Dr. Noah Rosenberg from California and his colleagues have developed calculation methods to find a match between STR data from law enforcement databases and genotyping data conducted by commercial companies. By themselves, the STR data do not allow identifying relatives other than the closest ones, but Rosenberg's approach has already allowed us to bring coverage to second cousins. His method uses the fact that DNA is inherited in extended sections, and it is possible to identify SNPs transmitted together with a given combination of repeats. As a result, a bridge will be thrown between the bases of the FBI and commercial companies. This will allow you to trace almost every DNA sample ever taken from a crime scene to a specific person.

Ethics issues

The authors of both articles express concern that, without proper control by society, the abuse of open genomic data may jeopardize individual rights. According to Fyodor Konovalov, head of the Laboratory of Clinical Bioinformatics (a Russian company engaged in medical genomics), due to such a turn of events, it may turn out that genetic information will fall under the law on personal data. According to this law, personal data is what makes it possible to identify a person. Previously, it was not possible to identify an individual by genotype, but if such an opportunity has appeared, a legal problem may arise.

Colin Fitzpatrick, director of the California-based DNA Doe Project, believes that the concerns are exaggerated: genomics data are no fundamentally different from all other information that law enforcement officers legally use. Almost everything we do in life, he believes, somehow carries information about others, and there is no reason to treat genetic tests more scrupulously than, for example, publications in social networks.

It is obvious that the story of an anonymous volunteer from Utah shows that the data of volunteers participating in academic genomic projects should be more reliably protected – by agreeing to participate in the study, they did not count on the fact that they jeopardize the right of their relatives to anonymity. It is not yet obvious how this story will affect the size of the customer base of genetic companies. Of course, it is difficult for a law–abiding American to imagine that a sample of his DNA could be used many years later to arrest his great-grandson who decided to rob a bank - if not for their idle curiosity, the great-grandson would have been free for a longer time. Although it is far from obvious which of the options – to get caught by the police as soon as possible or to have time to spend part of the loot – is preferable for a great-grandson, it is clear that genotyping can have far-reaching consequences. "Genetic information is a one–way road, it cannot be taken back," says Fyodor Konovalov. "Your blood test may change in a week, and your genes will stay with you for life and pass on to your descendants."

Two scientific papers published almost simultaneously in two of the best scientific journals in the world have stirred up public opinion enough to draw attention to the problem. It is obvious that the availability of genomic data will be regulated one way or another at the legislative level. Now that the number of genetic tests performed in the United States is doubling in less than a year, this area of technology will increasingly be at the center of public debate.

In Russia, such a problem does not exist yet: genetic tests number in the tens of thousands, not millions, as in the USA. The tests performed so far are not enough even for the usual procedure of searching for relatives, not to mention the forensic use. If the current exponential growth (i.e. annual doubling) of the consumer genomics sector continues, the situation in which the customers of American genetic companies are now will come to Russia only in ten years. Probably, by that time, there will be a generally accepted practice of access to genomic data in the world, on the basis of which national legislation can be developed.

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version