26 April 2019

DNA Leak

Is there any need to fear for the confidentiality of genetic data

Daria Varlamova, N+1

There have been lively discussions around the sale of personal data for a long time. But with the development of biotechnologies, the question of the confidentiality of genetic data also arises. Strictly speaking, we literally scatter genetic information about ourselves every day, leaving skin particles, hair, saliva and other substances everywhere. For a long time, it was not particularly useful even for detectives, not to mention commercial companies. With the advent of truly gigantic databases of genetic data, the situation began to change. We figured out who could potentially get information about your genome and whether it's worth worrying about.

For entertainment and science

Organizations engaged in genome research can be divided into four types for simplicity – each with its own goals, business models and data storage organization.

One of them includes companies specializing in the field of, relatively speaking, "entertainment genomics", such as 23andme, Ancestry or the Russian Atlas. Their clients are a wide range of people who want to learn something interesting about themselves, and the prices there are usually low.

But if you carefully read the user agreements of such services, it becomes clear that they earn not only on sequencing for the mass consumer.

For example, the company 23andme offers users to participate in a research program. This means that the company can transfer your anonymized data to third parties for research. The personal information that you provide during registration is separated from the actual genetic data and stored separately (two groups of data are linked only by a randomly assigned ID).

The Russian company Atlas honestly admits that it collects data "for the purpose of conducting scientific research of depersonalized genetic information and organizing consumer participation in research on a commercial basis." At the same time, the service reserves the right to "dispose of the information received at its discretion, provided that such information does not identify a person and all of the above is not done for the purpose of identifying a person."

The last remark is an important point: if the information is not used to identify a citizen, it is not considered biometric, according to the law on personal data.

However, contracts concluded with users of companies like 23andme often do not clearly state who owns the rights to the data obtained during decryption. If the data does not belong to the user, the company can once take permission from him to share information under certain conditions, and then, for example, add someone new to the list of those who will get access to the data - and no longer inform the client about it. But your identity of external researchers is most often not interested – they are willing to pay for big data.

In some cases (for example, with rare genetic mutations), specific clients may be offered to participate in a study through the service, where they will be asked for some personal information about them, but this is done only with the consent of the user.

The second type of organizations are laboratories whose employees are engaged in fundamental science and conduct large population studies, trying to find the relationship between individual genes or groups of them and certain traits in the carrier. Such laboratories may be clients of the above-mentioned companies.

The third type is genetic forensics laboratories. We'll talk about them later.

For medical purposes

And finally, there are medical laboratories where people turn to find out the cause of severe hereditary diseases, such as myodystrophy, epilepsy or malformations in children. Here we are talking about monogenic diseases, since the probability of polygenic disorders is influenced by too many factors to make effective predictions.

Medical laboratories are not particularly interested in big data – only the frequency of occurrence of a particular gene in the population (in order to filter out non-pathogenic variants), but not their correlation with other signs. Therefore, they use very limited external databases, and they can publish information about the genome of a particular patient only with his consent, anonymously and in very specific situations (a complex clinical case).

In addition, medical laboratories earn money by solving the problem of a particular patient, and not by collecting data, which is then usually stored with the consent of the patient, but is not transmitted anywhere. And all the rules of medical ethics apply to the actions of medical geneticists.

"I regularly communicate with genetic laboratories, and in my entire professional career I have not heard of a single case of a client suffering from a leak of genetic information," says Fyodor Konovalov, founder of a private laboratory of clinical Bioinformatics. – And in my opinion, it is better not to give people food for unreasonable paranoia. In the current atmosphere of prohibitions, this may encourage officials to impose unreasonable restrictions that make life difficult for scientists."

"Usually genetic data is stored separately from information about the individual, and in order to link them to the individual again, resources and knowledge are required that an ordinary citizen does not have," explains Ekaterina Pomerantseva, head of the Genetico laboratory complex. – Plus, both are legally (and we comply with it) stored not in the cloud, but on quite physically isolated servers, guarded with all the paranoia befitting the case. So customers don't really risk anything."

At the same time, there are reasonable arguments in favor of sequencing your genome for medical purposes. Thanks to this, a person learns useful and sometimes vital information about himself.

"The main problem is the carriage of hereditary diseases," emphasizes Fyodor Konovalov. "About half of the people born with monogenic diseases could have been prevented by knowing the carrier status of their parents."

The most likely trouble that can happen to a person who decides to order a DNA test, according to Pomerantseva, is the opportunity to encounter unpleasant information about the risk of a rare disease.

"In general, the laboratory looks at what is asked of it, but does not look at excess. But sometimes something does come across, and there is a special list of random finds that it is recommended to share with the client. As a rule, these are genetic markers that require a person to take some preventive measures to preserve health. And even to get information from this list, the user must give consent in advance. By the way, taking this opportunity, I want to convey my warm greetings to clinics that forget to send informed consent and explain that the lack of consent can cause a delay in the issuance of the result."

Altruism and profit

Altruistic citizens also have additional motivation. Studies of genetic data can help the development of preventive medicine (if scientists are able to identify genetic markers signaling a correlation between a certain gene sequence and a certain disease), including early detection of cancerous tumors.

In addition, such studies help to select more personalized (and, accordingly, better working) drugs based on the genetic characteristics of patients.

"Databases are very good," says Ekaterina Pomerantseva. – Moreover, their benefits are not always known in advance. Sometimes new useful features are found out in hindsight. For example, it becomes possible to investigate diseases that might not have been in diagnostic reference books at all at the time of data collection. And linking databases to a description of a person (with his consent) can be useful for researchers if we are talking about a client's medical history – so that parallels can be drawn between genetic markers and real health problems.

At the same time, just one patient can change the future of medicine. So, for example, it happened to the American Henrietta Lacks, who in 1951 went to Johns Hopkins Hospital, was diagnosed with cervical cancer and died eight months later at the age of 31, despite treatment.

While she was in the hospital, the attending physician discovered unique properties of her tumor cells: they multiplied twice as fast as normal ones, and they had no limit on the number of divisions. The cells were called HeLa, and they became a rare find for biologists and physicians, because it was very convenient to perform experiments on them due to their survivability and unlimited ability to reproduce.

HeLa cells have greatly helped the development of molecular biology, participated in a huge amount of research and even flew into space. But for reasons of confidentiality, neither Henrietta nor her relatives were informed by the doctors that they were going to use her cells for the benefit of science. None of her relatives received any financial remuneration, and Henrietta's name was practically unknown even to specialists until scientific journalist Rebecca Skloot published a book about her in 2010.

Perhaps it would be fair if companies shared with users a part of the profits received from the study of their genetic information. However, not everything is so simple. Because, firstly, this practice will teach people to expect rewards, and not to proceed from altruistic considerations, and secondly, it is not very clear how to decide who is worth rewarding and who is not.

But if researchers and medical companies are so eager to get our genome that they are willing to pay for it, then perhaps it's worth getting rid of intermediaries and trading your genetic information directly? If now a person can sell his hair or sperm, then in the near future information about his DNA will also become a commodity.

There are already two startups ready to provide everyone with such an opportunity. The first is Nebula Genomics, founded by Harvard geneticists. The creators of the project promise to maintain the anonymity of sellers and at the same time disclose information about buyers so that sellers can understand who they are dealing with. All transactions will be registered via the blockchain.

The second is Shivom, where you can upload and encrypt ready–made genetic information in VCF format, and then decide who to share it with.

Detectives and thieves

With the advent of big data, genetic forensics has also begun to make great strides. How can a person be identified by DNA at all? 99.9 percent of our DNA is identical to the DNA of other people, but the remaining 0.1 percent is important for forensic geneticists. As a rule, these are nucleotide sequences – short tandem repeats, or STR. They can be used as genetic markers characteristic of close relatives.

There are other markers called SNP (single nucleotide polymorphism), and they are commonly used in biomedical research. SNPs vary much slower than STR, so they are increasingly being used in criminology.

The next step is DNA phenotyping: based on genetic information, one can make assumptions about hereditary features of appearance (for example, height or skin and eye color) - inaccurate, but still practically applicable.

In 2017, American geneticist Craig Venter, together with employees of his own company Human Longevity, published an article with the results of testing an algorithm that, based on genomic data, predicted human height, skin color, eyes and other appearance parameters with high probability. The success of the algorithm, according to Venter, meant that people's genetic data should not be publicly available.

However, a specialist from Columbia University, Yaniv Ehrlich, questioned the ability of the algorithm to predict some features beyond what can already be assumed by the gender and ethnicity of an individual. In addition, Ehrlich argued, even if the algorithm gives some specific bonuses, it will require a whole database of various biometric data to work.

At the same time, the same Ehrlich conducted several high-profile studies that revealed vulnerabilities of public (that is, accessible at least to third-party researchers) genetic databases.

The databases of only 20 of the largest organizations in the world engaged in gene research contain about 100 information. For comparison: Twitter's servers are replenished with information by 0.5 petabytes per year.

Using information about the genomes of more than a million users who have undergone sequencing services, scientists have suggested that about 60 percent of search queries about individuals of European origin will lead at least to second cousins, which theoretically allows you to calculate a person through demographic databases.

For example, you can use the knowledge that, due to cultural traditions, the surname is often passed down the paternal line, so there is a correlation between surnames and Y-haplotypes (a set of genetic markers that allows you to find close male relatives).

If an attacker has an unknown DNA sample for sequencing or a ready-made decryption, he can look for matches in public genealogical databases, which will allow him to calculate a person's surname with a high probability. And then you can reduce the range of options with the help of social networks and other open information. To identify the identity of more than 50 people from the 1000 Genomes project, Ehrlich and his colleagues needed to calculate only 5 surnames.

Moreover, in order for you to be found by DNA, it is not necessary to sequence your personal genome – enough genetic data even from your distant relatives. This is similar to collective immunity, only on the contrary: if you give your genomic data to someone, it potentially puts not only you, but all your relatives at risk.

This is a double–edged sword - on the one hand, it is not difficult to imagine how such knowledge can be abused in a totalitarian state, on the other hand, the same technology helps to find criminals.

In particular, the famous "Golden State Killer", known for a series of gruesome murders, robberies and rapes committed in California in the 1970s-1980s, was recently caught in this way. The police compared the DNA of the criminal with the open database of genealogical information GEDmatch, and as a result, the search circle was reduced to several families, among which they were already looking for the optimal match by age and place of residence.

However, this method is not very relevant for Russia yet, because we do not have a large network of genealogical databases yet.

Discrimination by genes

The Western public also has concerns about genetic discrimination – what if a potential employer, an insurer or a bank employee making a decision on granting loans offers you a cup of coffee, isolates and sequences your DNA, and then runs through databases and finds out that you have had arrests in the past or have a certain genetic disease?

In some developed countries, laws against genetic discrimination have already appeared: the Genetic Information Nondiscrimination Act of 2000 (GINA) in the USA, Bill S-201 in Canada and the British The Equality Act of 2010, which prohibits employers from using genetic information to make hiring decisions, and also imposes a moratorium until 2019 on the use of such information. information provided by insurance companies.

Since May 25, 2018, a new Data Protection Regulation (General Data Protection Regulation) has been in effect in the European Union, prohibiting companies from searching and processing information, including about race or ethnicity and people's health status. Genetic and biometric data cannot be collected in order to accurately identify an individual.

At the same time, according to the non–profit organization Human Rights Watch, in the Chinese city of Xinjiang, the police are already collecting genetic and other biometric information, which causes alarm among human rights activists, because Xinjiang is the capital of the autonomous okrug where Uighurs live, a people oppressed in China.

The New York Times journalists found out that Chinese scientists are improving their methods of DNA analysis using databases from the 1000 Genome Project and American research laboratories (it is important to recall here that the technology that allows us to assume ethnic roots by DNA exists, but targeted weapons with an eye on a specific ethnic group, fortunately, cannot be created).

However, it does not follow directly from the collected material that the Chinese authorities aimed to recognize precisely ethnic origin and that, for example, this is simply not a pilot region for centralized collection of information throughout the country.

Judging by this information, Xinjiang is creating a database of individual DNA for each person – something that Russia and other countries do only for criminals. This information can then be used to individually identify a person (by his blood or by traces left by him like cigarette butts) and to identify his close relatives. I don't see any clear indications of binding to the population – and why try to determine from DNA whether it is a Uighur or a Chinese, if the DNA points to a specific person about whom everything is already known in the database? 
On the other hand, we used data from the laboratory of Kenneth Kidd, an American geneticist who is engaged in population analysis. But such a compulsory survey, as in Xinjiang, has nothing to do with the scientific study of populations. We can even say that the example of this blatant violation of ethical standards shows especially clearly how these standards – primarily voluntary participation and anonymity of DNA data – are correct and necessary. 
Oleg Balanovsky, Head of the Laboratory of Genomic Geography of the Institute of General Genetics of the Russian Academy of Sciences.

Fears and reality

So, unauthorized access to the genetic information of a particular person is quite possible today. But how dangerous is it if you are not a criminal and not a representative of a discriminated national minority? Is it worth waiting, for example, that in Russia they will soon begin to collect DNA data in the same way as they are collected today using questionnaires, social networks, credit histories and publicly available databases of personal data?

"We already have a good predictor of many things related to health, behavior, performance and life span – smoking," adds Ekaterina Pomerantseva. – All other things being equal, a non-smoking employee is more profitable for employers: he will get sick less, live longer, will not run to smoke breaks, non-smoking clients will not wince when they smell cigarettes from him. Do people hide the fact that they smoke at the interview? No, because in practice it does not lead to any discrimination. It turns out that even if such a sign written almost on the forehead does not particularly affect the decisions of employers, it is unlikely that they will attach much importance to genetic features. Moreover, everything related to human talents and abilities is poorly predicted by the genome, because it is the product of a complex combination of biological and social factors."

"If we talk about employers, then, strictly speaking, they already discriminate against applicants based on genetic information and there are a million cases of such in Russia," recalls Fyodor Konovalov. – I'm talking about two X chromosomes. Or take a genetic marker that significantly increases the likelihood of breast cancer – the same female sex. But when employers prefer to hire men, they are guided by completely different considerations."

Even ordinary medical information can be more sensitive than genetic information in terms of possible vulnerabilities of the client. For example, if a woman has a seriously ill child, employers may refuse to hire her, fearing that she will spend too much time caring for him. But will someone specifically dig into the database of the children's polyclinic to extract this "priceless" information? It is easier to find out about a sick child from social networks.

The human brain doesn't work very well with probabilities, which is probably why the prospect of genetic data leakage scares us more than the fact that we leave much more personal information in social networks and everyday communication than a potential attacker can extract from our DNA.

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version