12 July 2012

Genome in mp3 format

Mathematicians have come up with an archiver for DNA

<url>Mathematicians from the Massachusetts Institute of Technology have proposed a new way of storing and processing data about DNA sequences.

It should help to cope with the influx of data from an increasing number of read genomes. The work describing the new algorithm (Loh et al., Compressive genomics) is published in the journal Nature Biotechnology, and its summary (Searching genomic data faster) can be read on the institute's website.

The algorithm is based on the fact that the DNA sequences between all organisms are more or less similar, and the differences are of the greatest interest to scientists. Therefore, according to the authors, it is not the sequences themselves that should be stored and processed, but their differences from each other.

If, for example, the search for a certain sequence in the genome of some organism has already been carried out, then the search for the same sequence in the new genome should not be carried out throughout the sequence, but only in those places where the new genome differs from the old one. This makes it possible to significantly reduce the search time for sequences and the load on computing centers. The difference in the duration of calculations between the old and the new algorithm depends on the number of genomes already read – the more of them, the harder it is to search in the old way and the more obvious the advantages of the new algorithm.

The emphasis on finding differences in close genomes corresponds to the modern development of biology. On the one hand, the cost of sequencing has been sharply decreasing in the last decade.


The growth rate of computing power of computers (green)
and new DNA sequences. Image from the article by Loh et al.

Because of this, the growth rate of DNA sequence data already exceeds exponential. On the other hand, as the number of genomes read increases, the proportion of completely unique sequences decreases. The read genomes are more and more similar to each other. For example, in the near future bioinformatics expect a massive influx of data from DNA sequencing projects of thousands of individuals, vertebrates, insects.

Portal "Eternal youth" http://vechnayamolodost.ru12.07.2012

Found a typo? Select it and press ctrl + enter Print version