The DNA saved the operating system and video, and then read without errors

Soon, humanity will generate so much data that the usual storage will no longer cope. To solve this problem, scientists turned to a virtually unlimited natural repository of information - DNA. According to researchers, DNA is an ideal storage medium, since it is ultra-compact and can retain its properties for hundreds of thousands of years if it is provided with appropriate storage conditions. This is evidenced by the recent recovery of DNA from the bones of a 43-thousand-year-old human ancestor, found in Spain’s caves.

In a new study, scientists from Columbia University and the New York Genome Center ( NYGC ) demonstrated that the algorithm for streaming video on a smartphone can almost completely unlock the potential of DNA in storing and compressing additional information in four nucleotide bases.
')

The idea and general considerations about the possibilities of recording, storing and searching for information in DNA molecules belong to Mikhail Neiman , a Soviet physicist and physicist. In 1964, the magazine "Radio Engineering" published material that described the technology of this process and the data storage device - Neumann oligonucleotides (MNeimON).

In 2012, geneticists from Harvard University managed to encode a draft book of 53.4 thousand words, 11 images and one program. They found that 5.5 petabytes of data can be stored in each cubic millimeter of DNA. A year later, researchers at the European Institute of Bioinformatics managed to preserve and then completely extract and reproduce about 0.6 megabytes of text and video files: 154 Shakespeare's sonnets, a fragment of Martin Luther King’s famous performance “I have a dream” with a length of 26 seconds, scientific work on the structure James Watson and Francis Crick DNA, photographs of EBI headquarters in Hinkston and a file describing data conversion methods. All DNA files were reproduced with an accuracy varying between 99.99% and 100%.

Yaniv Erlich and his colleague Dina Zielinski, NYGC researcher selected six files for encoding and writing to DNA - the KolibriOS computer operating system, the 1896 French film Arriving at La Ciota Station, code 50 - a Amazon gift card, a computer virus, images from the Pioneer plates and Claude Shannon's research in the field of information theory in 1948.

Scientists gathered these files into one, and then divided the data into short lines of binary code. Using fountain codes , they randomly packed the strings into fountain “drops” - blocks and converted the 00, 01, 10, 11 combinations into four nucleotide bases: adenine (A), cytosine (C), guanine (G) and thymine (T ). To then put these blocks together, the team of scientists added labels for each “drop”.

In total, the researchers generated about 72 thousand such DNA chains, each of which contained about 200 bases. They gathered this information into a text file and sent it to San Francisco, where the DNA synthesis synthesis startup Twist Bioscience turned the digital data into biological data. Two weeks later, the Ehrlich team received a tube with DNA molecules.

Using sequencing technologies to read DNA strands and special software to translate the genetic code back into a binary file, they successfully restored the files. How long is reading and writing, scientists have not yet clarified.

The research team, led by Ehrlich, also demonstrated that its algorithm, multiplying the DNA sample using the polymerase chain reaction, can generate and accurately restore a practically unlimited number of copies of the sample, and even copies of its copies.

Ehrlich runs the operating system on a virtual machine and plays in "Sapper"

However, the most impressive features of the algorithm turned out to be the ability to place 215 petabytes of data in one gram of DNA - 100 times more than was achieved with the help of other methods and algorithms.

The storage capacity of DNA data is theoretically limited to two digits for each nucleotide, as well as a biological DNA device. In addition, to collect and read the recorded fragments, you need to include additional information, which subsequently reduces the capacity to 1.8 binary characters in the nucleotide. The DNA Fountain algorithm allows an average of 1.6 bits to be placed in each nucleotide, which is 60% more than previously managed, and also close to the limit of 1.8 bits.

The main obstacle to the wide spread of technology is its cost. Researchers spent 7 thousand dollars to synthesize DNA and archive 2 megabytes of data, and another 2 thousand to decrypt it. And although the cost of DNA sequencing is gradually decreasing, its synthesis still costs a lump sum. Investors are not ready to invest tons of money just for the sake of synthesis fell in price.

Erlich and his team offer another way to solve the problem: you can reduce the price of DNA synthesis, if you produce molecules of lower quality, and then use the “DNA fountain” coding strategy to correct molecular errors.

Scientific work published in the journal Science March 3, 2017
DOI: 10.1126 / science.aaj2038

Source: https://habr.com/ru/post/402079/

All Articles

The DNA saved the operating system and video, and then read without errors

More articles: