
Photo by
Matthew Potter CC-BYHow to link audio and visual information? This question is often asked by scientists and amateurs from around the world. So, in February 2006, the news that scientists were able to reproduce sounds from a clay pot over 6500 years old quickly spread throughout the Internet.
The potter allegedly put a musical rhythm on the pot when it was made. Unfortunately, this turned out to be an unsuccessful April Fool's joke of Belgian television.
')
However, Patrick Feaster
managed to process the record, which is more than 1000 years old. On this occasion, in May 2011, he spoke at the conference of the Association for Recorded Sound Collections (ARSC) with the discovery of "paleospectrophy".
Immersion in history: deciphering records of the past
Patrick uses modern technologies (in this case, not particularly modern ones, since the spectrogram was invented long enough) in order to convert visual objects into sound ones. However, humanity did not always go this way and tried, on the contrary, to “capture” the sound in the images.
For a long time (before the creation of the phonograph by Thomas Edison), people were concerned with the question: how to come up with a method of fixing music that would help the person looking at the recording play the melody in his head as easily as professional musicians do when looking at the score. Unfortunately, according to Dr. Fister, such a task is unattainable in principle, since in most cases our brain is not good enough in converting visual information into sound.
Perhaps the solution of this problem in the past and was not crowned with success, but the story has left us with plenty of evidence of how people in different eras tried to create similar sound recording systems. The most famous of these systems formed the basis of phonoautograph - the predecessor of the phonograph, invented by the Frenchman Edouard Martenville. The phonoautograph was a device in which sound passed through a cone, causing the membrane connected to the needle to vibrate. The needle, in turn, drew wavy lines on a glass cylinder covered with smoked paper.
With the help of phonoautograph sound could capture, but there was no way to reproduce it. This problem and solved Fister. In 2008, he, his colleagues, and audio expert David Giovannoni (David Giovannoni) gathered at the Lawrence National Laboratory in Berkeley to decipher one of the most well-preserved Martenville phonoautograms.
The Lawrence Lab developed technologies for extracting sounds from high-quality photographs that captured images of fragile wax media or broken discs. Using these technologies, scientists received a recording of the song "Moonlight" ("Au Clair de la Lune") made in 1860 from a phonoautogram. It is believed that this is the first record on which the human voice is distinguishable.
However, the solution of this problem to Fister was not enough: afterwards he not only recorded sound with more than 50 phonoautograms, but also investigated earlier attempts to “record sound”. In this scientist, oddly enough, helped the service Google Books. Using it, Fister wrote down characters from books that were constantly ignored, being considered historical quirks.
He found the oldest undulating line in the book of 1806. Through other techniques, he was able to decipher the melody of 1677, which was recorded by many points. Another was found in the records of the 10th century, where the lines showed the key in which it should be sung. Examples of such records can be found on his
Phonozoic site.
Another approach
In a different way, researchers from MIT, Microsoft and Adobe are following: they
reconstruct the sound from a moving (or rather, vibrating) picture. Researchers have developed an algorithm for obtaining an audio signal from vibrations recorded on video.
In one of these experiments, they managed to extract a intelligible speech from the recording of an empty bag from under the chips. In a number of other experiments, we managed to do the same with the surface of aluminum foil, with a glass of water, and even with the leaves of a home plant. In 2014, the team presented their achievements at the annual SIGGRAPH conference. (
Video from the speech of one of the researchers who worked on the project at the TED conference.)
The fact is that when the sound touches the object, it causes it to vibrate. The movements created by these vibrations are so insignificant and imperceptible that a person cannot see them. However, the camera can "see" them: to extract the audio signal from the video, the scientists used video recording with a frame capture rate higher than the frequency of the audio signal.
Initially, the experiments used cameras with a shooting frequency of 2,000 and 6,000 frames per second, but the researchers tried to use other, more budget cameras. Of course, it was not possible to extract articulate speech from the recorded video with a shooting frequency of 60 frames per second, but it was still possible to understand how many people were in the room, their gender and even the peculiarities of their pronunciation.
Of course, when thinking about the use of such developments, “spy stories” come to mind, but the researchers themselves call their project the opportunity to discover new facets in the image of objects and study their previously unexplored properties. And if hundreds of years ago people tried to think of a way to “record sound”, now such a “record” becomes a side effect, which, in turn, helps to reveal new properties of familiar objects.
Do it yourself
As already mentioned, the first phonoautogram was deciphered thanks to the technology of reproducing sound from photographs of old records (we already
wrote about this technology in one of our materials - it also contains references to decoded audio recordings). However, Patrick Fister emphasizes that anyone can cope with this task - if he knows what to do.
The detailed process is described in
this material. From myself, we note that to solve the problem, you will need a high-quality photo, basic skills in Photoshop (a wave drawn on vinyl, you need to digitize, “straighten” - the groove on the plate twists in a spiral - remove all kinds of noise and displacement), and also a relatively powerful computer with a large amount of RAM.
In order to convert the resulting image into a WAV file, Patrick uses rather exotic software: this is ImageToSound. It is free, but despite this, it is quite difficult to find it on the network (Patrick shared the
source ).
The program sequentially converts each image block (block width - 1 pixel) into an audio sample. Unfortunately, this software does not even support Windows 7 (the author uses a separate Windows 98 computer for work). As an alternative, Fister suggests using
the AEO-Light
program , but warns that he is not completely familiar with the intricacies of working with it.
The last stage - the regulation of the playback speed. Simple math comes to the rescue. First you need to know the playback speed on the original plate, the length of one revolution of the digitized wave (after “despiralization”) in pixels and the sampling rate of the final file.
If the image was edited to an audio file with a sampling frequency of 44.1 kHz, this means that the second of the audio file will be equal to 44,100 pixels of the image. If, for example, the speed of a song on a vinyl record was 50 revolutions per minute, and after digitizing and despiralizing one revolution of the record took 30,000 pixels, we get 1,500,000 pixels per minute (50x30,000).
If we divide this number by 60, we get the number of pixels per second (1 500 000/60 = 25 000). We divide the sampling rate by the number of pixels per second (44 100/25 000 = 1.764). The resulting number is multiplied by the length of the audio file (playing time of the song) and we get the time with which this file was originally recorded. If the playback speed of the original recording is unknown, Patrick advises to choose the final speed by ear.
Patrick Fister warns that this is a rather painstaking work that takes time and patience, but sometimes gives surprising results: especially when it comes to the votes of the past, which seemingly have been lost forever.
PS More materials on the topic of audio - in our blog " Hi-Fi World ".