Digital Audio Basics

Note Transl .: Today we are publishing a translation of an article from the blog of Ethan Hein, an associate professor of music technology from New York University. We have already published a translation of one of his articles (on music visualization) and decided to continue this series with material on the basics of digital audio (this article touches on the basic moments of analogue to digital conversion and will be interesting primarily to those who are not familiar with this process). This topic was discussed in one of our podcasts.

To understand how digital sound works, you need to know a few things about sound physics. Animation depicts how sound waves propagate from a circular sound source - imagine that this is the surface of a drum or cymbal (musical instrument).
')

As you can see, sound is a wave, like ripples on the surface of a reservoir. Imagine that your ear is in the middle of the bottom of this picture. The air pressure on your inner ear rhythmically increases and decreases. The sound is the result of the fact that your brain feels how far the oscillation goes and with what frequency.

If you build a graph of air pressure changes in your ear over time, it will look something like this :

We will see many more of these sinusoidal waves: it is very important for understanding the nature of sound. The main task of audio recording is to translate such a waveform into various media formats that can be saved, reproduced and managed.

From sound to electricity

Microphones work in the same way as your ears, but instead of the eardrum, the microphone contains a small, thin metal plate attached to a magnet. With the change in air pressure on the plate, the magnet swings back and forth and produces electrical oscillations. If you draw a graph of the current voltage, the waveform will look exactly like the graph of air pressure on the membrane.

There are several different technologies for creating microphones. In some microphones, a capacitor is used to generate a current instead of a magnet, which oscillates in accordance with the air. Such microphones use "phantom power" - instead of generating a small electric current, they regulate the current that already flows through them. There are also microphones that use a small piece of piezoelectric material, which fluctuates, changing the voltage level.

From current to “digital”

So now you get the sound, presented in the form of electric current. In the past, people kept it in many ways: as wavy grooves on vinyl records, photo films, or in the form of structured magnetic particles on magnetic tape. Computers store current level information by regularly reading the voltage level and storing each value as a number. The details of this process are quite complex, but it may be useful to know at least a little about how this works.

The graph below shows Pulse Code Modulation - the analog-to-digital conversion used in AIFF and WAV audio formats. The red line is the original analog signal, which constantly changes its amplitude and is fed through the cable from the microphone.

The computer reads the voltage level at regular time intervals, which are shown on the graph as vertical lines. The blue dots show the current voltage read by the computer. Horizontal lines represent other possible values that the computer can store and read; Of all these possible values, he always chooses the one closest to the real. AIFF and WAV audio files are a long (very long) list of numbers that are voltage level values.

As you might have guessed, the more often the computer reads the readings, and the more accurately each of them turns out, the better the digital recording will sound. The frequency with which the computer reads the readings is called the sampling rate, and the accuracy with which it produces it is quantized. I will reveal these concepts in the text below.

Sampling frequency

Analog-to-digital converters read the voltage incredibly fast. The CD quality standard requires a sampling rate of 44,100 reads per second, or in technical terms, 44,100 Hertz. Audio in movies or on TV has a sampling rate of 48,000 Hertz. And it is very fast! Lead recording studios sometimes use much higher frequencies. The higher the sampling rate, the more accurately you can transmit your analog signal and the wider the range of frequencies you can cover. The CD standard of 44,100 Hertz covers the entire range of human hearing.

Bit depth (quantization)

To understand the idea of quantization, you need to know how computers store numbers and other types of information in memory. Computer memory is made of billions of tiny electrical switches that can only be in two positions: on or off. The amount of information that can be represented by the position of one such switch is called a bit. And what can be done with a bit? Well, you can save the answer to the question, in the form of "yes / no" or the logical statement: "true / false." Or you can store two numbers, for example, zero and one.

But what if you have two bits, two electric switches? Four possible combinations of these two switches are obtained: 00, 01, 10, 11, and you can use these combinations to encode four numbers, for example: zero, one, two, and three.

If you have three bits, then it is possible to get eight combinations: 000, 001, 010, 011, 100, 101, 110, and 111. Now you can store numbers: zero, one, two, three, four, five, six, and seven. Using four bits you can get up to sixteen combinations, with five - thirty-two. Each bit doubles the number of numbers that can be encoded.

If your analog-to-digital converter has only one bit to represent the signal, then you will not be able to accurately represent the signal in digital form. The same will happen if two bits of the sample are used. The graph below shows two-bit audio. The digital version of the sound wave is inaccurate and will sound terrible, since only four allowable voltage values are used.

Three-bit sampled audio sounds a little better. Now the computer can choose from eight acceptable values. The blue digital wave is still very different from the red analogue of the original, but a little closer to it:

The first plot in this section shows a four-bit sound with sixteen possible values for each read. This graph looks much better. Computer games of the 80s of the last century used eight-bit sound - this means that at each instant of reading you can choose one of 256 values. The sound still seems to be too fake and “computerized”, but now, at least, something can be recognized.

The CD standard requires 16 bits to represent audio. This means that there are 65,536 different values for each read. With such a bit depth, your approximate digital value will become very similar to the original analog signal and will sound pretty good. Even higher sound quality can be achieved by using 24-bit audio when recording, which allows you to choose from 16,777,216 different values. With a frequency of 44,100 reads per second, you get a very smooth and regular sound wave, which is difficult to distinguish from the original analog wave even to the most sensitive listeners.

Of course, the greater the bit depth, the more space is required on the disk to store all of these numbers. High-quality 24-bit audio requires 256 times more space than 16-bit audio. So you always have to choose between quality and disk space. Now you understand why audio files are so big. When listening to 16-bit audio from a disc, about ten megabytes of information per minute are played, while listening to 24-bit audio is two and a half gigabytes per minute.

How recording equipment works

The most difficult thing in recording is to find the right signal level. If you set the microphone volume too low, you will get a slight voltage fluctuation. Then, when you listen to the recording, you will have to increase the volume very much to hear it (recording), but at the same time the volume of the recorded background noise from the surroundings or equipment will increase. The resulting track will not sound the best way. On the other hand, if you set the microphone volume too high, the voltage spikes may exceed the values your analog-to-digital converter can read. This phenomenon is called clipping - cutting off the signal, and it sounds just monstrous.

The graph below shows a signal that is too loud for this recorder, as well as two different variants of its curvature.

Analog systems respond to overload by soft clipping. Because of this, the sound waves are compressed and add some harmonics to the sound. In fact, a soft limit can sound pretty cool. Guitarists deliberately overload their amplifiers to recreate this kind of distortion that sounds great when playing from an audio tape. In turn, digital systems in case of overload severely limit the level of the signal (hard clipping). As the name suggests, this restriction completely cuts off signal peaks. Because of this, terrible high harmonics appear in the signal, and it is impossible to get rid of them afterwards. Thus, digital cuts are best avoided.

It is quite difficult to set the amplifier's knob at the recording device to the desired position, in which you will receive a good signal and avoid clipping. The picture below depicts the audio interface indicators that I use at the time of the sound recording. The upper indicator shows a very good volume level with sufficient power margin. The value of the lower one is located right on the clipping boundary, therefore, most likely, I will reduce it a little.

Where do you have to make sound recordings? It greatly depends on what kind of premises you have. The best places are recording studios, but if you don’t have a chance to get into one of these, there are other ways to record good sound. The video below details recordings in non-ideal conditions.

File formats

The resulting sound recording can be saved in several formats. You can start with the aforementioned AIFF and WAV formats. They are identical to each other and simply store a list of numbers in a different order. The main problem of AIFF and WAV is that they take up a lot of space. There are several ways to compress audio to reduce the amount of memory used. There are two types of compression: lossless compression and lossy compression.

Lossless compression

It is possible to reduce the size of files on a computer without losing important information. A good analogy is the legend of the shorthand painter. This system is used by reporters when they replace various words with short codes. Symbols take up less space than English words, and they can literally reproduce everything said. Just as the stenographer conventions are lossless compression for the English language, the FLAC and Apple Lossless formats are similar compression methods for audio. FLAC and Apple Lossless take up about half as much space as uncompressed AIFF and WAV.

Lossy compression

You can compress files to even smaller sizes if you are ready to sacrifice sound quality. Lossy compression is akin to the brief content of the book - you will understand the main idea, but do not recreate the whole text in detail. MP3 is the most well-known lossy audio format. An MP3 file does not sound as good as an uncompressed original, but it can take up to 10 times less space or even less. The more you sacrifice quality, the more you can compress the file. The disadvantage is that when the quality is lost, the restore file will not work.

Sound reproduction

Just as analog-to-digital converters translate electrical signals into numbers, digital-to-analog converters convert numbers into electrical signals. The converter reads all voltage readings in the audio file and sends signals of the corresponding force through the wire to the speakers. Current oscillations go through the wires and act on the magnet in dynamics, which is attached to a thin paper or plastic cone, vibrating with it. The vibrations of the cone shake the air that affects your inner ear, and you hear the sound being played.

Source: https://habr.com/ru/post/363151/

All Articles