Lost audio encoding. What is what?

Attention : this is an old version of the article, the new one is available on my site .

Evolution of audio coding

In the yard of 2011, the year since the advent of the first MP3 encoder 17 years have passed. But the fact that most of us still quietly listen to music in MP3 format does not mean at all that progress has been stagnating all this time. And this concerns not only the development of the MP3 encoding algorithm, but also the evolution of audio coding with losses in general - in the form of new, more sophisticated codecs that really allow to get better quality with a smaller size. Formats such as OGG Vorbis, AAC, WMA, Musepack have long left an outdated MP3 behind with its many limitations and shortcomings.
')
At the same time, lossless coding is gaining momentum. But due to the large amount of data, today it is still unsuitable for full-scale use - especially for portable devices with limited memory, for streaming on the network, and just for a quick exchange of music on the Internet (I must admit that not everyone has not always on hand is 100-megabit Internet access).

And so, the MP3 is outdated, and a replacement has definitely matured. Just how can a user to the uninitiated, but who wants to achieve the highest quality sound with the minimum amount of memory? After all, there are quite a few alternative codecs (at least 3 of them are really worthy of attention): Apple promotes the AAC format (Advanced Audio Coding - positioned as MP3 successor) using its iTunes Store, Microsoft - its own licensed WMA (Windows Media Audio), besides , OGG Vorbis is gaining more and more fame, and especially enlightened ones use even such a format as Musepack. Which of these codecs to choose?

There is no unequivocal answer to this question - and that is why I am writing this article.

How to decide?

The choice of a particular codec depends on the specific task. Namely:

1. From the equipment and software with which the sound will be played. Those. on the availability of support for a particular audio format, as well as the quality of playback (they should preferably be guided by the choice of bitrate).

2. From the amount of memory that will be allocated to the final material. Accordingly, a higher or lower target bitrate / quality is selected.

And, of course, besides the format and bitrate, it is necessary to select the optimal encoder and encoding parameters. At the same time, it should be understood that different formats / coders manifest themselves differently on different bitrate ranges.

Thus, the algorithm is approximately as follows:

1) Find out which formats the target device supports.
2) Decide how much space you can allocate for audio material, as well as determine the total duration of audio intended for encoding.
3) Calculate the desired bitrate by the formula: bitrate = disk_space (in kilobits) / total_duration (in seconds).
4) In accordance with the bit rate, choose the best format from the supported ones (more on this later).
5) Choose the best encoder and parameters to it.

More about our heroes

AAC

The development of data compression and psychoacoustics has gradually led to the fact that the MP3 standard has become “cramped” for the implementation of new ideas in audio coding. As a result, by 1997, the Fraunhofer Institute (Fraunhofer IIS), which in the early 90s created the MP3, as well as companies Dolby, AT & T, Sony and Nokia - developed a new audio compression method - Advanced Audio Coding (AAC), which became standard MPEG-2 and MPEG-4. The main differences from the MP3 standard are:

support for a wider range of formats (up to 48 channels) and audio sampling rates (from 8 kHz to 96 kHz);
a more efficient and simple filter bank: the hybrid MP3 filter bank has been replaced by conventional MDCT (modified discrete cosine transform);
the wider variation limits of the time-frequency resolution in the filter bank - eight times (in MP3 - three times) - led to an improvement in the coding of transients (transients) and stationary sections of the audio signal;
better coding frequencies above 16 kHz;
more flexible stereo coding mode, allowing you to switch to M / S mode (“joint stereo”) independently in different frequency bands;
additional features of the standard that increase the efficiency of compression: time-domain noise shaping technology (TNS), prediction of MDCT coefficients over time (long term prediction), parametric stereo coding mode (parametric stereo), noise synthesis (perceptual noise substitution), high recovery technology frequencies (SBR).

Thanks to these features, the AAC standard is able to achieve more flexible and efficient, and therefore better audio coding. As a result of widespread MP3 format, the AAC standard has not yet gained comparable popularity to MP3. Nevertheless, AAC is the main format in the popular online store iTunes Store, iPods, iTunes, iPhone, PlayStation 3, Nintendo Wii and DAB + / DRM digital broadcasting.

Ogg vorbis

Ogg Vorbis is a relatively new universal audio compression format, officially released in the summer of 2002. It belongs to the same type of formats as MP3, AAC, VQF and WMA, that is, to lossy compression formats. The psychoacoustic model used in Ogg Vorbis, according to the principles of operation, is close to MP3 and their ilk, but only the mathematical processing and practical implementation of this model are radically different, which allows the authors to declare their format completely independent of all predecessors.
The main indisputable advantage of the Ogg Vorbis format is its complete openness and freedom. Moreover, it uses the latest and most high-quality psychoacoustic model, because of which the bitrate / quality ratio is much lower than that of other formats. As a result, the sound quality is better, but the file size is smaller.
The format has a large number of advantages. For example, the Ogg Vorbis format does not limit the user to only two audio channels (stereo - left and right). It supports up to 225 individual channels with a sampling rate of up to 192kHz and a resolution of up to 32bit (which no lossy compression format allows), so Ogg Vorbis is great for encoding 6-channel DVD-Audio. In addition, the OGG Vorbis format is sample accurate. This ensures that the audio data before encoding and after decoding will not have offsets or additional / lost samples relative to each other. It's easy to appreciate when you encode non-stop music (when one track gradually enters another) - the integrity of the sound will end up.
The possibility of streaming is no surprise to anyone now, but in this format it is laid from the very foundations. This gives the format a rather useful side effect - you can store several compositions with your own tags in one file. When downloading such a file to the player, all the songs should be displayed as if they were downloaded from several different files.
We should also mention a fairly flexible system of tags. The tag header is easily expanded and allows you to include lyrics of any length and complexity (for example, lyrics), interspersed with images (for example, a photo of an album cover). Text tags are stored in UTF-8, which allows you to write at least in all languages at the same time and eliminates possible problems with encodings. It is much more convenient than various tricks like id3 tags.
Ogg Vorbis by default uses a variable bitrate, while the values of the latter are not limited to any hard values, and it can vary even at 1kbps. It is worth noting that the format is not strictly limited to the maximum bitrate, and at maximum encoding settings it can vary from 400kbps to 700kbps. The sample rate has the same flexibility - users are given any choice from 2000Hz to 192000Hz.
Ogg Vorbis was developed by the Xiphophorus community in order to replace all paid proprietary audio formats. Despite the fact that this is the youngest format of all competitors in the MP3, Ogg Vorbis has full support on all known platforms (Windows, PocketPC, Symbian, DOS, Linux, MacOS, FreeBSD, BeOS, etc.), as well as a large number of hardware implementations . Popularity today far exceeds all alternative solutions.
It is worth noting that Ogg Vorbis is just a small part of the multimedia project of Ogg Squish, which also includes free coders: Speex - for voice compression; FLAC - for lossless audio compression; Theora - for video compression.

Musepack

MusePack (mpp, mp +, mpc, MPEG +) is an unlicensed file format for storing audio information, distributed under the GNU General Public License.
The quality of MPC coding at high bit rates (160 Kbps and above) is noticeably (if not significantly) higher quality provided by MP3.
Main advantages:

The format does not produce the second dct-transform, it does not actually suffer from pre-echo artifacts, in contrast to formats such as MP3, Vorbis, AAC and WMA.
More efficient variable bit rate algorithms. If you follow how the bitrate changes while playing MPC tracks, you can see that for simpler sections the encoder allocates a lower bitrate, and for difficult sections it allocates a much larger one, sometimes higher than 400 (!) Kbps. One interesting fact is worth mentioning here: the MP3 encoder in VBR mode allocates 32 kbit / s for silence (at a sampling frequency of 44100 Hz), AAC and Vorbis OGG - 2 kbit / s, Musepack encodes silence with minimal costs, <1 kbit. / s (for example, a minute of silence will take some 514 bytes). All this speaks of the extraordinary “economy” of this coder.
Powerful and flexible psychoacoustic model. Here we can mention, for example, a dynamic low-pass filter based on frames (in other encoders a fixed bandwidth is set for each quality preset).
More advanced compression based on optimized Huffman tables (the same LAME MP3 wastes about 20% of the bitrate - only because of imperfect mathematical compression)

Wma

Windows Media Audio is a licensed file format developed by Microsoft for storing and translating audio information.

Initially, the WMA format was advertised as an alternative to MP3, but today Microsoft contrasts it with the AAC format. Nominally, the WMA format is characterized by good compression capability, which allows it to “bypass” the MP3 format and compete in parameters with the Ogg Vorbis and AAC formats. But, as was shown by independent tests, as well as by subjective evaluation, the quality of the formats is still not unequivocally equivalent, but the advantage even over MP3 is unequivocal, as claimed by Microsoft.

Selection of format, encoder and parameters

Now directly to the point.

To make it easier for you to choose, I would like to share my experience gained through numerous comparisons, auditions, and also based on the analysis of the results of open auditory tests.

And so, below I will talk about the coders most suitable for each individual case, as well as about the correct choice of parameters. I recommend using foobar2000 for converting (the converter setting is described in detail here ), the actual parameters are specified just for it. In addition, for foobar2000 there are a large number of useful DSPs that may be useful to us for preprocessing audio.

For those who are going to convert via the console or another program: the variable% s should be replaced with the name of the source file (or a similar variable), and% d - with the name of the output file.

Please note that for each bitrate range the possible formats are indicated: the first is the highest priority. If your player does not support the first option - pay attention to the next one, etc. As I already wrote, only three codecs are worthy of attention today - these are AAC, OGG Vorbis and Musepack. WMA, because of its closeness, does not differ in its special quality, but it is still better in most cases than MP3. Given that some alternative devices only support WMA, I will give recommendations for each of the four formats.

As for bit rates : you need to understand that the optimal coding mode is the so-called. True VBR, i.e. mode with target quality, not bit rate. Ideally, the result is a track with a variable bit rate, but constant quality (do not equate these two concepts - more complex track fragments need more bits to maintain quality). Thus, the output bitrate is difficult to predict. Therefore, the bitrate values below are indicated only as approximate, if possible - averages for a large number of compositions of varying complexity.

Mentioned in this article, as well as some other coders, with Russian descriptions of the main parameters and recommendations can be found here .

Ultra low bit rates (~ 25-40 kbps)

This range is great for encoding audio books. And then there can be only one option - AAC, or rather, Nero AAC . The parameters are as follows:

-lc -q 0.35 -ignorelength -if - -of %d

In this case, the material must be pre-converted to mono and resampled to a frequency of 22050 Hz (preferably with a SoX resampler). At the output we get the usual Low Complexity AAC with a bitrate of about 25 kbit / s.

For music in this range, there are also options:

1) Nero AAC . There is no need for any transformations:

-q 0.15 -ignorelength -if - -of %d

The output is High Efficiency AAC v2 (with parametric stereo and RF synthesis), ~ 35 kbit / s. A great option for some internet radio. Only here we must not forget that the decoder in the player must support HE-AACv2, otherwise you will get a complete absence of HF and monophony.

2) OGG Vorbis AoTuV - this modification of libvorbis includes an improvement of the coding algorithm with low bit rates and, even without SBR technology, is not much inferior to HE-AACv2. Command line:

-s %r -Q -q-2 - -o %d

Files obtained in this way must be fully compatible with standard OGG Vorbis decoders. The bitrate is similar - about 35 kbps.

3) WMA 10 Pro . For such cases, Microsoft also has something like SBR (RF synthesis), which doesn't sound as bad as it could have been. True bitrate goes beyond - 48 kbps.

-silent -a_codec WMA9PRO -a_mode 3 -a_setting 48_44_2_16 -input %s -output %d

Consider that old (especially “iron”) decoders do not support WMA 10. For such a case, you can use WMA 9.2 (the same encoder), although its quality at low bitrates is much worse.

-silent -a_codec WMA9STD -a_mode 3 -a_setting 48_44_2 -input %s -output %d

Low bitrate, ~ 64 kbps

Initially, I thought immediately go to higher speeds. But since quite recently a comparison of coders on this bitrate was held on hydrogenaudio.org, it's a sin to miss it.

1) QuickTime AAC - the winner (except for the new Opus / CELT) of the test itself. The following are the settings for the QAAC encoder:

-s -v 64 --he -q 2 --ignorelength - -o %d

At the output we have HE-AAC (with SBR, but without Parametric Stereo), which should be supported by various iPods and the like.

2) OGG Vorbis AoTuV - although it turned out to be quite far from QAAC, but still:

-s %r -Q -q0 - -o %d

3) And just in case WMA 10 Pro :

-silent -a_codec WMA9PRO -a_mode 3 -a_setting 64_44_2_16 -input %s -output %d

For older decoders - WMA 9 Standard:

-silent -a_codec WMA9STD -a_mode 3 -a_setting 64_44_2 -input %s -output %d

Slightly higher, ~ 80-100 kbps

And I consider this bitrate already from Vorbis.

1) As the tests showed, the OGG Vorbis AoTuV coder handles it best :

-s %r -Q -q1 - -o %d

2) Nero AAC - a very good result. In places where high is not so brightly expressed, it may sound even better than Vorbis (it loses on high due to synthesis).
30 -ignorelength -if - -of% d

The profile used is HE-AAC.

Standard de facto, 128 kbps

Interesting fact: many people say that 128 kbps for MP3 is “borderline bitrate”, from which the quality that is indistinguishable from the original begins. Perhaps this is so ... for plastic Chinese speakers with blatnyak. In reality, this threshold is somewhere around 200 kbps, and the new formats provide a more stable quality on this bitrate.

Modern coders managed to underestimate this bar at 128 kbps almost twice (again, according to the developers). But, nevertheless, if you have more or less decent acoustics (or headphones), on complex fragments the difference can be caught at 128 kbps.

1) Nero AAC :

-q 0.40 -ignorelength -if - -of %d

Profile - regular AAC LC.

2) OGG Vorbis AoTuV :

-s %r -Q -q2.8 - -o %d

3) WMA 10 Pro :

-silent -a_codec WMA9PRO -a_mode 3 -a_setting 128_44_2_24 -input %s -output %d

For older decoders - WMA 9 Standard:

-silent -a_codec WMA9STD -a_mode 3 -a_setting 128_44_2 -input %s -output %d

~ 160-192 kbps

In this range, the difference between the Nero, QuickTime AAC and Vorbis coders is almost disappearing. But here that Musepack is already on the scene. Its advantage begins to manifest itself on these bitrates (due to the unusually flexible VBR mode, as well as a fundamentally different compression algorithm):

1) Musepack --silent --quality 5 - %d

2) Nero AAC -q 0.50 -ignorelength -if - -of %d

3) OGG Vorbis AoTuV : -s %r -Q -q5 - -o %d

4) WMA 9 Standard :

-silent -a_codec WMA9STD -a_mode 3 -a_setting 160_44_2 -input %s -output %d

Transparency threshold: ~ 200-225 kbps

What I was talking about. In this bitrate, almost all coders give a transparent sound to most listeners. And it is this range that is optimal in terms of size / quality.

By the way, the LAME MP3 in this area also has a similar threshold (VBR V2), but this codec has very big problems with pre-echo (distortions preceding sharp bursts of the signal), and there is often Noise Shaping to the ear (noise from quantization errors in this way transferred to the high frequency domain).

For the same codecs like Vorbis, AAC and MPC, on this threshold a clear drawing of even background noise begins in the compositions.

1) Musepack --silent --quality 6 - %d

2) Nero AAC -q 0.55 -ignorelength -if - -of %d

3) OGG Vorbis AoTuV : -s %r -Q -q6 - -o %d

4) WMA 10 Pro :

-silent -a_codec WMA9PRO -a_mode 3 -a_setting 192_44_2_24 -input %s -output %d

WMA 9 Standard, the maximum bit rate perceived by the old decoders:

-silent -a_codec WMA9STD -a_mode 3 -a_setting 192_44_2 -input %s -output %d

Reasonable maximum: ~ 320-350 kbps

I must draw your attention: after ~ 225 kbps, the increase in bitrate most often does not give an audible increase in quality, and the size of files naturally increases. But still, for especially complex compositions (and good equipment / ears), there are higher quality settings. On these bitrates for such encoders as Museppack and Vorbis, I did not even manage to find the killer samples (problem samples, which clearly show the disadvantages of the coding algorithm). So:

1) OGG Vorbis AoTuV -s %r -Q -q9 - -o %d

2) Musepack --silent --quality 10 - %d

3) QAAC -s -V 127 -q 2 --ignorelength - -o %d

4) WMA 10 Pro -silent -a_codec WMA9PRO -a_mode 3 -a_setting 384_44_2_24 -input %s -output %d

Ahead of your questions: yes, for some of these coders, there are higher quality settings, but further raising them no longer makes any sense. Unless the volume of memory occupied by music really is not important to you, and your device does not have the support of lossless.

That's all I wanted to share with you. Try, comment, ask questions.

Source: https://habr.com/ru/post/118454/

All Articles