Once I needed to solve a simple (as it seemed to me then) task - in the PHP script, find out the duration of the mp3 file. I heard about ID3 tags and immediately thought that the duration information is stored either in tags or in mp3 file headers. Superficial searches on the Internet have shown that in a couple of minutes this problem cannot be solved. Since I am by nature quite curious and the time was not tight - I decided not to use third-party tools but to figure out one of the most popular formats on my own.
If you are interested in what's inside - welcome to the cut (traffic).
')
In this article we will not dwell on extracting ID3v2 tags - this can be put in a separate article, as there are various nuances. And also on header fragments that are practically not used at present (for example, part of Emphasis of an mp3 frame header). We also do not consider the structure of the audio data itself - the ones that are heard from the speakers.
ID3 tags
ID3 (from the English. Identify a MP3) is a metadata format most commonly used in MP3 audio files. The ID3 signature contains information about the title of the track, album, artist name, etc., which are used by multimedia players and other programs, as well as by hardware players, to display information about the file and automatically organize the audio collection.
Wikipedia
There are two completely different versions of ID3 data: ID3v1 and ID3v2.
ID3v1 - has a fixed size of 128 bytes, which are appended to the end of the mp3 file. There you can store: track name, artist, album, year, comment, track number (for version 1.1) and genre.
Pretty soon it became clear to everyone that 128 bytes is a very small place to store such data. And therefore, over time, the second version of the data,
ID3v2 , appeared and is being successfully used.
Unlike the first version, v2 tags have a variable length and are placed at the beginning of the file, which allows you to support streaming playback. (The ID3v2.4 format also allows storing data at the end of the file).
ID3v2 data consists of a header and subsequent ID3v2 frames. For example, in version ID3v2.3 there are more than 70 frame types.
- the marker is always 'ID3'
- Currently there are three versions of ID3v2.2, ID3v2.3 and ID3v2.4
Version v2.2 is considered outdated.
v2.3 - the most popular version.
v2.4 - is gaining popularity. One of the differences from v2.3 is that it allows you to use UTF-8 encoding (and not just UTF-16) - Flags . Currently, only three (5,6,7) bits are used:
bin:% abc00000
a 'unsynchronisation' - used only with MPEG-2 and MPEG-2.5 formats.
b 'Extended header' - indicates the presence of an extended header
with 'Experimental indicator' - an experimental indicator - Length The peculiarity of specifying the length of ID3v2 data is that in each byte the 7th bit is not used and is always set to 0.
Consider an example:
In this case, along with the ID3v2 header (10 bytes) - ID3v2 data takes 1024 bytes.
After the ID3v2 header, the actual tags go. A detailed analysis of the reading of ID3v2 tags, as mentioned above, I decided not to include in this article.
Now we have information on the availability and length of ID3 tags and we can proceed to the analysis of the mp3-frame and understand the same - where is the duration stored. But at the same time understand everything else.
MP3 frame
The entire mp3 file consists of frames, which can only be extracted sequentially. The frame contains a header and audio data. Since we are not aiming to write a firmware for a tape recorder, we are interested in the frame header.
More about him (a bunch of tables and dry information)
Header size is 4 bytes.
Description:
- [0-10] Marker - 11 bits, filled with units (Frame sync)
- [11-12] MPEG version index (Audio version ID)

- [13-14] Layer version index (Layer index)
By the way, MP3 is MPEG-1 Layer III - [15] Protection bit
1 - no protection
0 - 16-bit header is protected. CRC (follows heading) - [16-19] Bitrate index
The table stores the bitrate values in kilobits / sec. However, in this format it is assumed that 1 kilobit = 1000 bits, not 1024. Thus, 96 Kbps = 96000 bits / sec. - [20-21] Sampling rate index

- [22] Padding bit
If it is installed, the data is shifted by 1 byte. This is important for calculating the frame size. - [23] Bit private (for information only)
- [24-25] Channel mode

- [26-27] Expanding channel mode. (Mode extension) Used only with joint stereo
- [28] Copyright (Copyright bit) - for information only
- [29] Original (original bit) - for information only.
- [30-31] Accent (Emphasis) - practically not used at the moment.
Compression modes or what is the bit rate
There are 3 data compression modes:
CBR (constant bitrate) - constant bitrate. Does not change throughout the track.
VBR (variable bitrate) - variable bitrate. With this compression, the bitrate is constantly changing throughout the track.
ABR (average bitrate) - averaged bitrate. This concept is used only when encoding a file. At the "output" file is obtained with VBR.
CBR
If the file is encoded with a constant bit rate - then we can
finally! get the duration of our track by the following formula:
Duration = Audio data size / Bit rate (in bits!) * 8
For example, the file has a size of 350,670 bytes. There are ID3v1 tags (128 bytes) and ID3v2 tags (1024 bytes). Bitrate = 96. Therefore, the size of the audio data is 350670 - 128 - 1024 = 349518 bytes.
Duration = 349518/96000 * 8 = 29.1265 = 29 seconds
VBR
It is necessary to explain how to determine the compression mode. It's simple. If the file is compressed with VBR, then a VBR header is added. By its presence, we can understand that a variable bitrate is used.
There are two kinds of headers: Xing and VBRI.
Xing is placed with the offset from the beginning of the first mp3-frame in position, according to the table:
For example: our ID3v2 tag is 1024 bytes. If our mp3-file has the channel mode “Stereo” - then the VBR Xing header will start with an offset of 1024 + 32 = 1056 bytes.
The VBRI header is always placed with an offset of +32 bytes from the beginning of the first mp3 frame.
The first four bytes in both headers contain the 'Xing' or 'Info' marker for Xing. And 'VBRI' for VBRI.
These VBR headers are of variable length and contain various information about the encoding of the file. More information about the structure of VBR headers (and not only) can be read, for example,
here .
I will only talk about what interests us at the moment. Namely - the number of frames (Number of Frames). This number is 4 bytes long.
In the Xing header, it is contained by offset +8 bytes from the beginning of the header. In VBRI +14 bytes from the beginning of the header.
Using the Sampler Per Frame table we can get the duration of an mp3 file encoded with a variable bit rate.
Duration = Number of frames * Samples per frame / Sample rate
For example: from the VBRI header received the number of frames 1118, samples per frame = 1152. The sampling frequency = 44100.
Duration = 1118 * 1152/44100 = 29.204 = 29 seconds.
That's all for today. If someone was useful -
thanks .
For those who want to immediately dig out the insides of mp3 -
Here lies the script in php, which I wrote for myself at the same time with this article and four small mp3-files for the test.
Links
id3.org - Read about ID3
id3.org - and something about the mp3 frame
In some detail about the mp3 frame
getID3: A good library for getting mp3 information. (Php)