So, we need to calculate the hash sum for an MP3 file. A simple file run through md5.exe is no good, since the file contains meta information — tags that tend to change over time. Thus, just by updating the tags in the file, we get a different hash sum, which is no good at all.
By the way, for FLAC and APE formats, this problem is practically absent, since they usually initially contain a hash sum of audio data that is written by the encoder. For FLAC, the value can be obtained with the command 
metaflac --show-md5sum .
Next is a fairly reliable way to calculate (not perceptual) hash based on binary data stored in MP3.
')
1) Approach # 1
2) Approach # 2
3) Xing and Lame tags
4) Resync
5) Reliability of calculation
Approach # 1. Remove unnecessary
The idea is such that if you remove all unnecessary (tags) from a file, then only the necessary information remains - audio data, using which you can calculate the hash.
The structure of the mp3 file:
- ID3v2 tag
- mpeg-frames - the actual audio data
- Lyrics tag
- APE tag
- ID3v1 tag (final)
(All tags are optional.)
ID3v2, unlike its predecessor, is at the beginning of the file, which allows the client to immediately read the meta-information if the file is transmitted over the network, for example. It starts with three ID3 ASCII characters, then the encoded tag length goes:
 if buf[0:3] == 'ID3':   id3_v2_len = 20 if ord(buf[5]) & 0x10 else 10   id3_v2_len += ((ord(buf[6]) * 128 + ord(buf[7])) * 128 + ord(buf[8])) * 128 + ord(buf[9])   audio_start = id3_v2_len 
Next come the audio frames. Their beginning can be visually noticed if you open the file in the 1251 encoding and find the characters "yy."
Now let's go from the end. ID3v1 is recognized as a 128-byte block at the end of the file, starting with the ASCII string "TAG". If you then look for the “LYRICSBEGIN” from the end, you can find the Lyrics3 tag. And if “APETAGEX” is an APEv2 tag.
If you cut it all out, only the audio data will have to remain. This approach is practiced by the mp3tag.de program, a bunch of private scripts and tagging libraries, a significant part of which is focused only on ID3, which, of course, interferes.
But the bad thing is that the tags can be often broken, written over each other, etc. With this approach, heaps of garbage are taken as audio data, which leads to the calculation of one hash sum, and after changing tags - to another, which is impermissible.
As a result, the program written in this manner I had to throw out after a collision with reality.
Approach # 2. Leave the right
MP3-players act the opposite way - they isolate what they are interested in - mpeg-frames, skipping everything that doesn’t look like frames, moreover, they do it very successfully - you can’t usually hear “sobs” on “bad” files. It is reasonable to do the same.
It looks like the foobar2000 player does it, which in my estimation works perfectly, but, of course, it won't work out to dispose of it.
MPlayer should do the same, but doubts arise because in fact he sometimes stumbles on incorrect tags, leaving them. The file cleanup command for it is: 
mplayer in.mp3 -dumpaudio -dumpfile out.mp3 .
There are also media libraries - mp3 decoders. These are mad, gstreamer, and libmpg123, which are used differently by various pickers. I didn’t try the first two, but libmpg123 went off with a bang - this code has been tested for years and a lot of projects, and it’s qualitative based on the results of my own research and comparisons. There, in 
doc/examples there is a source code for a micro program with a talking name 
extract_frames.c . The program takes the original mp3 file as input and sends clear audio frames to the output.
libmpg123 compiles to cygwin and mingw without any problems (although the mingw version is somehow buggy with stdin / stdout, so I had to fix the source by opening the file in binary mode myself). I slightly changed the program so that instead of frames it immediately gave out md5 and made a couple of changes described below. Source code, who are interested:
dl.dropbox.com/u/1883230/my/habr/mp3hash.zipTags Xing and Lame
But audio frames can 
also store meta-information that we want to get rid of so much - these are xing and lame tags, where the extra information for us used to optimize the movement along the vbr-stream is encoded, as well as the parameters used for encoding. In general, xing with a leim can be left as few people can and will change them, but if you suddenly perform the operation “utilities / fix vbr mp3 header” in foobar2000, then the hash sum for the file changes. So it would be better to throw this meta. You can stop these tags when hashing by passing the following parameter to libmpg123:
 
Resync
It was also helpful to remove the limit on resync limit. If this is not done, the program will “stumble” when it does not meet audio frames for a long time (4KB), which happens to files in which, for example, there is a large image inside ID3v2. In my version of the program, the hash sum is calculated the same, but the error flag that appears is spoiling everything and you can no longer be sure that the result was obtained without errors. And with this parameter, everything is fine:
 mpg123_param(m, MPG123_RESYNC_LIMIT, -1, 0.0) 
Reliability counting
In my limited opinion, foobar2000 works (gets rid of meta-information) perfectly. The patched program 
extract_frames.c does not cope with rare files, but after the operation “rebuild stream” in FUBAR, 95 cases out of 100 are already calculated correctly (compatible with FUBAR). Further, mplayer goes somewhat worse - it is almost always compatible with 
extract_frames (in the lame / xing accounting mode, of course), but, as I already wrote, it sometimes falls on garbage tags. Still further, there are various teggers that require sufficiently correct tags, and hashing tasks are unlikely to be applicable if there are more stable alternatives.
In general, after one major failure and struggle with a couple of aspects, I personally was pleased with this algorithm, checking it on a bunch of files.