📜 ⬆️ ⬇️

WAVE and JPEG media compression / storage methods, part 1

Hello! My first series of articles will focus on the methods of compressing and storing images / sound, such as JPEG (image) and WAVE (sound), as well as examples of programs using these formats (.jpg, .wav) in practice. In this part we will look at WAVE.


Story


WAVE (Waveform Audio File Format) is a container file format for storing an audio stream recording. This container is typically used to store uncompressed sound in pulse code modulation. (Taken from Wikipedia)

It was coined and published in 1991 together with RIFF by Microsoft and IBM (Leading IT companies of the time).


File structure


The file has a header part, the data itself, but no footer. The header weighs a total of 44 bytes.
The header contains settings for the number of bits in the sample, the frequency of the descritization, the depth of the sound, etc. information required for a sound card. (All numerical values ​​of the table should be written in the Little-Endian order)


Block nameBlock Size (B)Description / PurposeValue (for some it is fixed
chunkIdfourDefining a file as a media container0x52494646 in Big-Endian ("RIFF")
chunkSizefourThe size of the entire file without chunkId and chunkSizeFILE_SIZE - 8
formatfourType Definition from RIFF0x57415645 in Big-Endian ("WAVE")
subchunk1IdfourTo make the file take up more space format continuation0x666d7420 in Big-Endian ("fmt")
subchunk1SizefourRemaining header (in bytes)16 by default (for the case without audio stream compression)
audioFormat2Audio format (depends on compression method and audio data structure)1 (for PCM, which we are considering)
numChannels2Number of channels1/2, we take 1 channel (3/4/5/6/7 ... - specific audio track, for example 4 for quad sound, etc.)
sampleRatefourSampling frequency of sound (in Hertz)The more, the better the sound will be, but the more memory is required to create an audio track of the same length, the recommended value is 48000 (the most acceptable sound quality)
byteRatefourThe number of bytes in 1 secondsampleRate numChannels bitsPerSample (hereinafter)
blockAlign2Number of bytes for 1 samplenumChannels * bitsPerSample: 8
bitsPerSample2The number of bits per 1 sample (depth)Any number that is a multiple of 8. The more, the better and harder the audio will be, from 32 bits there is no difference for a person.
subchunk2IdfourThe reference mark of the beginning of the data (since there may be other elements of the header depending on the audioFormat)0x64617461 in Big-Endian ("data")
subchunk2SizefourSize of the data areadata size in int
databyteRate * audio durationAudio data?

WAVE example


The previous table can be easily translated into a structure in C, but our language for today is Python. The easiest thing you can do using a “wave” is a noise generator. For this task we do not need high byteRate and compression.
First, we import the necessary modules:


# WAV.py from struct import pack #  py-     C from os import urandom #    /dev/urandom,  windows: # from random import randint # urandom = lambda sz: bytes([randint(0, 255) for _ in range(sz)]) #   windows, .. urandom'    from sys import argv, exit #      if len(argv) != 3: # +1   (-1,   ) print('Usage: python3 WAV.py [num of samples] [output]') exit(1) 

Next, we need to create all the necessary variables from the table by their size. Non-constant values ​​in it depend here only on numSamples (number of samples). The more there will be, the longer our noise will go.


 numSamples = int(argv[1]) output_path = argv[2] chunkId = b'RIFF' Format = b'WAVE' subchunk1ID = b'fmt ' subchunk1Size = b'\x10\x00\x00\x00' # 0d16 audioFormat = b'\x01\x00' numChannels = b'\x02\x00' # 2-    () sampleRate = pack('<L', 1000) # 1000 ,    ,     .  1000-  ,   bitsPerSample = b'\x20\x00' # 0d32 byteRate = pack('<L', 1000 * 2 * 4) # sampleRate * numChannels * bitsPerSample / 8 (32 bit sound) blockAlign = b'\x08\x00' # numChannels * BPS / 8 subchunk2ID = b'data' subchunk2Size = pack('<L', numSamples * 2 * 4) # * numChannels * BPS / 8 chunkSize = pack('<L', 36 + numSamples * 2 * 4) # 36 + subchunk2Size data = urandom(1000 * 2 * 4 * numSamples) #   

It remains only to write them in the required sequence (as in the table):


 with open(output_path, 'wb') as fh: fh.write(chunkId + chunkSize + Format + subchunk1ID + subchunk1Size + audioFormat + numChannels + sampleRate + byteRate + blockAlign + bitsPerSample + subchunk2ID + subchunk2Size + data) #  

And so, done. To use the script, we need to add the necessary command line arguments:
python3 WAV.py [num of samples] [output]
num of samples - count samples
output - the path to the output file


Here is a link to a test audio file with noise, but to save memory, I lowered the BPS to 1b / s and dropped the number of channels to 1 (with a 32-bit uncompressed stereo audio stream at 64kbs, I got 80M pure .wav file, and only 10): https: / /instaud.io/3Dcy


The entire code (WAV.py) (The code has a lot of duplication of variable values, this is just a sketch):


 from struct import pack #  py-     C from os import urandom #    /dev/urandom,  windows: # from random import randint # urandom = lambda sz: bytes([randint(0, 255) for _ in range(sz)]) #   windows, .. urandom'    from sys import argv, exit #      if len(argv) != 3: # +1   (-1,   ) print('Usage: python3 WAV.py [num of samples] [output]') exit(1) numSamples = int(argv[1]) output_path = argv[2] chunkId = b'RIFF' Format = b'WAVE' subchunk1ID = b'fmt ' subchunk1Size = b'\x10\x00\x00\x00' # 0d16 audioFormat = b'\x01\x00' numChannels = b'\x02\x00' # 2-    () sampleRate = pack('<L', 1000) # 1000 ,    . bitsPerSample = b'\x20\x00' # 0d32 byteRate = pack('<L', 1000 * 2 * 4) # sampleRate * numChannels * bitsPerSample / 8 (32 bit sound) blockAlign = b'\x08\x00' # numChannels * BPS / 8 subchunk2ID = b'data' subchunk2Size = pack('<L', numSamples * 2 * 4) # * numChannels * BPS / 8 chunkSize = pack('<L', 36 + numSamples * 2 * 4) # 36 + subchunk2Size data = urandom(1000 * 2 * 4 * numSamples) #   with open(output_path, 'wb') as fh: fh.write(chunkId + chunkSize + Format + subchunk1ID + subchunk1Size + audioFormat + numChannels + sampleRate + byteRate + blockAlign + bitsPerSample + subchunk2ID + subchunk2Size + data) #     

Total


So you learned a little more about digital sound and how it is stored. In this post, we didn’t use compression (audioFormat), but for reviewing each of the popular articles you’ll need articles 10. I hope you have learned something new for yourself and this will help you in future developments.
Thank!


Sources

WAV file structure
WAV - Wikipedia


')

Source: https://habr.com/ru/post/450774/


All Articles