WAVE and JPEG media compression / storage methods, part 1

Hello! My first series of articles will focus on the methods of compressing and storing images / sound, such as JPEG (image) and WAVE (sound), as well as examples of programs using these formats (.jpg, .wav) in practice. In this part we will look at WAVE.

Story

WAVE (Waveform Audio File Format) is a container file format for storing an audio stream recording. This container is typically used to store uncompressed sound in pulse code modulation. (Taken from Wikipedia)

It was coined and published in 1991 together with RIFF by Microsoft and IBM (Leading IT companies of the time).

File structure

The file has a header part, the data itself, but no footer. The header weighs a total of 44 bytes.
The header contains settings for the number of bits in the sample, the frequency of the descritization, the depth of the sound, etc. information required for a sound card. (All numerical values of the table should be written in the Little-Endian order)

Block name	Block Size (B)	Description / Purpose	Value (for some it is fixed
chunkId	four	Defining a file as a media container	0x52494646 in Big-Endian ("RIFF")
chunkSize	four	The size of the entire file without chunkId and chunkSize	FILE_SIZE - 8
format	four	Type Definition from RIFF	0x57415645 in Big-Endian ("WAVE")
subchunk1Id	four	~~To make the file take up more space~~ format continuation	0x666d7420 in Big-Endian ("fmt")
subchunk1Size	four	Remaining header (in bytes)	16 by default (for the case without audio stream compression)
audioFormat	2	Audio format (depends on compression method and audio data structure)	1 (for PCM, which we are considering)
numChannels	2	Number of channels	1/2, we take 1 channel (3/4/5/6/7 ... - specific audio track, for example 4 for quad sound, etc.)
sampleRate	four	Sampling frequency of sound (in Hertz)	The more, the better the sound will be, but the more memory is required to create an audio track of the same length, the recommended value is 48000 (the most acceptable sound quality)
byteRate	four	The number of bytes in 1 second	sampleRate numChannels bitsPerSample (hereinafter)
blockAlign	2	Number of bytes for 1 sample	numChannels * bitsPerSample: 8
bitsPerSample	2	The number of bits per 1 sample (depth)	Any number that is a multiple of 8. The more, the better and harder the audio will be, from 32 bits there is no difference for a person.
subchunk2Id	four	The reference mark of the beginning of the data (since there may be other elements of the header depending on the audioFormat)	0x64617461 in Big-Endian ("data")
subchunk2Size	four	Size of the data area	data size in int
data	byteRate * audio duration	Audio data	?

WAVE example

The previous table can be easily translated into a structure in C, but our language for today is Python. The easiest thing you can do using a “wave” is a noise generator. For this task we do not need high byteRate and compression.
First, we import the necessary modules:

# WAV.py from struct import pack #  py-     C from os import urandom #    /dev/urandom,  windows: # from random import randint # urandom = lambda sz: bytes([randint(0, 255) for _ in range(sz)]) #   windows, .. urandom'    from sys import argv, exit #      if len(argv) != 3: # +1   (-1,   ) print('Usage: python3 WAV.py [num of samples] [output]') exit(1)

Next, we need to create all the necessary variables from the table by their size. Non-constant values in it depend here only on numSamples (number of samples). The more there will be, the longer our noise will go.

 numSamples = int(argv[1]) output_path = argv[2] chunkId = b'RIFF' Format = b'WAVE' subchunk1ID = b'fmt ' subchunk1Size = b'\x10\x00\x00\x00' # 0d16 audioFormat = b'\x01\x00' numChannels = b'\x02\x00' # 2-    () sampleRate = pack('<L', 1000) # 1000 ,    ,     .  1000-  ,   bitsPerSample = b'\x20\x00' # 0d32 byteRate = pack('<L', 1000 * 2 * 4) # sampleRate * numChannels * bitsPerSample / 8 (32 bit sound) blockAlign = b'\x08\x00' # numChannels * BPS / 8 subchunk2ID = b'data' subchunk2Size = pack('<L', numSamples * 2 * 4) # * numChannels * BPS / 8 chunkSize = pack('<L', 36 + numSamples * 2 * 4) # 36 + subchunk2Size data = urandom(1000 * 2 * 4 * numSamples) #

It remains only to write them in the required sequence (as in the table):

 with open(output_path, 'wb') as fh: fh.write(chunkId + chunkSize + Format + subchunk1ID + subchunk1Size + audioFormat + numChannels + sampleRate + byteRate + blockAlign + bitsPerSample + subchunk2ID + subchunk2Size + data) #

And so, done. To use the script, we need to add the necessary command line arguments:
python3 WAV.py [num of samples] [output]
num of samples - count samples
output - the path to the output file

Here is a link to a test audio file with noise, but to save memory, I lowered the BPS to 1b / s and dropped the number of channels to 1 (with a 32-bit uncompressed stereo audio stream at 64kbs, I got 80M pure .wav file, and only 10): https: / /instaud.io/3Dcy

The entire code (WAV.py) (The code has a lot of duplication of variable values, this is just a sketch):

 from struct import pack #  py-     C from os import urandom #    /dev/urandom,  windows: # from random import randint # urandom = lambda sz: bytes([randint(0, 255) for _ in range(sz)]) #   windows, .. urandom'    from sys import argv, exit #      if len(argv) != 3: # +1   (-1,   ) print('Usage: python3 WAV.py [num of samples] [output]') exit(1) numSamples = int(argv[1]) output_path = argv[2] chunkId = b'RIFF' Format = b'WAVE' subchunk1ID = b'fmt ' subchunk1Size = b'\x10\x00\x00\x00' # 0d16 audioFormat = b'\x01\x00' numChannels = b'\x02\x00' # 2-    () sampleRate = pack('<L', 1000) # 1000 ,    . bitsPerSample = b'\x20\x00' # 0d32 byteRate = pack('<L', 1000 * 2 * 4) # sampleRate * numChannels * bitsPerSample / 8 (32 bit sound) blockAlign = b'\x08\x00' # numChannels * BPS / 8 subchunk2ID = b'data' subchunk2Size = pack('<L', numSamples * 2 * 4) # * numChannels * BPS / 8 chunkSize = pack('<L', 36 + numSamples * 2 * 4) # 36 + subchunk2Size data = urandom(1000 * 2 * 4 * numSamples) #   with open(output_path, 'wb') as fh: fh.write(chunkId + chunkSize + Format + subchunk1ID + subchunk1Size + audioFormat + numChannels + sampleRate + byteRate + blockAlign + bitsPerSample + subchunk2ID + subchunk2Size + data) #

Total

So you learned a little more about digital sound and how it is stored. In this post, we didn’t use compression (audioFormat), but for reviewing each of the popular articles you’ll need articles 10. I hope you have learned something new for yourself and this will help you in future developments.
Thank!

Sources

WAV file structure
WAV - Wikipedia

Source: https://habr.com/ru/post/450774/

All Articles