Hello! My first series of articles will focus on the methods of compressing and storing images / sound, such as JPEG (image) and WAVE (sound), as well as examples of programs using these formats (.jpg, .wav) in practice. In this part we will look at WAVE.
WAVE (Waveform Audio File Format) is a container file format for storing an audio stream recording. This container is typically used to store uncompressed sound in pulse code modulation. (Taken from Wikipedia)
It was coined and published in 1991 together with RIFF by Microsoft and IBM (Leading IT companies of the time).
The file has a header part, the data itself, but no footer. The header weighs a total of 44 bytes.
The header contains settings for the number of bits in the sample, the frequency of the descritization, the depth of the sound, etc. information required for a sound card. (All numerical values ​​of the table should be written in the Little-Endian order)
Block name | Block Size (B) | Description / Purpose | Value (for some it is fixed |
---|---|---|---|
chunkId | four | Defining a file as a media container | 0x52494646 in Big-Endian ("RIFF") |
chunkSize | four | The size of the entire file without chunkId and chunkSize | FILE_SIZE - 8 |
format | four | Type Definition from RIFF | 0x57415645 in Big-Endian ("WAVE") |
subchunk1Id | four | 0x666d7420 in Big-Endian ("fmt") | |
subchunk1Size | four | Remaining header (in bytes) | 16 by default (for the case without audio stream compression) |
audioFormat | 2 | Audio format (depends on compression method and audio data structure) | 1 (for PCM, which we are considering) |
numChannels | 2 | Number of channels | 1/2, we take 1 channel (3/4/5/6/7 ... - specific audio track, for example 4 for quad sound, etc.) |
sampleRate | four | Sampling frequency of sound (in Hertz) | The more, the better the sound will be, but the more memory is required to create an audio track of the same length, the recommended value is 48000 (the most acceptable sound quality) |
byteRate | four | The number of bytes in 1 second | sampleRate numChannels bitsPerSample (hereinafter) |
blockAlign | 2 | Number of bytes for 1 sample | numChannels * bitsPerSample: 8 |
bitsPerSample | 2 | The number of bits per 1 sample (depth) | Any number that is a multiple of 8. The more, the better and harder the audio will be, from 32 bits there is no difference for a person. |
subchunk2Id | four | The reference mark of the beginning of the data (since there may be other elements of the header depending on the audioFormat) | 0x64617461 in Big-Endian ("data") |
subchunk2Size | four | Size of the data area | data size in int |
data | byteRate * audio duration | Audio data | ? |
The previous table can be easily translated into a structure in C, but our language for today is Python. The easiest thing you can do using a “wave” is a noise generator. For this task we do not need high byteRate and compression.
First, we import the necessary modules:
# WAV.py from struct import pack # py- C from os import urandom # /dev/urandom, windows: # from random import randint # urandom = lambda sz: bytes([randint(0, 255) for _ in range(sz)]) # windows, .. urandom' from sys import argv, exit # if len(argv) != 3: # +1 (-1, ) print('Usage: python3 WAV.py [num of samples] [output]') exit(1)
Next, we need to create all the necessary variables from the table by their size. Non-constant values ​​in it depend here only on numSamples (number of samples). The more there will be, the longer our noise will go.
numSamples = int(argv[1]) output_path = argv[2] chunkId = b'RIFF' Format = b'WAVE' subchunk1ID = b'fmt ' subchunk1Size = b'\x10\x00\x00\x00' # 0d16 audioFormat = b'\x01\x00' numChannels = b'\x02\x00' # 2- () sampleRate = pack('<L', 1000) # 1000 , , . 1000- , bitsPerSample = b'\x20\x00' # 0d32 byteRate = pack('<L', 1000 * 2 * 4) # sampleRate * numChannels * bitsPerSample / 8 (32 bit sound) blockAlign = b'\x08\x00' # numChannels * BPS / 8 subchunk2ID = b'data' subchunk2Size = pack('<L', numSamples * 2 * 4) # * numChannels * BPS / 8 chunkSize = pack('<L', 36 + numSamples * 2 * 4) # 36 + subchunk2Size data = urandom(1000 * 2 * 4 * numSamples) #
It remains only to write them in the required sequence (as in the table):
with open(output_path, 'wb') as fh: fh.write(chunkId + chunkSize + Format + subchunk1ID + subchunk1Size + audioFormat + numChannels + sampleRate + byteRate + blockAlign + bitsPerSample + subchunk2ID + subchunk2Size + data) #
And so, done. To use the script, we need to add the necessary command line arguments:python3 WAV.py [num of samples] [output]
num of samples - count samples
output - the path to the output file
Here is a link to a test audio file with noise, but to save memory, I lowered the BPS to 1b / s and dropped the number of channels to 1 (with a 32-bit uncompressed stereo audio stream at 64kbs, I got 80M pure .wav file, and only 10): https: / /instaud.io/3Dcy
The entire code (WAV.py) (The code has a lot of duplication of variable values, this is just a sketch):
from struct import pack # py- C from os import urandom # /dev/urandom, windows: # from random import randint # urandom = lambda sz: bytes([randint(0, 255) for _ in range(sz)]) # windows, .. urandom' from sys import argv, exit # if len(argv) != 3: # +1 (-1, ) print('Usage: python3 WAV.py [num of samples] [output]') exit(1) numSamples = int(argv[1]) output_path = argv[2] chunkId = b'RIFF' Format = b'WAVE' subchunk1ID = b'fmt ' subchunk1Size = b'\x10\x00\x00\x00' # 0d16 audioFormat = b'\x01\x00' numChannels = b'\x02\x00' # 2- () sampleRate = pack('<L', 1000) # 1000 , . bitsPerSample = b'\x20\x00' # 0d32 byteRate = pack('<L', 1000 * 2 * 4) # sampleRate * numChannels * bitsPerSample / 8 (32 bit sound) blockAlign = b'\x08\x00' # numChannels * BPS / 8 subchunk2ID = b'data' subchunk2Size = pack('<L', numSamples * 2 * 4) # * numChannels * BPS / 8 chunkSize = pack('<L', 36 + numSamples * 2 * 4) # 36 + subchunk2Size data = urandom(1000 * 2 * 4 * numSamples) # with open(output_path, 'wb') as fh: fh.write(chunkId + chunkSize + Format + subchunk1ID + subchunk1Size + audioFormat + numChannels + sampleRate + byteRate + blockAlign + bitsPerSample + subchunk2ID + subchunk2Size + data) #
So you learned a little more about digital sound and how it is stored. In this post, we didn’t use compression (audioFormat), but for reviewing each of the popular articles you’ll need articles 10. I hope you have learned something new for yourself and this will help you in future developments.
Thank!
Source: https://habr.com/ru/post/450774/
All Articles