One day, while developing one project, I came across an interesting puzzle.
Initial conditions:
- a device that via ffmpeg via a webcam recorded video with audio track
- length of recording about a minute
- to create conditions under which to adjust the noise level once to continue to work autonomously
And of course from that moment the brainstorming began.
First I decided to understand what noise is and how it happens.
"Noise - a set of aperiodic sounds of varying intensity and frequency" Wiki
Noises, at least, are constant (crackle, buzz), and intermittent (car, drop, music, shock).
With the first type, everything turned out to be much simpler, many programs and algorithms just specialize in periodic and constant noises, because all you have to do is get a noise mask (for example, at the beginning of the recording), and clear everything else using this mask. But I had exactly chaotic noise, so let's move on.
In my case, the noise was chaotic, and had no common periodic mask, and could not have it. After a heap of time killed for googling, and reading audyphile forums, I found 3 solutions:
- Spectral noise subtraction
- Active noise cancellation
- Limitations of frequencies used by the human voice
Spectral subtraction
Noise on a real signal (recording) has several features:
- aperiodic
- has a finite spectrum
If one mask is enough for periodic noise to wipe all noise, then in this case the track is divided into many segments, and each of them is considered as an independent signal, and the noise mask for it is considered individually. Thus we get a number of masks with which you can work. The idea is similar to the discrete Fourier transform circuit. The idea could be developed, but I did not find a single software product working under such a scheme. Therefore, this item has been omitted.
')
Active noise cancellation
The essence of active noise cancellation is to receive a synchronous signal containing
only noise (without vocals, voice and other sound of interest to us). Further, this "pure noise" signal is inverted into the so-called antiphase, and superimposed on the noisy recording. As a result, Noise and Anti-Noise suppress each other, and the output is a clean record. It would seem - what is easier? But in my conditions, it was not possible to have a second microphone to record the background noise, without filling that interests me. And this item went into oblivion.
Frequency limits used by the human voice
Again, after shoveling tons of information, a range from 300 Hz to 3000 Hz was taken for the range of the human voice.
This item turned out to be more real in execution, although it did not guarantee an excellent result.
Checked the solution through Nero Wave Editor, limiting the frequency. The background noise, of course, did not disappear, but it lost in weight, and became monotonous and muffled, while the human voice remained unchanged, and it was already heard much better.
Thus, for myself, I chose the last item. Let the result is not perfect, but this is the simplest way to autonomous noise reduction, requiring virtually no personal costs. The only caveat was that the recording as a whole became a little quieter, which is absolutely not scary.
PS All reasoning is repelled precisely by the requirements of autonomy to the noise level. Naturally, if you process files manually, there are many more possibilities.
References:
Study of the problem of suppressing sound noise during video broadcasts.