Use Audio API to create vocoder

In the last article, we learned a little about the capabilities of the Audio API and wrote a simple signal visualizer. Now it is time to dig deeper and try out new API chips. But we need a goal to which we will strive, and in this case our goal will be to make fun of the incoming signal and its characteristics. In other words, we will write a small vocoder .

Since the final code turned out to be quite large, the article will cover the most important and interesting fragments from the point of view of the Audio API. The final result, of course, you can look at the demo .

Selection of signal source

So Audio API supports three types of signal source:

Source created using audio tag
Audio buffer
External audio stream (stream) (microphone or any other audio stream, including external)

In the demo example , all three types of source are implemented, as well as the ability to switch between them. We will consider, perhaps, the most interesting of them, namely the external audio stream from the microphone.
')
In order to reach our source, we first need to obtain the user's permission and capture the audio stream. And what do you think, we do not have to fence tons of code for this, but just use one function called getUserMedia . This magic f-I takes three arguments:

The type of data to which access is requested. It is an object of the form -
```
{video: true, audio: true} 
```
The multimedia data capture function, which receives a captured stream as an argument.
The error handling function occurred during the capture.

So, taking into account various specifications of browsers, our initialization function will look like this:

 var d = document, w = window, context = null, dest = null, source = null; var init = function () { try { var audioContext = w.AudioContext || w.webkitAudioContext; navigator.getMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia; //  context = new audioContext(); //   dest = context.destination; var bufferLoader = new BufferLoader(context, ["effects/reverb.wav"], function (buffers) { navigator.getMedia({ audio: true }, function (striam) { //       source = context.createMediaStreamSource(striam); }, function (e) { alert(e); }); }); bufferLoader.load(); } catch (e) { alert (e.message); } };

Consider what happens here. At the beginning we create the audioContext for our page (what it is described in the previous article ), then we see a new function BufferLoader . It is engaged in using XHR2 to pull external audio files and carefully store them in the buffer. In our case, we need it to tighten one audio effect, which will be described below. This function is not standard and we will have to write it.

 //    var BufferLoader = function (context, urlList, callback) { this.context = context; this.urlList = urlList; this.onload = callback; this.bufferList = new Array(); this.loadCount = 0; }; BufferLoader.prototype.load = function () { for (var i = 0; i < this.urlList.length; ++i) { this.loadBuffer(this.urlList[i], i); } }; BufferLoader.prototype.loadBuffer = function (url, index) { var request = new XMLHttpRequest(); request.open("GET", url, true); request.responseType = "arraybuffer"; var loader = this; request.onload = function () { loader.context.decodeAudioData( request.response, function (buffer) { if (!buffer) { alert('error decoding file data: ' + url); return; } loader.bufferList[index] = buffer; if (++loader.loadCount == loader.urlList.length) { loader.onload(loader.bufferList); } }, function (error) { console.error('decodeAudioData error', error); } ); } request.onerror = function () { alert('BufferLoader: XHR error'); } request.send(); };

After loading the effect, we capture the audio stream and, if the user allows it, we will need to associate the captured signal with our audio context. For this we will use the createMediaStreamSource function . Now our input signal is at our complete disposal and, believe me, we make fun of it notably over it.

Signal processing and change

The time has come for us to write a function that will ruthlessly mock the input stream. Consider the methods we will use:

createGain - This method allows you to amplify our signal. It has one gain parameter - gain value
createConvolver - This method allows the convolution of signals (addition). The method has two parameters buffer - the impulse characteristic with which the input signal is added is written here. In our case, this is the external file that was mentioned above. We will use this method in order to obtain the effect of a gradual decrease in the intensity of sound during its multiple reflections. In fact, this is not a trivial task, and to solve it, it will be necessary to make not frail calculations, but we cheat and use the ready impulse response, which is full on the Internet. The second parameter, normalize , indicates whether the impulse response can be scaled.
createDynamicsCompressor - Implements the effect of audio signal compression. In other words, compression allows you to make a narrower difference between the quietest and loudest sounds. The method has the following parameters: threshold (Threshold level, determines the value above which the compressor starts to attenuate the signal), ratio (determines the intensity of signal attenuation), attack (this is the time that elapses between the threshold value and the moment the compressor starts up. Experiments with this parameter allow to obtain special effects, for example, possible to make the sound of the bass drum noticeably sharper), release (the time that elapses between how the input signal level has fallen below the threshold and the moment when the compressor ceases attenuate the signal.

We use these methods and distribute our f-th transformation:

 var AudioModulation = function (buffers, source) { var am = this; //  var sourceGain = context.createGain(); sourceGain.gain.value = 2; //        var sourceConvolver = context.createConvolver(); sourceConvolver.buffer = buffers[0]; //   var sourceCompressor = context.createDynamicsCompressor(); sourceCompressor.threshold.value = -18.2; sourceCompressor.ratio.value = 4; //    source.connect(sourceGain); sourceGain.connect(sourceConvolver); sourceConvolver.connect(sourceCompressor); //   sourceCompressor.connect(dest); }

All values, such as loudness or the value from which the compression starts, can be tied to the user interface for changes in real time, as is done in the demo. As a result of passing the signal through our function, we get a slightly enhanced version with an echo effect (as if you were talking with a bucket on your head or in a helmet). But we will not get a fundamentally new sound at the output yet, which means that we are moving on. The next step we will try to implement the effect of ring modulation.

Ring modulation

Ring modulation is an audio effect that was very popular in the “shaggy” years and was used to create voices of all kinds of monsters and robots. The essence of this effect is that we have two signals, one is called a carrier and is a synthesized signal of arbitrary frequency, and the second is a modulating signal, and these signals are multiplied. As a result, we get a new signal with distortion and metallic notes. To implement this miracle, consider the following methods:

createOscillator - This method allows you to generate signals of arbitrary frequency and shape. It has three parameters: type (waveform: 1 - sine wave, 2 - rectangular, 3 - saw, 4 - triangular), frequency (signal frequency), detune (detuning - frequency deviation. Each octave consists of 1200 cents, and each the semitone consists of 100 cents. By specifying the offset of 1200, you can go up one octave, and having specified the offset of –1200 one octave down.)
createBiquadFilter - The method allows to implement frequency filtering. The method has four parameters: frequency (the frequency on which the filter is based), gain (level of frequency gain), Q ( Q ),
type - types of filtering that are supported out of the box:
1. lowpass - lowpass filter (cuts off anything higher than the selected frequency)
2. highpass - high-pass filter (cuts everything below the selected frequency)
3. bandpass - bandpass filter (passes only a certain frequency band)
4. lowshelf - shelf at low frequencies (means that anything below the selected frequency is amplified or weakened),
5. highshelf - shelf at high frequencies (means that everything that is higher than the selected frequency is amplified or weakened)
6. peaking - narrowband peak filter (amplifies a certain frequency, the popular name is “filter bell”),
7. notch - notch filter (weakens a certain frequency, the popular name is “filter-plug”),
8. allpass is a filter that passes all frequencies of a signal with equal gain, but changes the phase of the signal. This happens when the change in the frequency transmission delay. Usually such a filter is described by one parameter - the frequency at which the phase shift reaches 90 °.

Well, this is enough for us to realize our plans. As a result, the Fm AudioModulation will be transformed into:

Function code

 var AudioModulation = function (buffers, source) { var am = this; //  var sourceGain = context.createGain(); sourceGain.gain.value = 2; //        var sourceConvolver = context.createConvolver(); sourceConvolver.buffer = buffers[0]; //   var sourceCompressor = context.createDynamicsCompressor(); sourceCompressor.threshold.value = -18.2; sourceCompressor.ratio.value = 4; //    source.connect(sourceGain); sourceGain.connect(sourceConvolver); sourceConvolver.connect(sourceCompressor); //  var ringGain = this.ringModulation(); sourceCompressor.connect(ringGain); //   ringGain.connect(dest); } AudioModulation.prototype.ringModulation = function () { //,   var ringGain = context.createGain(); ringGain.gain.value = 1; //  ringCarrier = context.createOscillator(); //     40 ringCarrier.type = ringCarrier.SINE; ringCarrier.frequency.value = 40; //    ringCarrier.detune.value = 600; // ,      10 var ngHigpass = context.createBiquadFilter(); ngHigpass.type = ngHigpass.HIGHPASS; ngHigpass.frequency.value = 10; //    ,        ringCarrier.connect(ngHigpass); ngHigpass.connect(ringGain.gain); return ringGain; };

Well, it is quite another thing, after all this we will get a rather disguised "robotic signal", but as they say, there is not a lot of good, and therefore we will add an equalizer to all of this magnificence for manual tuning of various frequencies, but to realize it will be with the help of the already known f- createBiquadFilter with the type highshelf.

Frequency filtering

First, create an array with the settings for which we will build the filters:

 var filters = [{gain: 1,frequency: 40},{gain: 3,frequency: 120},....,{gain: -2,frequency: 16000}];

The parameters in it are gain level and frequency. Now the function that creates filters:

 AudioModulation.prototype.setFilters = function (source) { var fil = [{ gain: 1, frequency: 40 }, { gain: 3, frequency: 120 }, { gain: -2, frequency: 16000}], out = null, ln = fil.length; for (var i = 0; i < ln; i++) { var loc = fil[i], currFilter = null; currFilter = context.createBiquadFilter(); currFilter.type = currFilter.HIGHSHELF; currFilter.gain.value = loc.gain; currFilter.Q.value = 1; currFilter.frequency.value = loc.frequency; if (!out) { source.connect(currFilter); out = currFilter; } else { out.connect(currFilter); out = currFilter; } } return out; };

As a result, the f-I transformation will take the form:

Function code

 var AudioModulation = function (buffers, source) { var am = this; //  var sourceGain = context.createGain(); sourceGain.gain.value = 2; //        var sourceConvolver = context.createConvolver(); sourceConvolver.buffer = buffers[0]; //   var sourceCompressor = context.createDynamicsCompressor(); sourceCompressor.threshold.value = -18.2; sourceCompressor.ratio.value = 4; //    source.connect(sourceGain); sourceGain.connect(sourceConvolver); sourceConvolver.connect(sourceCompressor); //  var ringGain = this.ringModulation(); sourceCompressor.connect(ringGain); //  var outFilters = this.setFilters(sourceCompressor); //   outFilters.connect(dest); } //  AudioModulation.prototype.ringModulation = function () { //,   var ringGain = context.createGain(); ringGain.gain.value = 1; //  ringCarrier = context.createOscillator(); //     40 ringCarrier.type = ringCarrier.SINE; ringCarrier.frequency.value = 40; //    ringCarrier.detune.value = 600; // ,      10 var ngHigpass = context.createBiquadFilter(); ngHigpass.type = ngHigpass.HIGHPASS; ngHigpass.frequency.value = 10; //    ,        ringCarrier.connect(ngHigpass); ngHigpass.connect(ringGain.gain); return ringGain; }; // AudioModulation.prototype.setFilters = function (source) { var fil = [{ gain: 1, frequency: 40 }, { gain: 3, frequency: 120 }, { gain: -2, frequency: 16000}], out = null, ln = fil.length; while (ln--) { var loc = fil[ln], currFilter = null; currFilter = context.createBiquadFilter(); currFilter.type = currFilter.HIGHSHELF; currFilter.gain.value = loc.gain; currFilter.Q.value = 1; currFilter.frequency.value = loc.frequency; if (!out) { source.connect(currFilter); out = currFilter; } else { out.connect(currFilter); out = currFilter; } } return out; };

Well, now we have a full-fledged equalizer and can amplify or weaken any frequency in the signal. And if we were "lalkami", then we would stop at what was achieved and with a calm conscience tormented the microphone while playing with the settings, but we want more. And here we will, so to speak, add a cherry to the cake - we will try to realize an effect called pitch shifter .

Change the key

The essence of the effect is that a copy of it is added to the signal, which lags behind the main tone by any interval within two octaves up or down. This is a very fashionable effect and its implementation is damn complicated, so we will make, so to speak, a simplified version.
In order to start working on this effect, we will need an interface that would allow us to receive signal data that we could change.
To create it, we will use the Discrete Fourier Transform (or, to be more precise, its kind of window Fourier Transform ) and the method known to us from the previous article, createScriptProcessor . It takes three parameters: buffer (the size of the frame or data window that is selected from the signal per unit of time), numberOfInputChannels (number of input channels), numberOfOutputChannels (number of output channels). The result of calling this method is to create an interface object that we need. The resulting object has its own event onaudioprocess , which works every time a new sample of data from the signal occurs. In total, the conversion of our signal will look like this:

 var currentGrainSize = 512 var currentOverLap = 0.50; var currentShiftRatio = 0.77; var node = context.createScriptProcessor(currentGrainSize, 1, 1); //       -   (     ) node.grainWindow = hannWindow(currentGrainSize); // ,        node.buffer = new Float32Array(currentGrainSize* 2); node.onaudioprocess = function (event) { //  var input = event.inputBuffer.getChannelData(0); //  output = event.outputBuffer.getChannelData(0), ln = input.length; for (i = 0; i < ln; i++) { //    input[i] *= this.grainWindow[i]; //     this.buffer[i] = this.buffer[i + currentGrainSize]; //   this.buffer[i + currentGrainSize] = 0.0; } //   var grainData = new Float32Array(currentGrainSize * 2); for (var i = 0, j = 0.0; i < currentGrainSize; i++, j += currentShiftRatio) { var index = Math.floor(j) % currentGrainSize; var a = input[index]; var b = input[(index + 1) % currentGrainSize]; grainData[i] += linearInterpolation(a, b, j % 1.0) * this.grainWindow[i]; } //  for (i = 0; i < currentGrainSize; i += Math.round(currentGrainSize * (1 - currentOverLap))) { for (j = 0; j <= currentGrainSize; j++) { this.buffer[i + j] += grainData[j]; } } //      for (i = 0; i < currentGrainSize; i++) { output[i] = this.buffer[i]; } }

Now, using the parameters of step and overlap, we can get the effect of accelerating or slowing down pronunciation. For calculations, we will need to implement the f-and hannWindow (F- ih calculation of the window Hanna) and linearInterpolation (f-i linear interpolation). The final version of our conversion will be as follows:

Function code

 var AudioModulation = function (buffers, source) { var am = this, currentGrainSize = 512, currentOverLap = 0.50, currentShiftRatio = 0.77, node = context.createScriptProcessor(currentGrainSize, 1, 1); //  var sourceGain = context.createGain(); sourceGain.gain.value = 2; //        var sourceConvolver = context.createConvolver(); sourceConvolver.buffer = buffers[0]; //   var sourceCompressor = context.createDynamicsCompressor(); sourceCompressor.threshold.value = -18.2; sourceCompressor.ratio.value = 4; //    source.connect(sourceGain); sourceGain.connect(sourceConvolver); sourceConvolver.connect(sourceCompressor); //  var ringGain = this.ringModulation(); sourceCompressor.connect(ringGain); //  var outFilters = this.setFilters(sourceCompressor); //   outFilters.connect(dest); //       -   (     ) node.grainWindow = this.hannWindow(currentGrainSize); // ,        node.buffer = new Float32Array(currentGrainSize* 2); node.onaudioprocess = function (event) { //  var input = event.inputBuffer.getChannelData(0); //  output = event.outputBuffer.getChannelData(0), ln = input.length; for (i = 0; i < ln; i++) { //    input[i] *= this.grainWindow[i]; //     this.buffer[i] = this.buffer[i + currentGrainSize]; //   this.buffer[i + currentGrainSize] = 0.0; } //   var grainData = new Float32Array(currentGrainSize * 2); for (var i = 0, j = 0.0; i < currentGrainSize; i++, j += currentShiftRatio) { var index = Math.floor(j) % currentGrainSize; var a = input[index]; var b = input[(index + 1) % currentGrainSize]; grainData[i] += am.linearInterpolation(a, b, j % 1.0) * this.grainWindow[i]; } //  for (i = 0; i < currentGrainSize; i += Math.round(currentGrainSize * (1 - currentOverLap))) { for (j = 0; j <= currentGrainSize; j++) { this.buffer[i + j] += grainData[j]; } } //      for (i = 0; i < currentGrainSize; i++) { output[i] = this.buffer[i]; } } } AudioModulation.prototype.hannWindow = function (length) { var window = new Float32Array(length); for (var i = 0; i < length; i++) { window[i] = 0.5 * (1 - Math.cos(2 * Math.PI * i / (length - 1))); } return window; }; AudioModulation.prototype.linearInterpolation = function (a, b, t) { return a + (b - a) * t; }; //  AudioModulation.prototype.ringModulation = function () { //,   var ringGain = context.createGain(); ringGain.gain.value = 1; //  ringCarrier = context.createOscillator(); //     40 ringCarrier.type = ringCarrier.SINE; ringCarrier.frequency.value = 40; //    ringCarrier.detune.value = 600; // ,      10 var ngHigpass = context.createBiquadFilter(); ngHigpass.type = ngHigpass.HIGHPASS; ngHigpass.frequency.value = 10; //    ,        ringCarrier.connect(ngHigpass); ngHigpass.connect(ringGain.gain); return ringGain; }; // AudioModulation.prototype.setFilters = function (source) { var fil = [{ gain: 1, frequency: 40 }, { gain: 3, frequency: 120 }, { gain: -2, frequency: 16000}], out = null, ln = fil.length; while (ln--) { var loc = fil[ln], currFilter = null; currFilter = context.createBiquadFilter(); currFilter.type = currFilter.HIGHSHELF; currFilter.gain.value = loc.gain; currFilter.Q.value = 1; currFilter.frequency.value = loc.frequency; if (!out) { source.connect(currFilter); out = currFilter; } else { out.connect(currFilter); out = currFilter; } } return out; };

Well, now, with a clear conscience, we can enjoy the work done. Of course, you can not stop there and, for example, add a spectrum visualizer, some fashionable effect like Phaser , but this is up to you. Now, digging deeper into the Audio API, it becomes clear that thanks to the mechanisms that are now available to developers, it is possible to implement almost any effects and audio signal processing. You are limited only by your imagination.
You can see the final version with a different signal source with a control interface here:

Demo
Github code

Useful literature and sources:

PS Testing was conducted in Chrome and Opera browsers, so that everything will work optimally in them. In the rest, various errors may appear (which I will try to eliminate in a timely manner). In IE, you can not even look.

Source: https://habr.com/ru/post/211905/

All Articles

Use Audio API to create vocoder

Selection of signal source

Signal processing and change

Ring modulation

Frequency filtering

Change the key

Useful literature and sources:

More articles: