📜 ⬆️ ⬇️

Use Audio API to create vocoder

In the last article, we learned a little about the capabilities of the Audio API and wrote a simple signal visualizer. Now it is time to dig deeper and try out new API chips. But we need a goal to which we will strive, and in this case our goal will be to make fun of the incoming signal and its characteristics. In other words, we will write a small vocoder .

Since the final code turned out to be quite large, the article will cover the most important and interesting fragments from the point of view of the Audio API. The final result, of course, you can look at the demo .


Selection of signal source


So Audio API supports three types of signal source:
  1. Source created using audio tag
  2. Audio buffer
  3. External audio stream (stream) (microphone or any other audio stream, including external)

In the demo example , all three types of source are implemented, as well as the ability to switch between them. We will consider, perhaps, the most interesting of them, namely the external audio stream from the microphone.
')
In order to reach our source, we first need to obtain the user's permission and capture the audio stream. And what do you think, we do not have to fence tons of code for this, but just use one function called getUserMedia . This magic f-I takes three arguments:
  1. The type of data to which access is requested. It is an object of the form -
    {video: true, audio: true} 
  2. The multimedia data capture function, which receives a captured stream as an argument.
  3. The error handling function occurred during the capture.

So, taking into account various specifications of browsers, our initialization function will look like this:
 var d = document, w = window, context = null, dest = null, source = null; var init = function () { try { var audioContext = w.AudioContext || w.webkitAudioContext; navigator.getMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia; //  context = new audioContext(); //   dest = context.destination; var bufferLoader = new BufferLoader(context, ["effects/reverb.wav"], function (buffers) { navigator.getMedia({ audio: true }, function (striam) { //       source = context.createMediaStreamSource(striam); }, function (e) { alert(e); }); }); bufferLoader.load(); } catch (e) { alert (e.message); } }; 


Consider what happens here. At the beginning we create the audioContext for our page (what it is described in the previous article ), then we see a new function BufferLoader . It is engaged in using XHR2 to pull external audio files and carefully store them in the buffer. In our case, we need it to tighten one audio effect, which will be described below. This function is not standard and we will have to write it.
 //    var BufferLoader = function (context, urlList, callback) { this.context = context; this.urlList = urlList; this.onload = callback; this.bufferList = new Array(); this.loadCount = 0; }; BufferLoader.prototype.load = function () { for (var i = 0; i < this.urlList.length; ++i) { this.loadBuffer(this.urlList[i], i); } }; BufferLoader.prototype.loadBuffer = function (url, index) { var request = new XMLHttpRequest(); request.open("GET", url, true); request.responseType = "arraybuffer"; var loader = this; request.onload = function () { loader.context.decodeAudioData( request.response, function (buffer) { if (!buffer) { alert('error decoding file data: ' + url); return; } loader.bufferList[index] = buffer; if (++loader.loadCount == loader.urlList.length) { loader.onload(loader.bufferList); } }, function (error) { console.error('decodeAudioData error', error); } ); } request.onerror = function () { alert('BufferLoader: XHR error'); } request.send(); }; 

After loading the effect, we capture the audio stream and, if the user allows it, we will need to associate the captured signal with our audio context. For this we will use the createMediaStreamSource function . Now our input signal is at our complete disposal and, believe me, we make fun of it notably over it.

Signal processing and change


The time has come for us to write a function that will ruthlessly mock the input stream. Consider the methods we will use:

We use these methods and distribute our f-th transformation:
 var AudioModulation = function (buffers, source) { var am = this; //  var sourceGain = context.createGain(); sourceGain.gain.value = 2; //        var sourceConvolver = context.createConvolver(); sourceConvolver.buffer = buffers[0]; //   var sourceCompressor = context.createDynamicsCompressor(); sourceCompressor.threshold.value = -18.2; sourceCompressor.ratio.value = 4; //    source.connect(sourceGain); sourceGain.connect(sourceConvolver); sourceConvolver.connect(sourceCompressor); //   sourceCompressor.connect(dest); } 


All values, such as loudness or the value from which the compression starts, can be tied to the user interface for changes in real time, as is done in the demo. As a result of passing the signal through our function, we get a slightly enhanced version with an echo effect (as if you were talking with a bucket on your head or in a helmet). But we will not get a fundamentally new sound at the output yet, which means that we are moving on. The next step we will try to implement the effect of ring modulation.

Ring modulation


Ring modulation is an audio effect that was very popular in the “shaggy” years and was used to create voices of all kinds of monsters and robots. The essence of this effect is that we have two signals, one is called a carrier and is a synthesized signal of arbitrary frequency, and the second is a modulating signal, and these signals are multiplied. As a result, we get a new signal with distortion and metallic notes. To implement this miracle, consider the following methods:

Well, this is enough for us to realize our plans. As a result, the Fm AudioModulation will be transformed into:
Function code
 var AudioModulation = function (buffers, source) { var am = this; //  var sourceGain = context.createGain(); sourceGain.gain.value = 2; //        var sourceConvolver = context.createConvolver(); sourceConvolver.buffer = buffers[0]; //   var sourceCompressor = context.createDynamicsCompressor(); sourceCompressor.threshold.value = -18.2; sourceCompressor.ratio.value = 4; //    source.connect(sourceGain); sourceGain.connect(sourceConvolver); sourceConvolver.connect(sourceCompressor); //  var ringGain = this.ringModulation(); sourceCompressor.connect(ringGain); //   ringGain.connect(dest); } AudioModulation.prototype.ringModulation = function () { //,   var ringGain = context.createGain(); ringGain.gain.value = 1; //  ringCarrier = context.createOscillator(); //     40 ringCarrier.type = ringCarrier.SINE; ringCarrier.frequency.value = 40; //    ringCarrier.detune.value = 600; // ,      10 var ngHigpass = context.createBiquadFilter(); ngHigpass.type = ngHigpass.HIGHPASS; ngHigpass.frequency.value = 10; //    ,        ringCarrier.connect(ngHigpass); ngHigpass.connect(ringGain.gain); return ringGain; }; 


Well, it is quite another thing, after all this we will get a rather disguised "robotic signal", but as they say, there is not a lot of good, and therefore we will add an equalizer to all of this magnificence for manual tuning of various frequencies, but to realize it will be with the help of the already known f- createBiquadFilter with the type highshelf.

Frequency filtering


First, create an array with the settings for which we will build the filters:
 var filters = [{gain: 1,frequency: 40},{gain: 3,frequency: 120},....,{gain: -2,frequency: 16000}]; 

The parameters in it are gain level and frequency. Now the function that creates filters:
 AudioModulation.prototype.setFilters = function (source) { var fil = [{ gain: 1, frequency: 40 }, { gain: 3, frequency: 120 }, { gain: -2, frequency: 16000}], out = null, ln = fil.length; for (var i = 0; i < ln; i++) { var loc = fil[i], currFilter = null; currFilter = context.createBiquadFilter(); currFilter.type = currFilter.HIGHSHELF; currFilter.gain.value = loc.gain; currFilter.Q.value = 1; currFilter.frequency.value = loc.frequency; if (!out) { source.connect(currFilter); out = currFilter; } else { out.connect(currFilter); out = currFilter; } } return out; }; 

As a result, the f-I transformation will take the form:
Function code
 var AudioModulation = function (buffers, source) { var am = this; //  var sourceGain = context.createGain(); sourceGain.gain.value = 2; //        var sourceConvolver = context.createConvolver(); sourceConvolver.buffer = buffers[0]; //   var sourceCompressor = context.createDynamicsCompressor(); sourceCompressor.threshold.value = -18.2; sourceCompressor.ratio.value = 4; //    source.connect(sourceGain); sourceGain.connect(sourceConvolver); sourceConvolver.connect(sourceCompressor); //  var ringGain = this.ringModulation(); sourceCompressor.connect(ringGain); //  var outFilters = this.setFilters(sourceCompressor); //   outFilters.connect(dest); } //  AudioModulation.prototype.ringModulation = function () { //,   var ringGain = context.createGain(); ringGain.gain.value = 1; //  ringCarrier = context.createOscillator(); //     40 ringCarrier.type = ringCarrier.SINE; ringCarrier.frequency.value = 40; //    ringCarrier.detune.value = 600; // ,      10 var ngHigpass = context.createBiquadFilter(); ngHigpass.type = ngHigpass.HIGHPASS; ngHigpass.frequency.value = 10; //    ,        ringCarrier.connect(ngHigpass); ngHigpass.connect(ringGain.gain); return ringGain; }; // AudioModulation.prototype.setFilters = function (source) { var fil = [{ gain: 1, frequency: 40 }, { gain: 3, frequency: 120 }, { gain: -2, frequency: 16000}], out = null, ln = fil.length; while (ln--) { var loc = fil[ln], currFilter = null; currFilter = context.createBiquadFilter(); currFilter.type = currFilter.HIGHSHELF; currFilter.gain.value = loc.gain; currFilter.Q.value = 1; currFilter.frequency.value = loc.frequency; if (!out) { source.connect(currFilter); out = currFilter; } else { out.connect(currFilter); out = currFilter; } } return out; }; 


Well, now we have a full-fledged equalizer and can amplify or weaken any frequency in the signal. And if we were "lalkami", then we would stop at what was achieved and with a calm conscience tormented the microphone while playing with the settings, but we want more. And here we will, so to speak, add a cherry to the cake - we will try to realize an effect called pitch shifter .

Change the key


The essence of the effect is that a copy of it is added to the signal, which lags behind the main tone by any interval within two octaves up or down. This is a very fashionable effect and its implementation is damn complicated, so we will make, so to speak, a simplified version.
In order to start working on this effect, we will need an interface that would allow us to receive signal data that we could change.
To create it, we will use the Discrete Fourier Transform (or, to be more precise, its kind of window Fourier Transform ) and the method known to us from the previous article, createScriptProcessor . It takes three parameters: buffer (the size of the frame or data window that is selected from the signal per unit of time), numberOfInputChannels (number of input channels), numberOfOutputChannels (number of output channels). The result of calling this method is to create an interface object that we need. The resulting object has its own event onaudioprocess , which works every time a new sample of data from the signal occurs. In total, the conversion of our signal will look like this:
 var currentGrainSize = 512 var currentOverLap = 0.50; var currentShiftRatio = 0.77; var node = context.createScriptProcessor(currentGrainSize, 1, 1); //       -   (     ) node.grainWindow = hannWindow(currentGrainSize); // ,        node.buffer = new Float32Array(currentGrainSize* 2); node.onaudioprocess = function (event) { //  var input = event.inputBuffer.getChannelData(0); //  output = event.outputBuffer.getChannelData(0), ln = input.length; for (i = 0; i < ln; i++) { //    input[i] *= this.grainWindow[i]; //     this.buffer[i] = this.buffer[i + currentGrainSize]; //   this.buffer[i + currentGrainSize] = 0.0; } //   var grainData = new Float32Array(currentGrainSize * 2); for (var i = 0, j = 0.0; i < currentGrainSize; i++, j += currentShiftRatio) { var index = Math.floor(j) % currentGrainSize; var a = input[index]; var b = input[(index + 1) % currentGrainSize]; grainData[i] += linearInterpolation(a, b, j % 1.0) * this.grainWindow[i]; } //  for (i = 0; i < currentGrainSize; i += Math.round(currentGrainSize * (1 - currentOverLap))) { for (j = 0; j <= currentGrainSize; j++) { this.buffer[i + j] += grainData[j]; } } //      for (i = 0; i < currentGrainSize; i++) { output[i] = this.buffer[i]; } } 

Now, using the parameters of step and overlap, we can get the effect of accelerating or slowing down pronunciation. For calculations, we will need to implement the f-and hannWindow (F- ih calculation of the window Hanna) and linearInterpolation (f-i linear interpolation). The final version of our conversion will be as follows:
Function code
 var AudioModulation = function (buffers, source) { var am = this, currentGrainSize = 512, currentOverLap = 0.50, currentShiftRatio = 0.77, node = context.createScriptProcessor(currentGrainSize, 1, 1); //  var sourceGain = context.createGain(); sourceGain.gain.value = 2; //        var sourceConvolver = context.createConvolver(); sourceConvolver.buffer = buffers[0]; //   var sourceCompressor = context.createDynamicsCompressor(); sourceCompressor.threshold.value = -18.2; sourceCompressor.ratio.value = 4; //    source.connect(sourceGain); sourceGain.connect(sourceConvolver); sourceConvolver.connect(sourceCompressor); //  var ringGain = this.ringModulation(); sourceCompressor.connect(ringGain); //  var outFilters = this.setFilters(sourceCompressor); //   outFilters.connect(dest); //       -   (     ) node.grainWindow = this.hannWindow(currentGrainSize); // ,        node.buffer = new Float32Array(currentGrainSize* 2); node.onaudioprocess = function (event) { //  var input = event.inputBuffer.getChannelData(0); //  output = event.outputBuffer.getChannelData(0), ln = input.length; for (i = 0; i < ln; i++) { //    input[i] *= this.grainWindow[i]; //     this.buffer[i] = this.buffer[i + currentGrainSize]; //   this.buffer[i + currentGrainSize] = 0.0; } //   var grainData = new Float32Array(currentGrainSize * 2); for (var i = 0, j = 0.0; i < currentGrainSize; i++, j += currentShiftRatio) { var index = Math.floor(j) % currentGrainSize; var a = input[index]; var b = input[(index + 1) % currentGrainSize]; grainData[i] += am.linearInterpolation(a, b, j % 1.0) * this.grainWindow[i]; } //  for (i = 0; i < currentGrainSize; i += Math.round(currentGrainSize * (1 - currentOverLap))) { for (j = 0; j <= currentGrainSize; j++) { this.buffer[i + j] += grainData[j]; } } //      for (i = 0; i < currentGrainSize; i++) { output[i] = this.buffer[i]; } } } AudioModulation.prototype.hannWindow = function (length) { var window = new Float32Array(length); for (var i = 0; i < length; i++) { window[i] = 0.5 * (1 - Math.cos(2 * Math.PI * i / (length - 1))); } return window; }; AudioModulation.prototype.linearInterpolation = function (a, b, t) { return a + (b - a) * t; }; //  AudioModulation.prototype.ringModulation = function () { //,   var ringGain = context.createGain(); ringGain.gain.value = 1; //  ringCarrier = context.createOscillator(); //     40 ringCarrier.type = ringCarrier.SINE; ringCarrier.frequency.value = 40; //    ringCarrier.detune.value = 600; // ,      10 var ngHigpass = context.createBiquadFilter(); ngHigpass.type = ngHigpass.HIGHPASS; ngHigpass.frequency.value = 10; //    ,        ringCarrier.connect(ngHigpass); ngHigpass.connect(ringGain.gain); return ringGain; }; // AudioModulation.prototype.setFilters = function (source) { var fil = [{ gain: 1, frequency: 40 }, { gain: 3, frequency: 120 }, { gain: -2, frequency: 16000}], out = null, ln = fil.length; while (ln--) { var loc = fil[ln], currFilter = null; currFilter = context.createBiquadFilter(); currFilter.type = currFilter.HIGHSHELF; currFilter.gain.value = loc.gain; currFilter.Q.value = 1; currFilter.frequency.value = loc.frequency; if (!out) { source.connect(currFilter); out = currFilter; } else { out.connect(currFilter); out = currFilter; } } return out; }; 


Well, now, with a clear conscience, we can enjoy the work done. Of course, you can not stop there and, for example, add a spectrum visualizer, some fashionable effect like Phaser , but this is up to you. Now, digging deeper into the Audio API, it becomes clear that thanks to the mechanisms that are now available to developers, it is possible to implement almost any effects and audio signal processing. You are limited only by your imagination.
You can see the final version with a different signal source with a control interface here:

Useful literature and sources:



PS Testing was conducted in Chrome and Opera browsers, so that everything will work optimally in them. In the rest, various errors may appear (which I will try to eliminate in a timely manner). In IE, you can not even look.

Source: https://habr.com/ru/post/211905/


All Articles