HTML5 Audio Visualization

Our practical immersion describes an unusual scenario - we will not talk about what HTML5 can do, but about those opportunities that it still does not offer today and how this problem can be circumvented in practice.

Today’s HTML5 is a series, the ending of which even the writers do not know, a story in which there are almost ready-made chapters and chapters in rough drafts, and just notes for future stories.
')
Visualization of audio, or rather principled low - level access to audio information, lies somewhere between rough sketches and notes for the future.

What can and cannot <audio>

The < audio > HTML5 element, as you probably already guessed, does not provide any low-level API by itself. It only allows you to control the playback of the audio stream: start, pause, stop playback, find out the current position and the total duration of the composition.

In fact, I must honestly say that these are not the only problems and difficulties - if you have tried to make a somewhat complex application using several audio streams and the need to synchronize them, you will almost certainly encounter difficulties in implementation.

And it will depend not only on the features provided by the specification, but also (to a greater extent) on the implementation in a specific browser - it is not by chance that Rovio and Google, making Angry Birds on HTML5 , optimized for Chrome, abandoned the idea of using audio elements for sounds HTML5. Instead, Angry Birds HTML5 uses Flash. (See also the discussion on the developers blog .)

For a deeper immersion in the subject of the <audio> -element, I also recommend the article Unlocking the power of HTML5 <audio> , which describes the basic techniques for working audio in HTML5.

Sound extraction standards

Currently, work on the creation of a low-level API for accessing audio streams is already actively under way within the framework of the W3C Audio Group .

The developed API will provide not only the ability to get low-level access to the audio stream, but also to synthesize audio on the fly.

Audio audio synthesis and minimal audio latency. It will also add a programmatic access to the PCM audio stream.

Today, Mozilla and Google have already provided their own API versions for accessing audio information.

The Mozilla Audio Data API provides easy access to the audio stream for reading and writing, the task of implementing real-time audio processing algorithms should be solved on the script side (in JavaScript). The specification for Webkit - Google's Web Audio API - provides a high-level API, where the main processing tasks can be performed natively by the browser.

The W3C working group is working on a common approach that will provide a two-layer API to provide more features.

By the way, the group's field of activity in addition to the client API for working with Audio also includes the tasks of accessing audio devices, including microphones and other sound sources, and working with speakers, including in multi-channel mode.

You can follow the news of the group on twitter @ w3caudio .

But this is all the lyrics, let's get to practice!

Practical approach: what works today?

A practical approach that works today is a preprocessing .

Yes Yes! That's so trite. Pre-processing of audio information with the subsequent generation of visualization, synchronized with the reproduced audio stream.

In fact, if we are talking about the semantic retrieval of information (for example, song lyrics), then preprocessing is the only practical way out, and as a rule, this processing is done with pens.

In general, if the audio file and the rendering engine are known in advance, preprocessing is not just a good way, but also the only sensible, saving computational resource and, consequently, reducing the load on client machines.

Let's watch how this works.

Example from life: Chell in the Rain

Chell in the Rain is a beautiful audio text visualization of the song Exile Vilify . Synchronously with the audio stream, words from the lyrics appear on the screen.

What's inside

jQuery + Sizzle.js (for selectors)
jPlayer (for playing Audio and Video)
own code, which is actually interesting to us;)

How things work
Skipping audio initialization and event handlers to control playback.

The entire song is pre-beaten into fragments corresponding to the beginning of a particular phrase or stage of animation. The beginning of each fragment is stored in an array:

var timings = newArray(); timings[0] = 11.5; timings[1] = 17; timings[2] = 24; timings[3] = 29; timings[4] = 35.5; ...

Separately stored is an array of phrases from the lyrics:

 var lyrics = newArray(); lyrics[0] = 'Exile'; lyrics[1] = 'It takes your mind... again'; lyrics[2] = "You've got sucker's luck"; lyrics[3] ='Have you given up?'; ...

With reference to timing and on the basis of the current moment in the audio composition, a trigger is triggered for the transition to a new phrase:

 if(event.jPlayer.status.currentTime >= timings[currentTrigger] && nolyrics != true) { fireTrigger(currentTrigger); currentTrigger++; }

Then, at the right moment, one or another trigger is triggered, which launches the appropriate animation using jQuery:

 function fireTrigger(trigger) { switch (trigger) { case 0: $('#lyrics1 p').addClass('vilify').html(lyrics[0]).fadeIn(1500); break; case 1: $('#lyrics2 p').html(lyrics[1]).fadeIn(1000).delay(5000).fadeOut(1000); $('#lyrics1 p').delay(6000).fadeOut(1000); break; case 2: $('#lyrics1 p').fadeIn(1000); break; case 3: $('#lyrics2 p').fadeIn(1000).delay(4000).fadeOut(1000); $('#lyrics1 p').delay(5000).fadeOut(1000); break; case 4: $('#lyrics1 p').removeClass('vilify').html(lyrics[2]).fadeIn(1000); break; case 5: $('#lyrics2 p').html(lyrics[3]).fadeIn(1000).delay(3000).fadeOut(1000); $('#lyrics1 p').delay(4000).fadeOut(1000); break; ...

Quite simply and effectively, agree! The most important thing in this whole story is the ease of combining the audio stream and the capabilities of HTML, CSS, and JavaScript.

Example from life: Music Can Be Fun

Music Can Be Fun - a mini-game at the intersection of art and music. I suggest first to play a bit, so that it is clear what is going to be discussed;)

The example is more complicated - and here the possibilities of Canvas are already actively used, but since we are only interested in the musical component, everything is not so scary!

As in the previous case, the lyrics of the song are reproduced here, for which, in the wilds of the JS code, the corresponding time reference is sewn up:

 var _lyrics = [ ["00:17.94", "00:22.39", "When I have once or twice"], ["00:23.93", "00:30.52", "Thought I lived my life .. for"], ["00:40.74", "00:47.38", "Oh oh I'll wake up in a thousand years"], ["00:48.40", "00:52.06", "With every ghost I'm looking through"], ["00:53.33", "00:57.80", "I was a cold, cold boy"], ["00:59.52", "01:03.00", "Hey! Oh when I lie with you"], ...

In addition to the text, if you played with a toy, you could not help noticing the special effects that are also tied to the musical composition and the corresponding transitions. Binding is done absolutely similarly:

 var _effects = [ ["00:06.00", 1], ["00:30.50", 1], ["00:42.50", 1], ["00:54.50", 2], ["00:57.00", 1], ...

(In fact, even the frequencies of the appearance of blue and red balls are tied to the time;)

When updating the playback time (onTimeUpdate event), certain visualizations are applied:

 var _onTimeUpdate = function() { var t = MusicManager.currentTime = _song.currentTime; ... for (var i = _lyricsId; i < _lyrics.length; i++) { if (MusicManager.currentTime < _lyrics[i][0]) break; if (MusicManager.currentTime < _lyrics[i][1]) { SubtitleManager.changeSubtitle(_lyrics[i][2]); } else { SubtitleManager.changeSubtitle(""); _lyricsId++; } } for (var i = _effectsId; i < _effects.length; i++) { if (MusicManager.currentTime < _effects[i][0]) break; MusicManager.isEffect1Used = false; MusicManager.isEffect2Used = !_effects[i][1] == 2; _effectsId++; } } ... }

Still simple and effective. The same technique is easily applicable not only to textual information, but also to various visual effects.

It remains to be seen whether the second can somehow be automated in order not to do everything at all with pens. Obviously, you can - and Grant Skinner in his blog suggests how to do it;)

Life example: data extraction

In a blog post on Music Visualizer in HTML5 / JS with Source Code, Grant shares his experience in audio visualization using HTML5.

Faced with the fact that HTML5 Audio does not provide an API for extracting low-level data about the song being played, Grant wrote a small AIR application (the archive also contains examples) that allows you to extract sound level information from an mp3 file in text form or as an image.

On a larger scale, information about the musical composition is as follows:

Having the data in this form, it can be easily retrieved, for example, by means of Canvas. In the text form is still simpler (I do not give an example, because the data in the text file is packed).

To work with such preprocessed data, Grant wrote a special JavaScript library (VolumeData.js in the archive).

Working with the library is quite simple. It all starts with downloading information about the composition:

 loadMusic("music.jpg");

where, inside the loadMusic function, you guessed it, a regular image is loaded:

 function loadMusic(dataImageURL) { image = new Image(); image.src = dataImageURL; playing = false; Ticker.addListener(window); }

After downloading all the necessary components, sound data is extracted from the image:

 volumeData = newVolumeData(image);

Then at the right time from this data you can get both averaged information and information about the sound level in the left and right channels:

 var t = audio.currentTime; var vol = volumeData.getVolume(t); var avgVol = volumeData.getAverageVolume(t-0.1,t); var volDelta = volumeData.getVolume(t-0.05); volDelta.left = vol.left-volDelta.left; volDelta.right = vol.right-volDelta.right;

Visual effects are attached to this data. For visualization, the EaselJS library is used .
See how it works in practice, you can in the examples of Star Field and Atomic .

Conclusion

Summing up, it remains only to say that looking at all this, I do not leave the feeling that with HTML5 the industry is moving in the right direction. Yes, not everything is possible, and far from all things are done as easily (and generally possible) as they can be done today in Flash or Silverlight. But much is on the horizon!

Source: https://habr.com/ru/post/125832/

All Articles