Procedural audio

The time spent at the computer comes down mainly to “looking at the screen”: whether a person writes code, watches a movie, plays or finds out news. Just because the main feeling of a person is sight. And it largely determines the perception of reality, the mechanisms of interaction with it, and guides the development of many technologies.
The usual example of the evolution of iron and software over the past few decades: compare Wolfenstein 3D (1992) and Crysis 3 (2013). How advanced and complicated computer graphics during this time is difficult to underestimate.
But alas, this cannot be said about sound. How to correct a situation, and there will be article.

Image from page peripheriques.free.fr/blog/index.php?/past/2010-pure-data-read-as-pure-data

The basis of the article is the material of lectures, which periodically reads by Andy Farnell (Andy Farnell). In these lectures, he sets out the essence of his vision of the development and use of computational audio (or procedural audio, procedural / computational audio) and gives a number of interesting examples that I would like to share with the community.
A few words about Andy: sound designer, programmer, professor at a number of universities, an audio engineer in the past. His professional career was once strongly influenced by the demoscene, in particular the work of The Black Lotus. Andy is also the author of the book “Designing Sound”, which contains a lot of useful theory on sound physics, modeling and psychoacoustics, and a wide variety of practical exercises written in the Pure Data visual programming language.

The article turned out to be a little more than expected, so if it's interesting to just listen to the results - go directly to the section with examples.
If, on the contrary, you want to delve more into the details, watch the entire lecture, Andy tells a lot of interesting things in it.
')

A bit of history

So, the possibilities of computer graphics have reached the detailed modeling of objects and their interactions in real time. All this is due to the use of many technologies and concepts, including individual graphics cores and more optimal image-building algorithms. This makes visualization adaptive with respect to system resources without apparent loss of quality. The use of a dynamic level of detail of the visual space significantly saves resources (the Mipmapping method, for example). But naturally, it did not appear immediately. Already mentioned game Wolfenstein 3D looked terrible, if anyone remembers. Among some experts then there was the opinion that the photographic approach a la Myst (1993) will remain forever. But the low power of computers and the terrible results did not stop the enthusiasts.

In parallel with the development of graphics, serious sound synthesis in applications was not particularly thought of, since computer audio "for lack of attention" was still at the embryonic stage. Agree, even today when a person hears the word "synthesizer", the first association is likely to be a man with a brilliant head of hair, extracting ridiculous sounds from a plastic parallelepiped. Although now there are synthesizers capable of creating melodious, complex and beautiful sounds of a completely different nature (say, Massive from Native Instruments or Zebra from U-he, which was recently written on Habré ). Interestingly, when I hear the words “first person shooter,” I don’t have associations with huge pixels and eight-bit color, even though I’m not a gamer at all.

Current situation

Audio technologies in non-sound-oriented applications are not adaptable and flexible. The approach to creating sound is often based on pre-prepared data, and this creates a number of serious limitations. With a linear increase in the number of objects in the simulated space, the number of their possible interactions grows nonlinearly (n (n-1) / 2). This does not allow creating a realistic sound field, because in the general case it is impossible to predict all possible types of interaction and prepare sounds for each of them. In addition, regardless of the importance and proximity of the object on the stage, the resources spent on playback of the audio associated with the object do not change at all. As was the original audio file, say, 44.1 kHz 24 bits, so it plays. In the approach to infinity, this approach is ineffective. In this case, the default argument against the use of computational audio is that the performance of modern conventional computers is supposedly not enough. As will become clear further, this is not true for a long time. All examples of this lecture were calculated live on an ordinary gigahertz laptop, using a small part of the resources.

Unlike data-based design, we can use a dynamic level of sound detail in procedural audio. Turn on and off individual blocks of models and use the psychoacoustic features of human perception of sound, removing complex details of the signal or replacing them with something simpler, if necessary. For example, simulating the sound of rain, it is natural to begin with the fall of individual drops. Over time, when the density of falling drops becomes quite high, the noise of rain is perfectly approximated by the usual filtered pink noise - this is a kind of audio Mitmapping.

Procedural audio development process

Procedural audio involves a well-structured design process, applying the same principles as when writing programs: modularity and reuse. Such an organized approach has many positive consequences. I’ll say right away that it doesn’t kill the creative component of the designer’s work at all, since the sound design stack has several levels of abstraction, and the designer can work at the level at which he is comfortable. In favor of this, Andy gives an analogy with the practicality of the OSI seven-tier model and suggests the following general structure of the stack:

Behavior (Behavior)
Model (Model)
Method
Implementation

Behavior, as you can guess, is the behavior that causes the sound. There is no particular difficulty.

Simulation, on the other hand, is the most difficult stage. Here it is important to understand not only the physics of the process, but also the peculiarities of our perception of sound. Do not go too deep into the structure of the object. In the end, although the performance of the desktops is high, you can always write a redundantly complex model that cannot be calculated in real time (some remember the exact simulation of the sound of water, which took several hours to create in a few seconds). But even for very complex models there are sufficient capacities: according to Andy, a research team from Queen Mary University of London showed that CUDA technology can be very successfully used in computational audio and produce excellent results (unfortunately, I did not find a description of the results).
As David Beck (sound designer, author of the excellent book “The Csound Book”) says, the realism of sound is not at all connected with extreme detailing and accuracy, but with what can be called “acoustic viability”, that is, how well the physical parameters relate to the audible sound. If the perceived seems sensible to us, we intuitively think that "it sounds right."
One of the most important concepts in physical modeling is the “key process” (signature process) —the process on which the sound produced mostly depends. For walking, for example, the support reaction is important. It, in turn, depends on the force applied by the foot on a certain area. Depending on the type of walking, most of the weight can be shifted to the heel, and maybe to the toe. From a structural point of view, the leg can be represented as three joints. Plus, two legs, and they are in phase correlation. And so on. The model is complicated. But ultimately it depends entirely on only one parameter - the speed of the object, and it is elementary to control this parameter at a high level of abstraction.
In the general case, in modeling it is necessary to use a weighted approach, understand how something sounds outside, and, depending on the needs, resort to physical specificity (phenomenal approach, physically informed / contextualised models). This sometimes implies a departure from the realistic concept towards hyperrealism. For example, in life a gun does not sound as expressive as it used to be heard in games. Hyperrealism is often perceived by gamers better than realism.

The method is a palette of sound design techniques that is used to synthesize a particular sound. For example, you can use additive synthesis or frequency modulation instead. This is a separate big topic, which in this article should not go deep. It is important to note that methods are often interchangeable.

In terms of implementation in general and in detail, there are also many interesting points.
One of the possible ways to implement the following. Each simulated object has its own methods. Some of these methods relate to sounds produced by the object. At the physical level, we can say that the object reacts to the impulse that is transmitted to it when it collides with another object. Depending on the parameters of the pulse and the properties of objects and the sound is generated. To do this, each object must have its own “impulse key” (impulse signature), which determines the acoustic response of the object to the impulse.
In the implementation layer, there is also substitutability. Dan Stoll - also an expert in audio programming - took as a basis examples from Andy’s book (up to and including methods) and rewrote them in SuperCollider language. The result was of the same quality.

More pluses

As I said, modularity and reuse have a lot of positive effects.
Let's look at this example. We have created a Turbine class object that generates the noise of an airplane engine. The aircraft has two engines. We create the second object of our class. Now, for a person who is virtually in the pilot's cabin, we use the sound generated by these two objects, but pass them through the spatial processing module (psycho-acoustics in action: a person will hear on the left and right behind one engine) and hang impulse on them cockpit characteristic (impulse response is a unique characteristic of any space; roughly speaking, this is how the walls of a room reflect the different spectral components of sound, this creates the effect of being in the room). And for a person who is passing an airplane along the runway, the original generated sound can be simply passed through the module that simulates the Doppler effect, and the approach / delete module (if desired, you can add reflected sound from the runway).
Another positive consequence of this approach is scalability. For example, the same “Cylinder” class, depending on the settings of the parameters, will simulate a tin can, a big drum, and a piece of a huge pipe. Or use the same intermittent friction module for the squeak of the door hinge, the creaking of leather on furniture, the squeal of tires, or for rubbing a scrub on glass.

One of the most important aspects is financial. Andy told an interesting story from his experience. His friend, a sound engineer, recorded the sound of a real aircraft engine for an aircraft simulator. To do this, it was necessary to take microphones for several thousand euros, rent the engine itself for a day, buy fuel for it, pay for the day of work of a qualified service engineer, pay for the hangar rental and pay for medical insurance for the whole team. The sound producer was engaged in so-called. analytical recording - that is, the recording of individual components and processes in the engine: the ignition sequence, specific resonant vibrations of various cavities, the noise of the rotor brushes after cessation of fuel supply, and other similar details. The output was a few tens of gigabytes of high-quality recordings. Andy, having carefully analyzed these records, made a model that weighs less than 1 KB and is easily calculated in real time on a netbook. The sound, naturally, was very realistic.

Perspectives

A few years ago, in 2005, Andy tried to find like-minded people in the field of game development, but, according to him, this industry from the inside turned out to be much more conservative than it is customary to think. In addition to the fact that people do not really think about alternatives, even if you bring all the arguments for - very few people want to change the settled game development process and lose money in case of an unsuccessful experiment. In addition, there is no proven and effective way to create procedural audio, since there are very few pioneers. But they are still there, and lately there are more of them. For example, there is the game Pugs Luv Beats , the main feature of which is in procedural music, depending on the gameplay. The audio engine is also implemented on Pure Data.
Of course, it is difficult to start using procedural audio, but the libraries of modules and components will gradually become saturated, and in the long term the process of creating sound design will go faster and produce better results.
There is no need to fear that this approach will supplant the traditional sound design - computational audio will complement the existing set of tools, and not replace it entirely. It is not difficult to imagine how many specializations this will create and how it will expand the market. Moreover, there are now gradations between recorded sounds and procedural audio (for example, grain dictionaries dictionaries can be used - microsamples of the original sound, from which sound is algorithmically collected during the game).

How far can such an approach to sound go? Gradually, many new niches appear in interactive art and synchronized media. Probably, it will be possible to buy a disc not with a musical recording, but with an algorithm that generates it (to some extent it is already possible now ). However, living sound and human recording will not completely replace it, but it will significantly expand and enrich the experience of perception of art in general. Again, this is not a bipolar thing at all, a whole range of full-fledged approaches using procedural audio comes to mind. As, for example, in computer graphics, after fully modeled movements, over time, they switched over to models whose movement is recorded from live actors. It's just another way of recording reality.

Examples

In order of increasing sound complexity:

walking and running on gravel (@ 50: 53)
intermittent friction parameterized for door hinge (@ 48: 55)
a switch (@ 54: 07) and a clock based on it (@ 55: 50)
car engine (from another video) with tire squeal (intermittent friction module) (@ 53: 08)
electric motor (@ 1: 01: 49) and a robot from several such modules (@ 1: 02: 15)
propeller blades (@ 1: 03: 53)
fan in fan shaft (blades + electric motor + pulse characteristic of a shaft) (@ 1: 05: 34)
wind (@ 1: 07: 36) , rain (@ 1: 09: 14) , window (@ 46: 50) , all three together, "sitting by the window during a storm" (@ 1: 10: 15)
“Voice module” for robots (@ 1: 11: 37) , a lion and a cow on the same basis (@ 1: 12: 28)
the flame (@ 1: 13: 41) , the very engine of the aircraft (turbine + flame) (@ 1: 16: 01)
helicopter on the ground (using a propeller model) (@ 1: 19: 38) and flying (with an approximation / removal module) (@ 1: 25: 02)

Materials:

Andy Farnell Lecture in London, 2013: www.youtube.com/watch?v=sp83-Pq7TyQ (the same material, but in five parts and in the worst quality: 1 , 2 , 3 , 4 , 5 )
A large excerpt from the book "Designing Sound" in the public domain: aspress.co.uk/ds/pdf/pd_intro.pdf
Interview with Andy on Designingsound.org: designingsound.org/2012/01/procedural-audio-interview-with-andy-farnell
Interview with the creators of Pugs Luv Beats ibid: designingsound.org/2012/01/the-sound-of-pugs-luv-beats
Study of the effectiveness and quality of audio modeling at Queen Mary University of London: http://www.eecs.qmul.ac.uk/~josh/documents/HendryReiss-AES129.pdf
Pure Data: puredata.info
I also want to mention the remarkable Survey of Music Technology course from a practical and cultural point of view: class.coursera.org/musictech-001 and the ChucK course starting soon: www.coursera.org/course/chuck101

Source: https://habr.com/ru/post/196130/

All Articles