Development of interactive systems on OpenFrameworks: Interactive Sound

We recently talked about setting up and visualizing music using openFrameworks. Unfortunately, the Russian-language information on the OpenFrameworks framework is rather small. To fill this vacuum - we begin a series of publications of lectures that were given in Yekaterinburg at the Institute of Mathematics and Mechanics. N.N. Krasovsky (UB RAS) Denis Perevalov.

In this lecture, the theoretical fundamentals of digital sound will be explained, and an example of creating an interactive application for generating sound based on camera image capture will be shown.

What is digital sound, and sound in general?

')
Sound, in a broad sense - elastic waves, longitudinally propagating in the medium and creating mechanical vibrations in it;
in the narrow sense - the subjective perception of these fluctuations by special organs of sense of animals or humans. Like any wave, the sound is characterized by amplitude and frequency.

Presentation of sound in digital form

The real sound is captured by the microphone, then subjected to analog-to-digital conversion.

It is characterized
time resolution - sampling rate , [procedure - discretization]
resolution in amplitude - bit . [procedure - quantization]

Sampling frequency
8 000 Hz - phone, enough for speech.
11,025 Hz - games, samples for electronic music.
22,050 Hz - the same as 11,025 Hz.
44 100 Hz - many synthesizers and libraries of samples. Audio CD.
48 000 Hz - recording studios, live instruments, vocals. DVD.
96,000 Hz - DVD-Audio (MLP 5.1).
192,000 Hz - DVD-Audio (MLP 2.0).

Digit
The bit width is the number of bits used to represent the signal samples in quantization (in our case, in the amplitude quantization).

8 bit electronic music samples.
12 bit studio sound effects.
16 bit computer games, players, samples, Audio CD.18 bit studio sound effects
24 bit live sounds, vocals, DVD-Audio
32 bits is a floating point representation, so accuracy is not lost for quiet sounds, so it is used for internal sound processing.
64 bit also floating point, sound processing.

Representation of sound in memory
Example
1 second of 16-bit audio with a sampling rate of 44100 Hz can be represented as a vector
X = (x _1, x _2, ..., ..., x _44100),
where 0 <= x _i <= 2 ^ 16-1 = 65535.
The representation of sounds in such a way - using a vector - is called PCM (Pulse Code Modulation).
It is the most common and similar to the pixel representation of images.

The fundamental difference between sound and images
It is very convenient to work with images at the level of pixels. In particular:
1. we consider two images identical if their pixel values are close.
2. You can change the image based on the assignment of neighboring pixels (for example, the smoothing operation).

For audio in PCM format, both possibilities are not applicable , we will show it with an example:

The last two sounds sound the same. And their amplitude functions are significantly different. Thus, the human ear perceives the spectrum of sound, that is, the composition of its frequencies, and not the amplitude representation of sound.

What is easy / hard to do “straight” with PCM sound

Easy :
Changing and rearranging individual samples, excluding neighbors
- rearrange the pieces,
- change the volume of the pieces,
- do reverse - flip the sound from the end to the beginning,
- mix several sounds,
- mix and change stereo channels,
- do the simplest compression,
- add the simplest echo.

Samplers, portas and studio programs do it masterly.

Hard :
Counting neighboring counts
- compare two sounds for similarity,
- suppress low or high frequencies,
- add reverb.

This is usually done not directly in PCM, but through the spectral representation of the sound (window Fourier transform).

Audio storage formats
Wav
wav = Header + PCM bytes Keeps sound without loss of quality
(analog in images - bmp)

MP3
Data lossy, good for storing music.
(analog in images - jpg)

AMR
Data lossy, is designed to store speech. Used in mobile telephony (2011).
(analog in images - png)

Ways to generate digital sound

There are the following ways to build a PCM representation of some sound or music:

1. Sampling
Used to produce all the music. Devices - samplers

2. (Subtractive) Synthesis
Used primarily for modern electronic music. Devices - synthesizers.

3. FM synthesis
4. Additive synthesis
5. Granular synthesis
6. S & S - Sample & Synthesis - sampling, analysis, subsequent synthesis - today one of the best technologies for playing "live" instruments.

Consider three ways to generate sound: sampling, subtractive, and additive synthesis.

Sampling
Record: "Live sound" - microphone - ADC - PCM-format.

Reproduction: PCM-format - DAC - speaker.

Additional features: you can change the playback speed, then the tone and speed of the sample will increase.
Modern algorithms also allow you to change the tone of the sample without changing its speed, and vice versa.

Akai MPC1000 Sampler:

Subtractive Synthesis
In pre-computer time:
several simple waves (rectangular, sinusoidal, triangular) were processed by a set of filters (LF, HF, cutting the desired frequency). The resulting sound went to the speakers.

Now:
done digitally.
There are difficulties - you need to carefully take into account the known problems associated with the digital representation of sound ("aliasing").

Minimoog Synthesizer:

Additive synthesis
Additive synthesis is based on the construction of sound by summing up the set of harmonics (ie, sinusoids of different frequencies) with varying loudness.

Any sound can be represented with arbitrary accuracy as the sum of a large number of harmonics with varying volume. But in practice, working with a large number of harmonics requires large computational resources. Although, at present there are several hardware and software additive synthesizers.

OpenFrameworks project examples

About the installation, configuration of the framework, and the IDE for building projects, you can read here .

Playing samples in openFrameworks - the project “sound landscape”

The essence of the project: the user pokes the mouse in different parts of the screen and starts to hear some sound

//  ofSoundPlayer sample; //  ofPoint p; //   -    float rad; void testApp::setup(){ sample.loadSound("sound.wav"); //    bin/data sample.setVolume(0.5f); //, [0, 1] sample.setMultiPlay(true); //    ofSetFrameRate( 60 ); //   ofSetBackgroundAuto( false ); //   ofBackground(255,255,255); } void testApp::update(){ ofSoundUpdate(); //    } void testApp::draw(){ //  ,    ofEnableAlphaBlending(); if (sample.getIsPlaying()) { //  ofSetColor(ofRandom(0, 255), ofRandom(0, 255), ofRandom(0, 255), 20); ofCircle( px, py, rad ); } ofDisableAlphaBlending(); } //  void testApp::mousePressed(int x, int y, int button){ float h = ofGetHeight(); //  //    , //  1.0 -     float speed = (h - y) / h * 3.0; if ( speed > 0 ) { sample.play(); //   sample.setSpeed( speed ); //   //       p = ofPoint( x, y ); rad = (3 - speed); rad = 20 * rad * rad; } }

Scenario of the project "Additive synthesizer"

The essence of the project: the user on a white background waving his hands in front of the camera. There are n harmonics. The screen is divided by n
vertical bars, each is the number of pixels whose brightness is less than a certain threshold. This number determines the volume of the corresponding harmonics.

We use n = 20 sinusoidal harmonics, with frequencies
100 Hz
200 Hz
...
2000 Hz

Harmonics are played with looped samples that simply change the volume.

Project Source Code:

 //  //-  ""  ofVideoGrabber grabber; int w; //  int h; //  const int n = 20; //  ofSoundPlayer sample[ n ]; //  float volume[ n ]; //  int N[ n ]; // ,     ofSoundPlayer sampleLoop; //   // void testApp::setup(){ w = 320; h = 240; grabber.initGrabber(w, h); //  //   for (int i=0; i<n; i++) { int freq = (i+1) * 100; sample[ i ].loadSound( ofToString(freq) + ".wav"); //  100.wav,... sample[ i ].setVolume(0.0); // sample[ i ].setLoop(true); //  sample[ i ].play(); //  } } //  void testApp::update(){ grabber.grabFrame(); //  if (grabber.isFrameNew()){ //    for (int i=0; i<n; i++) { volume[i] = 0; N[i] = 0; } // un signed char * input = grabber.getPixels(); //   for (int y=0; y<h; y++) {for (int x=0; x<w; x++) { //  (x, y): int r = input[ 3 * (x + w * y) + 0 ]; int g = input[ 3 * (x + w * y) + 1 ]; int b = input[ 3 * (x + w * y) + 2 ]; int result = (r + g + b > 400 ) ? 0 : 1; //  int i = (x * n / w);//     volume[ i ] += result; N[ i ]++; } } //    for (int i=0; i<n; i++) { if ( N[ i ] > 0 ) { volume[ i ] /= N[ i ]; } //   [0, 1] sample[ i ].setVolume( volume[ i ] / n ); //. //  n,      } } ofSoundUpdate();//   } // void testApp::draw() { ofBackground(255,255,255); //   float w = ofGetWidth(); //    float h = ofGetHeight(); ofSetColor( 255, 255, 255 ); //     grabber.draw(0, 0, w, h); //  //  ofEnableAlphaBlending(); //  ofSetColor( 0, 0, 255, 80 ); //    80 for (int i=0; i<n; i++) { float harmH = volume[i] * h;//   i ofRect( i * w / n, h - harmH, w / n, harmH ); } ofDisableAlphaBlending(); //  }

Here's what happened:

If you are interested in the subject of interactive systems, and openFrameworks in particular, then we invite you to the Russian group on openFrameworks .

Source: https://habr.com/ru/post/245481/

All Articles