Disclaimer: I do not consider any algorithms and APIs for working with sound and speech recognition. This article is about audio issues and how to solve them with Go.
phono
- application framework for working with sound. Its main function is to create a pipeline of different technologies that will process the sound. for you the way you want.
What does the conveyor, besides, from different technologies, and why another framework? Now we will understand.
By 2018, sound has become the standard way for human interaction with technology. Most IT giants have created their voice assistant or are doing it right now. Voice control is already in most operating systems, and voice messaging is a typical feature of any messenger. In the world, about a thousand startups are working on the processing of natural language and about two hundred on speech recognition.
With music, a similar story. It plays from any device, and sound recording is available to anyone who has a computer. Hundreds of companies and thousands of enthusiasts around the world are developing music software.
If you had to work with sound, then the following conditions should sound familiar:
It turns out the usual pipeline - there is a data stream that goes through several stages of processing.
For clarity, we take the problem from real life. For example, you need to convert a voice to text:
Like any other problem, this one has several solutions.
Only for hardcore cyclists programmers. We record sound directly through the sound card driver, we write a smart noise and multiband equalizer. This is very interesting, but you can forget about your original task for a few months.
Long and very difficult.
The alternative is to use existing APIs. You can record audio using ASIO, CoreAudio, PortAudio, ALSA and others. For processing, there are also several types of plug-ins: AAX, VST2, VST3, AU.
A rich choice does not mean that you can use everything at once. The following restrictions usually apply:
In this situation, the complexity depends on your choice. In the worst case, you will have to deal with several libraries. And if you are not lucky at all, with complex abstractions and completely different interfaces.
You need to choose between very complex and complex :
No matter which method is chosen, the task always comes down to the pipeline. The technologies used may vary, but the essence is the same. The problem is that again, instead of solving a real problem, you have to write bicycle conveyor.
But there is a way out.
phono
designed to solve common problems - to " receive, process and transmit " sound. For this, he uses the pipeline as the most natural abstraction. There is an article in the official blog Go that describes the pattern pipeline. The main idea of the pipeline is that there are several stages of data processing that operate independently of each other and exchange data through channels. That is necessary.
Why go?
First, most audio programs and libraries are written in C, and Go is often mentioned as its successor. In addition, there are cgo and quite a lot of bindings for existing audio libraries. You can take and use.
Secondly, in my personal opinion, Go is a good language. I will not go deep, but I will note its multithreading . Channels and gorutiny greatly simplify the implementation of the pipeline.
The heart of phono
is the type of pipe.Pipe
(English pipe). That it implements the pipeline. As in the sample from the blog , there are three types of stages:
pipe.Pump
(English pump) - receiving sound, only output channelspipe.Processor
(English handler) - sound processing , input and output channelspipe.Sink
(English sink) - sound transmission , input channels onlyInside pipe.Pipe
data is transmitted by buffers. The rules by which you can build the pipeline:
pipe.Pump
pipe.Processor
, placed one after anotherpipe.Sink
, placed in parallelpipe.Pipe
components must have the same:The minimum configuration is Pump and one Sink, the rest is optional.
Let us examine a few examples.
Task: play wav file.
Let's bring it to the form " to receive, process, transfer ":
Audio is read and immediately played.
package example import ( "github.com/dudk/phono" "github.com/dudk/phono/pipe" "github.com/dudk/phono/portaudio" "github.com/dudk/phono/wav" ) // Example: // Read .wav file // Play it with portaudio func easy() { wavPath := "_testdata/sample1.wav" bufferSize := phono.BufferSize(512) // wav pump wavPump, err := wav.NewPump( wavPath, bufferSize, ) check(err) // portaudio sink paSink := portaudio.NewSink( bufferSize, wavPump.WavSampleRate(), wavPump.WavNumChannels(), ) // build pipe p := pipe.New( pipe.WithPump(wavPump), pipe.WithSinks(paSink), ) defer p.Close() // run pipe err = p.Do(pipe.Run) check(err) }
First, we create elements of the future pipeline: wav.Pump
and portaudio.Sink
and transfer them to the constructor pipe.New
. The p.Do(pipe.actionFn) error
function starts the pipeline and waits for completion.
Task: split the wav file into samples, compose a track from them, save the result and simultaneously play back.
A track is a sequence of samples, and a sample is a small piece of audio. To cut audio, you must first load it into memory. To do this, use the type asset.Asset
from the phono/asset
package. We divide the task into standard steps:
Now we make samples with our hands, add them to the track and finish off the task:
Again, without processing stage, but as many as two pipeline!
package example import ( "github.com/dudk/phono" "github.com/dudk/phono/asset" "github.com/dudk/phono/pipe" "github.com/dudk/phono/portaudio" "github.com/dudk/phono/track" "github.com/dudk/phono/wav" ) // Example: // Read .wav file // Split it to samples // Put samples to track // Save track into .wav and play it with portaudio func normal() { bufferSize := phono.BufferSize(512) inPath := "_testdata/sample1.wav" outPath := "_testdata/example4_out.wav" // wav pump wavPump, err := wav.NewPump(inPath, bufferSize) check(err) // asset sink asset := &asset.Asset{ SampleRate: wavPump.WavSampleRate(), } // import pipe importAsset := pipe.New( pipe.WithPump(wavPump), pipe.WithSinks(asset), ) defer importAsset.Close() err = importAsset.Do(pipe.Run) check(err) // track pump track := track.New(bufferSize, asset.NumChannels()) // add samples to track track.AddFrame(198450, asset.Frame(0, 44100)) track.AddFrame(66150, asset.Frame(44100, 44100)) track.AddFrame(132300, asset.Frame(0, 44100)) // wav sink wavSink, err := wav.NewSink( outPath, wavPump.WavSampleRate(), wavPump.WavNumChannels(), wavPump.WavBitDepth(), wavPump.WavAudioFormat(), ) // portaudio sink paSink := portaudio.NewSink( bufferSize, wavPump.WavSampleRate(), wavPump.WavNumChannels(), ) // final pipe p := pipe.New( pipe.WithPump(track), pipe.WithSinks(wavSink, paSink), ) err = p.Do(pipe.Run) }
Compared to the previous example, there are two pipe.Pipe
. The first transmits data into memory so that samples can be cut. The second has just two recipients at the end: wav.Sink
and portaudio.Sink
. With this scheme, the sound is simultaneously recorded in a wav file and played.
Task: read two wav files, mix, process the vst2 plugin and save to a new wav file.
The phono/mixer
package has a simple mixer mixer.Mixer
. You can send signals from multiple sources to it and get one mixed. For this, it simultaneously implements pipe.Pump
and pipe.Sink
.
Again the task consists of two subtasks. The first one looks like this:
The second:
package example import ( "github.com/dudk/phono" "github.com/dudk/phono/mixer" "github.com/dudk/phono/pipe" "github.com/dudk/phono/vst2" "github.com/dudk/phono/wav" vst2sdk "github.com/dudk/vst2" ) // Example: // Read two .wav files // Mix them // Process with vst2 // Save result into new .wav file // // NOTE: For example both wav files have same characteristics ie: sample rate, bit depth and number of channels. // In real life implicit conversion will be needed. func hard() { bs := phono.BufferSize(512) inPath1 := "../_testdata/sample1.wav" inPath2 := "../_testdata/sample2.wav" outPath := "../_testdata/out/example5.wav" // wav pump 1 wavPump1, err := wav.NewPump(inPath1, bs) check(err) // wav pump 2 wavPump2, err := wav.NewPump(inPath2, bs) check(err) // mixer mixer := mixer.New(bs, wavPump1.WavNumChannels()) // track 1 track1 := pipe.New( pipe.WithPump(wavPump1), pipe.WithSinks(mixer), ) defer track1.Close() // track 2 track2 := pipe.New( pipe.WithPump(wavPump2), pipe.WithSinks(mixer), ) defer track2.Close() // vst2 processor vst2path := "../_testdata/Krush.vst" vst2lib, err := vst2sdk.Open(vst2path) check(err) defer vst2lib.Close() vst2plugin, err := vst2lib.Open() check(err) defer vst2plugin.Close() vst2processor := vst2.NewProcessor( vst2plugin, bs, wavPump1.WavSampleRate(), wavPump1.WavNumChannels(), ) // wav sink wavSink, err := wav.NewSink( outPath, wavPump1.WavSampleRate(), wavPump1.WavNumChannels(), wavPump1.WavBitDepth(), wavPump1.WavAudioFormat(), ) check(err) // out pipe out := pipe.New( pipe.WithPump(mixer), pipe.WithProcessors(vst2processor), pipe.WithSinks(wavSink), ) defer out.Close() // run all track1Done, err := track1.Begin(pipe.Run) check(err) track2Done, err := track2.Begin(pipe.Run) check(err) outDone, err := out.Begin(pipe.Run) check(err) // wait results err = track1.Wait(track1Done) check(err) err = track2.Wait(track2Done) check(err) err = out.Wait(outDone) check(err) }
There are already three pipe.Pipe
, all connected to each other through a mixer. To run, use the p.Begin(pipe.actionFn) (pipe.State, error)
function p.Begin(pipe.actionFn) (pipe.State, error)
. Unlike p.Do(pipe.actionFn) error
, it does not block the call, but simply returns a state, which can then be waited with the help of p.Wait(pipe.State) error
.
I want phono
become the most convenient application framework. If there is a problem with sound, you do not need to understand complex APIs and waste time studying standards. All that is needed is to build a pipeline of suitable elements and start it.
For half a year, the following packages were washed down:
phono/wav
- read / write wav filesphono/vst2
- incomplete binding of the VST2 SDK, while you can only open the plugin and call its methods, but not all structuresphono/mixer
- mixer, adds N signals, without balance and volumephono/asset
- buffers samplingphono/track
- sequential reading of samples (layered layered)phono/portaudio
- reproduction of a signal, while experimentsIn addition to this list, there is a constantly gaining backlog of new ideas and ideas, including:
In the following articles I will analyze:
pipe.Pipe
life cycle - due to the complex structure, its state is controlled by the final atomThis is my first open-source project, so I will be grateful for any help and recommendations. Welcome.
Source: https://habr.com/ru/post/424623/
All Articles