Battle for sound speed on Android x86

At the heart of the “pyramid of needs” for those who need Android applications for working with sound is the speed of the system’s response to user actions. Suppose a program starts quickly and shows a beautiful picture with a piano keyboard. For a start it is not bad, but if the moments of touching the keys and the beginning of the sound (let’s just amazing) are separated by a noticeable time, the program will be closed and will not return to it anymore.

Let's talk about the features of sound reproduction with high response speed on Android-devices based on Intel Atom (Bay Trail) processors. The applied approach can be used on other platforms from Intel. Here we are reviewing Android 4.4.4., A similar study for the Android M platform is still under construction.

Preliminary Information

Playing high-latency audio is one of the problems with Android, which is particularly bad for sound applications. Long time intervals between the user's action and the beginning of the sound have a bad influence on the programs for creating sound, on games, on software for DJs and karaoke applications. If such applications, in response to certain actions, reproduce sounds with delays that the user finds too large, this seriously spoils his impressions.

During the study, we will use the concept of circular delay (Round-Trip Latency, RTL). In our case, this is the time that separates the moment when the user or the system performs an action that requires creating and playing an audio signal, and the moment the sound starts.
')
Users are faced with a delay in audio playback in Android, the code they, for example, relate to the object, the interaction with which should cause the sound, and the sound is not played immediately. On most ARM and x86 devices, round-trip delays range from 300 to 600 ms., Mainly in applications that use standard Android tools for audio output, which can be found in Design For Reduced Latency .

Users do not like this. The permissible round-trip delay should be well below 100 ms, and in most cases, below 20 ms. Ideally, for professional use, the delay should be below 10 ms. It is also necessary to take into account the fact that in Android applications that work with sound, the total delay consists of three components. The first is the touch delay (Touch Latency).

The second is the audio processing latency. The third is the delay in queuing the buffer with audio data (Buffer Queuing).

Here we focus on reducing the delay in sound processing, rather than on all three of the above components. However, by improving one of the factors, we will reduce the overall delay.

Device sound subsystem in Android

Like other Android mechanisms, the sound subsystem can be represented as consisting of several layers.

Android audio subsystem

Here you can learn more about the above scheme.

Note that the hardware abstraction layer (HAL) of the Android audio subsystem serves as a link between high-level APIs designed for sound processing in android.media and underlying audio drivers and hardware.

Opensl es

Using the OpenSL ES API is the most reliable way to efficiently process an audio signal that should be played in response to a user or application. Delays, and when using OpenSL ES, can not be avoided, but the Android documentation recommends using this particular API.

The reason for this recommendation is that OpenSL uses a mechanism for setting buffers with audio data in a queue (Buffer Queueing), which increases efficiency when working in the Android Media Framework. All this is implemented on the Android machine code, that is, it can give better performance, since such code is not subject to problems typical of Java or the Dalvik virtual machine.

We believe that the use of mechanisms OpenSL ES - is a step forward in the development of audio applications for Android. In addition, in the documentation for the Android Native Development Kit there is information that with the release of new releases of Android it is planned to improve the implementation of OpenSL.

Here we will look at using the OpenSL ES API through NDK. To begin with, here are three levels of code that form the basis for developing sound applications for Android using OpenSL

The top level is the Android SDK application development environment, which is based on Java.
A lower level of the software environment, called the Android NDK, allows developers to write C or C ++ code that can be used in applications using the Java Native Interface (JNI) mechanism.
The bottom level is the OpenSL ES API, which is supported on Android since version 2.3. The API is built into the NDK.

OpenSL works like several other APIs using a callback mechanism. In OpenSL, the callback can only be used to notify the application that a new buffer can be queued (to play or record sound). In other APIs, callback functions also support pointers to buffers with audio data that an application can fill with or receive data from. But in OpenSL, by choice, the API can be implemented so that the callback functions act as a signaling mechanism so that all calculations are performed in the stream responsible for processing the sound. This process involves queuing the data buffers after receiving the assigned signals.

Google recommends using the Sched_FIFO scheduling policy when using OpenSL. This policy is based on the technique of applying the ring buffer.

Sched_FIFO scheduling policy

Since Android is based on Linux, the Linux CFS scheduler is involved here. CFS may allocate CPU resources unpredictably. For example, he is able to transfer control to a flow with a higher, in his opinion, priority, depriving him of the power of a flow that seems to him less attractive. These are the features of CFS, if similar touches the stream that is busy processing the sound, it can cause problems with the buffer timings. The result is long delays, the appearance of which is difficult to predict.

The main solution to this problem is not to use CFS for streams engaged in intensive work with sound and, instead of the SCHED_NORMAL scheduling policy (its other name is SCHED_OTHER), which CFS implements, to apply the SCHED_FIFO policy.

Planning delay

The scheduling delay is the time that passes between the moment when the thread is ready to start and the moment when the context switch is completed, that is, the beginning of the thread execution on the processor. The less this delay, the better, and if it is more than two milliseconds - problems with sound are guaranteed. Long scheduling delays usually occur when changing processor modes. These include starting or stopping, switching between a protected core and a regular core, switching power modes, or adjusting the frequency and power consumption of the processor.

Guided by the above considerations, consider the scheme of the implementation of sound processing on Android.

Ring buffer interface

The first thing to do to properly organize the work is to prepare the interface of the ring buffer, which can be used from the code. To do this, we need four functions:

Function to create a ring buffer.
Buffer write function.
The function of reading from the buffer.
Function to destroy the buffer.

Here is a sample code:

circular_buffer* create_circular_buffer(int bytes); int read_circular_buffer_bytes(circular_buffer *p, char *out, int bytes); int write_circular_buffer_bytes(circular_buffer *p, const char *in, int bytes); void free_circular_buffer (circular_buffer *p);

The desired effect is that when performing a read operation, the requested number of bytes is read, up to the amount of information that has already been written to the buffer. The recording function will write data to the buffer taking into account the free space left in it. They return the number of bytes read or written - these numbers are in the range from zero to the number requested when calling the function.

The consumer stream (the I / O callback function, in the case of playback, or the stream occupied by audio processing in the case of recording) reads the data from the ring buffer and then performs some operations with the read audio data. At the same time, asynchronously, the provider thread is busy filling the ring buffer with data, stopping only when the buffer is full. If you select the appropriate size of the ring buffer, these two streams will work smoothly, without interfering with each other.

Audio Input / Output

Using the interface we discussed above, the audio input / output functions can be written using OpenSL callback functions. Here is an example of the function that processes the input stream:

 /    ,   ,      void bqRecorderCallback(SLAndroidSimpleBufferQueueItf bq, void *context) { OPENSL_STREAM *p = (OPENSL_STREAM *) context; int bytes = p->inBufSamples*sizeof(short); write_circular_buffer_bytes(p->inrb, (char *) p->recBuffer,bytes); (*p->recorderBufferQueue)->Enqueue(p->recorderBufferQueue,p->recBuffer,bytes); } //         int android_AudioIn(OPENSL_STREAM *p,float *buffer,int size){ short *inBuffer; int i, bytes = size*sizeof(short); if(p == NULL || p->inBufSamples == 0) return 0; bytes = read_circular_buffer_bytes(p->inrb, (char *)p->inputBuffer,bytes); size = bytes/sizeof(short); for(i=0; i < size; i++){ buffer[i] = (float) p->inputBuffer[i]*CONVMYFLT; } if(p->outchannels == 0) p->time += (double) size/(p->sr*p->inchannels); return size; }

In the callback function (lines 2-8), which is called each time a new full buffer (recBuffer) is ready, all data is written to the ring buffer. After this, the function is again queued for execution (line 7). The audio processing function (lines 10-21) tries to read the requested number of samples (line 14) in inputBuffer, and then copy this data to the output (converting it to floating-point format). The function returns the number of copied samples.

Here is an example of a function that performs sound output.

 //       int android_AudioOut(OPENSL_STREAM *p, float *buffer,int size){ short *outBuffer, *inBuffer; int i, bytes = size*sizeof(short); if(p == NULL || p->outBufSamples == 0) return 0; for(i=0; i < size; i++){ p->outputBuffer[i] = (short) (buffer[i]*CONV16BIT); } bytes = write_circular_buffer_bytes(p->outrb, (char *) p->outputBuffer,bytes); p->time += (double) size/(p->sr*p->outchannels); return bytes/sizeof(short); } //    ,   ,       void bqPlayerCallback(SLAndroidSimpleBufferQueueItf bq, void *context) { OPENSL_STREAM *p = (OPENSL_STREAM *) context; int bytes = p->outBufSamples*sizeof(short); read_circular_buffer_bytes(p->outrb, (char *) p->playBuffer,bytes); (*p->bqPlayerBufferQueue)->Enqueue(p->bqPlayerBufferQueue,p->playBuffer,bytes); }

The audio processing function (lines 2-13) takes a certain amount of data stored in floating-point format, converts them into whole numbers, writes the full outputBufer buffer to the ring buffer, reports the number of samples recorded. The OpenSL callback function (lines 16-22) reads all samples and puts them in a queue.

In order for all this to work properly, you need to transfer to the output data on the number of samples read from the input, along with the buffer. Here is a loop that transforms input data into a weekend.

 while(on) samps = android_AudioIn(p,inbuffer,VECSAMPS_MONO); for(i = 0, j=0; i < samps; i++, j+=2) outbuffer[j] = outbuffer[j+1] = inbuffer[i]; android_AudioOut(p,outbuffer,samps*2); }

In this code snippet, in lines 5-6, the read samples are traversed and copied to the output channels. Here is the conversion of the input monaural signal to the output stereo, which is why the same input data is copied into two output buffer positions following one another. Now that the buffer has been queued in the OpenSL threads, in order to start the callback mechanism, we need to queue the buffer for the recording and another one to play after we start playing the sound. This will ensure that the callback function is triggered when buffers need to be replaced.

We have just looked at a simple example of implementing a sound input / output stream using OpenSL. Each implementation will be unique and will require modifications to the HAL and the ALSA driver in order to squeeze everything out of the OpenSL implementation.

Completion of the Android sound subsystem on the x86 platform

Various OpenSL implementations do not guarantee that all devices will achieve the desired (up to 40 ms.) Level of delays when passing an audio signal to the Android “fast mixer”. However, if you make modifications to the Media Server, HAL, in the ALSA driver, various devices can, with varying success, show good results in processing audio with low latency. In the course of a study on what is needed to increase the response speed when working with sound on Android, we implemented the corresponding solution on the Dell Venue 8 7460 tablet.

As a result of the experiments, a hybrid system for processing media data was created. In it, the stream that processes the input data is controlled by a dedicated fast server that processes the original audio signal. The signal is then transmitted to a media server implemented in Android, which uses the “fast mixer” stream. Servers that process input and output data use the OpenSL Sched_FIFO scheduling mechanism.

Implementing fast sound processing, drawing provided by Eric Serre

As a result of the modifications made, it is possible to achieve an acceptable RTL of 45 milliseconds. This implementation relies on the Intel Atom SoC and on the features of the device used in the experiment. The test was conducted on the Intel Software Development Platform and is available through the Intel Partner Software Development Program.

The implementation of the OpenSL and scheduling policy SCHED_FIFO demonstrates efficient processing of sound generated in real time. It should be noted that this implementation is not available on all devices, since it was created for the above-mentioned tablet computer, taking into account its software and hardware features.

In order to figure out how the sound processing technique presented in this material will show itself on other devices, you need to carry out the appropriate tests. After conducting such tests, we can provide the results to partner developers.

findings

We discussed the features of using OpenSL to create a callback function and a buffer queue in an application that processes audio on Android. In addition, it reflects the efforts made by Intel to achieve low-latency sound performance using the modified Media Framework.

In order to implement such a system on your own, follow the recommendations of Google and take into account the features of building applications for fast sound processing, which we described in this material. The results suggest that reducing delays in the processing of audio data on Android is quite a real task, but the battle for sound speed on the Android x86 platform continues.

Source: https://habr.com/ru/post/277569/

All Articles