We write VoIP iOS chat on CORE AUDIO for VK Mobile Challenge contest

Recently, the VC team announced a competition to develop a mobile application that would expand the capabilities of the social network " VKontakte ", and I decided to take part, since under the terms of the competition you can come up with your own idea of the application. I had three ideas, and I had to choose which one to take.

Dear readers of "Habrahabr", please send all errors and edits to this article to your private messages.
')

Idea 1

I really like group chats in VC, it’s a pity that you can’t talk in voice in these chats. Such group public audio chats can help gamers find friends for online iOS games. For example, I create an audio chat called “Asphalt 8” - and everyone who wants to play with me - join my audio scene in the app, and we play together, talking in voice. There are similar “audio shows” on the PlayStation 4 console - and I even know people who do not include PS4 for games, but only for chatting with friends who sit in these audio shows. Why make a separate application if you can call Viber or Whatsapp and play games, talking in these applications? But you can’t try calling on Viber and launching, for example, the game Deepworld - the call to Viber will immediately fly off, as the stream from Viber is interrupted by an audio stream from the game. In Skype, the situation for gamers is better, the Skype audio session will remain active, even if you turn on the music, the music will only be muffled a little. But nowadays it is considered bad form to call someone without warning, and suddenly I call a friend, but he doesn’t want to play the game now, which I will offer? The solution is to create an audio track, and all friends will receive a notification: “Your friend Ivan Ivanov has created a Hearthstone party.” Those of friends who want to join - they press on the notification and go to voice communication! One click on the notification - and you are in a get-together, you no longer need to ring up friends.

Idea 2

VKontakte has a documents section, so why not make an analog Dropbox for VK? Yes, it will be necessary to make a Windows / Mac client, in addition to a mobile, a folder will be created on the user's PC, all files from which will be synchronized with VKontakte documents and a folder on the mobile device. It turns out some analog Dropbox with backend VKontakte.

Idea 3

There is the “Theory of Six Handshakes” - the theory that any two people on Earth are separated by no more than five levels of mutual friends (and, accordingly, six levels of connections). So why not make an application in which you can find out how many people share me in VK, for example, with Pavel Durov? That is, we enter two users into the mobile application window - and get a chain of friends through which we can make contact with the person we need. To implement the idea, you will have to download all the profiles of VKontakte users, sorting them by ID.

Core audio

Attention! Core Audio is famous for its complexity! Attempts to google problems on stackoverflow.com often lead to questions that no one answered on this portal! Paid support for Apple, too, throws up his hands! Reefs emerge at every step of development!

The choice fell on the first idea, since it seemed to me more difficult to implement, and in order to complicate the process, I decided to do the implementation on Core Audio, for which there is practically no documentation, so I will have to experiment. Vkontakte would already have time to add audio calls, because even Facebook in a mobile client has the opportunity to call by voice! Why is VC worse? VK team already tried to start video calls in the web client, did alfa version, but that was it. I consider that it is necessary to add the ability to call the VK mobile client without fail! And within the framework of this article I will try to tell you how to do it.

What do I even know about sound? How is sound transmitted over the network? With video, everything is simpler, each pixel can be encoded in RGB and transmit changes to the matrix of pixels in the array. But what is the "cast of sound" for a unit of time? And he is such an array of Float numbers:

Moreover, if we add (Float 1) + (Float 2) + (Float 3) + ... + (Float (n)) and divide the sum by the number of elements (n) , then we will get the volume of this impression!

To double the sound level, we just need to multiply all the elements of this array by 2:

(Float 1) * 2 + (Float 2) * 2 + (Float 3) * 2 + ... + (Float (n)) * 2

But what if, in our case, the sound comes from several users, how do we “glue” two audio streams? The answer is simple - you just need to put the elements of these two arrays in pairs.

On Mac OS X, in both kAudioFormatFlagsCanonical and kAudioFormatFlagsAudioUnitCanonical, one array element is Float with a floating point, but floating point calculations were too expensive for crystals with ARM processors, therefore in iOS, kAudioFormatFlagsCanonical format is represented by full-set programs. kAudioFormatFlagsAudioUnitCanonical - fixed-point integers. "8.24". This means that 8 bits (the integer part) are to the left of the decimal point, and 24 bits (the fractional part) to the right.

Select the application name and icon :

I had two names in my head, the first one was “Tusa”, the second one was “Wassap”. The application is a group audio chat, so it would be great if the participants greeted the entry with the phrase “Wassaaaap!”, But because of the similarity of the name with “WhatsApp” I chose the name “Tusa”. I chose a microphone as an icon first, but then replaced it with pebbles:

How does the application "Tusa"

First, the user enters the start screen, where he is asked to log in using the VK button. At this stage, the application receives user information and a list of friends (public information only).
The application then sends the user information and the friends list to the PHP server, the PHP server in turn returns the audio chat list of the user's friends, and each “get-together” is assigned an IP and a Python server port on which the sound is exchanged.
The user selects the “audio party”, and the application connects to the desired Python server, or the user chooses to “create a new party”, and already other users log on to this chat.

Why bother to use a PHP server? Why not get a list of chats on the same Python server? I made a PHP server so that I could parallelize the “audio parties” to different Python servers, and if the Internet channel on one Python server is full, then the PHP server will create audio rooms on another Python server with a separate IP address. Also, the PHP part will be responsible for sending IN-APP notifications.

A little experiment - the background

Before getting acquainted with Core Audio, I decided to conduct a small experiment with my capabilities. I imagined such a situation - my plane crashed a plane crash, and I found myself on an uninhabited island with other passengers with a Macbook, a router, XCODE out of the box, and a dozen iOS devices charged by solar panels. I would not have any documentation on Core Audio, and since at that time I didn’t know how the sound was digitized, could I write an audio chat under these conditions? All I knew at that time was how to write .wav (.caf) files and play them. Recently, I developed an iOS realtime multiplayer game “Tanchiki with Dandy” , where up to 100 tanchiki are played on the same map. I decided to turn the game into audio chatting in a few lines of code, recording the sound in a loop into a file, then sending this file to other users, and creating playlists from these files in users! This is a complete idiocy - to send files with sound, and I conducted this experiment only because of my existing network engine, I wanted to know the delay indicators in this case and check the operation of my network code in the conditions of sending large amounts of data, but as a result, in addition to the detected network bugs code, I got interesting details of the performance of the audio player in iOS, which may be useful to readers.

How to play sound in iOS? Using AVAudioPlayer

AVAudioPlayer *avPlayer = [[AVAudioPlayer alloc] initWithContentsOfURL: [NSURL fileURLWithPath:@"_.caf"] error:nil]; [avPlayer play];

The sound from other users comes in NSData format and is added to the playlist array, so using AVAudioPlayer you can play not the file from the folder, but the sound from NSData directly:

 AVAudioPlayer *avPlayer = [[AVAudioPlayer alloc] initWithData: data fileTypeHint:AVFileTypeCoreAudioFormat error:nil]; [avPlayer play];

How to know that AVAudioPlayer finished playback? Via callback audioPlayerDidFinishPlaying:

 - (void)audioPlayerDidFinishPlaying:(AVAudioPlayer *)player successfully:(BOOL)flag { // AVAudioPlayer  , //      }

I launched this option on the iPhone and iPad - but now disappointment, the sound was played with interruptions. The fact is that the initialization of AVAudioPlayer takes up to 100 milliseconds, hence the lags with sound.

The solution was AVQueuePlayer , which was specifically made to play playlists without delay between tracks, initialize AVQueuePlayer :

 AVQueuePlayer avPlayer = [[AVQueuePlayer alloc] initWithItems: playerItems]; avPlayer.volume = 1; [avPlayer play];

To add a file to the playlist, use the AVPlayerItem :

 NSURL *url = [NSURL fileURLWithPath:pathForResource]; AVPlayerItem *item = [AVPlayerItem playerItemWithURL:url]; NSArray *playerItems = [NSArray arrayWithObjects:item, nil]; [avPlayer insertItem:item afterItem:nil];

By launching this option, I heard a clear sound between my devices, the delay was about 250 milliseconds, since files of a shorter size could not be recorded, an error crashed. And of course, this option was gluttonous to traffic, because besides the necessary sounds, a .wav (.caf) file that contained a header was transmitted over the network several times per second. Also, this method does not work in the background, so in the background of iOS you cannot start playing new sounds. On it we will finish experiment and we will begin to program the application.

What do we know about Core Audio?

The Apple website has an example of recording audio into an audio file using Core Audio, you can download it on the page:

https://developer.apple.com/library/ios/samplecode/AVCaptureToAudioUnit/Introduction/Intro.html

After studying this source code, it became clear to me that when recording sound many times per second, Callback is called

 #pragma mark ======== AudioUnit recording callback ========= static OSStatus PushCurrentInputBufferIntoAudioUnit(void * inRefCon, AudioUnitRenderActionFlags * ioActionFlags, const AudioTimeStamp * inTimeStamp, UInt32 inBusNumber, UInt32 inNumberFrames, AudioBufferList * ioData) { // AudioBufferList *ioData -      //    //    NSData      NSMutableData * soundData = [NSMutableData dataWithCapacity:0]; for( int y=0; y<ioData->mNumberBuffers; y++ ) { AudioBuffer audioBuff = ioData->mBuffers[y]; //   ,     Float Float32 *frame = (Float32*)audioBuff.mData; //        [soundData appendBytes:&frame length:sizeof(float)]; } return noErr; }

Having analyzed the AudioBufferList format, which contained a sound in the form of a list of numbers, I converted AudioBufferList to NSData , building all the digits in a chain of 4 bytes - and through the python server in the loop passed this buffer to the remote device. But how to play AudioBufferList on a remote device? In the official source on the Apple site, I did not find the answer; the answer from Apple support did not give me the necessary information either. But having spent enough time on the principle of “scientific spear”, I realized that for this purpose there is a similar callback into which AudioBufferList should be substituted and it will be played on the fly:

 #pragma mark ======== AudioUnit playback callback ========= static OSStatus playbackCallback(void *inRefCon, AudioUnitRenderActionFlags *ioActionFlags, const AudioTimeStamp *inTimeStamp, UInt32 inBusNumber, UInt32 inNumberFrames, AudioBufferList *ioData) { //  *ioData    Floats, //       return noErr; }

How to activate callbacks data? First, rename your .m project file to .mm and import all the necessary C ++ libraries from the AVCaptureToAudioUnit project. After that we create, configure and run our audio stream using this code:

  //   OSStatus status; AudioComponentInstance audioUnit; //    AudioComponentDescription desc; desc.componentType = kAudioUnitType_Output; desc.componentSubType = kAudioUnitSubType_RemoteIO; desc.componentFlags = 0; desc.componentFlagsMask = 0; desc.componentManufacturer = kAudioUnitManufacturer_Apple; AudioComponent inputComponent = AudioComponentFindNext(NULL, &desc); status = AudioComponentInstanceNew(inputComponent, &audioUnit); //  IO    UInt32 flag = 1; status = AudioUnitSetProperty(audioUnit, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Input, 1, // Input &flag, sizeof(flag)); //  IO    status = AudioUnitSetProperty(audioUnit, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Output, 0, // Output &flag, sizeof(flag)); AudioStreamBasicDescription audioFormat; //    audioFormat.mSampleRate = 8000.00; audioFormat.mFormatID = kAudioFormatLinearPCM; audioFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked; audioFormat.mFramesPerPacket = 1; audioFormat.mChannelsPerFrame = 1; audioFormat.mBitsPerChannel = 16; audioFormat.mBytesPerPacket = 2; audioFormat.mBytesPerFrame = 2; // Apply format status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Output, 1, // Input &audioFormat, sizeof(audioFormat)); status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Input, 0, // Output &audioFormat, sizeof(audioFormat)); //  Callback    AURenderCallbackStruct callbackStruct; callbackStruct.inputProc = recordingCallback; callbackStruct.inputProcRefCon = (__bridge void * _Nullable)(self); status = AudioUnitSetProperty(audioUnit, kAudioOutputUnitProperty_SetInputCallback, kAudioUnitScope_Global, 1, // Input &callbackStruct, sizeof(callbackStruct)); //  Callback    callbackStruct.inputProc = playbackCallback; callbackStruct.inputProcRefCon = (__bridge void * _Nullable)(self); status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_SetRenderCallback, kAudioUnitScope_Global, 0, // Output &callbackStruct, sizeof(callbackStruct)); //      flag = 0; status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_ShouldAllocateBuffer, kAudioUnitScope_Output, 1, // Input &flag, sizeof(flag)); //  status = AudioUnitInitialize(audioUnit); //  status = AudioOutputUnitStart(audioUnit);

By the way, as an experiment, I studied the format of the caf file, after spending a lot of time with the HEX editor and tried to take an AudioBufferList on a remote device, add a byte header (header) to the .caf file, then save this AudioBufferList to a .caf file, and play back with using AVQueuePlayer . And the strangest thing is that I did it!

Novocaine

So, we have already dealt with Core Audio, but how to make the process even easier and clearer? And the answer is, you need to use Novocaine !

https://github.com/alexbw/novocaine

What is Novocaine ? Three years, three coders designed Core Audio into a separate class, and it turned out great! Novocaine is implemented in C ++, so in order to connect a C ++ class with our Objective C file, you need to rename it from .m to .mm - and import everything at the beginning of the .mm file.

How to count audio to buffer?

 Novocaine *audioManager = [Novocaine audioManager]; [audioManager setInputBlock:^(float *newAudio, UInt32 numSamples, UInt32 numChannels) { //         20  //  numChannels = 2,  newAudio[0]   1, // newAudio[1] -  2, newAudio[2] -  1  .. }]; [audioManager play];

How to play the buffer?

 Novocaine *audioManager = [Novocaine audioManager]; [audioManager setOutputBlock:^(float *audioToPlay, UInt32 numSamples, UInt32 numChannels) { // ,   -    //   float   audioToPlay }]; [audioManager play];

Just like that!

We try to collect all this on the iPhone and iPad, launch an audio call - and ... Echo! Squeak! The killing echo passes many times through the communication channel and crashes into the brain! I expected that users would communicate, even without a headset, on the speakerphone, but the sound came from me to the remote device, from the speaker of the remote device I got into the microphone and came back to me. Unpleasant How to implement echo cancellation in iOS using Core Audio?

You must use the kAudioUnitSubType_VoiceProcessingIO parameter for the audio stream, instead of the standard kAudioUnitSubType_RemoteIO . Open the file Novocaine.m , find the line:

 inputDescription.componentSubType = kAudioUnitSubType_RemoteIO;

replace with:

 inputDescription.componentSubType = kAudioUnitSubType_VoiceProcessingIO;

We try to collect and see the error. The fact is that by default our audio stream operated at a frequency of 44100.0 hz , and I need a lower frequency for kAudioUnitSubType_VoiceProcessingIO to work.

I changed the value of 44100.0 to 8000.0 - in all files, but the audio stream continued to be generated at a frequency of 44100.0 . After parsing information on the Internet, I discovered that the Novocaine project on github has three "Pull requests" from third-party users, and one of them had a description:

Fixed Crash when launching from background while audio playing; Ability to manage Sample Rate

Having copied all the modified lines from this request, I managed to start the audio stream at a frequency of 8000.0 and the echo cancellation worked! The delay sound was 15-25 ms! The application worked in a minimized form, even with the screen off on a locked iPhone!

The fact is that iOS does not allow launching new sounds when the application is minimized, to check, you can launch the song in Safari from VK and minimize the browser. As soon as the track ends, the new track from the playlist will not turn on until you make the browser active! If you use audio streams in iOS, then the application will perfectly cope with the task of playing new sounds from the background!

How sound is transmitted from device to device in the “Tusa” application

On the remote server, I open TCP port 7878 using the python script and create a TCP connection with this server from an iOS application:

Then, collecting the sound into an array of float - I convert it to NSMutableData , building a float into
chain of 4 bytes:

 NSMutableData * soundData = [NSMutableData dataWithCapacity:0]; for (int i=0; i < numFrames; ++i) { for (int iChannel = 0; iChannel < numChannels; ++iChannel) { float theta = data[i*numChannels + iChannel]; [soundData appendBytes:&theta length:sizeof(float)]; } }

Now the sound is in soundData , we transmit it to the server in the format:

LENGTH (soundData) + A + soundData

where A is the byte indicating that the sound came to the server, LENGTH (soundData) is the packet length (4 bytes), soundData is the data itself in the NSData format.

I also tried to encrypt the entire audio stream by the secret key, the traffic volume increased by 50-100% - but in terms of iOS performance, the devices handle it with a bang. Although for those who use 3G in conditions of poor reception, such an increase in the Internet channel may turn out to be very heavy.

The most annoying thing is that I initially implemented the whole project on the Cocos2D library intended for games, and it turned out that the VK SDK does not work with Cocos2D projects, but only supports ARC mode ( Automatic Reference Counting ), in which automatic work with memory release takes place. In one of the past games, I also tried to embed the VK button, but due to errors I had to replace it with a Facebook button. I hope that the next versions of the VK SDK will work with Cocos2D , but in the meantime I had to rewrite all the code on the standard Storyboard interfaces, removing all the " release " memory releases from the code. And if a few days ago I was looking for where to insert “release” in order to avoid memory leaks, then in ARC mode there is no such problem at all. The application began to occupy only 10mb of RAM, instead of 30mb on Cocos2D .

Note: I still managed to “make friends” of the Storyboard interfaces with Cocos2D, and run the Cocos2D game directly in the UIViewController, and Cocos2D runs in ARC mode, but this is a topic for a separate article

Doubtful Innovation or UDP vs TCP

Instead of the usual UDP protocol for VoIP, I used the TCP data transfer protocol as an experiment. When transmitting audio over TCP, a small delay is created with each packet loss (due to the re-transmission of data). As a result, because of the unstable Internet, the client in the incoming playlist of audio messages sometimes has too much data, the length of the incoming playlist begins to exceed several seconds, and something needs to be done about it. I decided to try to correct the situation in the following way:

If the length of the incoming playlist exceeds 2 seconds - then I just skip the “silent” audio impressions, cutting out the silence between the phrases
If the length of the incoming playlist exceeds the critical figure, then I simply increase the speed of the audio stream by 2 times until the playlist is of a satisfactory length. As a result, the incoming voice in this situation sounds "accelerated."

Advantages of using TCP - all packets and phrases will be guaranteed to be delivered, and if you encrypt audio packets, there will be no problems with their decryption. There is also no need for additional STUN and TURN servers through which all UDP traffic is “proxied” for NAT traversal (you should not forget that almost all iOS users do not have an external IP), in the case of TCP, the exchange takes place directly between the server and the client.

The advantages of using UDP are the absence of delays in packet loss, if packets are lost, the application will ignore this.

Bottom line: In case of packet loss with a poor Internet connection, in any case we will have to abandon part of the audio data, in the case of the traditional VoIP UDP connection, these will be arbitrary audio data, including the audio date with voice, and in the case of TCP connections - we can choose which audio data to discard, and we choose to cut off “quiet” audio data - this is how we compensate for the delay.

Well, that's all, if you are interested in following the development of the project within the framework of the competition, provide a link to the VK project page: http://vk.com/id232953074

At the moment, one of my games (Tanchiki Online) went to the App Store main page (Hurray!), All my servers were filled with thousands of players, so the launch of Tusa had to be postponed for several days. All information about the launch, I'll post VKontakte .

The source code of the Tusa application, I will also try to post VKontakte, as soon as I give it a more optimized look.

In the comments to the article, I would like to hear alternative free (!) Options for transferring an audio stream to iOS through my own servers (!) , Which could be used to transmit sound on VKontakte.

Also in the comments, write whether there are analogues of the Tusa application in the App Store (In addition to the paid "Timspeak").

Since the VK contest was started to expand the VC client's capabilities, and my application demonstrates how to add calls to VC, I ask you to take part in the voting, do you think audio / video calls are needed in the VC mobile client? Voting will not play a key role in the decision, but the VC developers will definitely notice it, because there are already calls on Facebook :)

Source: https://habr.com/ru/post/279517/

All Articles