On Friday, we launched a new version of
our iPhone application . In this post, we would like to share with you the experience of developing such services.
Service "Calls" - video chat on the site Odnoklassniki implemented by means of Flash. But not all of our users come to Odnoklassniki from a computer / laptop. To expand the video chat audience, we decided to support it also in smartphones.
This article will discuss the experience of implementing video calls in an iOS application.
')
Scheme of the application:

Audio / video encoders and decoders
Since it is possible to compile
C / C ++ code
for iPhone, there were no problems with the choice of codecs - you can take any open
source codec , provided that the license allows use in commercial applications. For the sound, they took Speex, and for H.263 video, from the Android OS code (I had to modify it a bit to be compatible with the flash).
Video capture module
Starting with version 4.0, iOS has an API for capturing raw video (AVCaptureSession). But this API has its limitations - in particular, no arbitrary resolution can be set (several fixed presets are supported). It is also impossible to change the orientation of the video when turning the device. In order to circumvent these restrictions, we capture video frames larger than we need, and then we cut and rotate the video in accordance with the orientation of the device. This takes quite a lot of processor resources and has an undesirable effect of approximation (“zoom”), but all that remains is to say thanks to Apple for such an implementation of the API.
The video is captured in the native YUV codec format. In this regard, there is the problem of "drawing" frames to display "their" pictures on the screen. We also solved it, but more on that later, in the section “Video output to the screen”.
Another worth mentioning is the camera selection algorithm: if the device has a front camera, the application selects it. If there is no front camera, the video is captured from the camera on the back cover. If there is no camera at all (for example, iPad), the “Enable video” button is not displayed.
Video output
The video decoder, as well as the video capture module from the camera, output frames in the YUV format. In order to display the frame on the device screen, you need to convert it to RGB format and bring it to the desired size. Since the main processor is busy at this time with other tasks, we transferred these operations to a graphics processor using OpenGL ES 2.0. A frame in YUV format is loaded into the OpenGL texture, after which the fragment shader recalculates the color components for each point. To support older devices that do not support OpenGL ES 2.0 (younger than 3GS), the application has a code branch that performs all calculations without using shaders. On such devices, it is necessary to compensate for the lack of computing power by lowering the video frame rate.
Capture and play sound
The AudioSession API is used to capture sound from a microphone and play through the speaker. This is the lowest-level API that can be accessed from an iOS application. The sound that comes from the server is decoded and goes into a buffer that smoothes the network jitter. The depth of the buffer varies depending on the quality of the connection - the worse the network, the deeper the buffer. For the user, this is manifested by the fact that the sound delay increases on a bad network.
There are several ways to play sound on the iPhone - it can be played through the phone speaker, through the speakerphone (the speaker on the bottom of the device) or through a connected headset. If a headset is not connected, the speakerphone route is selected by default. The phone speaker is connected when a distance sensor is triggered (proximity sensor).
Multitasking
Starting with version
4 in iOS, multitasking support has been added. The video chat call can be continued even when our application is running in the background. In order to maximally free up resources when switching to the background mode, the application stops receiving and sending video, and also does not update the graphical interface until the application becomes active again. At the same time, the capture and playback of the sound continues, and a red “band” with the inscription “odnoklassniki” is displayed on the screen - an indication that the application is “listening” to the microphone. When you click on this panel, the application returns to the active state.
Using the proximity sensor
The proximity sensor is a sensor mounted on the top of the front of the phone that detects the proximity of
an object to the device. The idea is to determine when the user brings the phone to his ear. During a video call, the application monitors the status of this sensor. In the case of sensor triggering, the application suppresses the screen, turns off the reception and sending of video, and switches the sound to the telephone speaker. This is convenient for those users who just want to “follow the old fashioned way” by phone.
Adaptation to the quality of the connection
Video calls in the application can be used on both WIFI and 3G / EDGE networks. During a call, the state of the network connection is constantly monitored and in case of a deterioration in the quality of communication, the frame rate and video quality are reduced.
What I would like to improve
- Make it possible to use the other functions of the classmates application during a conversation
- Automatic reconnection in case of a break (for example, when switching between Wi-Fi / 3G networks)
- More widely use code optimization for an ARM processor to reduce processor utilization and battery drain during a call.
We will be glad to hear your comments and recommendations for improving the service.
Development team of the service "Calls".