In this article, I would like to talk about how, in a minimum of effort, to write my own simple VoIP application with a backend and work in the background on the Windows Phone 8 platform.
Before the release of Windows Phone 8, users of voip-applications were very disappointed with the work in the background, which, in fact, was practically absent - the maximum that developers could do to show the user an incoming call while the application is in the background is to show toast notification, which is hardly noticeable heard and disappears quickly. On the one hand, it did not allow the battery to be eaten as if the application worked fully in the background, but on the other hand, it made it a useless tool. Before the release of WP8, Microsoft fueled public interest in the new version of the platform with promises to integrate Skype into the operating system and work in the background. Well, they fulfilled their promises - now it became possible:
- initiate a call to Skype through the phone’s contact book
- continue talking on Skype, even if you intentionally or accidentally roll up the application (earlier if during a conversation you accidentally hit the search button - the conversation was interrupted)
- and the most interesting thing: to receive incoming calls with the interface a la the usual gsm-call in conditions when Skype is not running (not in the foreground) and moreover - it does not do anything in the background (it does not eat the battery)
Microsoft did not do this with exclusive features (except for integration into the contact book) for its product and opened an API, which allows third-party developers to implement the same scenarios, while not being a privileged partner (as was the case in WP7 with native sdk). And although it is also impossible to integrate into the contact book beautifully - you can use the ContactStore and Protocol handlers to change the URL field in the contact and open the application by click).
')
At the end of the article are attached the sources of two projects: one of them is an example of Microsoft Chatterbox, which explains how the background processes work with back-end simulation with incoming calls and even with video; the second is my project with a simple backend that allows you to communicate over voip on two devices and uses voip push notifications, but first things first.

VoIP application architecture with background work
If you set out to write a full-fledged voip application, then unfortunately (or fortunately) you cannot do without a native component in C ++ (because the normal API for working with audio devices is not available from the managed part) In short, voip is an application which can work in the background, should consist of two processes:
- Foreground is actually an ordinary process in which the application interface “runs”.
- Background is the second process, which essentially consists of four agents:
- VoipHttpIncomingCallTask - starts when an incoming call comes to us via the push channel (a special kind of push notifications will be described below).
- VoipForegroundLifetimeAgent - starts when our application becomes active and works until the application is minimized or closed.
- VoipCallInProgressAgent - Runs when a call is signaled that the process has allocated more processor resources to support the call. Thus, (de) video and audio encoding must be started after this event.
- VoipKeepAliveTask - runs periodically every 6 hours. In fact, it is needed in order to periodically remind your server that the application is still installed on the phone.
- Out-of-process is an interprocess component designed to solve the communication problem between the first two. In fact, it is the same second process.
Graphically, it looks like this:

How to write your VoIP application?
Let's start in order:
1. Transportation
First, let's deal with the level of transport of our data. Of course, this is a very simple example that I built in a day, so there will be no mega-mind pieces here - you know yourself: it’s not enough to write a transport, record and playback of audio - it is also necessary that it works quickly without delays even on weak communication channels - but this theme of a single book. And so, for transport we will use a very convenient class from the new API - DatagramSocket (it is simple and works via UDP which is more logical than audio \ video streams (we don’t have to wait for confirmation of delivery of each audio packet, right?). Thanks to async \ await work with him is very simple:
const string host = "192.168.1.12"; const string port = "12398"; var socket = new DatagramSocket(); socket.MessageReceived += (s, e) => { // var reader = e.GetDataReader(); string message = reader.ReadString(reader.UnconsumedBufferLength); }; await socket.BindServiceNameAsync(host); var stream = await socket.GetOutputStreamAsync(new HostName(host), port); var dataWriter = new DataWriter(stream); // dataWriter.WriteString("Hello!"); await dataWriter.StoreAsync(); //
I'm so used to async \ await that I used the same class for the server side (see how to use the WinRT API in the desktop
here ). The protocol is also very simple: COMMAND! BODY - enough for our example.
2. Voice recording
In the Managed part for recording data from a microphone there are two classes:
- XNA Microphone
- AudioVideoCaptureDevice
In our example, we will use the first one (it is available from WP7), since I personally could not figure out how to play audio from the second without using native api, but, of course, to implement a serious voip application, you will have to use the second method (StartRecordingToSinkAsync, which gives a clean uncompressed data stream from a microphone). And so, recording data from a microphone is organized with just a couple of lines:
_microphone = Microphone.Default; _microphone.BufferDuration = TimeSpan.FromMilliseconds(500); _microphoneBuffer = new byte[_microphone.GetSampleSizeInBytes(_microphone.BufferDuration)]; _microphone.BufferReady += (s, e) => { _microphone.GetData(_microphoneBuffer);
3. Playing Audio
In our example, we will use very non-optimal, but working and small code:
_soundEffect = new SoundEffect(e.Data, _microphone.SampleRate, AudioChannels.Mono)
Unfortunately, there are no alternatives and through the managed part there is no possibility to play audio on the speaker for calls, but only on the speaker, therefore echo and other noises can appear (this is a simple example).
4. VoIP push notifications
The killer feature of our example will be that if you install this application on two devices - you can call through the application to another device without having to be in the foreground of the application on that device. First you need to register the Push URI for both devices on the server along with some user ID (in Skype it is an arbitrary name, in Viber the phone number of the user). Then, when device A wants to call device B - it sends a command to the server, the server will find push uri for device B and send to MPNS xml with some data about the caller with the necessary condition of having the request X-NotificationClass = 4. Before the release of WP8, there were only three classes of Push notifications
but as you can see, with WP8 a new fourth class has been added - VoIP. MPNS on its channels sends this packet to the client and picks up a specially launched
ScheduledTaskAgent for this purpose. If this agent performs correctly, the user will see the incoming call screen (similar to a normal GSM call). So, what should
ScheduledTaskAgent do?
var incomingCallTask = task as VoipHttpIncomingCallTask; if (incomingCallTask != null) { // XML Notification pushNotification; using (var ms = new MemoryStream(incomingCallTask.MessageBody)) { var xs = new XmlSerializer(typeof(Notification)); pushNotification = (Notification)xs.Deserialize(ms); } VoipPhoneCall callObj; var callCoordinator = VoipCallCoordinator.GetDefault(); // gsm-call-like callCoordinator.RequestNewIncomingCall("/MainPage.xaml?incomingCall=" + pushNotification.Number, pushNotification.Name, pushNotification.Number, new Uri(defaultContactImageUri), "Voip.Client.Phone", new Uri(appLogoUri), " VoIP-push!", new Uri(logoUrl), VoipCallMedia.Audio, TimeSpan.FromMinutes(5), out callObj); callObj.AnswerRequested += (s, e) => { s.NotifyCallActive(); // // : // , // managed code, NotifyCallActive // , // , await Task.Delay(3000); s.NotifyCallEnded(); }; callObj.RejectRequested += (s, e) => s.NotifyCallEnded(); }
It is worth noting that VoIP pushes, unlike all other types, can arrive both in an open application and if it is closed - Skype accepts incoming calls only through the pouch even if it is currently in the foreground - in fact, a controversial decision, t .to. voip pushy sometimes slow down. Alas, in our example we will not be able to bring up the conversation, if voip push arrives when the application is running - we do not have a native interprocess component in our example to inform the main process about this (and yes, OnNavigatedTo, From will not work when the UI of the incoming call appears, although it will be possible to call the Obscured event at the frame, but we will not be able to reach the caller’s number) - so in my example, when receiving a call, the receiving party must exit the application in order to correctly pick up the conversation.
Conclusion
All this was enough to write a simple VoIP-application for the day. Alas, it can speak only through the speaker, cannot switch off the screen when brought to the ear (proximity sensor) and continue the conversation if the application is minimized - all this requires a native component, which is described in detail in the Microsoft Chatterbox example - my example is simpler, but with server part. Initially, I wanted to tell only about VoIP-push, but it turned out a little more. Of course, for the implementation of full-fledged VoIP-applications, it is better to look towards the rapidly developing WebRTC, which, by the way, is
already officially working in chrome on Android, but I hope my example will be useful to someone.
Sources: