How to deal with unstable Google C2DM

It just so happens that at work I’m working with a small team of like-minded people to write applications for smartphones, in particular, iTelephone and Android.

We started with the development of the iPhone, where everything worked smoothly and as expected.
And what worked? The main task of the application was to send the request “Where are you?” - nothing complicated. But I would really like to deliver this request to the addressee as quickly as possible while it is still relevant. Here, having experience in the development of the iPhone, the reader will say that there is APN Service, and will be absolutely right. It was to them that we used and did not know grief, for these notifications were delivered faster than a second.

Then, for some internal reasons, we switched to development for Android and quickly ported everything. In particular, without any hindsight, the module for working with APN was replaced by a similar one with C2DM .
')
All developers' phones had no delivery problems. But the new users immediately discovered a huge problem - the delivery time for the notification is not guaranteed at all, and some of them arrived in a few hours. And on the next device they reached in seconds.

During the study of this problem, I came across a number of strange features of the work of these notifications from Google.

Those interested in the implemented low-level interaction of the smartphone with the server without prejudice to the prerequisites can skip these prerequisites and go to section “4. An alternative to Google C2DM, but not a replacement. "

1. Scheme for using notifications

First of all, the scheme (designations are chosen just for clarity, and not according to GOST ):
smartphone-server-C2DM network communication scheme

smartphone-server-C2DM network communication scheme

To study the problem you need to understand how compounds 1, 2 and 3 are arranged.

This is our open HTTP connection that sends the request. The application waits for the server only 200 OK and the rest is not important. Here, a wide part of the bottle - there are few users so far, and they send inquiries a little (60-100 messages / s during active work).
Our server opens this connection via HTTP protocol to Google servers. In this case, you have to make 2 consecutive connections: first, the authorization in the recommended way is ClientLogin, then the request to android.clients.google.com/c2dm/send . The first thing to look for a problem started here.
Finally, this connection is kept by Google itself with smartphones running Android.

2. We are looking for problems with the request to C2DM

Since I am developing mainly the server part, the first stone flew into me for possible problems in connection # 2. What has been done?

In my humble opinion, the best explanation of how to connect C2DM in an application is in habratopic. We are writing an Android application with Cloud to Device Messaging (C2DM) support

This connection itself is implemented according to the recommendations of this article.

Sometimes when connecting to the Google server, Connection timed out came, which gave me the idea of limiting the number of our simultaneous connections. The thought may be erroneous, but the applied solution turned out to be useful.
The server part is written in Java and runs as a separate JAR with Jetty built into it. The Spring Framework is responsible for setting up and launching, which means that I managed to reconfigure the interaction with the C2DM server rather painlessly.

Step 1

Add asynchrony to query execution.

public class C2DMServer implements IPushNotificator, IPushChecker { ... @Override @Async //     public void sendData(String deviceId, String c2dmID, String jsonObject) { ... } }

This step gave another improvement - the 200 OK response to the requesting client now comes much faster, since the thread does not wait for a response from the Google notification server.

Step 2

Customize the number of parallel requests to Google, just to meet the limits.
There were a lot of tests and selection of coefficients, and the result resulted in such a Spring configuration.

 <task:annotation-driven executor="asyncExecutor" /> <task:executor id="asyncExecutor" pool-size="15" queue-capacity="300" rejection-policy="CALLER_RUNS" />

If pool-size is set to more than 15, then this number of simultaneous connections leads to various network errors.

Total

What pleased me : no more errors connecting to the Google server.
What upset : the problem of speed of delivery remained, which means we move on.

3. Explore Google’s work with Android

After installing on Android, any application can request an identifier from a special service with which to send notifications.

This is done by inheriting from the base class C2DMBaseReceiver.
You can see an example of such inheritance in the same way. We are writing an Android application with the support of Cloud to Device Messaging (C2DM) . Here is a modified implementation for me:

 import com.google.android.c2dm.C2DMBaseReceiver; public class C2DMReceiver extends C2DMBaseReceiver { private static final String DATA = "data"; public C2DMReceiver() { super(Settings.C2DM_ACCOUNT); } @Override public void onDestroy() { super.onDestroy(); } @Override public void onError(Context context, String errorId) { Settings.Init(context, false); Settings.updateC2DM(null); } @Override protected void onMessage(Context context, Intent receiveIntent) { Settings.Init(context, false); String data = receiveIntent.getStringExtra(DATA); JSONUtils.processJSON(context, data); } @Override public void onRegistered(Context context, String registrationId) { Settings.Init(context, false); if (!registrationId.equals(Settings.getC2dm_id())) Settings.updateC2DM(registrationId); } @Override public void onUnregistered(Context context) { } }

Here Settings is an assistant class with a bunch of static fields and methods for storing application state. JSONUtils is another helper class that parses JSON and stores all data in Settings.

What is important to understand is that the moment of receiving the identifier is not defined. In fact, with this class we just hang on to the event that the C2DM identifier is received , and, in theory, when it triggers, we should immediately pass the identifier to the server.
Anaaaaaaaa

After that, any message to the C2DM server with this identifier must be delivered to the desired device and the desired application.

Let's see how these messages are delivered.

C2DM-GTalk application interaction diagram

C2DM-GTalk application interaction diagram

In the center of all is the service Cloud To Device Messaging.
Interestingly, on problematic devices, this service was sometimes unloaded from memory. This means that it does not take any OS locks and may well turn off when Android needs resources. This service uses the Google Messaging service as the core of the exchange protocol, on which GTalk also depends. This happens because the C2DM protocol is encapsulated in the XMPP protocol over which GTalk is exchanged. On this channel, once every 300 seconds, the C2DM service sends a Ping to Google’s servers and waits for Ack, confirming that the connection is OK. You can learn more from the source in this video .
With services all, of course, is not so sad. The notification service is able to recover when the network conditions change and when the screen is turned on, although not always.
To see the status of your connection, you can dial *#*#8255#*#* and in the GTalk Service Monitor that opens, see what application has been exchanged via Google Messaging.

So, part of the problem was identified, but there was no solution for it.
Why exactly a part? Because notifications still did not reach, even with operating services. Sometimes waves of notifications were noticed, when after some time (20-40 minutes) all devices received notifications at the same time, although they were sent at different times.

In the end, after thinking, reading the documentation and a lot of forums and Q & A all agreed on one thing - we will make an alternative channel of notifications.

4. Alternative to Google C2DM , but not replacement

The main question: how to arrange a stable server-client channel?
A side question: how can this channel not eat the entire battery of a user?

Examples of inspiration from the resource http://code.google.com/p/android-random/ .
In particular, the KeepAliveService example.

The first idea is a frontal solution: once every n seconds, open a connection to the server and check if there are no notifications.
Instead of a frontal solution to the “often polling server”, the authors offer a more reasonable option, although it looks like a kind of hack.

Chips of the proposed solution:

connection to the server must be kept constantly, and not reconnected at intervals;
once per n seconds, where n> 60, check the connection status by sending something to it and reconnect only if it is open;
use blocking read on server connection.

I tested various client options for the notification server.
2 clients were written:

I opened it once every n seconds and read if anything had appeared. It was n that varied in testing.
Opened a permanent connection and checked it every 60 seconds. The devices varied to find out how the lifetimes differ.

The first client is not difficult to implement on their own. All the details of the second can be viewed in the archive . It includes an Android client of the second type and a server that supports the connection, logs all keepalive client messages, and also sends a notification to the client once a minute. Everything is going to Maven with the connected android-maven-plugin.

	Customer 1				Customer 2
Device	Desire	Desire	Desire	Desire	Desire	Wildfire s	Desire s
Test duration (min.)	540	1273	845	962	1117	1180	1121
Charge consumption	82	87	31	9	39	80	49
KeepAlive in seconds	ten	thirty	60	60	60	60	60
Internet connection	3G	3G	3G	WiFi	3G	3G	3G
Calculated discharge rate (s./h)	ten	4.28	2.22	0.56	2.22	4.28	3

Immediately striking several results:

Although reopening the connection, at least keeping it open - it does not matter from the point of view of the battery.
3G burns out the battery many times faster than WiFi.
The optimal polling time is 60 seconds.

For us, the most important is the first result. It follows from it that you need to choose from two clients according to their functional capabilities. For a reconnecting client (# 1), notifications arrive only once at a specified check interval. The client that supports the connection (№2), notifications come at the moment when the server writes to the open connection. And, running ahead, I will say that even the asleep device wakes up when a message from the server comes to the open connection.

To withstand the influx of TCP connections, I built the following architecture.
own notification server

The notification server consists of two components:

Router - registers the connected device and gives it the address and port of the server with which to keep the connection. In addition, it sends all requests for sending notifications to the required notification server.
The notification server itself keeps the connection, notifies the router that it has received keepalive successfully from the device, and sends a notification if the router has called it.

All client interaction goes on pure TCP . The notification itself can be of arbitrary size and content, but to reduce the load in my application, I send exactly one byte "1".
RMI connections are raised between server components using Spring Remoting.

Now let's take a look at the client side logic in steps.

The client, when connected to the notification servers, reports its unique identifier, in my case it is just the GUID .
In response, he receives the address and port, where to open and maintain the connection.
After the socket is opened, the client goes into blocking read without setting the read timeout.
Using AlarmManager, the client wakes up every 60 seconds and sends a message with its GUID to the open connection. So the server finds out that the client is still alive.
If the connection is dropped, the client checks for any access to the Internet and reconnects if there is one.
If read returned the data, it means that a notification has been received, which is reported to the rest of the application logic, and the client again goes to the blocking read.

Working with AlarmManager is very simple.

 //  ,       Intent i = new Intent(this, NotificationService.class).setAction(ACTION_KEEPALIVE); PendingIntent pi = PendingIntent.getService(this, 0, i, 0); //      AlarmManager- AlarmManager alarmMgr = (AlarmManager) getSystemService(ALARM_SERVICE); alarmMgr.setRepeating(AlarmManager.RTC_WAKEUP, System.currentTimeMillis() + KEEP_ALIVE_INTERVAL, KEEP_ALIVE_INTERVAL, pi);

Results and problems of my implementation or why it is impossible to completely abandon C2DM

For maximum lightness, all notification servers do not work with either files or databases. Hence the first consequence: if the notification did not reach, it will never reach.
Notification servers do not know anything about the data in the router. The second consequence: the client is not obliged to obey the router and go to the specified address and port, which means that clients can launch an attack on one notification server, while others will stand idle.
The router remembers when keepalive came from the client. A useful third consequence: the router can communicate this information to external systems, and this information is essentially a record of who is online now.
The router remembers to which notification server the client sent keepalive. The fourth consequence: even if the client connects to the wrong server, the router will know through which notification server to send the packet, instead of sending this packet to all servers.

5. Conclusion

“Any self-respecting smartphone programmer should write his own implementation of the notification service” - this is how the result of my work was jokingly described.
But despite the joke, the notification service described above works at the same speed and almost with the same stability as that of Apple, which is good news, and the lifetime of the device, about which developers worry so much, is not reduced by much.

6. Useful links

Reflections on how to implement a good delivery of notifications
Google C2DM Connection Documentation
Habratopik "Writing an Android application with Cloud to Device Messaging (C2DM) support"
Complaints about the speed of C2DM and others - I hope among them one day the answer will appear “Hurray! It all worked!

Source: https://habr.com/ru/post/142045/

All Articles