Google Cloud Messaging: “Owl, discover! Push has come! ”

The well-known service Google Cloud Messaging (GCM) is needed to ensure that your application always shows actual data to the user. The scheme of the service includes three components.

Directly GCM server, your push server and device with an installed application. The operation algorithm is simple: the device registers with the GCM, receives the registrationId — a certain token that is used later — stores it locally and sends it to the server. Next, the push server uses this registrationId to send messages to your application on the device.
')
This material will address the problems at two sites, which are indicated in the diagram: push server - GCM and GCM - device.

Error handling from GCM server

If the message is successfully sent, your push server will receive a response from GCM with status code 200 and a non-zero message_id .

Errors come in the response body with the status code 200. Therefore, relying only on the status code 200 is not enough. Here I will give an example of one of the most important errors, these are NotRegistered , others less interesting. In most cases, it means that the application to which your push server sends a message has been deleted, or the application already uses another registrationId, and for some reason your push server does not know about it. Having received such a response from GCM, the push server should immediately remove the given registrationId from its repository.
To monitor errors, enable GCM statistics in the developer console.

Synchronization of registrationId on the client and push server

RegistrationId is one of the most important parts of the GCM infrastructure. Disregarding the registrationId between the client and the push server will have dire consequences. There is every chance that users will remain without push-notifications forever. GCM monitors the situation when the registrationId is updated on the device for some reason, and reports this to the push server using the canonical_ids parameter.

A case with canonical_ids can be reproduced as follows:

Install the application
Send a message to him
Remove the application
Install the application
Send a message to him

After sending the message in step 5, you will receive a response from GCM with the canonical_ids parameter equal to 1 and the fresh registrationId directly, which is already being used by your client.

Having received such a response, the push server is simply obliged to update the registrationId to the value from the response. If this is not done, then for some time the messages will reach the client and the old registrationId will be valid, but sooner or later the GCM will respond with a NotRegistered error, after which the push server will delete the registrationId and users will forget about the push notifications in your application . Therefore, handle the parameter canonical_ids and do not bring to sin.

Two main approaches to working with GCM

The first is Messages with Payload . Its essence is to convey some useful information in the message itself. For example, in the messenger it may be the text of the message, in the news application - the news itself. The second mechanism is Send-to-Sync . It is more optimized for traffic consumption, because The message itself does not pack a lot of data. The message acts as a signal that the application should pick up fresh data from the server. The second approach is directly related to the collapse_key parameter.

Messages with Payload

The ideal situation is when your device holds a connection to the GCM server, messages are sent and successfully delivered to the device. If there is no connection (for example, you are stuck in a bodice or entered the subway), and messages are being sent to you at this time, then they begin to form a queue in the GCM storage. This queue is not infinite, the limit is 100 messages. As soon as 101 messages arrive, they all disappear and no longer accumulate. When the device catches the network and establishes a connection with GCM, an intent will come into the application with information about what has been deleted, for example, 345 messages.

Having received such an intent, you need not be lazy and go to the server for fresh data. Otherwise, the user will see them only when another push notification arrives, and when it comes, nobody knows. This is a very important point to remember when implementing the “Messages with Payload” approach.

Send-to-sync

Let's say we use collapse_key . This is a kind of constant, which can be no more than four for one registrationId, i.e. for one instance of the application. For example, a news application collects some data from different services. Let one server give sports news, another - culture, the third - policy, the fourth - auto. There will be a problem, of course, when the fifth service appears, but now this is not the point. In sending a message for the appropriate column, you can use your collapse_key : sport, culture, policy, auto.

When the next message arrives with the same collapse_key, GCM replaces the old message with the new one. In principle, it is logical, since we remember that the message in the “Send-to-Sync” approach is only a signal to our application that we should go to the server for fresh data. But here we were lurked by one unpleasant moment, because of which we had to abandon the “Send-to-Sync” approach - trotting. Trotling is that the GCM server can wait some time to collect as many messages as possible with the same collapse_key . Everything would be fine, but this introduces a delay in the delivery of the message to the client (consistently noticed a delay of half a minute-minute), which is unacceptable for some types of applications, for example, the messenger.

Because of this delay, we stopped using collapse_key . If your application doesn’t have a small delay in message delivery, then the Send-to-Sync approach is a good choice.

Over time, we took into account all the above details in the implementation of our push server. But still there was a large number of reviews with something like this: “I see new messages only when I go to the application. When it is not running, messages do not reach me !!! ”. At first, the main hypothesis was the registrationId desynchronization stored on the device and on the push server. To confirm it, we installed a check on the device, the essence of which is that the application periodically asked the push server: “Do you have my registratioinId?”. The answer "yes" guarantees us with a high probability that the registrationId is up to date.

And according to statistics, the answers are "yes" 99.7%. Which allowed us to conclude that with registrationId synchronization everything is fine. Began to look for a problem in the area between the device and the GCM. Repeatedly witnessed a situation where Samsung S4, forgive me Samsung, turn off the screen, and messages start to arrive with a big delay (about 10 - 15 minutes). With the help of our network administrators colleagues, it was found that the TCP connection between the device and the GCM became idle, and the packets stopped walking. The reason for all this is the so-called "heartbeat". “Heartbeat” is a sachet (ping) sent by the system once a certain time interval in order to “liven up” a TCP connection between the device and the GCM (you can read more about it here ).

And the interval through which heartbeat is sent is quite large. It seems that in August 2014 it was reduced to 8 minutes, but the information may not be accurate. The Internet offers a solution that is used in the so-called "push-fixes." Its essence is to initiate sending a heartbeat-package manually. But unfortunately, this solution only works for root devices.

Optimism to achieve instant message delivery on all devices supported by us (with the exception of the Chinese iPhones on the android) by means of GCM remained less and less. And the problem with the delay in the delivery of messages must be solved. The only thing that can guarantee a more or less stable delivery of messages is to keep your own connection with the server. But first I would like to learn how to identify the devices on which there is a problem with the delay of push messages. For this purpose, we wrote down the statistics, the essence of which is to compare the difference between the time of the push arrival and the time when the data on the server was ready for the client (when the push was sent). And the statistics showed that approximately 20% of users experience a delay in the delivery of messages. But it is rather rough, because it does not take into account cases with the disappearance of the network and other things. At the moment we are thinking about the implementation of the following algorithm:

Determine whether there is a delay on this device.
If yes, then we begin to keep a constant connection with the backend server, no - we continue to use only GCM (in order to save battery).

Ten minutes taken from the head. If the delta is greater than the threshold value, then switch the application to the operating mode with its own connection.

For experimental purposes, the own connection was implemented using the http long-polling connection, since It was the fastest way to "try." Such an assembly was sent to several of our beta users with a request to observe the message delivery and battery consumption. In general, oddly enough, there was no sharp increase in battery consumption, and users were satisfied with the speed of message delivery. The topic of implementing your own connection to the backend server deserves a separate article, and we continue to think about the implementation itself.

I hope the article will be useful and will reduce your time when using the GCM service, and I will also be happy if you share your experience in solving the problem of the speed of message delivery to the client in your applications.

PS For debugging purposes, I wrote a test push server - maybe someone will come in handy. The source code is here .

Source: https://habr.com/ru/post/260841/

All Articles