
* in fact, we will write only the prototype of the protocol.
Perhaps you have met with a similar situation - sit in your favorite messenger, chat with friends, go into the elevator / tunnel / car, and the Internet still seems to catch, but send nothing? Or sometimes your communication provider incorrectly configures the network and 50% of the packets disappear, and nothing works either. Perhaps you thought at that moment - well, you can probably somehow do something so that if you have a bad connection, you could still send that little piece of text you want? You are not alone.
Image source')
In this article I will talk about my idea for implementing a protocol based on UDP, which can help in this situation.
TCP / IP problems
When we have a bad (mobile) connection, a large percentage of packets begins to be lost (or walk with a very long delay), and the TCP / IP protocol may perceive this as a signal that the network is overloaded, and everything starts working sooooo slowly if it works at all. It does not add joy to the fact that establishing a connection (especially TLS) requires sending and receiving several packets, and even small losses affect its operation very badly. Also often requires access to the DNS before you establish a connection - a couple more extra packets.
In summary, typical TCP / IP-based REST API problems with a poor connection:
- Bad reaction to packet loss (sharp decrease in speed, long timeouts)
- Establishing a connection requires exchanging packets (+3 packets)
- Often you need an “extra” DNS request to find out the server IP (+2 packets)
- Often need TLS (+2 packet minimum)
In total, this means that only for connecting to the server, we need to send 3-7 packets, and with a high percentage of losses, the connection can take a significant amount of time, and we haven't even sent anything yet.
Idea implementation
The idea is this: we just need to send a single UDP packet to the pre-wired IP address of the server with the required authorization data and the message text, and get an answer to it. All data can be additionally encrypted (this is not in the prototype). If the answer within a second did not come, then we consider that the request was lost and we try to send it again. The server should be able to remove duplicate messages, so re-sending should not create problems.
Possible pitfalls for production-ready implementation
Listed below are (far from all) things that need to be thought out before using something like that in “combat” conditions:
- UDP can be “cut” by the provider - you need to be able to work on TCP / IP
- UDP is not friendly with NAT - usually there is not enough time (~ 30 seconds) to respond to the client’s request
- The server must be resistant to gain attacks - you need to ensure that the packet with the response will not be more than the packet with the request
- Encryption is difficult, and if you are not a security expert, you have little chance to implement it correctly.
- If you set the forwarding interval incorrectly (for example, instead of trying to re-try once a second, try again without stopping), then you can do much worse than TCP / IP
- More traffic may come to your server due to the lack of UDP feedback and endless repeated attempts to send.
- The server may have several IP addresses, and they may change over time, so you need to be able to update the cache (Telegram does well :))
Implementation
Let's write a server that will reply via UDP and send the request number that came to it (the request looks like “request-ts message text”), as well as the timestamp of receiving the response:
Now the difficult part is the client. We will send messages one by one and wait for the server to respond before sending the next. We will send the current timestamp and a piece of text - the timestamp will serve as the request identifier.
Feature Code:
func send(conn *net.UDPConn, requestID int64, resCh chan udpResult) { for {
I also implemented the same thing based on (more or less) standard REST: using HTTP POST, we send the same requestTs and message text and wait for a response, and then proceed to the next one. The appeal was made on the domain name, DNS caching was not prohibited in the system. HTTPS was not used to make the comparison more honest (there is no encryption in the prototype). The timeout was set to 15 seconds: TCP / IP already has re-sending lost packets, and the user will probably not wait much more than 15 seconds.
Testing results
When testing the prototype, the following things were measured (all in
milliseconds ):
- Response time to the first request (first)
- Avg Response Time (avg)
- Maximum response time (max)
- H / U - the ratio of "HTTP time" / "UDP time" - how many times less than the delay when using UDP
100 series of 10 requests were done - we simulate a situation when you need to send just a few messages and after that the normal Internet becomes available (for example, Wi-Fi in the metro, or 3G / LTE on the street).
Tested communication types:
- The “Very Bad Network” profile (10% loss, 500 ms latency, 1 Mbps) on the Network Link Conditioner - “Very Bad”
- EDGE, phone in the fridge ("elevator") - fridge
- Edge
- 3G
- LTE
- Wi-Fi
Results (time in milliseconds):

(
same in CSV format )
findings
Here, what conclusions can be drawn from the results:
- Apart from an anomaly with LTE, the difference when sending the first message is the greater, the worse the connection (on average, 2-3 times faster)
- Subsequent sending of messages in HTTP is not much slower - on average 1.3 times slower, and on a stable Wi-Fi there is no difference at all
- The response time based on UDP is much more stable, which is indirectly visible by the maximum wait time - it is also 1.4-1.8 times less
In other words, in appropriate (“bad”) conditions, our protocol will work much better, especially when sending the first message (often this is all that needs to be sent).
Prototype implementation
The prototype is posted on github . Do not use it in production!
The command to start the client on the phone or computer:
instant-im -client -num 10
. The server is currently running :). You need to look first at the time of the first response, as well as at the maximum delay. All this data is printed at the end.
An example of running in an elevator