📜 ⬆️ ⬇️

How we searched for a bug in Yandex Search Balancer, and found it in Chromium

Some time ago, colleagues began to receive complaints from users that sometimes when using Search and Yandex Browser, they see an error SSL connection error. The investigation of why this happened, in my opinion, turned out to be interesting, so I want to share it with you. In the process of analyzing the situation, we changed the “suspect” software several times, studied a lot of dumps, remembered the device of the TLS state machine and eventually even understood the Chromium code. I hope you will be interested in reading this no less than we were exploring. So.



After some time, we had error log entries and pcap files with similar content:
')


Everything looks as if the server responded incorrectly and the client stopped the handshake. After analyzing the "correct" (accepted by the client) and "incorrect" server responses, we realized that they are identical.

Analysis of the dumps showed that the problem occurs only when the client uses the TLS Ticket (the session reuse mechanism), and if the ticket was not encrypted using the default key (in our case, it was received before the keys were rotated, but less than 28 hours ago).

As I already said , the Search uses its own Balancer, so first we began to look for a mistake in it. However, it was later suggested that the problem may be related to the client's behavior - it occurs when the browser tries to simultaneously create several SSL connections to the web server. This behavior on the part of the browser (several connections) in the general case (let's forget about prefetch, etc.) can cause HTML like this:

<img src="https://domain.com/x"><img src="https://domain.com/y"><img src="https://domain.com/z"> 

By combining these theories, we were able to reproduce the problem on a bunch of Chromium + Nginx and realized that the Balancer code was not involved. Then we managed to finally find out the reason for this behavior.

Further a few details about TLS and its client state machine in the implementation of BoringSSL

So, as you already know, the TLS handshake can be long and short.
At the first call to the server, from the client’s point of view, a long handshake looks like this (I didn’t specifically prescribe the processing of some TLS extensions to make it easier to understand):



States with the prefix SSL3_ST_CR - the client reads a message (record) from the server, with the prefix SSL3_ST_CW - the client sends a message to the server. (Not so long ago, Chromium switched to using the OpenSSL fork - Boringssl, so all the above conditions are valid for him.).

Let's look at the structure of some TLS protocol messages:



Field Assignment (lowered some TLS extensions):

• Version - client version of the protocol (SSL 3.0 / TLS 1.0 / TLS 1.1 / TLS 1.2),
• Random - client random,
• Session ID length - the length of the Session ID field (0 during the first access),
• Session ID - the identifier of the previous session (empty at the first call),
• SessionTicket TLS - TLS extension, Length - the length of the data in the extension, Data is the value.
(At the first call, respectively, the length is 0 and the empty value).,
• Cipher Suites - client-supported ciphers,
• Server Name - SNI TLS extension, allowing you to tell the server which domain the client is accessing.

In order to not do the full - “expensive” and slow - handshake on the next call, the server can offer the client to use one of two methods of session reuse. To do this, it can return to the client in ServerHello either a Session ID pointing to the state saved on the server side (RFC 5246), or a Session ID and Session Ticket TLS (RFC 5077). I talked about them several times in detail .
Since RFC 5077 appeared later, it complements the session mechanism in RFC 5246 and inside the client is built around the same implementation. Today we analyze only the mechanism TLS Tickets.



Field Assignment:
• Version - server version of the protocol (SSL 3.0 / TLS 1.0 / TLS 1.1 / TLS 1.2),
• Random - server random,
• Session ID length - the length of the Session ID field (when the server issues a new ticket, it must be set to 0),
• Session ID - the identifier of the previous session (at the first issuance of the ticket is 0),
• SessionTicket TLS - TLS extension, the presence of this extension means that the server is going to issue a new TLS Ticket to the client by sending a New Session Ticket message in the ST_CR_FINISHED_A state and placing the server in the SSL3_ST_CR_SESSION_TICKET_ state.



Field Assignment:
• Session Ticket Lifetime Hint - the lifetime of the ticket, after which it must be deleted by the client (the client can decide for himself when to remove the ticket within a specified period of time, 0 - at the discretion of the client) ,.
• Session Ticket Length - the length of the ticket data,
• Session Ticket - ticket value.

The value and parameters of the ticket are stored in the client's memory:



It should be noted that for the client the ticket value is an unimportant binary blob, which must either be transferred to the server or saved / updated upon receipt, the reference field is the Session ID . The server uses the first 16 bytes of the ticket value to identify the set of keys that will be used to verify its integrity and decryption. Thus, the server can rotate key values ​​while continuing to accept tickets issued on old keys from customers.

This is a short handshake using the first issued ticket:



where in ClientHello the following values ​​are set:
• Session ID length - the length of the Session ID field (usually 32 bytes),
• Session ID - The Session ID value from the SSL_SESSION structure,
• SessionTicket TLS - TLS extension, length - ticket data length, data - ticket value.
If the ticket is accepted, the server must respond to ServerHello in such a way that
• Session ID length and Session ID are equal to the corresponding fields from the ClientHello.

Moreover, if the received ticket is not updated by the server (the current key is used), then the SessionTicket TLS field in the ServerHello is absent.

If the ticket was accepted by the server, but the key has changed, then the handshake looks like this:



The Session ID length and Session ID values ​​are equal to the corresponding fields from the ClientHello, the SessionTicket TLS field is added to the ServerHello . This puts the client in the SSL3_ST_CR_SESSION_TICKET_A state, and it is waiting for the New Session Ticket message. Upon receiving the New Session Ticket message, the client checks that the Session ID value from ServerHello is equal to that stored in SSL_SESSION, writes the Session Ticket value to the SSL_SESSION structure, and updates (!) The Session ID value, making it equal to the SHA-256 hash value result , sets the state to SSL3_ST_CR_CHANGE.

The place in the Chromium code responsible for the session reuse looked like this:



Here, GetSessionCacheKey () uniquely identifies the domain, port, protocol version. That is, for one origin, no more than one session instance is always stored within a shard.

The SSL_set_session () function does not copy the session instance to the specified connection, but passes a pointer to this instance into it.

Thus, when initializing, for example, three connections in a row, the client will send the same Session ID and SessionTicket TLS values. The first of the connections will succeed and change to the SSL3_ST_CR_SESSION_TICKET_A state, after which the Session_ID value will be changed, and for the second and subsequent clients, the ServerHello will receive not an empty Session ID and, seeing that the value returned by the server (the same one sent by the client) , not equal to the value in the SSL_SESSION structure (it has already been changed by the first connection), will go into the SSL3_ST_CR_CERT_A state (full handheld). The server, rightly considering that the client expects a new ticket from it (SSL3_ST_CR_SESSION_TICKET_A), will send a New Session Ticket message that does not match the expected state and will result in a Unexpected message alert.

The problem is already fixed in Yandex. Browser 15.9 and Chromium 46.

Source: https://habr.com/ru/post/269777/


All Articles