Detection of known malicious code in TLS-encrypted traffic (without decryption)

This article discusses the work of the Cisco research team, which proves the applicability of traditional statistical and behavioral analysis methods for detecting and attributing malware using TLS as a method of encrypting communication channels without decrypting or compromising a TLS session, implements the principles laid down in this study.

The widespread use of the TLS protocol by malware has created new problems for remedies, since traditional pattern-based detection methods are not applicable in this case. However, TLS still has a whole range of capabilities available to an outside observer, which can be used to search for malware, both when analyzing client-side traffic, which starts an encrypted connection, and by analyzing calls to the server to which an encrypted tunnel is built. In this case, we can analyze only the establishment of a secure connection, without access to the transmitted confidential information, and without decrypting the latter. In most cases, such an analysis makes it possible to make a fairly accurate attribution of the established connection to belonging to a particular family of malware, even if we are dealing with a single fully encrypted connection. To test this hypothesis, a group of Cisco employees — Blake Anderson, Subharthi Paul, David McGrew, conducted a detailed study of “Deciphering Malware's use of TLS (without Decryption)”, a preprint of the work is freely available at arxiv.org/abs/1607.01639 , how exactly malicious and enterprise applications use TLS. An analysis of several millions of TLS-encrypted connections was carried out, the possibility of attribution of 18 HPS families using thousands of unique HPS samples and tens of thousands of malicious TLS connections was tested. One of the most important results of this work was to check the correctness of the work of the detection mechanisms of sandboxes and other analysis tools used.

The performance of the HPE classifier correlates well with the way this particular VPO family uses TLS; family of VPO, which to a greater extent use cryptographic functions are more difficult to classify.

We have proven that the use of TLS by malware and legitimate applications is different, and these differences can be successfully applied to create behavioral detection rules or classifiers used in machine learning.
')
How and from where can we get this information? We can collect it directly on network devices, switches and routers that allow us to collect network telemetry (un-spliced Netflow / IPFIX) for analyzing information about connections, and also send for analysis the first packet initialization of an encrypted TLS connection (Initial Data Packet, IDP), for analyzing TLS metadata. We can also collect related information about DNS and HTTP requests, to improve detection accuracy and reduce the number of errors and information about global reputation or suspicious behavior based on information from the cloud reputation center.

The solution architecture is as follows:

As an example of using this technology in order to collect information about the cryptographic parameters used (compliance with regulatory requirements, for example, to audit PCI-DSS compliance):

Malware detection (information from Cisco Cognitive Analytics Global Cloud Center, CTA):

Malware detection (correlation of global and local information):

Example incident investigation:

Confirmed threat:

When creating classifiers of machine learning on the basis of belonging to one or another family of HPE, it became obvious that some families are harder to detect, and some easier. Our goal was not only to detect traces of malware in traffic, but also to do it in an optimal way - to pay attention to which parameters allow us to make more accurate conclusions for this family of malware, and which are less accurate.

Finally, we have demonstrated that attribution of known malware can only be done based on the analysis of network traffic without decrypting a TLS connection.

A detection accuracy of 90.3% was achieved with the attribution of the HPE family, when we are limited to a single encrypted connection, and an accuracy of 93.2% when analyzing all available encrypted connections within a five-minute analysis window. For the analysis of the first five minutes of activity of well-known samples of malware, the Cisco ThreatGrid dynamic analysis system was used. Tens of thousands of unique malware samples were collected, and hundreds of thousands of malicious, encrypted connections were analyzed. Telemetry was collected about the millions of encrypted TLS connections in corporate networks, for comparison with the telemetry generated by malicious connections.

An open toolkit was developed for efficiently collecting and preprocessing and converting network telemetry into JSON ( the Joy project ), which collects all the information necessary for analysis — source and destination IP, source and destination ports, protocols, time-frequency characteristics of transmitted packet sizes, frequency byte. distribution and entropy, unencrypted TLS connection establishment information. The entire analysis is performed only at the network level, without the need to install any agents on the end devices.

Additional materials:
Download the official Cisco report on encrypted traffic analytics.

Source: https://habr.com/ru/post/346544/

All Articles

Detection of known malicious code in TLS-encrypted traffic (without decryption)

More articles: