Artificial intelligence in the network security service. Part 2

Part 2. Part 1 by reference.

In our case, the Introspect behavior analytics system from the User and Entity Behavior Analytics product class (UEBA for short) is a single entry point for a large amount of various machine information collected from the existing infrastructure, including from SIEM systems, and on the basis of machine analytics algorithms and artificial intelligence to help security staff automate the routine work of analyzing a large number of incidents.

Moreover, the system can be integrated with existing infrastructure access control systems (NAC) to perform various actions with sources of abnormal behavior in the network — turn off, slow down, move to another VLAN, etc.

')
What kind of data should Introspect receive as baseline information? The most diverse, up to network traffic. For this purpose, the system has specialized components for processing traffic - Packet Processor (PP).

The advantage of receiving data from SIEM systems can be the fact that they have already passed preliminary analysis (parsing) by these systems. Introspect works with such SIEM systems as SPLUNK, QRadar, ArcSight. The next step is the implementation of LogRhythm (raw syslog), Intel Nitro. In addition, the system collects a huge array of data:

MS Active Directory (AD Security Logs, AD user, group, group user), MS LDAP logs,
DHCP logs
MS DHCP, Infoblox DHCP, dnsmasq DHCP
DNS logs
MS DNS, Infoblox DNS, BIND
Firewall logs
Cisco ASA (syslog), Fortinet (via SPLUNK), Palo Alto (via SPLUNK), Checkpoint (via SPLUNK), Juniper (via SPLUNK).
Proxy logs
Bluecoat, McAfee, ForcePoint
Alert
Fireeye
MS ATA
VPN logs
Cisco Anyconnect / WebVPN
Juniper VPN (via SPLUNK)
Juniper Pulse Secure (via SPLUNK)
Fortinet VPN (via SPLUNK)
Checkpoint VPN
Palo alto vpn
Flow logs
Netflow v5, v7, v9
Email logs
Ironport ESA
Bro logs
Connection logs

The competitive advantage of the system is the ability to work at the transaction level, i.e. with network traffic that complements the information received in the log messages. This gives the system additional analytical capabilities — analyzing DNS queries, tunnel traffic, and efficiently searching for attempts to transfer sensitive data outside the organization’s perimeter. In addition, the system provides analytics of packet entropy, analysis of HTTPS headers and files, as well as analysis of the work of cloud applications.

The above-mentioned packet processors (PP) have a virtual and hardware implementation, operate at speeds up to 5-6 Gbps and perform DPI “raw” data, extract context information or packet metadata from it and transfer it to another component of the system - the analyzer.

If analysis decisions are made not only for logs, but also for traffic using SPAN / TAP methods or using a package broker or repeaters, such as Gigamon or Ixia, PP should be located in the right place on the network. For maximum efficiency, it is necessary to capture all network traffic going in each user VLAN to / from the Internet, as well as traffic going from / to users to protected resources or servers containing critical information.

A necessary and key component of the system is the Analyzer. It processes data from logs, flows, packet metadata, alerts from third-party systems, threat intelligence feeds and other sources.

An analyzer can be a single 2RU appliance or a horizontally scalable scale-out solution consisting of a set of 1RU appliance as well as a cloud solution.

Logical structure

Logically, the Analyzer is a horizontally scalable Hadoop platform consisting of several types of nodes - Edge Nodes, Index and Search nodes, Hadoop data nodes.
Edge Nodes receive data and record to Flume channels with HDFS receivers.

Index and Search nodes extract information from three types of bases - Hbase, Parquet, ElasticSearch.
Hadoop data nodes are intended for data storage.

Logically, the system works as follows - packet metadata, flows, logs, alerts, threat feeds are parsed, cached, distilled, correlated. The system performs a link between the user and his data, caching fast-moving user data in HDFS.

Then the data goes to the discrete analytics module, where, based on the received information on any fixed event or field, the so-called discrete alarms are sifted out. For example, the operation of the DNS DGA algorithm or an attempt to log into a blocked account does not obviously require any computer analytics to detect a potentially dangerous event. The behavior analytics module is connected at this stage only for reading potential events on the network.

The next step is the correlation of events, indexing and storage in the above mentioned bases. The behavior mechanism of analytics is enabled based on the stored information and can work on the basis of certain periods of time or on the basis of the behavior of this user in comparison with another user. This is the so-called baselining mechanism of behavior profiling. Models of behavior profiling are built on the basis of machine analytics algorithms SVD, RBM, BayesNet, K-means, Decision tree.

The integrated model of the behavior of the analytics product is shown in Figure 1.

Pic1

The diagram shows that the behavior analytics mechanism is based on four blocks:

data sources;
conditions of work with data (access time, the amount of downloaded or downloaded data, the number of e-mail messages, information about the geolocation of the source or receiver of information, VPN connection, etc.);
mechanisms for profiling user behavior (evaluation of behavior after some time or in relation to another employee, time window during which the analysis is performed, mathematical model of behavior profiling - SVD, Restricted Boltzmann Machine (RBM), BayesNet, K-means, Decision tree and others);
detection of anomalies in traffic using mathematical models of machine analytics, such as mahalanobis distance, energy distance and the generation of events in the system with a certain priority and stage.

Aruba Introspect has more than 100 supervised and unsupervised models designed to detect targeted attacks at each stage of the CKC model. For example, the implementation of the Introspect Advanced level detects

Suspicious network activity types: Abnormal Asset Access, Abnormal Data Usage, Abnormal Network Access, Adware Communication, Bitcoin Application in the form of Bitcoin Mining, Botnet (TeslaCrypt, CryptoWall), Cloud Exfiltration, HTTP Protocol Anomaly (Header Misspellings, Header Misordering), Hacker tool Download, IOC attack types (IOC-STIX Abuse-ch, IOC-STIX CybercrimeTracker, IOC-STIX EmergingThreatsRules and others), Network Scan, P2P Application, Remote Command Execution, SSL Protocol Violation, Spyware Comm, Suspicious Data Usage, Suspicious External Access , Suspicious File, Suspicious Outbound Comm, WebShell, Malware communication, Command and Control, Lateral movement, Data Exfiltration, Browser exploit, Beaconing, SMB execution, Protocol violation, Internal Reconnaissance and more.
Suspicious access to accounts such as Abnormal Account Activity, Abnormal Asset Access, Abnormal Logon, Privilege Escalation, Suspicious Account Activity, Suspicious User Logon and others
Data access via VPN: Abnormal Data Usage, Abnormal Logon, Abnormal User Logon,
DNS data analysis: Botnet work through various DNS DGA algorithms
Email Analysis: Abnormal Incoming Email, Abnormal Outgoing Email, Suspicious Attachment, Suspicious Email

Further, on the basis of the identified anomalies, an event is assigned a risk assessment associated with a particular stage of hacking the system, as determined by Lockheed Martin's Cyber Kill Chain (CKC). Risk assessment is determined by the Hidden Markov Model, unlike competitors, which linearly increase or decrease the risk assessment in their calculations.

As the attack develops on the CKC model, i.e. stages of infection, internal reconnaissance, command & control, privilege escalation, lateral movement, exfiltration, risk assessment increases. See fig.2

Pic2

The system has the functions of adaptive learning, when the results of the analytics module are subject to revision or adaptation, in assessing risk scoring or when placed on a white list.

Information about threats or Threat Feeds can be downloaded from external sources, using the mechanisms of STIX, TAXII. Anomali resource is also supported. Introspect can also download “Whitelist” domain names from the Alexa service to reduce false positives in the generation of alerts.

The competitive advantages of the system are:

a variety of input data used,
DPI function,
correlation of security events with the user, not the IP address, without additional software,
Using Hadoop / Spark big data as the basis of the system with unlimited clustering capabilities
the results of the system, obtained on the basis of analytics, the ability to investigate incidents using full-context forensics, threat hunting,
integration with the existing NAC Clearpass solution,
work without an agent at Endpoint,
practical independence from the type of network infrastructure manufacturer
On-premise work, without having to send data to the cloud

The system has two delivery options - Standard Edition and Advanced Edition. The Standard Edition is adapted for Aruba Network equipment and receives log information from AD, AMON, LDAP, Firewall, VPN logs.

Source: https://habr.com/ru/post/423545/

All Articles

Artificial intelligence in the network security service. Part 2

More articles: