As often happens, first you are looking for a solution on the market and, without finding it, you make it yourself and for yourself. And then it turns out so well that you give it to others. So it was with OpenSOC, an open source solution for managing large amounts of data in the field of cybersecurity, which was developed at Cisco for its own needs, and then laid out on GitHub for universal access.

If you recall one of our previous
notes about how the access control process is built within Cisco, then you can pay attention to the number of devices that we have to monitor. And now let's see how much safety-relevant data these devices generate / pass through on a
daily basis :
- 47 Terabyte of network traffic data
- 1.2 trillion network events
- 4.8 billion DNS queries for Cisco Umbrella
- 4.1 million email for Cisco Email Security Appliance
- 45 Million Web Requests (URLs) for Cisco Web Security Appliance and Cisco Cloud Web Security
- 15 Billion NetFlow Flows for Cisco Stealthwatch
- 1.5 Million Alarms from Cisco NGIPS
- 10 thousand files for the Cisco AMP ThreatGrid system.
In total, we collect and store 4 TB of data daily for analysis. This is a huge amount of information for the effective management of which our cybersecurity service simply needed specialized tools. The first thing that comes to mind when the topic of security event management arises is SIEM (Security Information Event Management), and we, like many other companies in the world, have also tried to use such solutions. But unfortunately, not a single SIEM solution could solve our problems and there were several reasons for that:
')
- the complexity of data indexation is not from information security tools, including the support of data formats from our developed information security tools (in this case, this is not about the Cisco portfolio, which we not only sell to customers, but also use ourselves, but personally wrote various information security tools on our own “Tools” for some tasks)
- serious problems with scaling and data processing speed (searching in just 10 GB of data took more than 6 minutes)
- complexity in customizing solutions for our tasks and the need to rewrite almost from scratch all built-in rules that generated too many false positives
- the complexity of working with structured and unstructured data.
As an alternative to existing SIEM solutions, we began to use Splunk, which allowed us to solve many of the problems listed above, and even cheaper. But ... Splunk, in spite of everything, still could not solve all our problems; only 95% of them. He collected data from our antiviruses, ITU, IDS / IPS, content filtering systems, etc. But 5% of the information, which is very important for our incident response team, was left out. Very important information about threats, attackers, their methods and tactics. It was not so easy to enrich the data in Splunk from the means of protection with this information. At first we created various simple scripts and utilities, each of which solved its own narrow problems. Then came the idea to make your decision, turning it into a full-fledged platform for analyzing big data in the field of cybersecurity (Threat Intelligence Platform).

Since this system, called OpenSOC, was created for its own needs, it was impossible to attract any developers involved in the development of Cisco products - we had to use only our own information security forces. Therefore, it is quite logical that we did not write everything from scratch, but used open source solutions, among which the main role was played:
- Flume (https://flume.apache.org) is a distributed tool for collecting and aggregating large amounts of data from different sources in different formats (Syslog, SNMP, Netflow, JMS, HTTP, files, PCAP, etc.). In OpenSOC, Flume collects telemetry from various protections, applications, and equipment.
- Kafka (http://kafka.apache.org) is a distributed high-performance message broker (bus).
- Storm ([http://storm.apache.org]) is a real-time distributed data handler that receives data to perform calculations on them, including from Kafka. Built-in handlers allow you to perform various tasks on security events - filtering, normalizing, parsing (parsing), enriching information about threats. In the latter case, the normalized data is extended by the context of information security - geolocation, IP reputation, information from Whois, etc. The public version of OpenSOC by default includes only 4 processors involved in enriching the source data.

- Hadoop ([hadoop.apache.org]) is a set of utilities and libraries for developing execution of distributed programs, for example, for search and contextual mechanisms. Underlies the technology of “Big Data”. Hadoop cluster in OpenSOC stores all data
- Elasticsearch is a real-time search and indexing system for large amounts of data.
- HBase is a non-relational distributed database. In OpenSOC provides long-term storage of network packets (PCAP).
- Hive is a database management system based on Hadoop, including HBase. In OpenSOC provides long-term storage of security events.
- MySQL is a relational database management system. OpenSOC stores Hive metadata, as well as other data, such as geolocation.
- Kibana is a visualization system.

OpenSOC has become a unified security data analysis platform at Cisco, which:
- collects, stores, processes and analyzes data from various sides in order to detect anomalies and other obvious and implicit violations of information security
- allows you to use different predictive and correlation models to search for hidden interconnections in the collected data
- takes into account the context of the analyzed data in real time
- visualizes security events in the right environment and generates reports for interested parties.
From a technical point of view, it is based on a cluster of 45 servers of Cisco UCS (1440 processors, 12 TB of memory), the aforementioned technologies of Hadoop, Hive, HBase, Elasticsearch and a number of others. The storage capacity of OpenSOC at Cisco is 1.2 PB. In one table is over 1.3 trillion lines.

We began to create Cisco OpenSOC in 2013, and in September 2013, the first prototype appeared. In half the way, in December 2013,
Hortonworks joined us, which gave an impetus to the development of the project and brought many interesting ideas on using open source components for a high-performance and distributed platform, which was supposed to be OpenSOC. In March 2014, we completed the development of OpenSOC, and in September 2014 we made it public and uploaded to
GitHub .

In the basic configuration supports the functions of data collection, their storage and analysis, as well as enriching the information about the threats, victims, attackers. The number of adapters, which essentially perform the function of correlating various events, is small in the public version of OpenSOC. As is the case with most SIEM, the out-of-box correlation rules do not work very well, and therefore in OpenSOC you need to write them yourself for each system user. In the NTT group of companies, for example, the open source engine
Esper is used as a tool for creating triggers and correlation rules.

After the success of OpenSOC within the company, it was decided to develop it as the basis for our outsourcing services.
Cisco Active Threat Analytics (ATA) is essentially a distributed Security Operations Center (SOC), which takes over the functions of monitoring and managing the information security of our customers. Developing OpenSOC both within the company and within the Cisco ATA, we are faced with the fact that the Apache 2.0 license does not always allow us to implement everything we wanted. Yes, and only open source components, we could not be limited to solving all the problems we have.

It was decided to divide the development of OpenSOC into two directions. The first remained in Cisco, for the needs of our information service and ATA. But the second direction was very interesting. OpenSOC entered the Apache incubator, becoming a full-fledged project developed by the open source community. OpenSOC changed the name, becoming
Apache Metron . In April 2016, the first version of Apache Metron 0.1 was released. At the same time, the ideology remains unchanged and it will be easy for OpenSOC users to switch to Apache Metron.

Cisco OpenSOC has been developed in a different direction. On its basis, MapR has created its
Security Log Analytics solution, which is used by NTT, Zion Bank, and other customers mentioned above. But, as in the case of Apache Metron, the ideology of OpenSOC remains unchanged - the analysis of big data for cybersecurity purposes and the ability to work not only with structured, but also with unstructured data. This allows you to significantly expand the functions of monitoring threats in the company and "see" much more than before, using more information sources. For example, in Cisco over the past few years, there has been a shift from the usual static rules and signatures towards behavioral analysis, anomalies, and threat intelligence. All this would have been impossible without Cisco OpenSOC.
