I am Sergey Rublev, head of the SOC (Security Operations Center) at Infosecurit company.
In this article I will discuss in detail the ambitious project
Sigma Rules , the motto of which is: “Sigma for logs is like Snort for traffic and Yara for files”.

It will be about three aspects:
')
- The applicability of the Sigma-rules syntax for maintaining a knowledge base of threat identification scenarios
- Possibilities of tools for generating rules for boxed SIEM systems
- The SOC value of the current content of the Sigma-rules public repositories
Once upon a time, in a galaxy far far away
It all started a few years ago, when the trees were large, and our monitoring team was still small. We are faced with a lot of questions, almost any team that grows into three people passes through this.

The causes of the questions are different:
- Team growth
- Staff turnover
- A large number of heterogeneous systems on monitoring
In case you have to take a SIEM already tuned by someone, the number of questions grows like an avalanche.
Use Case Library
The world experience of building monitoring centers has already come up with a solution for organizing chaos and his name is the library of juz cases. The goal of each case is to comprehensively describe the solution of a certain task within the framework of information security monitoring.
The composition of knowledge laid down in each case can vary, we are repelled by the following set:
- Objective - a task solved by a case
- Threat - a threat that is detected by the detection rule.
- Stakeholders - people interested in the work of this rule: IB / IT / Business
- Data Requirements - the data set required to identify the threat
- Logic - the logic of detecting threats
- Testing - an algorithm for testing the correctness of the detection rule
- Priority - priority of event handling by case (as a rule, it is calculated from the potential damage from a successfully realized threat)
- Output - A list of actions for parsing an alert, a description of the correct exits from the parsing procedure and the composition of the data recorded in the parsing results
Example use case for the task of detecting communication with the botnet control server (C & C popularly or just C2):

The example is considerably simplified; in reality, the case with proper description expands into a multipage document.
At that moment, when the number of cases exceeded several dozen, we began to look for ready-made tools for maintaining such a knowledge base, preferably having, besides human friendly, also some kind of machine friendly interface for work.
Sigma project
The Sigma project certainly deserves consideration in the context of the knowledge base on incident detection rules. He started in 2016, and I follow him almost from the very beginning.
In fact, the project consists of
- Samih Sigma-rules
- Utilities for converting rules to queries for various SIEM systems
The list of SIEM is impressive: there are almost all popular solutions for analyzing events. Then everything in detail and in order.
Syntax rules
Sigma rules are YAML documents that describe a script to detect a particular attack. Syntactically, the rules consist of the following blocks:
Meta-information
The descriptive part to structure and simplify the search for the necessary rules.
title: Access to ADMIN$ Share description: Detects access to $ADMIN share author: Florian Roth falsepositives: - Legitimate administrative activity level: low tags: - attack.lateral_movement - attack.t1077 status: experimental
Separately, I would like to note that many of the rules are already supplied with links to the attack technique using the MITER ATT & CK methodology.
Data source declaration
Description of the source based on the events of which the detection logic is implemented.
logsource: product: windows service: security
Syntactically, it is possible to describe both the final service of a specific product, and the whole category of systems.
Declaration of processing logic
At the detection logic level, the following are described:
- Required patterns
- Values ​​of certain fields in the log
- Time frame
- Aggregate functions
The logic can be as trivial, for example, the conditions imposed on a set of fields:
detection: selection: EventID: 5140 ShareName: Admin$ filter: SubjectUserName: '*$' condition: selection and not filter
and quite complicated:
detection: selection1: EventID: - 529 - 4625 UserName: '*' WorkstationName: '*' selection2: EventID: 4776 UserName: '*' Workstation: '*' timeframe: 24h condition: - selection1 | count(UserName) by WorkstationName > 3 - selection2 | count(UserName) by Workstation > 3
Although expressive means of language are not universal, they are still quite wide and allow us to describe a large number of cases for detecting attacks.
Rule development tools
In addition to your favorite text editor, SOC Prime's WEB UI is also available for YAML, which allows both to validate the syntax of an already written rule and to create rules manually from graphic blocks.

Sigma as a means of maintaining the knowledge base
Let's summarize a brief summary.
At the moment, the syntax of the rules mainly concentrates on the description of the threat detection logic and is not intended for a comprehensive description of the use case, respectively, it will not work to maintain a full-fledged library using only Sigma Rules.
For the use case structure we chose, Sigma covers only half (Objective, Data requirements, Logic and Priority).

Conversion to various SIEM
Since we are a SOC service provider, the idea of ​​keeping all our developments according to the correlation rules in a universal format looked very tempting to us and at the implementation stage to convert the necessary SIEM into the format.
The project includes console utilities for generating event requests in the format of various SIEM. Consider what constitutes a conversion and what is under its hood.

The conversion takes place in 3 stages:
- Parsing the rules - I think everything is clear with this: the YAML document is parsed into its component blocks
- Reduction to the taxonomy of the SIEM
The need for this stage is due to the fact that normalization in the SIEM systems is implemented a little differently, respectively, the declaration from the Sigma-rules must be brought to the taxonomy of the events of the selected SIEM. - Request generation for SIEM
For this stage, another component is required - the backend for this SIEM.
In fact, the backend is a plugin for the conversion utility, which incorporates the conversion logic to the final request format in SIEM. The detection and logsource blocks are converted based on the previously applied field mapping, additional SIEM-specific information is added.
As a result, the launch of the conversion utility is as follows:

The following parameters are passed as parameters:
- Target SIEM
- Rule
- File with mappings for this SIEM
SOC Prime also has a ready UI for the conversion function (
uncoder.io )

Conversion pitfalls
- Having studied the mechanics of conversion, we faced significant limitations, which kept us from transferring all the developments into the Sigma format:
- The converter operates only with the request. The correlation rule in the SIEM covers more aspects: time window, aggregation, actions based on the results of detected alerts
- Key features of individual SIEMs, for example, ActiveLists, are not taken into account.
- Insufficient detailing of the mapping of fields - as part of the mapping configuration, the fields of just a few sources are described, respectively, having rules for several dozens of different types of event sources in the base, you have to invest heavily in writing mapping.
Rule base
Let's see what the publicly available Sigma rule base carries. Currently, content is actively being added to two repositories:
- The main repository of the project
- SOC Prime Threat Detection Marketplace
The rules in the repository have a non-zero intersection.
SOC Prime has a number of rules distributed in a paid subscription, I do not consider their content in this article.
For analytics, we need a
sigmatools library for Python and some programming skills.
To parse and download the rules from the catalog to the dictionary, you can use the following code:
from sigma.parser.collection import SigmaCollectionParser import pathlib import itertools def alliter(path): for sub in path.iterdir(): if sub.name.startswith("."): continue if sub.is_dir(): yield from alliter(sub) else: yield sub def get_inputs(paths, recursive): if recursive: return list(itertools.chain.from_iterable([list(alliter(pathlib.Path(p))) for p in paths])) else: return [pathlib.Path(p) for p in paths] BASE_PATH = [r'sigma\rules'] path_list = get_inputs(BASE_PATH, True) rules_map = {} for sigmafile in get_inputs(BASE_PATH, True): f = sigmafile.open(encoding='utf-8') parser = SigmaCollectionParser(f) rule = next(iter(parser)) rules_map[rule['title']] = rule
Deduplicating the same rules, the following picture emerges:

Within the framework of a unique list of rules, we obtain the following distributions:
By type of event source:Slightly larger statistics
- Windows ~ 80%
- Sysmon ~ 53%
- Proxy ~ 8%
- Linux ~ 4%
Basically, the current content is focused on the Windows system and Sysmon, in particular, only a few of the rules on other systems.
By the degree of content readiness:It turns out that the developers of the Sigma-rules marked as stable, less than 20% of all existing rules.
Let's sum up
There are a large number of rules in publicly available sources. They are regularly updated, and the rules for detecting indicators appear quickly, and sometimes even the technician for the loudest APT companies.
To apply the rules in real life there are a large number of restrictions:
- A lot of rules for Microsoft Sysmon, which is rarely used in the enterprise.
- There are many rules that actually verify IoC (hashes, IP addresses, URLs, User Agents). Such rules quickly become obsolete, and there are more efficient mechanisms than rules for searching for IoC.
- A lot of experimental content, respectively, imposes additional requirements on high-quality testing before commissioning.
In Infosecurity, we use the content of Sigma-rules as an additional source of knowledge for more efficient detection of incidents. If we find something interesting, we implement it already within the framework of our correlation rules, which take into account the core of the rules (Apache Spark), and the specifics of the infrastructures and the security tools we use.