📜 ⬆️ ⬇️

Mars IS expands the functionality of the operational analysis platform SPLUNK

Collecting, analyzing and using big data from the technological infrastructure is one of the important emerging areas of work for Mars IS. In today's publication we will talk about a project on the use of the SPLUNK platform for monitoring and analyzing the operating activities of IT infrastructure and applications.



The future is laid today


Every IT specialist understands that areas such as big data and machine learning have great potential. In Mars IS, there were many different developments on these topics, but the global vision and movement towards a clear goal appeared only after we had selected several strategic technologies.

One of them was the platform for operational analysis SPLUNK. This is the main software product of the same American company, which allows you to collect, analyze and use machine data from technological infrastructure, security systems and business applications.
')
This data has great hidden potential. Using the results of machine analysis of such information helps to improve the performance, profitability, competitiveness and security of the company.



I must say that Mars IS is not new to the use of SPLUNK. We have used this platform for many years, but, like most other companies, only for monitoring security breaches. And during this time she has proven herself very well.

Expand horizons


Conclusion SPLUNK outside the security field contained a certain risk, because today in the world there are not many companies use it for something more. But we saw the potential of the system, were ready to develop and improve the product itself.
Application files with event records (logs) are millions of lines of code that you can’t manually analyze or even view. And the SPLUNK platform does this automatically. The system finds the necessary code link and reports the presence of a problem or its imminent appearance.

As a result, it was decided to use SPLUNK for monitoring and analyzing the operating activities of the IT infrastructure and applications. It became clear to us that the time had come to automate most of the processes of machine analysis and machine learning.

Implementation progress


Specialists from different areas take part in the implementation of the platform, but the backbone of the team consists of four people.

The business analyst is well aware of what data different systems provide and with which this information can be “crossed” in order to get a deeper analysis.

Specialist SPLUNK is engaged in setting up the system so that it is user-friendly and as productive as possible. It also brings new data sources to the system.

The architect monitors integration with other systems and the correctness of the process of interaction between support teams.

The system accessibility solution architect is a specialist who, on the basis of SPLUNK, creates a single monitoring tool that allows you to see and fix a problem before it appears.

Many colleagues, on their own initiative, helped us with advice, technology, and business cases. In the short term, at their workplace, they will not receive any benefits from this, but it turned out that people see the picture of automation as a whole and want to help in the speedy development of this area. In the process of implementation, we were once again convinced that Mars IS is a place where people are interested, they work in a team and do one thing enthusiastically.

The first step was a network of "agents". SPLUNK as a large spider connects hundreds of computers into a single network, receiving data from them at the level of hardware, software, and, in some cases, applications.

Now, when gigabytes of data of different levels and details began to flow into a single SPLUNK cloud, Mars IS specialists can analyze errors on servers and programs within a few minutes. The time to search for a problem has decreased significantly.



The system learns itself, we learn with it


Like any IT tool, SPLUNK requires certain qualifications in order to learn how to make “its own” requests. Therefore, we create dashboards and reports for teams that are not yet ready to start studying the program, but already want to see their data and make decisions based on them.

To find repetitive problems, we use standard mathematical models to identify dependencies. For example, we want to check if we have any problems today with the end of the job (plan task) in the program. For this, it is necessary that SPLUNK considers the standard time for today precisely for this job. We cannot set limits manually, as this will require huge resources to keep the model current.

The system also looks at the historical standard load time and adjusts the limits automatically. She can see and exclude from the calculation of peak values, which, most likely, were the problem. That’s how, by gradually teaching the SPLUNK model, we learn not only to understand and see our own data, but also to predict the problem before it appears.
So that the problem does not “hang” in the air, we carried out the integration of the SPLUNK and ServiceNow systems, which made it possible to turn knowledge of the approaching problem into an incident in the ITIL ITSM system. Then it will either be resolved automatically, without the involvement of people, or the system will redirect it to a specialist of the appropriate profile.

Awesome goal inspires


Now the main work is underway to bring as many logs as possible from different departments into the system and calculate the return on automation of one or another scenario.
Despite the difficulties, the project is gaining momentum, his team is growing. After some time we will be able to share beautiful examples of our implementations, which have saved us a lot of money.

More and more people want not only to look for their own logs in the system, but also to create complex queries and analyze them. This means that the level of technical expertise will continue to grow.

Not far off is the day when we will not need to deal with simple scenarios that can be accurately described in machine language. The system will do this for us. Employees will create complex analytical models that will be processed by machines as they develop.

In the photo: lagutolg limat


When you see Mars IS keeping up with the times, it is very inspiring! We recently attended the SPLUNK conference in Washington, where we shared our goals and solutions with other teams implementing this system.

Our plans are very ambitious. Of course, there will be a lot of difficulties ahead, since we are actually pioneers in this matter. But for our team this is not just a job, but an amazing goal. And we want to achieve it, creating a new and developing ourselves.

Source: https://habr.com/ru/post/342008/


All Articles