📜 ⬆️ ⬇️

Video: Uptime Day, conference about monitoring and 24/7 support



In April, Uptime Day was held in Moscow - the first meeting of the uptime.community community - communities of people who are engaged in monitoring, round-the-clock support and administration of complex projects. ITSumma is the ideological inspirer and one of the organizers of this community. At the meeting, specialists from the companies Booking, Badoo, Parallels, ITSumma and Bitrix24 told how their monitoring and support works.

We post slides, theses, videos from speeches and tell a little about the community itself.

If you look at the million conferences held in Russia, you can suddenly notice that there are a huge number of events for developers (backend / frontend), administrators / devops, but if you suddenly want to understand how other people are monitoring projects, how they are organized 24/7 duty, how and exactly who responds to accidents - knowledge becomes very torn off.
')
And I don’t want to step on a rake. We decided to try to create a kind of community where people who are engaged in ensuring that their projects never fall (and if they did fall, they quickly rose) could exchange knowledge, how they monitor and support - and figure out how to do better asking yourself to ask each other questions - maybe there’s a problem that a person understands now, someone has already decided, well, and most importantly, just to meet.

To bring everyone together, we arranged for the uptime.community community (that’s how we decided to name it), which took place on April 7th in Digital october. Below are the performances themselves, and in the end we will tell you how to get into the community.

Inventing the wheel: how we wrote our monitoring


Evgeny Potapov, General Director, ITSumma

Theses:

Every web developer once wanted to make their own framework. Each admin wanted to write their own monitoring. The six-year history of developing our own monitoring system, the reasons for its creation, how we provide data storage, fault tolerance and scaling. Cones that we stuffed. How our system differs from standard systems.

Video:


Slides


Streaming monitoring


Stanislav Osipov

Theses:

- advertising platforms; Features R'n'D and Ops in advertising.
- three whales, on which Zabbix can be turned into a useful tool for perception.
- plate, patch and reporting - shooting Ops managers back into orbit.
- it's not like people: streaming the well-being of the system.
- channels (SMS, Tg, Sl, Ml), streams / groups.
- and now all together: Zabbix, New Relic, Jenkins and others.

Video:


Slides


As usual, the introduction of monitoring from scratch


Nikolay Sivko, co-founder of Okmeter

Theses:

Many Okmeter clients do not know what exactly they need from monitoring. In the process of communicating with such clients, we have formed a more or less general algorithm for covering a project from equipment errors to business metrics. Correct metrics, correct work with alerts, etc.

Video:


Slides


Monitoring when not testing


Ivan Kruglov, senior developer Booking.com

As many know, in Booking, deployments in many cases are made without testing - the price of a mistake is cheaper than the price of the rate of change. Ivan told about how in such conditions it turns out to detect errors quickly, monitor what is happening, and manage the changes.

Video:


Slides
https://www.slideshare.net/slideshow/embed_code/key/crYlLI4fthc0YK "

Effective technical support 24 Ă— 7: instructions for use


Julia Sinyanskaya, head of Parallels technical support team

Theses:

How Parallels managed to build support for corporate clients, having ready-made developments, but at the same time possessing limited resources. Employee search and recruitment, adaptation process and training, shift schedule, performance evaluation.

Video:


Slides


How is monitoring in Badoo


Ilya Ableev, Head of Badoo Monitoring

Theses:

Imagine a burning chair, a burning table in a burning house. Something like this is a typical day in the monitoring department or on-duty administrators in any IT company. In Badoo, they learned how to cope with the burning temperature and share our experiences.

1. What is Badoo: architecture and maintenance department features.
2. Why do we need an independent monitoring department and what it does.
3. How the department is organized: the number of people / shifts; what people do in their free time, so as not to burn out.
4. Tools: what is used to analyze problems, how not to get lost in the flow of events and not to miss important incidents.

Video:


Slides


How to live in the cloud almost without admins: monitoring and operation of hundreds of virtual machines by three people


Alexander Demidov, director of cloud services Bitrix24

Theses:

1. Why Bitrix24 and other services 1C-Bitrix live in the cloud, how we administer our entire infrastructure and how to cope with hundreds of virtual machines and services by three people. How to communicate with developers and QA, how to deploy, and indeed how we live and develop.
2. Monitoring - our everything! Distributed system of real-time monitoring (was nagios, became shinken), analytics, automation, work with incidents.
3. Bonus - what are the most serious rakes we have attacked in the five years since the launch of “Bitrix24”, and how we learned to bypass them.

Video:


Slides


The next event is scheduled for the beginning of autumn, but for now - join the community - leave an email and we will send you a questionnaire (we will not send spam, I swear).

Source: https://habr.com/ru/post/328024/


All Articles