📜 ⬆️ ⬇️

Uptime day 2: Russian IT companies will talk about how to cope with disasters

Three weeks later, on Friday, the 13th, in the Deworkacy coworking in Moscow, the second Uptime community conference will be held, the subject of which is the accident in the IT infrastructure. Only 300 places, participation is free - under the cut there is a link to the registration.

image

A bit of history


The idea of ​​naming the conference (and the community) came to us at the same time with the guys from Code & Supply in Pittsburgh. Their domain uptime.events is registered on March 28, 2017, our uptime.community is on March 14. Our first conference was held in April, watch videos .

In August, a similar conference was held in Pittsburgh, I was a volunteer at the sound engineer, and even spoke a little.
')
image

What will happen on Uptime day 2


So, on October 13 in Moscow we will discuss IT disasters that have happened in the lives of Badoo, Carprice, Revizium, ITSumma, and Bitrix24 employees.

My report is “Incident management and accident life cycle research”. The flip side of technological progress in the 20th century was a large number of man-made disasters. The operation of high-load projects is the same technological process that occurs daily in aviation, medicine and large-scale industry. For decades, there has been the practice of investigating major incidents and detailed analysis of the causes of accidents in order to avoid them in the future. However, in our area there are still no common practices that will prevent the repetition of already passed errors. Each company approaches this issue in its own way, often not knowing what is happening on the same rake that their colleagues have stumbled upon hundreds of times.

Supporting the websites of 350 clients around the clock, we face an average of ten serious accidents per day, with about half of them occurring due to the human factor. It is important for us to train specialists from both sides on how to avoid such accidents.

Using the example of real accidents, I will show those techniques and technologies that ITSumma uses to solve incidents that have already occurred and, more importantly, prevent them in the future.

Consider the following processes:

1. Fixing how team members interact with each other during the elimination of an accident.
2. Creation and analysis of post-mortem accidents.
3. Development of recommendations and regulations for us and for clients.
4. Developing incident management software.
5. Embedding analysis results in daily development and support procedures.

Friday, the 13th is a great day to talk about disasters. Participation is free, register .

Source: https://habr.com/ru/post/338432/


All Articles