We at Skyeng are gradually building our library of important and useful books. It all started with the fact that the founders of the company shared their lists on Facebook (links below), and now the leaders of the directions have joined them. In March, Nadezhda Ryabtsova, responsible for our IT infrastructure, presented her top professional literature. I asked her to tell about each book in a little more detail - I hope this list, supplemented by four weekly newsletters, will be useful to Habra's readers.
First - the promised links. Georgy Soloviev shares a list of important books for entrepreneurs , and Khariton Matveyev for product managers .
Nadezhda Ryabtsova aka ladamalina - Skyeng IT infrastructure manager since 2016, came to us from a small (then) Start Club, Delivery Club. At that time only 15 developers worked for us, and now its department of six remotely working people serves 12 programming teams.
Give her the word.
')
You can’t force our SRE engineers to read everything, so I chose the five most necessary books. The main thing is to realize that in order to support the rapid growth of the company, we have to introduce practices and build new processes in the operations department that were literally not needed three or six months ago.
Practical Monitoring: Effective Strategies for the Real World
Must read for growing startups, no matter how large the infrastructure is. Explains the philosophy of monitoring services for the company and the construction of each component. Most system administrators set Zabbix with collecting a minimum set of metrics and alerting on default thresholds. In Skyeng, this approach does not work, for each of the more than 50 projects, we must be able to identify problems at several levels: application performance, iron status, trends and anomalies in business metrics. Analytics, developers and devops take care of the metrics in each of our products.
Site Reliability Engineering: How Google Runs Production Systems
If I am not mistaken, this book is the first, where the principles of SRE were well systematized and the role of the reliability engineer was described. Super available on practical examples tells how Google has built incident management, monitoring and alert processes in distributed systems, ways to identify routine tasks that degrade team performance. The approaches are explained in such a way that it is easy to project it onto your company, incomparably smaller than Google. In Skyeng, infrastructure is served by only six engineers, and this is enough if you properly adapt the experience of leading large companies.
The Art of Capacity Planning: Scaling Web Resources in the Cloud
The book will teach in advance to plan the expansion of infrastructure for growing projects. If there is not enough capacity in the end, then we have planned it badly. If there are four times as many as needed, then we spent a lot of money in vain. Preliminary estimates should be able to do for a year or more, fortune-telling on a crystal ball will not help. About 8 years ago it was more difficult to do, as it seems to me, although then there were already cloud services, but they did not provide as many services as now.
The project "Phoenix". A novel about how DevOps is changing business for the better
The only book in this short-list in Russian is a pity that so little is translated. It is popularly written, it helps to re-look at the delivery processes in development, to identify bottlenecks, to see the volume of routine tasks, to protect planned work from the blockage of unplanned "fires". I would say that this book is the most useful for managers to reflect, but I also advise engineers to read easily.
Handbook of World Class Agility, Reliability, and Security in Technology Organizations
Read immediately after the "Project Phoenix", a book from the same authors, continues and develops ideas for improving development processes. I also advise managers first. Soon there will be an edition in Russian, very much waiting.
There is also a
list of weekly newsletters that you will not include in the library, but I recommend to all engineers:
SRE Weekly
Monitoring weekly
O'Reilly Systems Engineering and Operations Newsletter
Docker weekly
Well, and traditionally recall that we have a lot of interesting vacancies . Although not in the IT infrastructure department (positions have recently been closed there), there is enough work for everyone!