5 basic anti-patterns of system administration

Introduction

This article is more from the category “for the smallest” than “for the wise by experience”, but it is designed to enhance the professional culture of system administrators.
Due to the nature of the work, I inherited the most diverse cloudy hell that has to be raked, optimized, revived and made transparent and beautiful. These notes are perhaps an illustration of those moments that are generally unacceptable in system administration.
The reasons that give rise to these anti-patterns, you can understand a very long time: deadlines, laws and business pace, and just idiocy, finally. But the purpose of the article is different. I would like to create a constructive discussion. And now its results are the main goal of the article.

Meet the anti-patterns:

1. Manual control / configuration of the system by administrators.

What is it?
This is perhaps the most frequent and most dangerous anti-pattern, especially when it is supported by others. The essence of the problem can be described in three words - “to err is human”. And if, according to a certain law, trouble can happen, it will happen. That happens, from time to time, similar to this kommitu cases. The degree of idiocy in this particular situation is of course beyond the limits, and it is you who can never go wrong so stupid , but isn't it easier to take preventive measures?
What can you do about it?
The simplest thing is not to go to the server via ssh yourself. At all! Master the configuration management systems: Opscode Chef , Puppet well, or CFEngine , for example. Basic information available, including in Russian (and also in Habré) is more than enough for a successful understanding and start of use.

2. Third-party components that interfere with system updates.

What is it and what to do with it?
I’m almost sure that every system administrator who came across ruby, but didn’t open rvm / rbenv for himself at that time, at least once in his life collected it from the sources on his server and used in production. And now the question is - you need to update ruby urgently on 16 front-end servers, because a patch has been released covering the critical vulnerability that allows you to get root rights remotely (an example, of course, taken from the ceiling, but anything can happen in life). Will you go to each of the servers manually compiled? Or, nevertheless, on a test machine, you will assemble a new package and centrally update all servers using the tools from the description of the first anti-pattern? The answer, I think, is obvious.

3. Lack of standardization.

What is it and what to do with it?
Oddly enough, this anti-pattern is either a cause, or a consequence of the first two. Imagine a zoo of 16 front-end servers with different versions of debian, centos and gentoo with connected non-standard repositories of dubious origin? Submitted? Crossed? It's good.
Fortunately, fighting this is not at all difficult. Write guidelines and follow them. What could be easier?
')

4. Lack of monitoring and notifications.

What is it and what to do with it?
Strange, but sometimes it seems that this suffers a little less than 50% of companies. If you don’t have at least Nagios and Monit to collect all kinds of metrics and gladly send letters to your Operations Team in case of something extraordinary, you are guaranteed to be nervous in the office for 24 hours in a row. And maybe all 48.
To fight this anti-pattern is very simple and the flight of your imagination in the choice of tools is limited only by your religious beliefs. You want - you can keep Nagios or Zabbix + Cacti in yourselves, you want to use SaaS-solutions like Circonus and / or NewRelic . (Yes, we have Circonus. No, this is not an advertisement).
And there is also a wonderful tool - PagerDuty - wrap all your email alerts on it, and it will happily send SMS messages to your Ops, help you set up a duty schedule and is generally very cool and flexible.

5. Lack of file change tracking.

What is it and what to do with it?
Yesterday I edited the configuration file. And one more. And then came the changer and something else rules there. And today we were both summoned to the office at 3 am and now we are angry and ready to kill our neighbor, right? This is also familiar to many, no doubt. But Linus created Git back in 2005, and before git there were other vcs. Making git commit after you have edited something is a matter of one second. But it is this good habit that will save you from problems with configuration rollback in case of any surprises. And in conjunction with configuration management systems, this becomes almost the main and most important skill in everyday work. It is necessary to develop the correct habit of using version control systems and never change it.

Conclusion?

The best result of this article is that everyone will look back at their own production and correct what they usually don’t reach for months, will overcome laziness and do it once in mind.
Well, as I wrote above, it would be nice to continue this list in the comments.

useful links

1. devopsweekly.com - English-speaking, but very interesting weekly newsletter
2. agilesysadmin.net is an excellent blog in English from a very experienced system administrator and one of the evangelists of the DevOps movement
3. Chef Cookbooks - a collection of recipes for Chef, created by the community
4. The book Test-Driven Infrastructure with Chef - a book about how to write and test your recipes for Chef.

Source: https://habr.com/ru/post/136323/

All Articles