When Chef and Puppet are not the solution. Part 1

Over the past five years, I have seen a lot of articles on “successful” recipes for building deployment and configuration management systems based on Chef / Puppet / Vagrant / Ansible. I spent about 7 years solving the problems of automatic deployment in the company in which I was working at that time, and now I think that I have enough experience to criticize many common tools.

I cannot disclose a lot of details due to the NDA and non-expired nondisclosure, although I would very much like to describe in detail my approach. In this article I would like to outline the general principle and ideas and get constructive criticism in the comments. The following examples, of course, do not apply to any particular company and are simply intended to give examples.

The article was intended to be rather specific, but suddenly it turned out to be rather large, so I decided to split it into several parts for the convenience of perception, and at the same time in order to get comments on specific parts.
')
I would like to address readers who have 5-10 servers in production and all of them are limited to the “web server - base - pair of application servers” model. Since the topic in this article can be very holistic, please consider my features. I agree that for small companies a solution with Chef / Puppet / Vagrant / Ansible / still might work well. It can. I'm not trying to tell how bad these tools are at all. I try to warn against overuse of fashionable solutions and to think about how well this can really help and if there are any other options before implementation.

My main idea, which I am going to explain further: for a large company, comes the moment when developing your own deployment solution turns out to be more profitable and convenient than using ready-made tools or even adapting them to your processes.

If your company is large enough, but, nevertheless, you cannot afford to single out one or two, and better than 5-6 developers to develop their own solution to this problem, it is better to take ready. Or you can try to convince the authorities of the harm (precisely harm!) Of third-party decisions. I once was not convincing enough and, in my opinion, because of this, we lost three years.

I notice that most of the presentations and video lectures on the introduction of auto-warmup begin with something like “let's see how easy it is to secure a php file on a local host”. Suddenly, they end with just this successfully completed file, and at best, it still shows how easy it is to install apache and mysql on two servers with roles assigned to them. On this impressed viewer decides that it is really simple and let's all use this cool shmappet in our production right now.

But it would be worth thinking about ...

Unfortunately, something similar happened with us and I could not prevent the attempt to use Chef, although from us, sensible people, there were a lot of dangerous questions to the initiators of the introduction. Having connected to the task in time, I was able to smooth out the negative effect a bit and further get rid of Chef. True, by that time I did not keep track of it and Puppet made our way to us.

Environment. "Brief" introduction.

Let's imagine some company that provides SaaS services. The company has been on the market for 5-10 years and it already has a good and selling service with happy users. At some point, their happiness overflows and attracts many more users and partners. That is, the company begins to grow rapidly.

Suppose there used to be 20 servers in production and they worked, and editing 5 configs on 6 servers was considered quite common, as well as outage for 3-4 hours on weekends to update servers. Servers are different - about 80% is Windows Server with ASP through IIS5 or self-written services - we do not just host the site, we have telephony, messages, there may be fax support or any solutions for integration with Skype and other services. Do not forget about billing. All, of course, tied to the database. She seems to be coping for now. Developers code on their machines, QA has two old desktops in the corner, all services are tested in turn. Everything is calm, 2 major releases per year, bugfixes and minor features sometimes. Developers do not hesitate to check a couple of ideas to go over RDP / SSH to one of the production servers and tweak a couple of files to collect logs or put a remote debugger. The usual thing. Everyone knows the whole structure of services and their communication.

There are few servers and why not register the address of the billing server in the config of your server? Anyway, he is one and has not changed for a long time. Do not ask this gloomy guy who is in charge of the database to add a sign with parameters - again he will cry out that we keep all the garbage in his precious base and change the scheme all the time. Add and then remove after refactoring. In the billing server, we add the address of the external payment gateway to the config too - it is always the same, but it’s bad hardcoded, and we’ll not be given again to push the constants again.

Suddenly, for growth, it is necessary to bring the number of servers to 100 in production, because after a successful advertising campaign, the number of users has grown well. The CEO arrived, showed a beautiful presentation, hinted that this was just the beginning. He told us that we have two new partners, the largest in our industry, and we need not to lose face, and in general he has already said that we can do this and this feature. A month later, the demo, but we can also make these features from scratch, right?

It is necessary - it means it is necessary, the prospects are good. Premiums promise good. One-two - and features are quickly made. Two new services appear, so as not to overwrite the old ones - then combine them if necessary. These services, unfortunately, are very strongly connected with all the others - they need billing, telephony, and web. But here common sense prevailed and we decided to store the addresses of all servers in the database - otherwise there are already a lot of them. Now all services at the start of the database take a list of servers and work with it later. Conveniently. The old components, however, still have something in the configs, but then we clean it up.

The CEO and CTO came with a bunch of other managers, very satisfied. Partners appreciated the demo, gave money. Lot. Here you have a pack, buy as many servers as you need and what did they say about refactoring? So be it, figachim. Is java in fashion now? Is everything taken in Java to write in successful offices? So we are now successful - let's make Java! And at once we will do everything right, divided into several services that will communicate through the API.

[skip the first one and a half years to refactor and refactor it].

Now we have 200 production servers and test servers about 50. The number of different internal components is about 20. Many are similar, but they do different things. All samopisnye.

All 10 system administrators are working, very nice guys. Releases every 2-3 months, dofig plans. There is money. Users go. Developers and testers are gaining flow, PM does not get out of the interviews.

But there are problems.

To be continued…

Here is the best place to take a breath and to make a discussion in the comments. I think many have learned something from their companies.

In the following parts, I will describe the first attempts at automation, the emergence of Chef and, in conclusion, my view of the architecture of an ideal service that can solve most of the problems for the company.

Source: https://habr.com/ru/post/263595/

All Articles

When Chef and Puppet are not the solution. Part 1

Environment. "Brief" introduction.

More articles: