Hi, Habr! We have been in our bank for the second year already discussing, in theory and in practice, how to properly organize our Dev and Ops teams. Recently, the discussion, reinforced by a new portion of practical experience, reached another turn, which led me to another search for ideas and arguments.
As a result, I came across one very curious material that impressed me so much that my desire to share it with the widest possible audience was overwhelmed by the
laziness and lack of time for translation. Even despite the complete despondency, which every time catches up with me when trying to pick up an adequate Russian word for the term “Silo”. "Silo tower"? German Oskarovich, I remember, used the word "well" ... After consulting with colleagues, I decided to dwell on the term "plot".
What did I like about this article? In the first place - a variety of options. There are no “do it only this way, and you will be happy,” which can often be heard from evangelists and consultants. Here - 9 “right” and 7 “wrong” options, which can be tried on to your organization, combined in different ways and get at least a couple of “to try” ideas if not something workable. Secondly, these stories are clearly taken from the real world of a “bloody enterprise” - most of them offer DevOps models not for a five-person startup who does everything right from the start, but for a large office with its own legacy, complex organizational structure and confused human relationship. After all, as we know from
Habr 's
recent hit , even the most advanced organizations become similar to each other after reaching a certain size.
')
So, Matthew Skelton and Manuel Pais, article
Which team structure is better suited for the development of DevOps? . The first version appeared in 2013, since then the authors have added a lot and reworked it.
Anti-types of DevOps topologies
It is always helpful to have an idea of ​​bad practices that we may call “Anti-types” (rather than the omnipresent term “anti-pattern”).
Anti Type A: Dev and Ops Plots
This is the classic scheme for the separation of Dev and Ops on the principle of "throw the software through the wall." In practice, for Dev, this means an earlier recognition of the execution of tasks (“done” means “made a feature” and not “works in battle”), but the software’s performance suffers because Dev does not know what really happens in industrial environment, and Ops does not have the ability or time to engage Dev in solving problems, until the new version of the application is out for users.
We all have already heard that this topology is not the best, but personally I think that there are even worse options.
Anti-type B: DevOps Team
As a rule, this anti-type occurs when a high-ranking manager decides that “we need some of this DevOps” and instructs to create a “DevOps-team” (maybe even consisting of “DevOps-people”). Such a team will very quickly bargain for a separate plot, pushing Dev and Ops away from each other, so that each of them can protect their own corner, skills and tools from these "Krivorukov developers" or "stupid admins."
The only situation in which this model makes practical sense is when a separate DevOps team is created for a limited period of time, a maximum of (for example) 12-18 months, with the original goal to bring Dev and Ops together and with a clear statement about the elimination of DevOps teams by after this time. In this case, the DevOps model is implemented, which I call
Type 5 .
Anti-type C: Dev-ops is not needed
This model arises from a combination of naivety and arrogance of developers and their managers, especially often when launching new projects or systems. Assuming that IT Ops is an image from the past (“we now have clouds around, right?”), Developers often underestimate the complexity and importance of operational skills and activities, and begin to believe that they can exist without them, or to perform them in "free time.
Anti-type C is likely to turn into
Type 3 (Ops as IaaS) or
Type 4 (DevOps as an external service) when the released software starts to create a real operating load and absorb the time allotted for development. Only if this team recognizes the importance of IT Operations as a discipline will it be able to avoid pain and unnecessary operational errors.
Anti-type D: DevOps as a team of instrumentalists
In order to “become DevOps” without compromising the current developer productivity (i.e. implementing functional features), a DevOps team is created inside Dev to provide Dev with a full set of tools — the development pipeline, the configuration management system, the approach to managing environments, and so on. At the same time, Ops continue to work in isolation, and Dev - “to transfer software through the wall”.
Despite the fact that the DevOps team in such a scheme can bring real benefits through the introduction of tools, the final effect will be limited. The fundamental problem remains unsolved: the lack of early involvement of Ops and the interaction of functions during the entire software life cycle.
Anti-type E: Re-branded system administrators
This anti-type is typical for organizations with low engineering maturity. They want to improve practices and reduce costs, but they are not able to see in IT a key success factor for a business. Since the use of DevOps for the industry (already) is considered obvious, they also want to "do DevOps". Alas, instead of solving real problems in the structure or relationship of teams, they choose the wrong path, hiring "DevOps engineers" in the Ops-team.
The term “DevOps” is used to rebrand the role of system administrators; no real changes in culture or organization of processes occur. This anti-type is becoming more popular as the indiscriminate recruiters join the search for people with automation skills and working with DevOps tools. In reality, it is the communication skills of people that allow DevOps to work.
Anti-type F: Ops inside Dev
This organization does not want to have a separate Ops team, so the Dev team is responsible for infrastructure, media management, monitoring, and so on. With this approach, in projects or product teams, operational tasks become victims of resource constraints or low priorities, which, in turn, leads to poor-quality, raw products.
This anti-type demonstrates the underestimation of the importance of the role and skills of effective IT Operations.
Anti-type G: Dev and DBA plots
This is one of the varieties of Anti-type A, which is very common in medium and large companies with numerous old systems tied to centralized databases. Since these databases are critical for business, a separate DBA team, usually within the framework of the Ops structure, is responsible for their administration, configuration and recovery in case of failures. Such a scheme is logical. But if, at the same time, the DBA team becomes the guardian of the border and slows down all possible changes in the database, this becomes an additional obstacle to regular software updates (acceleration of which is the main goal of DevOps).
Moreover, if the DBA team is not involved in software development at an early stage, all problems with data migrations, database performance, and so on, are detected only at the later stages of the life cycle, which, coupled with a large load on administrators, leads to constant fires in combat and increased pressure on the DBA.
Types of DevOps Topologies
After talking about the unsuccessful options, look at the models that help DevOps work.
Type 1: Dev and Ops Collaboration
This is the “promised land” DevOps: “smooth” interaction between Dev and Ops teams, where it is necessary — specialization is applied, where it is necessary — teams work together. In this embodiment, there may be several separate Dev commands, each of which works on a partially independent product.
In my opinion, the implementation of Type 1 requires significant organizational changes and a high level of competence in the management of IT organizations. Dev and Ops should have a clearly articulated, clear and understandable common goal (for example, “Deliver reliable and frequent software changes”). People from Ops should feel comfortable working in tandem with developers on issues that are specific to development, such as test development (TDD) or version control. On the other hand, Dev should be seriously involved in operational problems, and also strive to get input from Ops by developing new solutions. All this requires significant cultural changes compared with the traditional approaches of the past.
Applicability Type 1: organizations with a strong technological component
Potential efficiency: highType 2: Fully Combined Commands
When people from Ops are fully integrated into teams that develop products, we get the Type 2 topology. In this case, the difference between Dev and Ops is so minimal that all employees are fully focused on a common goal. In principle, this is one of the types of
Type 1 , but with its own characteristics.
Companies like Netflix or Facebook that provide customers with a single purely digital product use a Type 2 topology, but I think this approach has limited applicability in conditions where an organization provides customers with more than one product. The budget constraints and the need for context switching, usually present in organizations producing multiple products, may cause the distance between Dev and Ops to be increased (using
Type 1 topology).
Type 2 can also be called "NoOps" because there is no separate or visible Ops command in this model (although the model of NoOps in Netflix is ​​also similar to
Type 3 (Ops as IaaS)).
Applicability Type 2: Organizations providing one, fully digital product
Potential efficiency: highType 3: Ops as IaaS (infrastructure as a service)
For organizations with a more traditional IT Ops division that cannot or will not change quickly, and also for organizations that use public clouds like Amazon EC2, Azure and the like for all their applications, it may be useful to treat Ops as a team, which provides a resilient infrastructure on which applications are installed and run. In this model, the Ops team is similar to Amazon EC2, that is, the IaaS approach (infrastructure as a service).
In this scheme, a separate team (possibly a virtual one) works within Dev, acting as a center of expertise for operational features, metrics, monitoring, server deployment, and so on. She can also take on all communications with the IaaS-team. This team is Dev by nature, and it uses a standard set of practices such as TDD, CI, iterative development, and more.
IaaS topology to simplify the implementation has limited effectiveness (due to the loss of cooperation with the Ops-team), but it gives the opportunity to get results faster than trying to directly implement
Type 1 (which can be the next development step).
Applicability Type 3: organizations with several different products and services, with the traditional division of IT Ops; organizations whose applications run fully in the public cloud
Potential effectiveness: averageType 4: DevOps as an external service
Some organizations, especially small ones, may not have the financial resources, experience or qualified personnel in order to successfully cope with their own operational tasks regarding their software. In this case, the Dev team can find an external service provider such as Rackspace to help build the test environment system, automate the infrastructure and build monitoring, and also give advice on non-functional requirements that need to be implemented during development.
This option may be useful for small organizations or for teams that want to learn in practice about modern approaches to automation, monitoring or configuration management. In the future, as the organization grows and people appear in it with a pronounced focus on operations, the organization will be able to move towards the
Type 3 model or even
Type 1 .
Applicability Type 4: small teams and organizations with limited experience in IT Operations
Potential effectiveness: averageType 5: DevOps team with a limited life
This model looks the same as
Anti-type B (DevOps Plot), but its tasks and lifespan DevOps teams are fundamentally different. A temporary DevOps team is created with the mission to bring the Dev and Ops closer together, ideally to bring them to
Type 1 or
Type 2 , and then withdraw themselves due to uselessness.
The temporary team members initially work as “translators” between Dev and Ops, offering, on the one hand, crazy ideas such as stand-ups and kanban boards for Ops-teams, and on the other, together with a team of developers, thinking of a low-level “kitchen” type of setting balancers, SSL offloading or control of network controllers. If enough people start to see the value of bringing Dev and Ops closer, then the DevOps team gets a real chance to reach their goals. Important: long-term responsibility for the implementation or operation of the application should not be "hung" on a temporary team, otherwise this scheme will quickly turn into
Anti-type B.Applicability Type 5: This is the predecessor of
Type 1 , but there is a risk of becoming
Anti-type B.Potential effectiveness: low to highType 6: DevOps Evangelical Team
Within organizations with a large gap between Dev and Ops (or with a tendency to widen this gap), it may be useful to create a DevOps team that will support communication between Dev and Ops. This is a
Type 5 version in which the DevOps team exists on a continuous basis, but its task is to facilitate communication and interaction between the Dev and Ops teams. Members of this DevOps team are often called evangelicals because they spread knowledge about DevOps practices.
Applicability Type 6: Organizations with a trend to distance Dev and Ops. Danger - crawling towards
Anti-type B.Potential effectiveness: medium to highType 7: SRE Team (Google Model)
DevOps usually recommends engaging Dev-team employees on duty on the phone, but this is not necessary. In fact, some organizations (including Google) use a different model, with an obvious ritual of transferring software from the development team to the team responsible for the operation - SRE (Site Reliability Engineering). In this model, Dev teams should present evidence (logs, metrics, and so on) to the SRE team, clearly demonstrating that the application being transferred is reliable enough to be supported by SRE.
What is important - the SRE team may not take in support an application that does not meet the standards of SRE, asking developers to remake certain “features” before the application goes into commercial operation. The interaction between the Dev and SRE teams revolves around the operational indicators, but as soon as the SRE team is satisfied with the quality of the application (it is she, and not the Dev team), SRE start supporting this software.
Applicability Type 7: Type 7 is applicable only in organizations with high engineering culture and organizational maturity. Otherwise, there is a risk to get
Anti-type A , in which SRE / Ops-teams simply receive and follow the instructions for installing the software
Potential effectiveness: low to highType 8: Container-based cooperation
Putting the application runtime into a container eliminates the real need for collaboration between Dev and Ops. In this case, the container serves as the delimiter of the Dev and Ops areas of responsibility. With a high level of engineering maturity, this model works well, but if Dev starts to ignore operational issues, then this option can turn into a classic “we are against them” confrontation.
Applicability Type 8: Containers may work fine, but this option may mutate into
Anti-type A , if the Ops team is expected to support everything that the Dev command throws into it
Potential effectiveness: medium to highType 9: Dev and DBA Collaboration
This option is a response to the situation described in
Anti-type G. To close the gap between Dev and DBA, some organizations bring the DBA team together with a separate Dev team that specializes in refining the central database. This helps in communication between people who are accustomed to look at the database as an object of development, in which they store these applications, and those who perceive this same database as a huge amount of complex business information.
Type 9 Applicability: Suitable for organizations in which multiple applications work with one or more centralized databases.
Potential effectiveness: averagePS We ourselves (Raiffeisenbank) this summer agreed to consider
Type 1 targets (as the basic option for all) and
Type 2 (as more “advanced” where it turns out to be done). Life will show how this choice was correct.
PPS What about you? What types of DevOps are in your organizations?