Why don't engineers care about monitoring applications?

All with friday! Friends, today we continue the series of publications devoted to the course “DevOps practices and tools” , because classes in the new group of the course will start at the end of next week. So, let's begin!

Monitoring is easy . This is a known fact. Raise Nagios, run NRPE on the remote system, configure Nagios on the NRPE TCP port 5666 and you have monitoring.
')
It is so easy that it is not interesting. Now you have the main metrics for CPU time, disk subsystem, RAM, which come by default to Nagios and NRPE. But in reality it is not “monitoring” as such. This is just the beginning.

(Usually they put PNP4Nagios, RRDtool and Thruk, set up notifications in Slack and go straight to nagiosexchange, but for now let's skip this).

Good monitoring is actually quite complex, you really need to know the insides of the application you are watching.

Is monitoring difficult?

Any server, be it Linux or Windows, will by definition serve some purpose. Apache, Samba, Tomcat, file storage, LDAP — all these services are more or less unique in one or several respects. Each has its own function, its own characteristics. There are different ways to get metrics, KPI (key performance indicators), interesting to you when the server is under load.

Author photo Luke Chesser on Unsplash

(I would like my dashboards to be painted in neon-blue colors - sighing dreamily - ... hmm ...)

Any software providing services must have a mechanism for collecting metrics. Apache has a mod-status module that displays the server mod-status page. Nginx has stub_status . Tomcat has JMX or special web applications that show key metrics. MySQL has a “show global status” command, etc.
So why don't developers embed these mechanisms into the applications they create?

Do developers alone do this?

A certain level of indifference to embedding metrics is not limited to developers. I worked in companies where I developed applications using Tomcat and did not issue any metrics of my own, no service activity logs, except for Tomcat general error logs. Some developers generate an abundance of logs that mean nothing to the system administrator, who was unlucky to read them at 3:15 in the morning.

Posted by Tim Gouw on Unsplash

System engineers who allow such products to be released, should also bear some responsibility for the situation. Few system engineers have the time and care to try to get meaningful metrics from logs, without the context of these metrics and the ability to interpret them in the light of application activity. Some do not understand how they can benefit from this, except for indicators like "something is now (or will be soon) wrong."

Changing thinking about the need for metrics should occur not only among developers, but also among system engineers.

For any system engineer who needs not only to respond to critical events, but also to guarantee their absence, the absence of metrics is usually an obstacle to this.

However, system engineers usually do not dig into the code, earning money for their company. They need leading developers who understand the importance of the responsibility of the system engineer in identifying problems, increasing awareness of performance problems and the like.

This devops thing

The devops mentality describes the synergy of developers (dev) and exploitation (ops). Any company stating that they "make devops" should:

say what they probably do not do (hint at the meme from the film Princess Bride - “I don’t think it means what you think it means!”)
encourage the position of continuous product improvement.

You cannot improve the product and know that it has been improved if you do not know how it is currently working. You will not be able to find out how the product works if you do not understand how its components work, the services on which it depends, its main pain points and bottlenecks.
If you do not watch for potentially bottlenecks, you will not be able to follow the “Five Why” technique when writing Postmortem. You cannot collect everything on one screen to see how the product works or to find out how it looks “normal and happy.”

Shift to the left, LEFT, I SAID, LEFT

For me, one of the key principles of Devops is "shift left" (shift left). Shifting to the left in this context means shifting opportunities ( not responsibility , only opportunities) to do what system engineers usually take care of, for example, creating performance metrics, using logs more efficiently, etc., to the left in the software delivery life cycle ( Software Delivery Life Cycle).

Posted by NESA by Makers at Unsplash

Software developers should be able to use and know the monitoring tools that the company uses to monitor in all its forms, metrics, logging, monitoring interfaces and, most importantly, to observe how their product works in production . You cannot force developers to invest time and energy in monitoring until they can see the metrics and influence how they look, how the product owner presents them to the CTO at the next briefing, etc.

Shortly speaking

Bring the horse to the water. Show developers how many problems they can avoid for themselves, help them identify the right KPIs and metrics for their applications, so that there is less cry from the owner of the product that the Technical Director (CTO) is yelling at. Bring them to the light, gently and calmly. If this does not work out, then bribe, threaten and persuade either them or the product owner to realize how to get these metrics from applications as quickly as possible, and then draw diagrams. It will be difficult, because it will not be considered as a priority, and there will be many awaiting income-generating projects in the product roadmap. Therefore, you will need a business case to justify the time and money spent on implementing monitoring in a product.
Help system engineers sleep. Show them that applying a “release release” checklist for any product released is good. And checking that all applications in production are covered with metrics will help achieve a healthy sleep at night, allowing developers to see what works wrong where. However, the right way to annoy and frustrate any developer, product owner, and technical director is to push sticks into the wheels and resist. This behavior will affect the release date of any product, if you wait again until the last minute, so again shift left and include these questions in the project plan as soon as possible. If necessary, make your way to product meetings. Wear fake mustache and felt or something like that, it will never let you down. Report your problems, show obvious advantages and evangelize.
Make sure that both the developers (dev) and the exploitation (ops) understand the meaning and the consequence of the product’s conversion to the “red zone”. Do not leave the operation as the sole guardian of the health of the product, make sure that the developers are also involved in this (#productsquads).
Logs are great, but metrics too. Combine them and do not let your logs become trash in a huge glowing ball of uselessness. Explain and show the developers why no one except them can figure out their logs, show them what it is like to watch logs at 3:15 in the morning.

Posted by Marko Horvat on Unsplash

That's all. New material will be released next week. If you want to learn more about the course, we invite you to the open day , which will be held on Monday. And now we traditionally welcome your comments.

Source: https://habr.com/ru/post/453278/

All Articles