It is possible to monitor everything on the servers - the load of memory and CPU, network traffic and hard disk, some services and the number of requests to these services.
But about a couple of months ago we started monitoring the response time of our backend server URLs at work. I will say right away, the response time during the day can be wildly jumping (sometimes even like a wild bull at a rodeo). Because the response time may depend on many factors, for example, whether the query result was already in the cache or was re-considered, on the network load at the time of the monitoring survey, server load, etc. The reasons are different, but they are all normal and natural, until the time of issue at peak time jumps out above a certain threshold - there are no problems.
Such a schedule for the day in the normal mode of the backend looks like a comb (the time in milliseconds, the higher the value - the worse): ')
The problems that can be noticed on the schedule, others (well, now I will tell why it was useful to monitor the return time of the URL). There were the following cases when the monitoring noticed that:
they rolled out an update, but it began to work more slowly (a programmer screwed up somewhere);
some regular bursts (for example, some other service starts to “sand” this service at a certain time and pull pages from it, discrediting other users, you need to carefully plan the time or frequency of requests of this third-party scanner ”;
some external data source has fallen off, from which this service takes the result, and now its schedule is already abnormally allocated among others (the problem is not ours, but you need to deal with an external source, inform its administrator about the problem);
periodic delays in issuing time of more than a second already indicate that somewhere something is not right, and you need to sit around the service and find out exactly where the bottleneck was formed;