Methods of finding the causes of poor server performance 1c

Recently I ran into an unusual case, the 1s server worked disgustingly at the customer, so that it was clear what was going on, I would give you an example - the launch of a fat client could take ten minutes. When measured with Gilev's dough , the result was lower than the worst. Looking at the nearest results of measurements of other users, I realized that this is not the only case.
This is not about optimization, when it is necessary to raise productivity by 10–20%, it’s about finding the causes of poor performance, and eliminating it. Agree it is a few different things. On the Internet, many articles are just about improving performance, which are limited only by setting up the 1c server and (or) setting up the database server. But I haven’t come across articles dealing with cases of poor performance, especially if there are several reasons, and these reasons are at different levels.
Usually administrators rush to watch the monitoring results. The case that I encountered showed almost zero processor load, free RAM, no queue at the network interface, and only the queue to the disk showed that not everything was in order. I had to arrange a check on the full program, which, of course, takes a lot of time, requires excluding the server from the workflow, but it does produce a result. Perhaps for someone a similar approach is unacceptable, moreover, some consider it an unprofessional approach, but I can’t help them with anything.

Hardware level

It sounds trite, but it’s worth starting with iron testing. The fact is that you can only guess about the problems with the hardware, if you look at the level of the operating system. In my case, one of the disks in the disk array did not work. Oddly enough, the hard disk turned out to be intact and, after putting it in place, earned, though I had to wait a while until all the data is synchronized (it has long been disconnected). If everything ended with this, then there would not be this article. Just in case, the server underwent hardware testing (stress tests, memory tests, physical testing of disks and controllers), which did not reveal any problems.

Operating system level

The second point of our program was the verification and configuration of the operating system, the essence of which is as follows:

tidy up the file system;
disable unnecessary services, remove unnecessary and, most importantly, malicious programs;
check the optimality of the operating system settings.

Tidying up the file system implies the most obvious operations that, as it is not strange, many administrators consider to be inapplicable to server operating systems. This is about:

checking the logical disk structure;
delete temporary and unnecessary files;
defrag file system.

In fairness, it should be noted that for SSD drives defrmentation does not really give anything, but only increases the number of write cycles. In my case, after putting the file system in order, the server “revived” a little, but this was still not enough.
I think it is not necessary to explain why anti-virus scanning and disabling unused services are needed, but you should not neglect this. Look, maybe some programs have been installed that are no longer needed on this server. Well, make an update of the system and programs.
As for the optimality of the operating system settings, in my case the power saving mode was set. After turning on the maximum performance mode, the Gilev test showed satisfactory results, but that particular server should have shown better results.
To find out the reasons, the use of resources was monitored, although from the very beginning it was clear that we should look for those processes that take up a lot of the disk subsystem. In my case, the best indicator was “Queue length to disk”. Let me remind you that the rest of the indicators were normal, they certainly changed a little compared to the initial ones, but on the whole, the indicator remained the queue length to the disk. The results of the monitoring were obvious: the processes of the 1C server and the database server turned out to be “plunderers” of resources.

Service level

In my case, the server 1c was together with the MS SQL database server on the same machine, but the hardware configuration of the server completely ensured their joint activity, but the settings of these two services were not at all optimal. A lot of articles are devoted to these settings, for example, this one , here we will focus only on those that do not require additional investments, for example, the acquisition of a hard disk.
For the MS SQL server in each database, the values of the database auto-expansion parameter were increased to 500 MB, since the 1C databases are fast-growing. A daily maintenance plan was also set up, in which, in addition to creating a database dump, statistics were updated, a procedural cache was cleared, and index defragmentation was added. In my case, this markedly reduced the number of write operations. As additional measures, we can recommend weekly defragmentation of the database and reorganization of the indices.
For server 1C, the parameters “Number of IB per process” and “Number of connections per process” of the working server were changed, the first one was set to 1, and the second to 25. Such tips are more like “dancing with a tambourine”, but they give results. In this case, changing these parameters led to a significant decrease in read-write operations on the server, and it worked in the expected mode. The Gilev test also confirmed the performance gain.

Base level

Having made the measurements under the workload and after the users came out, I ran into a strange result - under the load, the Gilev test showed better results than when idle! It was also noticed a huge amount of background tasks performed on test databases. Test bases were used by sysadmins for various test tasks. I asked to remove them - and everything fell into place. Whether it is necessary to keep test databases on a working server, of course, you decide, but it is better to find some other solution for this, for example, to use the file variant.
One of the databases could not reduce the transaction log, while the other database did not recreate the indices. For both cases, there is one simple and effective solution. Before describing it, it should be clarified that there is the same name for different objects: 1C bases and MS SQL bases, the former may not be MS SQL bases, but, for example, PostgreSQL bases. In turn, the latter will not necessarily be bases for 1C. Based on this, backups of 1C databases (dt-file) can be deployed to other DBMSs, but no one forbids you to deploy from MS to the same copy. We remove backup copies of 1C databases with 1C tools, then delete 1C databases from 1C server, and then create them again, filling the contents from the dt-file.
Having put all the bases in order, I had nothing to complain about: the server worked smoothly, the disk subsystem worked in the normal mode, users were happy with the fast work in 1s, the administrators were surprised at how fast the updates were now.

Conclusion

If you use only one level to look for the causes of poor performance, then you can ignore the reasons lying on other levels, that is, the result will not be achieved. The given example clearly shows that there can be several reasons, and each of them can be at its own level. I hope that this material will help someone to overcome the problem of poor performance of the 1C-server.

Source: https://habr.com/ru/post/306498/

All Articles