As a person owing to the development of a new system, I am often asked the same question: “How many users are the system pulling?”. A very awkward question, isn't it? I always first want to exercise in wit, and to include a model of the “harmful admin” behavior: to ask a few counter questions that will save me from having to think about this difficult but interesting topic for a while:
• What is the hardware configuration?
• How much time should “pull”?
• What is the initial volume of data?
Well, the control shot: what does “pulling” mean?
But, you want you do not want, and it is necessary to answer. On one difficult search for an answer to this question is my next post.
What do we pull?
Our test subject is the EUFRAT E1 document management system. A completely new system, written almost from scratch in the last 2-3 years in the depths of Cognitive Technologies LLC.
The system has a three-tier architecture: a database (MS SQL), an application server based on IIS plus thick and mobile clients. The main development platform is .NET 3.5, the main language is C #. As is often the case, during the release process some attention was paid to load tests, but not enough to understand the capabilities of the created architecture. And at the moment when life forced to answer the main question of the post, some infrastructure for stress tests still existed. It was created on the built-in toolkit MS Visual Studio.
We have prepared a stand that includes an average power server (virtual Windows Server 2008 R2, 6 CPUs, 10 GB of RAM), on which a database, an application server, and a computer with tests simulating client applications were deployed. A common test scenario simulates a typical large organization's work day. Each test script performed a specific user role in record management: the registrar, the document and route controller, the assignee, approvals, and the user who simply read the mail and viewed the available documents.
The percentage of user roles in the overall scenario is presented in Figure 1. This distribution was made after a small marketing research of customers on the previous version of the system.

Fig. 1. Distribution of scenarios.
')
After long discussions, we determined for ourselves what “pulls”? This is uninterrupted operation of the server according to the scenario model described above, for 8 working hours and without server errors. At the same time, the percentage of failed client tests should not exceed 0.5% (errors in the business logic of tests, unexpected timeouts, etc., are possible). The measurement will be subject to the maximum possible number of simultaneously connected users (tests) when all the above conditions are met.
How do we pull?
The next problem that had to be solved: how exactly to choose the most optimal number of users? To do a lot of starts at 8 o'clock is a long and not rewarding lesson. Therefore, it was proposed to change the number of simultaneous tests dynamically in order to get the optimal rate for one 8-hour run. But depending on what? There are a lot of options than to determine server load: requests processing queue on the server, speed of execution of requests on the client, number of error tests, etc ...
Here it is worth making a small digression for a better understanding of the further description. The application server of the system processes client requests in the following way: all requests come in one queue and wait for their processing. there is
customizable workflow pool In our case, there were 12: 2 per core
processor. The last request in the queue goes to the first processing
the released workflow.
As a result, after a series of experiments, it was concluded that the optimal
to determine the number of users in our case is to measure the load
these workflows. It was calculated as the ratio of the load time of threads to their idle time, expressed as a percentage. Thus, 50% means that the workflow is idle for half the time, 90% - the workflow is almost all the time loaded. The control function is quite simple: a workload in the range of 70-90% is considered normal. If the threads load is less than 70%, then a new test was added, and if it is more than 90%, one of the tests no longer started.
What is the result? Ideally, we wanted to get a straight line showing the optimal number of users at the end of the test for this configuration. Unfortunately, it never “settled down”. (see graph in fig.2)

Fig.2. The result of the automatic determination of the optimal number of users.
As a result, 8 hours of test:
• The size of the database - 9719.50 MB;
• Documents in the database - 4885;
• Orders in the database - 14478.
It is clearly seen that the deviation from the average value, equal to 270 users, is small. It is clear that this is how the optimal number of users was determined in the "intensive" mode of server load for this hardware configuration. Naturally, in real life, users do not work with the system like this, i.e. indicators of 1000 and more simultaneous users on more powerful configurations are quite good.
I would like to make a reservation right away that the process was quite long, because In the course of testing, server load control parameters were selected for a long time, errors on the server and client were corrected, etc.
What pulled?
If you need to answer how much the system “pulls” or a similar tricky question, then you need to follow a few simple rules:
1. Clearly set the task - what do we want to learn about our architecture in a certain environment?
2. Formally determine what exactly we want to measure, and what value should this indicator correspond to?
3. Record all other possible test parameters that we can fix: hardware configuration, test scripts, their distribution, etc.
4. In the process of approaching the answer to the question, do not be distracted by other interesting and unpredictable research tasks.
You can consider this set of rules as a starting point for those who plan to tackle the tasks of measuring and improving performance, but where to begin is not clear.
And to the eyeballs, a small hit parade of other "uncomfortable" issues related to load testing:
1. “What server is needed for XXX users?”
2. “How many objects can I enter into the system? The place will not end? "
3. "What is the speed of the network should be so that everything runs fast?"
4. And my favorite: “If we buy a powerful server, stop falling?”
As you can see, the space for future work is provided to us. I hope that the Visual Studio toolkit is enough for this ...