Since here we have a “week” of nginx, for example,
here or
here , then I will try and make my own, so to speak, contribution. It will be about nginx 4 windows, namely, more or less official build for this apritritory, some not very favorite platform.
Why windows. It's simple, in the corporate sector of Windows on the server, and on the workstations - often an obligatory program. And from these requirements for the platform, for example, in the ultimatum form voiced by the client, you cannot get anywhere.
And since we have Windows, but I
don’t want to suffer from IIS, apache and others like them, if you want to use your favorite tools, and nginx definitely applies to them, you sometimes have to put up with even some restrictions on this platform. Rather had ...
Although it should be noted that even with these restrictions, nginx will give odds to almost any web server under windows for many factors, including stability, memory consumption, and most importantly performance.
')
I hasten to immediately share the good news - there are practically no more restrictions that are critical to high performance, when using nginx under windows, and the last of the critical ones, with high probability, will also soon disappear. But in order ...
Here are the known problems with nginx 4 windows, namely:
- Workflow can serve no more than 1024 simultaneous connections.
- Cache and other modules that require shared memory support do not work under Windows Vista and later versions due to the fact that address space randomization is enabled on these versions of Windows.
- Although it is possible to start several workflows, only one of them really works.
I changed the order a bit, because It was in this sequence that I dealt with these restrictions, so to speak sorted "historically."
1024 simultaneous connections
Actually, this is not true, or rather, not quite true - since time immemorial, nginx could be built under Windows without this restriction — it was necessary at the assembly stage to determine
FD_SETSIZE
equal to the number of connections you need.
For example, for VS by adding the directive
--with-cc-opt=-DFD_SETSIZE=10240
, the nginx worker can manage 10K simultaneous connections if you specify
worker_connections 10240;
in the configuration
worker_connections 10240;
.
Cache and other modules that require shared memory support
Until recently, all these functions and modules really did not work under Windows, starting with version x64 or where by default the whole system works with ASLR enabled.
Moreover, disabling ASLR for nginx does not change anything, because functions for working with shared memory are protected deep in the kernel, i.e. ASLR (and probably DEP with it, for some reason it didn’t work with it) needs to be disabled for the entire system.
This is actually quite a small list of functionality: Cache, any zones, respectively, limit_req, etc. etc. By the way, without the support of shared memory, it would be much more difficult to remove the 3rd limit, i.e. implement support for multiple workers under windows.
I will not bore the reader as I struggled with this, but together with Max (thanks to
mdounin ) we did it to the release version. A little about this, who are interested, see under the spoiler or in the source code
hg.nginx.org or
github ...
A bit of theory ...The shared memory itself can also be used with randomization of the address space. It doesn’t interfere with one another, just when the ASLR is on, you are in another process almost guaranteed to get a pointer to “the same memory”, but under a different address. This is not really critical, as long as the contents of this space itself does not contain direct pointers, aka pointer. Those. pointers offsets relative to the initial address of shmem are valid, but not direct pointers as they are.
Respectively, without having to rewrite all the functionality that works with the pointer inside the shared mem in nginx, there is the only option to
fool windows into giving the link to shmem under the constant for all workflow address. Well, then everything is not very difficult, in fact.
The beginning of the discussion about this can be read
here . Maxim, by the way, fixed the problem I had missed (remapping), sometimes arising after resetting the workers (reload on the fly).
Viva open source!
Those. officially this restriction no longer applies with
Release 1.9.0 dated 28 Apr 2015:
Feature: shared memory can now be used on Windows versions with address space layout randomization.
Only one workflow really works.
In nginx there is a master process and child processes, called worker or worker.
Under windows, nginx can have several worker processes running, i.e. specifying the "
worker_processes 4;
" in the configuration, you will force the wizard to start four child worker processes. The problem is that only one of them, “stealing” the listener connection from the wizard (using SO_REUSEADDR) will really listen to this socket, i.e. make accept incoming connections. As a result, the other workers - no incoming connections - no work.
This limitation is related to the technical implementation of winsock, and the only way to get a distributed listener connection for all workflows in Windows is to clone a socket from the master process, i.e. to use inherited handle from a socket from it.
Who is interested in the implementation details, they can look at them under the spoiler or in the source code, so far only on my
github .
Read more ...Let's start with the fact that even if you run child processes (
CreateProcess
) using
bInheritHandle=TRUE
, and setting
SECURITY_ATTRIBUTES::bInheritHandle
when creating a socket is also TRUE, most likely you will fail. in the workflow using this handle, you get "failed (10022: An invalid argument was supplied)". While “successfully” duplicating this socket using
DuplicateHandle
, the duplicate handle will also not accept any function working with sockets (probably with error 10038 - WSAENOTSOCK).
Why this happens is a little quotation from
MSDN - DuplicateHandle :
Sockets. Winsock at the target process. Also, using DuplicateHandle interferes with internal reference counting on the underlying object. To duplicate a socket handle, use the WSADuplicateSocket function.
The problem is that to duplicate a handle using WSADuplicateSocket, you need to know in advance the pid of the process, i.e. this cannot be done before the process is started.
As a result, to inform the child process the information received by the master from WSADuplicateSocket, which is necessary for creating a socket-clone in the workflow - we have two options, or use something of the IPC type, for example, as described in
MSDN - WSADuplicateSocket , or transmit it through shared memory (we have already repaired it above).
I decided to use the second option, since I think this is the least laborious of the two and the fastest way to implement connection inheritance.
Below are the changes in the algorithm for running workflows under windows (marked with
*
):
- The master process creates all listener sockets;
- [cycle] The master process creates a workflow;
*
[win32] the wizard calls the new ngx_share_listening_sockets
function: for each listener socket information is polled (for inheritance) specifically for this new worker (“cloned” via WSADuplicateSocket for pid), which will be stored in shared memory - shinfo (protocol structure);- The master process waits for the worker to set a readiness event - the event “worker_nnn”;
*
[win32] The workflow performs a new ngx_get_listening_share_info
function to get shinfo inheritance information that will be used to create a new socket descriptor for the master's shared listener socket;*
[win32] The workflow creates all listener sockets using shinfo information from the master process;- The workflow sets the event — event “worker_nnn”;
- The master process stops waiting, and creates the next workflow, repeating the [cycle].
If necessary,
here is a link to the discussion about fixation so that it will be.
As a result, nginx under windows now starts N "full-fledged", from the point of view of "listening" and, most importantly, establishing a connection, workflows that process incoming connections really parallel.
This fix is still a “pull request” (I sent the changeset to nginx-dev), but you can already try it, for example, by downloading it from
my github and building it yourself under windows. If they want to lay out somewhere binary.
For a long time I tortured my hardware, driving it with tests and under load "scripts" - the result, all the workers are loaded more or less evenly and actually work in parallel. I also tried to reboot nginx on the fly (reload) and randomly “kill” some workers simulating the “crash” of the latter - everything works without the slightest criticism.
While the only "flaw" manifested, IMHO - if you run
netstat /abo | grep LISTEN
then you will see only the master process in the list of "listeners", although in reality it is just that he never establishes a connection, only his child workflows.
By the way, my experience so far says that
accept_mutex
for a windows platform probably needs to disable "
accept_mutex off;
", since at least on my test systems, with
accept_mutex
turned
accept_mutex
they worked significantly slower than they did with shutdown. But I think everyone should check it experimentally (because it depends on a heap of parameters, such as the number of cores, workers, keep-alive connections, etc., etc.).
Well, as without
beautiful plates with numbers of performance comparison, before (the first column is marked ** NF) and after.
The test is made on Windows7 - i5-2400 cpu @ 3.10GHz (4 core).
Request: static, 452 bytes (+ header) - small gif-icons.
Workers x Concur. | 1 x 5 ** NF | 2 x 5 | 4 x 5 | 4 x 15 |
---|
Transactions | 5624 hits | 11048 hits | 16319 hits | 16732 hits |
Availability | 100.00% | 100.00% | 100.00% | 100.00% |
Elapsed time | 2.97 secs | 2.97 secs | 2.97 secs | 2.96 secs |
Data transferred | 2.42 MB | 4.76 MB | 7.03 MB | 7.21 MB |
Response time | 0.00 secs | 0.00 secs | 0.00 secs | 0.00 secs |
Transaction rate | 1893.60 trans / sec | 3719.87 trans / sec | 5496.46 trans / sec | 5645.07 trans / sec |
Throughput | 0.82 MB / sec | 1.60 MB / sec | 2.37 MB / sec | 2.43 MB / sec |
Concurrency | 4.99 | 4.99 | 4.99 | 14.92 |
Successful transactions | 5624 | 11048 | 16319 | 16732 |
Failed transactions | 0 | 0 | 0 | 0 |
Longest transaction | 0.11 | 0.11 | 0.11 | 0.11 |
Shortest transaction | 0.00 | 0.00 | 0.00 | 0.00 |
And may nginx be with you and under windows.