📜 ⬆️ ⬇️

GridFS performance

There are not so many articles about GridFS performance on the Internet, here is one of them Serving files out of GridFS, which shows that returning files from GridFS is slower than from disk 6 times.
But in that article there is a flaw - in testing the conversion goes to one file, and at the same time the file is cached at the level of nginx or the file system, which gives a gap compared to GridFS. And it's a good idea to check out fresh GridFS, 3 years have passed by the way.
Therefore, I decided to conduct my own testing, with reference to different file names.

There are 52 thousand files - movie posters, totaling 2GB, the average picture weighs 40kb. Copy files to ext4, copy to GridFS.
Virtual 512MB with 1 core. Ubuntu server 12.04 LTS 64bit, Nginx / 1.4.1 settings are standard.
The test is designed for a low-cost server; for powerful servers, the results will be different.

Ways to return files:
1) Nginx - statics
2) Gevent via nginx
3) 2 x Gevent via nginx (balancing)
4) Gevent directly
5) Gevent via nginx (unix socket)
for points 2-5, the http server was used in Python + Gevent, which gave files from the GridFS
')
Load methods:
1) ol, t2 - Appeal to one url, 2 streams
2) ol, t10 - Appeal to one url, 10 threads
3) t2 - Appeal to different url, 2 threads
4) t10 - Appeal to different url, 10 threads



Details:
* All tests were run 3 times, the average in the table.
* Appeal to different url for all tests occurs on a single list of links, the links contain the ID of the files (in the GridFS search is on the ID)
* In tests where the address goes to one url, the file size is 13.5 kb.
* When returning via gevent, the last file is cached, so in tests where one url is accessed, the GridFS is not accessed and, in fact, the rate of data transfer from Python is measured.
* The “one link” test is done mainly to determine client limits.

The client is written in Python, judging by the results, its power is enough for at least 1450 requests per second. Increasing the threads, or starting as several processes do not give greater performance. From here it can be judged that the server was a bottleneck, which is necessary for testing.

As a result, we received information that the GridFS performance is not as low as previously thought, at least on small servers.

More ideas:
* Try nginx-gridfs
* uwsgi-gridfs
* Return files through yield
* Try other proxy servers
* Try other FSs (ZFS?)

I also plan to test on a powerful server.

Script sources

Source: https://habr.com/ru/post/192390/


All Articles