📜 ⬆️ ⬇️

Mongoose: a tool for testing the performance of storage

Good day, Habr. It will be a question of the performance testing tool for storage systems (data storage systems), which was originally developed in the depths of EMC for internal needs, but which has the ability to grow smoothly. By the way, literally “yesterday” the mongoose received the status of the OpenSource project . And this means that the time has come to tell a little about him. So, what is this beast?

image

Key Features


  1. Distributed mode
    It is a way to perform load tasks simultaneously from many network nodes with centralized control and collection of metrics. A simplified illustration of the documentation:

    Allows you to significantly increase the load on distributed storage systems, emulating the requests of a large number of users.

  2. Reporting
    • Output files with lists of processed objects (files, ...) that can be reused as input
    • Availability of high resolution time stamps (ms) for each individual operation

  3. CRUD ( C reate / R ead / U pdate / D elete) - available types of I / O operations to create load
  4. Support for various storage types:
    • Amazon S3 REST API
    • EMC Atmos REST API
    • OpenStack Swift REST API
    • File system (local, NFS mount, ...)

  5. Supported object types:
    • Containers (they are also directories in the case of FS, they are also buckets in case of S3)
    • Data (files in the case of working with the file system)

  6. Verification of data when performing a read operation
  7. Generate arbitrary data (incompressible uniform noise, text, or identical bytes)
  8. Scripting language
  9. "Stub" : HTTP-server that implements the functions of cloud storage, which does not store data, but is able to give them back when reading. In fact, the storage mock to test the functionality and performance of the mongoose itself. To soon become a distributed stub, as well as the FS driver.
  10. Web GUI
  11. And many other wonderful things, the transfer of which will take too much space.

Known analogues



A few words about the high load


Since the performance testing tool must create a high I / O load, this tool itself must be very productive, and it must be very efficient in using the resources of the environment.
  1. Problem Solving C10K
    In earlier versions of the Mongoose, execution threads were tied to the appropriate connections. It quickly became clear that such an approach was flawed. When working with large objects with a large number of threads, the performance indicators were especially bad. However, after applying event-oriented asynchronous I / O, the results began to impress. The tool demonstrated performance even with 1 million simultaneously open connections, even without the use of distributed mode , which allows to multiply this number.



  2. Zero Copy wherever possible
  3. Automatic configuration of I / O buffer sizes based on known sizes of transmitted data. Small objects - less buffer, large objects - more buffer. Write - more output buffer, read - more input buffer. Actually, the buffers are located in Direct Memory to ensure Zero Copy

How it looks in practice


Once you have downloaded the tarball with the latest version and unpacked it, launching the mongoose happens to disgrace is simple:
java -jar mongoose-<VERSION>/mongoose.jar 

This will lead to the Mongoose trying to do everything by default:

To see something other than errors in this case, you can try running a stub on the same machine (which will serve as the storage mock):
 java -jar mongoose-<VERSION>/mongoose.jar wsmock 


For those who want to use the GUI, you will need to run the following command:
 java -jar mongoose-<VERSION>/mongoose.jar webui 

And go to the browser at 127.0.0.1:8080.


Another important feature is custom scripts. The script is recorded in JSON format and can be specified at startup as follows.
 java -jar mongoose-<VERSION>/mongoose.jar -f <PATH_TO_SCENARIO_FILE>.json 

One of the simplest scenarios is as follows:
 { "type": "load" } 

This script is used by the mongoose by default when no other script file is specified explicitly. Slightly more complex script example:
 { "type" : "for", "value" : "threads", "in" : [ 1, 10, 100, 1000, 10000, 100000 ], "config" : { "load" : { "threads" : "${threads}" } }, "jobs" : [ { "type" : "load" } ] } 


More detailed information on the use is available in the “Documentation” section of the Mongoose website .

What's next?


By the time this article was written, the last stable version is 2.4.1. Currently, an active development of version 3 is underway, in which a new architecture will be applied (monitor - generator - driver - monitor), which opens up new possibilities for distributed operation and scenarios such as "weighted load".



Future plans also include the following:

Source: https://habr.com/ru/post/306448/


All Articles