Over-provisioning resources in mesos-based clusters

The Apache mesos developers claim that mesos has learned to do overprovisioning since version 0.23.0, released last September. To do this, we introduced the concept of revocable resources, and if an application runs on revoked resources, it can always be asked to free resources partially (throttle) or completely (kill). Determine which resources the task will run on during the startup phase by marking some (or all) of the resources requested by the task as revocable .

In practice, to use this feature you need:

Declare REVOCABLE_RESOURCES support when registering framework with mesos
Connect to the mesos slave resource estimator module, which would estimate the amount of reclaimed resources (for example, measuring the difference between consumed and allocated resources) and predicted changes in resource consumption (for example, based on a statistical model)
Connect the QoS Controller module to the mesos slave, which would kill or limit the tasks running on the reclaimable resources.

As can be seen from the requirements above, for the effective use of the proposed model, some support is required from the tasks carried out in mesos, at least in terms of managing consumed resources. Of course, it would be very cool to write a resource estimator tied to the application logic, but even predictions based on daily statistics of load changes will give a good effect.
')
Complete with mesos now comes a pair of resource estimators:

noop - stub prohibiting oversubscription
fixed - allows declaring a fixed number of host resources to be revoked.

and a couple of qos controllers:

noop - disable host-level resource recall
load - can kill all the tasks performed on the recalled resources, if the load average on the host exceeds the threshold values

Unfortunately, the good news ends here because support from widespread frameworks is virtually absent.

In Marathon, for example, support for revocable resources is so far extremely poorly implemented:

The first paragraph of requirements is made purely formally in release 0.15.0.
There is a quarterly article in the corporate blog.
There is a open issue for support for revocable (resource) offers with extremely sluggish discussion.

By the way, in the aforementioned article there is a statement which is useful for showing to the management, who demands to optimize the use of resources based on low utilization of cpu on the hosts, for example:
It is often 20%.

In kubernets , and even less - managed to find one unpopular issue with a mention of the problem.

Thus, on the basis of the mesos-marathon stack, it will not be possible to use over-provisioned resources, which obviously affects the cost-effectiveness of the solution.

Source: https://habr.com/ru/post/276773/

All Articles

Over-provisioning resources in mesos-based clusters

More articles: