📜 ⬆️ ⬇️

Overview of the main mathematical optimization methods for problems with constraints

I have been preparing and collecting material for a long time, I hope this time turned out better. This article is dedicated to the main methods of solving mathematical optimization problems with constraints, so if you have heard that the simplex method is a very important method, but you still do not know what it does, then this article may help you.

PS The article contains mathematical formulas added by the macro editor. They say that they are sometimes not displayed. There are also many animations in gif format.

Preamble


The task of mathematical optimization is a task of the “Find in a set  mathcalKelement xsuch that for all xof  mathcalKperformed f(x) leqf(x)"That in the scientific literature it will most likely be written down somehow

\ begin {array} {rl} \ mbox {minimize} & f (x) \\ \ mbox {subject to} & x \ in \ mathcal {K}. \ end {array}


Historically, popular methods such as gradient descent or Newton's method work only in linear spaces (and preferably simple, for example  mathbbRn). In practice, there are often problems where you need to find a minimum in a non-linear space. For example, you need to find the minimum of some function on such vectors x=(x1, ldots,xn)for which xi geq0This may be due to the fact that xidenote the length of any objects. Or for example, if xrepresent the coordinates of the point that should be at a distance no more rfrom yi.e.  |xy | leqr. For such tasks, gradient descent or Newton's method is not directly applicable. It turned out that a very large class of optimization problems is conveniently covered by “constraints”, similar to those I described above. In other words, it is convenient to represent the set  mathcalKin the form of a system of equalities and inequalities

 beginarraylgi(x) leq0, 1 leqi leqm,hi(x)=0, 1 leqi leqk. endarray


Minimization tasks over a view space  mathbbRnthus, they became conditionally called “problems without constraints” (the unconstrained problem), and tasks over sets defined by sets of equalities and inequalities - “problems with constraints” (constrained problem).

Technically, absolutely any set  mathcalKcan be represented as a single equality or inequality using the indicator function, which is defined as

I_ \ mathcal {K} (x) = \ begin {cases} 0, & x \ notin \ mathcal {K} \\ 1, & x \ in \ mathcal {K}, \ end {cases}


However, such a function does not have different useful properties (convexity, differentiability, etc.). However, you can often imagine  mathcalKin the form of several equalities and inequalities, each of which has such properties. The main theory is brought under the case

\ begin {array} {rl} \ mbox {minimize} & f (x) \\ \ mbox {subject to} & g_i (x) \ leq 0, ~ 1 \ leq i \ leq m \ & Ax = b , \ end {array}


Where f,gi- convex (but not necessarily differentiable) functions, A- matrix. To demonstrate how the methods work, I will use two examples:

  1. Linear programming task

    $$ display $$ \ begin {array} {rl} \ mbox {minimize} & -2 & x ~~~ - & y \\ \ mbox {provided} & -1.0 & ~ x -0.1 & ~ y \ leq -1.0 \ \ & -1.0 & ~ x + ~ 0.6 & ~ y \ leq -1.0 \\ & -0.2 & ~ x + ~ 1.5 & ~ y \ leq -0.2 \\ & ~ 0.7 & ~ x + ~ 0.7 & ~ y \ leq 0.7 \\ & ~ 2.0 & ~ x -0.2 & ~ y \ leq 2.0 \\ & ~ 0.5 & ~ x -1.0 & ~ y \ leq 0.5 \\ & -1.0 & ~ x -1.5 & ~ y \ leq - 1.0 \\ \ end {array} $$ display $$


    In essence, this task is to find the farthest point of the polygon in the direction (2, 1), the solution to the problem is the point (4.7, 3.5) - the most “right” in the polygon). But actually the polygon itself

  2. Minimization of a quadratic function with a single quadratic constraint

    \ begin {array} {rl} \ mbox {minimize} & 0.7 (x - y) ^ 2 + 0.1 (x + y) ^ 2 \\ \ mbox {subject to} & (x-4) ^ 2 + ( y-6) ^ 2 \ leq 9 \ end {array}



')

Simplex method


Of all the methods that I cover with this review, the simplex method is probably the most famous. The method was developed specifically for linear programming and the only one presented achieves an exact solution in a finite number of steps (provided that exact arithmetic is used for calculations, in practice this is usually not the case, but in theory it is possible). The idea of ​​the simplex method consists of two parts:

  1. Systems of linear inequalities and equalities define multidimensional convex polytopes (polytopes). In the one-dimensional case, it is a point, a ray, or a segment, in a two-dimensional one, a convex polygon, and in a three-dimensional case, a convex polyhedron. Minimizing a linear function is essentially finding the furthest point in a certain direction. I think intuition should suggest that there should be some peak at this furthest point, and this is indeed so. In general, for a system of minequalities in n-dimensional space a vertex is a point satisfying a system for which exactly nof these inequalities turn into equalities (provided that among the inequalities there are no equivalent). There are always a finite number of such points, although there may be a lot of them.
  2. Now we have a finite set of points, generally speaking, you can simply pick them up, that is, to do something like this: for each subset of ninequalities to solve a system of linear equations constructed on the selected inequalities, verify that the solution fits into the original system of inequalities and compare with other such points. This is a rather simple, inefficient, but working method. The simplex method instead of iteration moves from vertex to vertex along edges so that the values ​​of the objective function are improved. It turns out that if a vertex has no “neighbors” in which the function values ​​are better, then it is optimal.

The simplex method is iterative, that is, it consistently improves the solution slightly. For such methods, you need to start somewhere, in general, this is done by solving auxiliary problem

\ begin {array} {rl} \ mbox {minimize} & s \\ \ mbox {subject to} & g_i (x) \ leq s, ~ 1 \ leq i \ leq m \\ \ end {array}


If to solve this problem x,ssuch that s leq0then executed gi(x) leqs leq0otherwise, the original problem is generally given on the empty set. To solve an auxiliary problem, you can also use the simplex method, the starting point can be s= maxigi(x)with arbitrary x. Finding the starting point can be called the first phase of the method, finding the solution to the original problem can be called the second phase of the method.

The trajectory of the two-phase simplex method
The trajectory was generated using scipy.optimize.linprog.


Projective Gradient Descent


I recently wrote a separate article about gradient descent, in which I also briefly described this method. Now this method is quite lively, but is being studied as part of a more general proximal gradient descent . The very idea of ​​the method is completely trivial: if we apply a gradient descent to a convex function fthen, with the right choice of parameters, we get the global minimum f. If, after each step of the gradient descent, the resulting point is corrected, instead of taking its projection onto the closed convex set  mathcalKthen as a result we get the minimum of the function fon  mathcalK. Well or more formally, a projective gradient descent is an algorithm that sequentially calculates

 begincasesxk+1=yk alphak nablaf(yk)yk+1=P mathcalK(xk+1), endcases


Where

P mathcalK(x)= mboxargminy in mathcalK |xy |.


The last equation defines a standard projection operator on a set; in fact, it is a function that, by the point xcomputes the nearest point of the set  mathcalK. The role of distance plays here  | ldots |It is worth noting that any norm can be used here, however, projections with different norms may differ!

In practice, projective gradient descent is used only in special cases. Its main problem is that the calculation of the projection can be even more challenging than the original, and it needs to be calculated many times. The most common case for which it is convenient to apply a projective gradient descent is “boxed constraints”, which have the form

 elli leqxi leqri,  1 leqi leqn.


In this case, the projection is calculated very simply, for each coordinate is obtained

[P _ {\ mathcal {K}} (x)] _ i = \ begin {cases} r_i, & x_i> r_i \\ \ ell_i, & x_i <\ ell_i \\ x_i, & x_i \ in [\ ell_i, r_i ]. \ end {cases}


The use of projective gradient descent for linear programming problems is completely meaningless; nevertheless, if you do this, it will look something like this.

The trajectory of the projective gradient descent for the linear programming problem


But what does the projective gradient descent trajectory look like for the second task, if

choose a large step size


and if

choose a small step size


Ellipsoid method


This method is remarkable because it is the first polynomial algorithm for linear programming problems; it can be considered a multidimensional generalization of the bisection method . I will start with a more general method of separating hyperplanes :


For optimization problems, the construction of a “separating hyperplane” is based on the following inequality for convex functions

f(y) geqf(x)+ nablaf(x)T(yx).


If fix x, then for a convex function fhalf space  nablaf(x)T(yx) geq0contains only points with a value not less than at a point x, which means they can be cut off, since these points are no better than the one we have already found. For problems with constraints, you can likewise get rid of points that are guaranteed to violate any of the constraints.

The simplest version of the separating hyperplane method is to simply cut off half-spaces without adding any points. As a result, at each step we will have a certain polyhedron. The problem with this method is that the number of faces of a polyhedron is likely to increase from step to step. Moreover, it can grow exponentially.

The ellipsoid method actually stores an ellipsoid at every step. More precisely, after the hyperplane, an ellipsoid of minimal volume is constructed, which contains one of the parts of the original one. This is achieved by adding new points. An ellipsoid can always be defined by a positive definite matrix and vector (the center of the ellipsoid) as follows

\ mathcal {E} (P, x) = \ {z ~ | ~ (z-x) ^ TP (z-x) \ leq 1 \}.


The construction of a minimum volume ellipsoid containing the intersection of a half-space and another ellipsoid can be accomplished using moderately cumbersome formulas . Unfortunately, in practice, this method was still not as good as the simplex method or the interior point method.

But actually how it works for

linear programming


and for

quadratic programming


Interior point method


This method has a long history of development, one of the first prerequisites appeared around the same time as the simplex method was developed. But at that time it was not yet sufficiently effective to be used in practice. Later in 1984, a variant of the method was developed specifically for linear programming, which was good both in theory and in practice. Moreover, the internal point method is not limited only to linear programming, unlike the simplex method, and now it is the main algorithm for convex optimization problems with constraints.

The basic idea of ​​the method is the replacement of restrictions on the penalty in the form of the so-called barrier function . Function F:Int mathcalK rightarrow mathbbRcalled the barrier function for the set  mathcalK, if a

F(x) rightarrow+ infty  mboxforx rightarrow partial mathcalK.


Here Int mathcalK- the inside  mathcalK,  partial mathcalK- the border  mathcalK. Instead, the original problem is proposed to solve the problem.

 mboxminimizebyx   varphi(x,t)=tf(x)+F(x).


Fand  varphiset only on the inside  mathcalK(essentially hence the name), the barrier property ensures that  varphiat least xexists. Moreover, the more tthe greater the impact f. Under sufficiently reasonable conditions, one can achieve that if one strives tto infinity then the minimum  varphiwill converge to the solution of the original problem.

If many  mathcalKspecified as a set of inequalities gi(x) leq0, 1 leqi leqm, the standard choice of the barrier function is the logarithmic barrier

F(x)= sumi=1m ln(gi(x)).


Minimum points x(t)functions  varphi(x,t)for different tforms a curve, which is usually called the central path , the method of the inner point as if trying to follow this path. That's what it looks like for

Examples with linear programming

Analytical center is just x(0)

Finally, the internal point method itself has the following form.

  1. Select initial approximation x0, t0>0
  2. Select New Approach by Newton Method

    xk+1=xk[ nablax2 varphi(xk,tk)]1 nablax varphi(xk,tk)


  3. Zoom t

    tk+1= alphatk,   alpha>1



The use of Newton's method is very important here: the fact is that with the right choice of the barrier function, the step of Newton's method generates a point that remains inside our set , experimented, does not always produce this form. Finally, the trajectory of the inner point method

Linear programming task

The jumping black dot is x(tk)i.e. point to which we are trying to approach the step of the Newton method in the current step.

Quadratic programming problem

Source: https://habr.com/ru/post/428794/


All Articles