⬆️ ⬇️

Optimization Method Trust-Region DOGLEG. Python implementation example





The Trust-region method (TRM) is one of the most important numerical optimization methods for solving nonlinear programming problems. The method is based on determining the region around the best solution, in which the quadratic model approximates the objective function.



Line search methods and trust-region methods generate steps by approximating the objective function by a quadratic model, but they use this model in different ways. Linear search uses it to get the search direction and further find the optimal pitch along the direction. The Trust-region method defines a region (region) around the current iteration, in which the model sufficiently approximates the objective function. In order to increase efficiency, the direction and length of the step are selected simultaneously.

')

Trust-region methods are reliable and stable, can be applied to ill-conditioned tasks and have very good convergence properties. Good convergence is due to the fact that the size of the region TR (usually determined by the module of the radius vector) at each iteration depends on the improvements made at previous iterations.



If the calculations show a fairly good approximation of the approximating model to the objective function, then the trust-region can be increased. Otherwise, if the approximation model does not work well enough, the trust-region should be reduced.



As a result, we get approximately the following picture:





Algorithm



Step # 1



The trust-region method uses a quadratic model. At each iteration, the step is calculated by solving the following quadratic problem ( subproblem ):

 minp inRn mk(p)=fk+pTgk+ frac12pTBkp, s.t.|p| leq Deltak,



Where fk=f(xk),gk= nablaf(xk),Bk= nabla2f(xk);

 Deltak>0- trust-region radius;



Thus, trust-region requires a sequential calculation of approximations of a quadratic model in which the objective function and condition (which can be written pTp le Deltak2) is also quadratic. When hessian ( Bk) positively defined and |Bk1 nablafk| le Deltakthe decision is easy to determine - this is the absolute minimum pkB=Bk1 nablafk. In this case pkbcalled a full step. The solution is not so obvious in other cases, but finding it does not cost too much. We need to find only an approximate solution in order to obtain sufficient convergence and good behavior of the algorithm in practice.



There are several strategies for approximating a quadratic model, including the following:



Cauchy point algorithm



The concept of the method is similar to the logic of the steepest descent algorithm. The Cauchy point lies on a gradient that minimizes the quadratic model, provided there is a step in the trust-region. By successively finding Cauchy points, a local minimum can be found. The method has ineffective convergence, like the method of the fastest descent.



Steihaug algorithm



The method is called its researcher - Steihaug. It is a modified conjugate gradient approach.



Dogleg Algorithm



The article will discuss in detail the method of approximation of the quadratic model dogleg , which is one of the most common and effective methods. It is used when the Hessian matrix (or its approximation) is positive defined.

The most simple and interesting version of the method works with a polygon consisting of only two segments, which some people resemble the dog's leg.



Step 2



The first problem that arises when determining the trust-region algorithm is the choice of a strategy for finding the optimal trust-region radius  Deltakat each iteration. The choice is based on the similarity of the function. mkand objective function fat the previous iteration. After finding pkwe define the following relationship:

 rhok= fracactual reductionpredicted reduction= fracf(xk)f(xk+pk)mk(0)mk(pk)



The denominator should always be non-negative because the pitch pkobtained by minimizing the quadratic model mkby region which includes the step p=0. The relationship is used to determine the succession of the step and the subsequent update of the radius trust-region.



If a  rhok<0or  rhok approx0we reduce the size of the trust-region region.



If a  rhok approx1then the model fits well with the objective function. In this case, the trust-region should be extended at the next iteration.



In other cases, the trust-region remains unchanged.



Step 3



The following algorithm describes the process:



Determine the starting point x0maximum trust-region radius  stackrel Deltainitial trust-region radius  Delta0= in(0, stackrel Delta)and constant  eta in[0, frac14)



for k = 0, 1, 2, ... bye xknot optimum.



We solve:

 minp inRn mk(p)=fk+pTgk+ frac12pTBkp s.t.|p| leq Deltak



Where pk-decision.

Calculate the ratio:

 rhok= fracf(xk)f(xk+pk)mk(0)mk(pk)



Update the current point:

$$ display $$ x_ {k + 1} = \ begin {equation *} \ begin {cases} x_k + p_k & if \ \ rho_k> \ eta, \\ x_k & \ text {otherwise.} \ end {cases} \ end {equation *} $$ display $$



Update the trust-region radius:

$$ display $$ \ Delta_ {k + 1} = \ begin {equation *} \ begin {cases} \ frac {1} {4} \ Delta_k & if \ \ rho_k <\ frac {1} {4}, \ \ min (2 \ Delta_k, \ stackrel {-} {\ Delta}) & if \ \ rho_k> \ frac {3} {4} \ and \ | p_k | = \ Delta_k, \\ \ Delta_k & otherwise. \ end {cases} \ end {equation *} $$ display $$



Algorithm in expanded form





Note that the radius increases only if |pk|reaches the boundary of the trust-region. If the step remains strictly in the region, then the current value does not affect the operation of the algorithm and there is no need to change the radius value at the next iteration.



Dogleg Algorithm



The method begins with checking the effectiveness of the trust-region radius in solving p( delta)quadratic model m(p). When Bpositive defined, as already noted, the best solution would be a full step pb=b1g. When this point can be found, obviously it will be the solution.

p( delta)=pb,delta geq|pb|.





When  Deltakvery little restriction |p| leq Deltakensures that the square member in the model m(p)has a slight effect on the decision. Real solution p( delta)approximated as if we optimized the linear function f+gTpprovided |p| leq Delta, then:

p( Delta) approx Delta fracg|g|



When  Deltaquite a few.



For average values  Deltadecision p( delta)usually follows a curved path as shown in the picture:







Dogleg method approximates curved path p( delta)line consisting of two straight lines. The first segment begins at the beginning and extends along the steepest descent direction and is defined as follows:

pU= fracgTggTBgg



The second begins with puand continues to pb.



Formally, we denote the trajectory  tildep( tau)where  tau in[0,2];

$$ display $$ \ tilde p (\ tau) = \ begin {equation *} \ begin {cases} \ tau p ^ U, & 0 \ leq \ tau \ leq1, \\\ p ^ U + (\ tau - 1 ) (p ^ Bp ^ U), & 1 \ leq \ tau \ leq2. \ end {cases} \ end {equation *} $$ display $$



For searching  tauIt is necessary to solve a quadratic equation, as follows:

|pU+ tau(pBpU)|2= Delta2



(pU)2+2 tau(pBpU)pU+ tau2(pBpU)2= Delta2



Find the discriminant of the equation:

D=4(pBpU)2(pU)24(pBPpU)2((pU)2 Delta2)



 sqrtD=2(pBpU) Delta



The root of the equation is:

 tau= frac2(pBpU)pU+2(pBpU) Delta2(pBpU)2= frac DeltapUpBpU



Dogleg method selects pkto minimize the model malong this way. In fact, there is no need to perform a search because the dogleg path crosses the trust-region border only once and the intersection point can be found analytically.



Example



Using the trust-region algorithm (dogleg), optimize the following function (Rosenbrock function):

f(x,y)=(1x)2+100(yx2)2



Find the gradient and hessian functions:



 frac partialf partialx=400(yx2)x2+2x



 frac partialf partialy=200y200x2



 frac partial2f partialx partialx=1200x2400y+2



 frac partial2f partialx partialy= frac partial2f partialy partialx=400x



 frac partial2f partialy partialy=$20



Initialize the necessary variables for the operation of the algorithm:



 Deltak= 1.0,

 stackrel Delta= 100.0,

xk=x0=[5,5],

gtol=0.15(required accuracy)

 eta=0.15.



Iteration 1



Find the optimal solution of the quadratic model pk:



Insofar as pU=[1.4226,0.1422]and |pU|> Deltak



Consequently:

pk= frac DeltakpU|pU|=[1.4226,0.1422]





Calculate  rhok:



actual reduction = f(xk)f(xk+pk)=28038.11



predicted reduction = mk(0)mk(pk)=26146.06



 rhok= frac28038.1126146.06=1.0723





 Deltak- remains unchanged.



We update xk:



xk=xk+pk=[4.004,5.099]





Iteration 2



Find the optimal solution of the quadratic model pk:



Insofar as pU=[1.01090.1261]and |pU|> Deltak.



Consequently:

pk= frac DeltakpU|pU|=[0.99230.1238]





Calculate  rhok:



actual reduction = f(xk)f(xk+pk)=10489.43



predicted reduction = mk(0)mk(pk)=$8996.7



 rhok= frac10489.438996.73=1.1659





Since  rhok>0.75and |pk|= Deltak:



 Deltak=min(2 Deltak, stackrel Deltak)=2.0





We update xk:



xk=xk+pk=[3.01,5.22]





Iteration 3



Find the optimal solution of the quadratic model pk:



pk=pU+ tau(pBpU)=[0.257,1.983]





Where  tau=0.5058.



Calculate  rhok:



actual reduction = f(xk)f(xk+pk)=$1470.6



predicted reduction = mk(0)mk(pk)=$1424.1



 rhok= frac1470.621424.16=1.0326





 Deltak- remains the same.



We update xk:



xk=xk+pk=[2.7551,7.2066]





The algorithm continues until |gk|<gtol or a specified number of iterations has not been performed.



The table of results of the algorithm for the Rosenbrock function:



kpk rhok Deltakxk
0--1.0[5, 5]
one[-0.9950, 0.0994]1.0721.0[4.0049, 5.0994]
2[-0.9923, 0.1238]1.16592.0[3.0126, 5.2233]
3[-0.2575, 1.9833]1.03262.0[2.7551, 7.2066]
four[-0.0225, 0.2597]1.00262.0[2.7325, 7.4663]
five[-0.3605, -1.9672]-0.45870.5[2.7325, 7.4663]
6[-0.0906, -0.4917]0.99661.0[2.6419, 6.9746]
7[-0.1873, -0.9822]0.87152.0[2.4546, 5.9923]
eight[-0.1925, -0.9126]1.27222.0[2.2620, 5.0796]
9[-0.1499, -0.6411]1.35562.0[2.1121, 4.4385]
ten[-0.2023, -0.8323]1.05942.0[1.9097, 3.6061]
eleven[-0.0989, -0.3370]1.27402.0[1.8107, 3.2690]
12[-0.2739, -0.9823]-0.79630.25495[1.8107, 3.2690]
13[-0.0707, -0.2449]1.08110.5099[1.7399, 3.0240]
14[-0.1421, -0.4897]0.87951.0198[1.5978, 2.5343]
15[-0.1254, -0.3821]1.31221.0198[1.4724, 2.1522]
sixteen[-0.1138, -0.3196]1.30551.0198[1.3585, 1.8326]
17[-0.0997, -0.2580]1.30251.0198[1.2587, 1.5745]
18[-0.0865, -0.2079]1.28781.0198[1.1722, 1.3666]
nineteen[-0.0689, -0.1541]1.27801.0198[1.1032, 1.2124]
20[-0.0529, -0.1120]1.24321.0198[1.0503, 1.1004]
21[-0.0322, -0.0649]1.19711.0198[1.0180, 1.0354]
22[-0.0149, -0.0294]1.10971.0198[1.0031, 1.0060]
23[-0.0001, -0.0002]1.00121.0198[1.00000024, 1.00000046]
24[-2.37065e-07, -4.56344e-07]1.00001.0198[1.0, 1.0]




Analytically we find the minimum of the Rosenbrock function; it is reached at the point [1,1]. Thus, you can verify the effectiveness of the algorithm.



Python implementation example



The algorithm is implemented using the numpy library. The example imposes a limit on the number of iterations.



''' Pure Python/Numpy implementation of the Trust-Region Dogleg algorithm. Reference: https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods ''' #!/usr/bin/python # -*- coding: utf-8 -*- import numpy as np import numpy.linalg as ln import scipy as sp from math import sqrt # Objective function def f(x): return (1-x[0])**2 + 100*(x[1]-x[0]**2)**2 # Gradient def jac(x): return np.array([-400*(x[1] - x[0]**2)*x[0] - 2 + 2*x[0], 200*x[1] - 200*x[0]**2]) # Hessian def hess(x): return np.array([[1200*x[0]**2 - 400*x[1]+2, -400*x[0]], [-400*x[0], 200]]) def dogleg_method(Hk, gk, Bk, trust_radius): # Compute the Newton point. # This is the optimum for the quadratic model function. # If it is inside the trust radius then return this point. pB = -np.dot(Hk, gk) norm_pB = sqrt(np.dot(pB, pB)) # Test if the full step is within the trust region. if norm_pB <= trust_radius: return pB # Compute the Cauchy point. # This is the predicted optimum along the direction of steepest descent. pU = - (np.dot(gk, gk) / np.dot(gk, np.dot(Bk, gk))) * gk dot_pU = np.dot(pU, pU) norm_pU = sqrt(dot_pU) # If the Cauchy point is outside the trust region, # then return the point where the path intersects the boundary. if norm_pU >= trust_radius: return trust_radius * pU / norm_pU # Find the solution to the scalar quadratic equation. # Compute the intersection of the trust region boundary # and the line segment connecting the Cauchy and Newton points. # This requires solving a quadratic equation. # ||p_u + tau*(p_b - p_u)||**2 == trust_radius**2 # Solve this for positive time t using the quadratic formula. pB_pU = pB - pU dot_pB_pU = np.dot(pB_pU, pB_pU) dot_pU_pB_pU = np.dot(pU, pB_pU) fact = dot_pU_pB_pU**2 - dot_pB_pU * (dot_pU - trust_radius**2) tau = (-dot_pU_pB_pU + sqrt(fact)) / dot_pB_pU # Decide on which part of the trajectory to take. return pU + tau * pB_pU def trust_region_dogleg(func, jac, hess, x0, initial_trust_radius=1.0, max_trust_radius=100.0, eta=0.15, gtol=1e-4, maxiter=100): xk = x0 trust_radius = initial_trust_radius k = 0 while True: gk = jac(xk) Bk = hess(xk) Hk = np.linalg.inv(Bk) pk = dogleg_method(Hk, gk, Bk, trust_radius) # Actual reduction. act_red = func(xk) - func(xk + pk) # Predicted reduction. pred_red = -(np.dot(gk, pk) + 0.5 * np.dot(pk, np.dot(Bk, pk))) # Rho. rhok = act_red / pred_red if pred_red == 0.0: rhok = 1e99 else: rhok = act_red / pred_red # Calculate the Euclidean norm of pk. norm_pk = sqrt(np.dot(pk, pk)) # Rho is close to zero or negative, therefore the trust region is shrunk. if rhok < 0.25: trust_radius = 0.25 * norm_pk else: # Rho is close to one and pk has reached the boundary of the trust region, therefore the trust region is expanded. if rhok > 0.75 and norm_pk == trust_radius: trust_radius = min(2.0*trust_radius, max_trust_radius) else: trust_radius = trust_radius # Choose the position for the next iteration. if rhok > eta: xk = xk + pk else: xk = xk # Check if the gradient is small enough to stop if ln.norm(gk) < gtol: break # Check if we have looked at enough iterations if k >= maxiter: break k = k + 1 return xk result = trust_region_dogleg(f, jac, hess, [5, 5]) print("Result of trust region dogleg method: {}".format(result)) print("Value of function at a point: {}".format(f(result))) 




Thank you for your interest in my article. I hope she was useful to you and you have learned a lot.

Source: https://habr.com/ru/post/335224/



All Articles