Optimization Method Trust-Region DOGLEG. Python implementation example

The Trust-region method (TRM) is one of the most important numerical optimization methods for solving nonlinear programming problems. The method is based on determining the region around the best solution, in which the quadratic model approximates the objective function.

Line search methods and trust-region methods generate steps by approximating the objective function by a quadratic model, but they use this model in different ways. Linear search uses it to get the search direction and further find the optimal pitch along the direction. The Trust-region method defines a region (region) around the current iteration, in which the model sufficiently approximates the objective function. In order to increase efficiency, the direction and length of the step are selected simultaneously.
')
Trust-region methods are reliable and stable, can be applied to ill-conditioned tasks and have very good convergence properties. Good convergence is due to the fact that the size of the region TR (usually determined by the module of the radius vector) at each iteration depends on the improvements made at previous iterations.

If the calculations show a fairly good approximation of the approximating model to the objective function, then the trust-region can be increased. Otherwise, if the approximation model does not work well enough, the trust-region should be reduced.

As a result, we get approximately the following picture:

Algorithm

Step # 1

The trust-region method uses a quadratic model. At each iteration, the step is calculated by solving the following quadratic problem ( subproblem ):

m i n_{p i n R^{n}} m_{k} (p) = f_{k} + p^{T} g_{k} + f r a c 12 p^{T} B_{k} p, s . t . | p | l e q D e l t a_{k},

$\ min_ {p \ in R ^ n} \ m_k (p) = f_k + p ^ T g_k + \ frac 1 2 p ^ T B_kp, \ \\ s.t. | p | \ leq \ Delta_k,$

Where

f_{k} = f (x_{k}), g_{k} = n a b l a f (x_{k}), B_{k} = n a b l a^{2} f (x_{k})

$f_k = f (x_k), g_k = \ nabla f (x_k), B_k = \ nabla ^ 2f (x_k)$ ;

D e l t a_{k} > 0

$\ Delta_k> 0$ - trust-region radius;

Thus, trust-region requires a sequential calculation of approximations of a quadratic model in which the objective function and condition (which can be written

p^{T} p l e D e l t a_{k}^{2}

$p ^ Tp \ le \ Delta_k ^ 2$ ) is also quadratic. When hessian (

B_{k}

$B_k$ ) positively defined and

| B_{k}^{- 1} n a b l a f_{k} | l e D e l t a_{k}

$| B_k ^ {- 1} \ nabla f_k | \ le \ Delta_k$ the decision is easy to determine - this is the absolute minimum

p_{k}^{B} = - B_{k}^{- 1} n a b l a f_{k}

$p_k ^ B = -B_k ^ {- 1} \ nabla f_k$ . In this case

p_{k}^{b}

$p_k ^ b$ called a full step. The solution is not so obvious in other cases, but finding it does not cost too much. We need to find only an approximate solution in order to obtain sufficient convergence and good behavior of the algorithm in practice.

There are several strategies for approximating a quadratic model, including the following:

Cauchy point algorithm

The concept of the method is similar to the logic of the steepest descent algorithm. The Cauchy point lies on a gradient that minimizes the quadratic model, provided there is a step in the trust-region. By successively finding Cauchy points, a local minimum can be found. The method has ineffective convergence, like the method of the fastest descent.

Steihaug algorithm

The method is called its researcher - Steihaug. It is a modified conjugate gradient approach.

Dogleg Algorithm

The article will discuss in detail the method of approximation of the quadratic model dogleg , which is one of the most common and effective methods. It is used when the Hessian matrix (or its approximation) is positive defined.
The most simple and interesting version of the method works with a polygon consisting of only two segments, which some people resemble the dog's leg.

Step 2

The first problem that arises when determining the trust-region algorithm is the choice of a strategy for finding the optimal trust-region radius

D e l t a_{k}

$\ Delta_k$ at each iteration. The choice is based on the similarity of the function.

m_{k}

$m_k$ and objective function

f

$f$ at the previous iteration. After finding

p_{k}

$p_k$ we define the following relationship:

r h o_{k} = f r a c a c t u a l r e d u c t i o n p r e d i c t e d r e d u c t i o n = f r a c f (x_{k}) - f (x_{k} + p_{k}) m_{k} (0) - m_{k} (p_{k})

$\ rho_k = \ frac {actual \ reduction} {predicted \ reduction} = \ frac {f (x_k) - f (x_k + p_k)} {m_k (0) - m_k (p_k)}$

The denominator should always be non-negative because the pitch

p_{k}

$p_k$ obtained by minimizing the quadratic model

m_{k}

$m_k$ by region which includes the step

p = 0

$p = 0$ . The relationship is used to determine the succession of the step and the subsequent update of the radius trust-region.

If a

r h o_{k} < 0

$\ rho_k <0$ or

r h o_{k} a p p r o x 0

$\ rho_k \ approx0$ we reduce the size of the trust-region region.

If a

r h o_{k} a p p r o x 1

$\ rho_k \ approx1$ then the model fits well with the objective function. In this case, the trust-region should be extended at the next iteration.

In other cases, the trust-region remains unchanged.

Step 3

The following algorithm describes the process:

Determine the starting point

x_{0}

$x_0$ maximum trust-region radius

s t a c k r e l - D e l t a

$\ stackrel {-} {\ Delta}$ initial trust-region radius

D e l t a_{0} = i n (0, s t a c k r e l - D e l t a)

$\ Delta_0 = \ in (0, \ stackrel {-} {\ Delta})$ and constant

e t a i n [0, f r a c 14)

$\ eta \ in [0, \ frac {1} {4})$

for k = 0, 1, 2, ... bye

x_{k}

$x_k$ not optimum.

We solve:

m i n_{p i n R^{n}} m_{k} (p) = f_{k} + p^{T} g_{k} + f r a c 12 p^{T} B_{k} p s . t . | p | l e q D e l t a_{k}

$\ min_ {p \ in R ^ n} \ m_k (p) = f_k + p ^ T g_k + \ frac 1 2 p ^ T B_kp \ \\ s.t. | p | \ leq \ Delta_k$

Where

p_{k}

$p_k$ -decision.
Calculate the ratio:

r h o_{k} = f r a c f (x_{k}) - f (x_{k} + p_{k}) m_{k} (0) - m_{k} (p_{k})

$\ rho_k = \ frac {f (x_k) - f (x_k + p_k)} {m_k (0) - m_k (p_k)}$

Update the current point:

$$ display $$ x_ {k + 1} = \ begin {equation *} \ begin {cases} x_k + p_k & if \ \ rho_k> \ eta, \\ x_k & \ text {otherwise.} \ end {cases} \ end {equation *} $$ display $$Update the trust-region radius:$$ display $$ \ Delta_ {k + 1} = \ begin {equation *} \ begin {cases} \ frac {1} {4} \ Delta_k & if \ \ rho_k <\ frac {1} {4}, \ \ min (2 \ Delta_k, \ stackrel {-} {\ Delta}) & if \ \ rho_k> \ frac {3} {4} \ and \ | p_k | = \ Delta_k, \\ \ Delta_k & otherwise. \ end {cases} \ end {equation *} $$ display $$

Algorithm in expanded form

Note that the radius increases only if

| p_{k} |

$| p_k |$ reaches the boundary of the trust-region. If the step remains strictly in the region, then the current value does not affect the operation of the algorithm and there is no need to change the radius value at the next iteration.

Dogleg Algorithm

The method begins with checking the effectiveness of the trust-region radius in solving

p^{*} (d e l t a)

$p ^ * (\ delta)$ quadratic model

m (p)

$m (p)$ . When

B

$B$ positive defined, as already noted, the best solution would be a full step

p^{b} = - b^{- 1} g

$p ^ b = -b ^ {- 1} g$ . When this point can be found, obviously it will be the solution.

p^{*} (d e l t a) = p^{b}, d e l t a g e q | p^{b} | .

$p ^ * (\ delta) = p ^ b, \\ delta \ geq | p ^ b |.$

When

D e l t a_{k}

$\ Delta_k$ very little restriction

| p | l e q D e l t a_{k}

$| p | \ leq \ Delta_k$ ensures that the square member in the model

m (p)

$m (p)$ has a slight effect on the decision. Real solution

p (d e l t a)

$p (\ delta)$ approximated as if we optimized the linear function

f + g^{T} p

$f + g ^ Tp$ provided

| p | l e q D e l t a

$| p | \ leq \ Delta$ , then:

p^{*} (D e l t a) a p p r o x - D e l t a f r a c g | g |

$p ^ * (\ Delta) \ approx - \ Delta \ frac {g} {| g |}$

When

D e l t a

$\ Delta$ quite a few.

For average values

D e l t a

$\ Delta$ decision

p^{*} (d e l t a)

$p ^ * (\ delta)$ usually follows a curved path as shown in the picture:

Dogleg method approximates curved path

p^{*} (d e l t a)

$p ^ * (\ delta)$ line consisting of two straight lines. The first segment begins at the beginning and extends along the steepest descent direction and is defined as follows:

p^{U} = - f r a c g^{T} g g^{T} B g g

$p ^ U = - \ frac {g ^ Tg} {g ^ TBg} g$

The second begins with

p^{u}

$p ^ u$ and continues to

p^{b}

$p ^ b$ .

Formally, we denote the trajectory

t i l d e p (t a u)

$\ tilde p (\ tau)$ where

t a u i n [0, 2]

$\ tau \ in [0, 2]$ ;

$$ display $$ \ tilde p (\ tau) = \ begin {equation *} \ begin {cases} \ tau p ^ U, & 0 \ leq \ tau \ leq1, \\\ p ^ U + (\ tau - 1 ) (p ^ Bp ^ U), & 1 \ leq \ tau \ leq2. \ end {cases} \ end {equation *} $$ display $$For searching $\ tau$ It is necessary to solve a quadratic equation, as follows:

| p^{U} + t a u * (p^{B} - p^{U}) |^{2} = D e l t a^{2}

$| p ^ U + \ tau * (p ^ B - p ^ U) | ^ 2 = \ Delta ^ 2$

(p^{U})^{2} + 2 t a u (p^{B} - p^{U}) p^{U} + t a u^{2} (p^{B} - p^{U})^{2} = D e l t a^{2}

$(p ^ U) ^ 2 + 2 \ tau (p ^ B-p ^ U) p ^ U + \ tau ^ 2 (p ^ B-p ^ U) ^ 2 = \ Delta ^ 2$

Find the discriminant of the equation:

D = 4 (p^{B} - p^{U})^{2} (p^{U})^{2} - 4 (p^{B} P - p^{U})^{2} ((p^{U})^{2} - D e l t a^{2})

$D = 4 (p ^ B-p ^ U) ^ 2 (p ^ U) ^ 2 - 4 (p ^ BP-p ^ U) ^ 2 ((p ^ U) ^ 2 - \ Delta ^ 2)$

s q r t D = 2 (p^{B} - p^{U}) D e l t a

$\ sqrt {D} = 2 (p ^ B-p ^ U) \ Delta$

The root of the equation is:

t a u = f r a c - 2 (p^{B} p^{U}) p^{U} + 2 (p^{B} - p^{U}) D e l t a 2 (p^{B} p^{U})^{2} = f r a c D e l t a - p^{U} p^{B} p^{U}

$\ tau = \ frac {-2 (p ^ Bp ^ U) p ^ U + 2 (p ^ B - p ^ U) \ Delta} {2 (p ^ Bp ^ U) ^ 2} = \ frac {\ Delta-p ^ U} {p ^ Bp ^ U}$

Dogleg method selects

p_{k}

$p_k$ to minimize the model

m

$m$ along this way. In fact, there is no need to perform a search because the dogleg path crosses the trust-region border only once and the intersection point can be found analytically.

Example

Using the trust-region algorithm (dogleg), optimize the following function (Rosenbrock function):

f (x, y) = (1 - x)^{2} + 100 (y - x^{2})^{2}

$f (x, y) = (1-x) ^ 2 + 100 (y-x ^ 2) ^ 2$

Find the gradient and hessian functions:

f r a c p a r t i a l f p a r t i a l x = - 400 (y - x^{2}) x - 2 + 2 x

$\ frac {\ partial f} {\ partial x} = -400 (y-x ^ 2) x - 2 + 2x$

f r a c p a r t i a l f p a r t i a l y = 200 y - 200 x^{2}

$\ frac {\ partial f} {\ partial y} = 200y - 200x ^ 2$

f r a c p a r t i a l^{2} f p a r t i a l x p a r t i a l x = 1200 x^{2} - 400 y + 2

$\ frac {\ partial ^ 2 f} {\ partial x \ partial x} = 1200x ^ 2 - 400y + 2$

f r a c p a r t i a l^{2} f p a r t i a l x p a r t i a l y = f r a c p a r t i a l^{2} f p a r t i a l y p a r t i a l x = - 400 x

$\ frac {\ partial ^ 2 f} {\ partial x \ partial y} = \ frac {\ partial ^ 2 f} {\ partial y \ partial x} = -400x$

f r a c p a r t i a l^{2} f p a r t i a l y p a r t i a l y = $ 20

$\ frac {\ partial ^ 2 f} {\ partial y \ partial y} = $ 20$

Initialize the necessary variables for the operation of the algorithm:

D e l t a_{k}

$\ Delta_k$ = 1.0,

s t a c k r e l - D e l t a

$\ stackrel {-} {\ Delta}$ = 100.0,

x_{k} = x_{0} = [5, 5]

$x_k = x_0 = [5, 5]$ ,

g t o l = 0.15

$gtol = 0.15$ (required accuracy)

e t a = 0.15

$\ eta = 0.15$ .

Iteration 1

Find the optimal solution of the quadratic model

p_{k}

$p_k$ :

Insofar as

p^{U} = [- 1.4226, 0.1422]

$p ^ U = [-1.4226, 0.1422]$ and

| p^{U} | > D e l t a_{k}

$| p ^ U | > \ Delta_k$

Consequently:

p_{k} = f r a c D e l t a_{k} p^{U} | p^{U} | = [- 1.4226, 0.1422]

$p_k = \ frac {\ Delta_kp ^ U} {| p ^ U |} = [-1.4226, 0.1422]$

Calculate

r h o_{k}

$\ rho_k$ :

actual reduction =

f (x_{k}) - f (x_{k} + p_{k}) = 28038.11

$f (x_k) - f (x_k + p_k) = 28038.11$

predicted reduction =

m_{k} (0) - m_{k} (p_{k}) = 26146.06

$m_k (0) - m_k (p_k) = 26146.06$

r h o_{k} = f r a c 28038.11 26146.06 = 1.0723

$\ rho_k = \ frac {28038.11} {26146.06} = 1.0723$

D e l t a_{k}

$\ Delta_k$ - remains unchanged.

We update

x_{k}

$x_k$ :

x_{k} = x_{k} + p_{k} = [4.004, 5.099]

$x_k = x_k + p_k = [4.004, 5.099]$

Iteration 2

Find the optimal solution of the quadratic model

p_{k}

$p_k$ :

Insofar as

p^{U} = [- 1.0109 0.1261]

$p ^ U = [-1.0109 0.1261]$ and

| p^{U} | > D e l t a_{k}

$| p ^ U | > \ Delta_k$ .

Consequently:

p_{k} = f r a c D e l t a_{k} p^{U} | p^{U} | = [- 0.9923 0.1238]

$p_k = \ frac {\ Delta_kp ^ U} {| p ^ U |} = [-0.9923 0.1238]$

Calculate

r h o_{k}

$\ rho_k$ :

actual reduction =

f (x_{k}) - f (x_{k} + p_{k}) = 10489.43

$f (x_k) - f (x_k + p_k) = 10489.43$

predicted reduction =

m_{k} (0) - m_{k} (p_{k}) = $ 8996.7

$m_k (0) - m_k (p_k) = $ 8996.7$

r h o_{k} = f r a c 10489.43 8996.73 = 1.1659

$\ rho_k = \ frac {10489.43} {8996.73} = 1.1659$

Since

r h o_{k} > 0.75

$\ rho_k> 0.75$ and

| p_{k} | = D e l t a_{k}

$| p_k | = \ Delta_k$ :

D e l t a_{k} = m i n (2 D e l t a_{k}, s t a c k r e l - D e l t a_{k}) = 2.0

$\ Delta_k = min (2 \ Delta_k, \ stackrel {-} {\ Delta_k}) = 2.0$

We update

x_{k}

$x_k$ :

x_{k} = x_{k} + p_{k} = [3.01, 5.22]

$x_k = x_k + p_k = [3.01, 5.22]$

Iteration 3

Find the optimal solution of the quadratic model

p_{k}

$p_k$ :

p_{k} = p^{U} + t a u * (p^{B} - p^{U}) = [- 0.257, 1.983]

$p_k = p ^ U + \ tau * (p ^ B-p ^ U) = [-0.257, 1.983]$

Where

t a u = 0.5058

$\ tau = 0.5058$ .

Calculate

r h o_{k}

$\ rho_k$ :

actual reduction =

f (x_{k}) - f (x_{k} + p_{k}) = $ 1470.6

$f (x_k) - f (x_k + p_k) = $ 1470.6$

predicted reduction =

m_{k} (0) - m_{k} (p_{k}) = $ 1424.1

$m_k (0) - m_k (p_k) = $ 1424.1$

r h o_{k} = f r a c 1470.62 1424.16 = 1.0326

$\ rho_k = \ frac {1470.62} {1424.16} = 1.0326$

D e l t a_{k}

$\ Delta_k$ - remains the same.

We update

x_{k}

$x_k$ :

x_{k} = x_{k} + p_{k} = [2.7551, 7.2066]

$x_k = x_k + p_k = [2.7551, 7.2066]$

The algorithm continues until

| g_{k} | <

$| g_k | <$ gtol or a specified number of iterations has not been performed.

The table of results of the algorithm for the Rosenbrock function:

k	$p_k$	$\ rho_k$	$\ Delta_k$	$x_k$
0	-	-	1.0	[5, 5]
one	[-0.9950, 0.0994]	1.072	1.0	[4.0049, 5.0994]
2	[-0.9923, 0.1238]	1.1659	2.0	[3.0126, 5.2233]
3	[-0.2575, 1.9833]	1.0326	2.0	[2.7551, 7.2066]
four	[-0.0225, 0.2597]	1.0026	2.0	[2.7325, 7.4663]
five	[-0.3605, -1.9672]	-0.4587	0.5	[2.7325, 7.4663]
6	[-0.0906, -0.4917]	0.9966	1.0	[2.6419, 6.9746]
7	[-0.1873, -0.9822]	0.8715	2.0	[2.4546, 5.9923]
eight	[-0.1925, -0.9126]	1.2722	2.0	[2.2620, 5.0796]
9	[-0.1499, -0.6411]	1.3556	2.0	[2.1121, 4.4385]
ten	[-0.2023, -0.8323]	1.0594	2.0	[1.9097, 3.6061]
eleven	[-0.0989, -0.3370]	1.2740	2.0	[1.8107, 3.2690]
12	[-0.2739, -0.9823]	-0.7963	0.25495	[1.8107, 3.2690]
13	[-0.0707, -0.2449]	1.0811	0.5099	[1.7399, 3.0240]
14	[-0.1421, -0.4897]	0.8795	1.0198	[1.5978, 2.5343]
15	[-0.1254, -0.3821]	1.3122	1.0198	[1.4724, 2.1522]
sixteen	[-0.1138, -0.3196]	1.3055	1.0198	[1.3585, 1.8326]
17	[-0.0997, -0.2580]	1.3025	1.0198	[1.2587, 1.5745]
18	[-0.0865, -0.2079]	1.2878	1.0198	[1.1722, 1.3666]
nineteen	[-0.0689, -0.1541]	1.2780	1.0198	[1.1032, 1.2124]
20	[-0.0529, -0.1120]	1.2432	1.0198	[1.0503, 1.1004]
21	[-0.0322, -0.0649]	1.1971	1.0198	[1.0180, 1.0354]
22	[-0.0149, -0.0294]	1.1097	1.0198	[1.0031, 1.0060]
23	[-0.0001, -0.0002]	1.0012	1.0198	[1.00000024, 1.00000046]
24	[-2.37065e-07, -4.56344e-07]	1.0000	1.0198	[1.0, 1.0]

Analytically we find the minimum of the Rosenbrock function; it is reached at the point

[1, 1]

$[1, 1]$ . Thus, you can verify the effectiveness of the algorithm.

Python implementation example

The algorithm is implemented using the numpy library. The example imposes a limit on the number of iterations.

''' Pure Python/Numpy implementation of the Trust-Region Dogleg algorithm. Reference: https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods ''' #!/usr/bin/python # -*- coding: utf-8 -*- import numpy as np import numpy.linalg as ln import scipy as sp from math import sqrt # Objective function def f(x): return (1-x[0])**2 + 100*(x[1]-x[0]**2)**2 # Gradient def jac(x): return np.array([-400*(x[1] - x[0]**2)*x[0] - 2 + 2*x[0], 200*x[1] - 200*x[0]**2]) # Hessian def hess(x): return np.array([[1200*x[0]**2 - 400*x[1]+2, -400*x[0]], [-400*x[0], 200]]) def dogleg_method(Hk, gk, Bk, trust_radius): # Compute the Newton point. # This is the optimum for the quadratic model function. # If it is inside the trust radius then return this point. pB = -np.dot(Hk, gk) norm_pB = sqrt(np.dot(pB, pB)) # Test if the full step is within the trust region. if norm_pB <= trust_radius: return pB # Compute the Cauchy point. # This is the predicted optimum along the direction of steepest descent. pU = - (np.dot(gk, gk) / np.dot(gk, np.dot(Bk, gk))) * gk dot_pU = np.dot(pU, pU) norm_pU = sqrt(dot_pU) # If the Cauchy point is outside the trust region, # then return the point where the path intersects the boundary. if norm_pU >= trust_radius: return trust_radius * pU / norm_pU # Find the solution to the scalar quadratic equation. # Compute the intersection of the trust region boundary # and the line segment connecting the Cauchy and Newton points. # This requires solving a quadratic equation. # ||p_u + tau*(p_b - p_u)||**2 == trust_radius**2 # Solve this for positive time t using the quadratic formula. pB_pU = pB - pU dot_pB_pU = np.dot(pB_pU, pB_pU) dot_pU_pB_pU = np.dot(pU, pB_pU) fact = dot_pU_pB_pU**2 - dot_pB_pU * (dot_pU - trust_radius**2) tau = (-dot_pU_pB_pU + sqrt(fact)) / dot_pB_pU # Decide on which part of the trajectory to take. return pU + tau * pB_pU def trust_region_dogleg(func, jac, hess, x0, initial_trust_radius=1.0, max_trust_radius=100.0, eta=0.15, gtol=1e-4, maxiter=100): xk = x0 trust_radius = initial_trust_radius k = 0 while True: gk = jac(xk) Bk = hess(xk) Hk = np.linalg.inv(Bk) pk = dogleg_method(Hk, gk, Bk, trust_radius) # Actual reduction. act_red = func(xk) - func(xk + pk) # Predicted reduction. pred_red = -(np.dot(gk, pk) + 0.5 * np.dot(pk, np.dot(Bk, pk))) # Rho. rhok = act_red / pred_red if pred_red == 0.0: rhok = 1e99 else: rhok = act_red / pred_red # Calculate the Euclidean norm of pk. norm_pk = sqrt(np.dot(pk, pk)) # Rho is close to zero or negative, therefore the trust region is shrunk. if rhok < 0.25: trust_radius = 0.25 * norm_pk else: # Rho is close to one and pk has reached the boundary of the trust region, therefore the trust region is expanded. if rhok > 0.75 and norm_pk == trust_radius: trust_radius = min(2.0*trust_radius, max_trust_radius) else: trust_radius = trust_radius # Choose the position for the next iteration. if rhok > eta: xk = xk + pk else: xk = xk # Check if the gradient is small enough to stop if ln.norm(gk) < gtol: break # Check if we have looked at enough iterations if k >= maxiter: break k = k + 1 return xk result = trust_region_dogleg(f, jac, hess, [5, 5]) print("Result of trust region dogleg method: {}".format(result)) print("Value of function at a point: {}".format(f(result)))

Thank you for your interest in my article. I hope she was useful to you and you have learned a lot.

Source: https://habr.com/ru/post/335224/

All Articles

Optimization Method Trust-Region DOGLEG. Python implementation example

Algorithm

Step # 1

Step 2

Step 3

Algorithm in expanded form

Dogleg Algorithm

Example

The table of results of the algorithm for the Rosenbrock function:

Python implementation example

More articles: