天天看点

Understanding the ADMM algorithmADMM1. What is optimization problem?2. Dual Gradient Ascent3. Augmented Lagrangian Method4.ADMM

版权声明:本文为博主原创文章,未经博主允许不得转载。https://blog.csdn.net/weixin_42444045/article/details/83512560

Understanding the ADMM algorithm

  • ADMM
  • 1. What is optimization problem?
  • 2. Dual Gradient Ascent
  • 3. Augmented Lagrangian Method
  • 4.ADMM

ADMM

This chapter explains the procedure of Alternative Direction Methods for Multipliers (ADMM), starting from the fundamental knowledge about the optimization problem.

1. What is optimization problem?

The following function represents a most common and simple optimization problem:

m i n x f ( x ) − − − − ( 1 ) min_x f(x) ----(1) minx​f(x)−−−−(1)

The x represents the variable needs to be optimized, i.e. by changing the value of x to make our objective function f ( x ) f(x) f(x) achieve the minimum value.

In above equation, if we have only this function without any constrains to variable x, this problem is a simplest optimization problem, but in many circumstances, there will be many constrains to optimization variable x such as x needs to meet equality or inequality constraints, or to satisfy a set of numbers, e.g. s . t . A x = b , A x > c s.t. Ax= b, Ax > c s.t.Ax=b,Ax>c. So an optimization problem with equality constraints might look like this:

m i n x f ( x ) − − − − ( 2 ) min_x f(x) ----(2) minx​f(x)−−−−(2) s . t . A x = b s.t.Ax=b s.t.Ax=b

2. Dual Gradient Ascent

The alternating direction method of multipliers (ADMM) is an algorithm that solves convex optimization problems by breaking them into smaller pieces, each of which are then easier to handle.

Consider the question (2), x ∈ R n , A ∈ R m ∗ n , f : R n ∈ R x∈\mathbb R^n, A∈\mathbb R^{m*n}, f: \mathbb R^n∈R x∈Rn,A∈Rm∗n,f:Rn∈R is a convex function, the Lagrangian function is :

L ( x , y ) = f ( x ) + y T ( A x − b ) − − − − ( 3 ) L(x,y)=f(x)+y^T(Ax-b)----(3) L(x,y)=f(x)+yT(Ax−b)−−−−(3)

The dual function of (3) is:

g ( y ) = i n f x L ( x , y ) = − f ∗ ( − A T y ) − b T y − − − − ( 4 ) g(y)=inf_xL(x,y)=-f^*(-A^Ty)-b^Ty----(4) g(y)=infx​L(x,y)=−f∗(−ATy)−bTy−−−−(4)

Where the y y y is Lagrangian multiplier and dual variable, f ∗ f* f∗ is the conjugate function of f f f.

Assume a function satisfies strong duality property, then the optimal solution for original problem is same as the optimal solution for dual problem. If the optimal solution for original problem is x ∗ x* x∗, the optimal solution for dual problem is y ∗ y* y∗, then:

x ∗ = a r g m i n x L ( x , y ∗ ) − − − − ( 5 ) x^*=argmin_xL(x,y^*)----(5) x∗=argminx​L(x,y∗)−−−−(5)

We use gradient ascend to solve dual problem, the update of dual gradient ascent is:

x k + 1 = a r g m i n x L ( x , y k ) x^{k+1}=argmin_xL(x,y^k) xk+1=argminx​L(x,yk) y k + 1 = y k + a k ( A x k + 1 − b ) − − − − ( 6 ) y^{k+1}=y^k+a_k(Ax^k+1-b)----(6) yk+1=yk+ak​(Axk+1−b)−−−−(6)

Where a k > 0 a_k>0 ak​>0 is the step size of the gradient ascent.

Assume the objective function is decomposable, then:

f ( x ) = ∑ i = 1 n f i ( x i ) − − − − ( 7 ) f(x)=\sum_{i=1}^nf_i(x_i)----(7) f(x)=i=1∑n​fi​(xi​)−−−−(7)

Where , x = ( x 1 , x 2 , . . . , x n ) , x i ∈ R n x=(x_1,x_2,...,x_n),x_i∈\mathbb R^n x=(x1​,x2​,...,xn​),xi​∈Rn, split matrix A:

A x = ∑ i = 1 n A i x i − − − − ( 8 ) A_x=\sum_{i=1}^{n}A_ix_i----(8) Ax​=i=1∑n​Ai​xi​−−−−(8)

then,

A x = ∑ i = 1 n A i x i − − − − ( 9 ) A_x=\sum{i=1}^nA_ix_i----(9) Ax​=∑i=1nAi​xi​−−−−(9)

The reconstructed Lagrangian function is :

L ( x , y ) = ∑ i = 1 n L i ( x i , y ) = ∑ i = 1 n ( f i ( x i ) + y T A i x i − ( 1 / N ) y T b ) − − − − ( 10 ) L(x,y)=\sum_{i=1}^nL_i(x_i,y)=\sum_{i=1}^n(f_i(x_i)+y^TA_ix_i-(1/N)y^Tb)----(10) L(x,y)=i=1∑n​Li​(xi​,y)=i=1∑n​(fi​(xi​)+yTAi​xi​−(1/N)yTb)−−−−(10)

The update for dual gradient ascent:

x i k + 1 = a r g m i n x i L i ( x i , y k ) x_i^{k+1}=argmin_{x_i}L_i(x_i,y^k) xik+1​=argminxi​​Li​(xi​,yk) y k + 1 = y k + a k ( A x k + 1 − b ) − − − − ( 11 ) y^{k+1}=y^k+a^k(Ax^{k+1}-b)----(11) yk+1=yk+ak(Axk+1−b)−−−−(11)

3. Augmented Lagrangian Method

In order to increase the robustness of dual ascent method and relax constrains of strong convexity, we introduce the augmented Lagrangian method as follow:

L ρ ( x , y ) = f ( x ) + y T ( A x − b ) + ( ρ / 2 ) ∣ ∣ A x − b ∣ ∣ 2 2 − − − − ( 12 ) L_\rho(x,y)=f(x)+y^T(Ax-b)+({\rho/2})||Ax-b||_2^2----(12) Lρ​(x,y)=f(x)+yT(Ax−b)+(ρ/2)∣∣Ax−b∣∣22​−−−−(12)

Where is penalty parameter. Augmented Lagrangian method is actually adding a penalty term to reconstruct the original problem into:

m i n x f ( x ) + ( ρ / 2 ) ∣ ∣ A x − b ∣ ∣ 2 2 min_xf(x)+(\rho/2)||Ax-b||_2^2 minx​f(x)+(ρ/2)∣∣Ax−b∣∣22​ s . t . A x = b − − − − ( 13 ) s.t.Ax=b----(13) s.t.Ax=b−−−−(13)

Dual ascent update:

x k + 1 = a r g m i n x L ( x , y k ) x^{k+1}=argmin_xL(x,y^k) xk+1=argminx​L(x,yk)

y k + 1 = y k + ρ ( A x k − b ) − − − − ( 14 ) y^{k+1}=y^k+\rho(Ax^k-b)----(14) yk+1=yk+ρ(Axk−b)−−−−(14)

4.ADMM

The method in section 3 uses augmented Lagrangian method to update and the step size is now ρ which replaces , this is called multiplier method, it can make function be more convergence than normal dual ascent method, but the addition of second penalty term disabled the decomposable property of original problem, to address this problem, alternative direction method for multipliers (ADMM) is introduced.

For optimization problem:

m i n x f ( x ) + g ( z ) − − − − ( 15 ) min_xf(x)+g(z)----(15) minx​f(x)+g(z)−−−−(15) s . t . A x + B z = c s.t.Ax+Bz=c s.t.Ax+Bz=c

x ∈ R n , z ∈ R m , A ∈ R p ∗ n , B ∈ R p ∗ m , c ∈ R p x∈\mathbb R^n, z∈\mathbb R^m, A∈\mathbb R^{p*n}, B∈\mathbb R^{p*m}, c∈\mathbb R^p x∈Rn,z∈Rm,A∈Rp∗n,B∈Rp∗m,c∈Rp, the augmented Lagrangian function is:

L p ( x , z , y ) = f ( x ) + g ( z ) + y T ( A x + B z − c ) + ( ρ / 2 ) ∣ ∣ A x + B z − c ∣ ∣ 2 2 − − − − ( 16 ) L_p(x,z,y)=f(x)+g(z)+y^T(Ax+Bz-c)+(\rho/2)||Ax+Bz-c||_2^2----(16) Lp​(x,z,y)=f(x)+g(z)+yT(Ax+Bz−c)+(ρ/2)∣∣Ax+Bz−c∣∣22​−−−−(16)

The update of variables by using ADMM is:

x k + 1 = a r g m i n x L ρ ( x k + 1 , z , y k ) x^{k+1}=argmin_xL_\rho(x^{k+1},z,y^k) xk+1=argminx​Lρ​(xk+1,z,yk) z k + 1 = a r g m i n x L ρ ( x k + 1 , z , y k ) z^{k+1}=argmin_xL_\rho(x^{k+1},z,y^k) zk+1=argminx​Lρ​(xk+1,z,yk) y k + 1 = y k + ρ ( A x k + 1 − B z k + 1 − c ) y^{k+1}=y^k+\rho(Ax^{k+1}-Bz^{k+1}-c) yk+1=yk+ρ(Axk+1−Bzk+1−c)

With ρ > 0 \rho>0 ρ>0, the basic original iteration structure of ADMM method is presented.

继续阅读