Understanding the ADMM algorithm

ADMM
1. What is optimization problem?
2. Dual Gradient Ascent
3. Augmented Lagrangian Method
4.ADMM

ADMM

This chapter explains the procedure of Alternative Direction Methods for Multipliers (ADMM), starting from the fundamental knowledge about the optimization problem.

1. What is optimization problem?

The following function represents a most common and simple optimization problem:

m i n x f ( x ) − − − − ( 1 ) min_x f(x) ----(1) minxf(x)−−−−(1)

The x represents the variable needs to be optimized, i.e. by changing the value of x to make our objective function f ( x ) f(x) f(x) achieve the minimum value.

In above equation, if we have only this function without any constrains to variable x, this problem is a simplest optimization problem, but in many circumstances, there will be many constrains to optimization variable x such as x needs to meet equality or inequality constraints, or to satisfy a set of numbers, e.g. s . t . A x = b , A x > c s.t. Ax= b, Ax > c s.t.Ax=b,Ax>c. So an optimization problem with equality constraints might look like this:

m i n x f ( x ) − − − − ( 2 ) min_x f(x) ----(2) minxf(x)−−−−(2) s . t . A x = b s.t.Ax=b s.t.Ax=b

2. Dual Gradient Ascent

The alternating direction method of multipliers (ADMM) is an algorithm that solves convex optimization problems by breaking them into smaller pieces, each of which are then easier to handle.

Consider the question (2), x ∈ R n , A ∈ R m ∗ n , f : R n ∈ R x∈\mathbb R^n, A∈\mathbb R^{m*n}, f: \mathbb R^n∈R x∈Rn,A∈Rm∗n,f:Rn∈R is a convex function, the Lagrangian function is :

L ( x , y ) = f ( x ) + y T ( A x − b ) − − − − ( 3 ) L(x,y)=f(x)+y^T(Ax-b)----(3) L(x,y)=f(x)+yT(Ax−b)−−−−(3)

The dual function of (3) is:

g ( y ) = i n f x L ( x , y ) = − f ∗ ( − A T y ) − b T y − − − − ( 4 ) g(y)=inf_xL(x,y)=-f^*(-A^Ty)-b^Ty----(4) g(y)=infxL(x,y)=−f∗(−ATy)−bTy−−−−(4)

Where the y y y is Lagrangian multiplier and dual variable, f ∗ f* f∗ is the conjugate function of f f f.

Assume a function satisfies strong duality property, then the optimal solution for original problem is same as the optimal solution for dual problem. If the optimal solution for original problem is x ∗ x* x∗, the optimal solution for dual problem is y ∗ y* y∗, then:

x ∗ = a r g m i n x L ( x , y ∗ ) − − − − ( 5 ) x^*=argmin_xL(x,y^*)----(5) x∗=argminxL(x,y∗)−−−−(5)

We use gradient ascend to solve dual problem, the update of dual gradient ascent is:

x k + 1 = a r g m i n x L ( x , y k ) x^{k+1}=argmin_xL(x,y^k) xk+1=argminxL(x,yk) y k + 1 = y k + a k ( A x k + 1 − b ) − − − − ( 6 ) y^{k+1}=y^k+a_k(Ax^k+1-b)----(6) yk+1=yk+ak(Axk+1−b)−−−−(6)

Where a k > 0 a_k>0 ak>0 is the step size of the gradient ascent.

Assume the objective function is decomposable, then:

f ( x ) = ∑ i = 1 n f i ( x i ) − − − − ( 7 ) f(x)=\sum_{i=1}^nf_i(x_i)----(7) f(x)=i=1∑nfi(xi)−−−−(7)

Where , x = ( x 1 , x 2 , . . . , x n ) , x i ∈ R n x=(x_1,x_2,...,x_n),x_i∈\mathbb R^n x=(x1,x2,...,xn),xi∈Rn, split matrix A:

A x = ∑ i = 1 n A i x i − − − − ( 8 ) A_x=\sum_{i=1}^{n}A_ix_i----(8) Ax=i=1∑nAixi−−−−(8)

then,

A x = ∑ i = 1 n A i x i − − − − ( 9 ) A_x=\sum{i=1}^nA_ix_i----(9) Ax=∑i=1nAixi−−−−(9)

The reconstructed Lagrangian function is :

L ( x , y ) = ∑ i = 1 n L i ( x i , y ) = ∑ i = 1 n ( f i ( x i ) + y T A i x i − ( 1 / N ) y T b ) − − − − ( 10 ) L(x,y)=\sum_{i=1}^nL_i(x_i,y)=\sum_{i=1}^n(f_i(x_i)+y^TA_ix_i-(1/N)y^Tb)----(10) L(x,y)=i=1∑nLi(xi,y)=i=1∑n(fi(xi)+yTAixi−(1/N)yTb)−−−−(10)

The update for dual gradient ascent:

x i k + 1 = a r g m i n x i L i ( x i , y k ) x_i^{k+1}=argmin_{x_i}L_i(x_i,y^k) xik+1=argminxiLi(xi,yk) y k + 1 = y k + a k ( A x k + 1 − b ) − − − − ( 11 ) y^{k+1}=y^k+a^k(Ax^{k+1}-b)----(11) yk+1=yk+ak(Axk+1−b)−−−−(11)

3. Augmented Lagrangian Method

In order to increase the robustness of dual ascent method and relax constrains of strong convexity, we introduce the augmented Lagrangian method as follow:

L ρ ( x , y ) = f ( x ) + y T ( A x − b ) + ( ρ / 2 ) ∣ ∣ A x − b ∣ ∣ 2 2 − − − − ( 12 ) L_\rho(x,y)=f(x)+y^T(Ax-b)+({\rho/2})||Ax-b||_2^2----(12) Lρ(x,y)=f(x)+yT(Ax−b)+(ρ/2)∣∣Ax−b∣∣22−−−−(12)

Where is penalty parameter. Augmented Lagrangian method is actually adding a penalty term to reconstruct the original problem into:

m i n x f ( x ) + ( ρ / 2 ) ∣ ∣ A x − b ∣ ∣ 2 2 min_xf(x)+(\rho/2)||Ax-b||_2^2 minxf(x)+(ρ/2)∣∣Ax−b∣∣22 s . t . A x = b − − − − ( 13 ) s.t.Ax=b----(13) s.t.Ax=b−−−−(13)

Dual ascent update:

x k + 1 = a r g m i n x L ( x , y k ) x^{k+1}=argmin_xL(x,y^k) xk+1=argminxL(x,yk)

y k + 1 = y k + ρ ( A x k − b ) − − − − ( 14 ) y^{k+1}=y^k+\rho(Ax^k-b)----(14) yk+1=yk+ρ(Axk−b)−−−−(14)

4.ADMM

The method in section 3 uses augmented Lagrangian method to update and the step size is now ρ which replaces , this is called multiplier method, it can make function be more convergence than normal dual ascent method, but the addition of second penalty term disabled the decomposable property of original problem, to address this problem, alternative direction method for multipliers (ADMM) is introduced.

For optimization problem:

m i n x f ( x ) + g ( z ) − − − − ( 15 ) min_xf(x)+g(z)----(15) minxf(x)+g(z)−−−−(15) s . t . A x + B z = c s.t.Ax+Bz=c s.t.Ax+Bz=c

x ∈ R n , z ∈ R m , A ∈ R p ∗ n , B ∈ R p ∗ m , c ∈ R p x∈\mathbb R^n, z∈\mathbb R^m, A∈\mathbb R^{p*n}, B∈\mathbb R^{p*m}, c∈\mathbb R^p x∈Rn,z∈Rm,A∈Rp∗n,B∈Rp∗m,c∈Rp, the augmented Lagrangian function is:

L p ( x , z , y ) = f ( x ) + g ( z ) + y T ( A x + B z − c ) + ( ρ / 2 ) ∣ ∣ A x + B z − c ∣ ∣ 2 2 − − − − ( 16 ) L_p(x,z,y)=f(x)+g(z)+y^T(Ax+Bz-c)+(\rho/2)||Ax+Bz-c||_2^2----(16) Lp(x,z,y)=f(x)+g(z)+yT(Ax+Bz−c)+(ρ/2)∣∣Ax+Bz−c∣∣22−−−−(16)

The update of variables by using ADMM is:

x k + 1 = a r g m i n x L ρ ( x k + 1 , z , y k ) x^{k+1}=argmin_xL_\rho(x^{k+1},z,y^k) xk+1=argminxLρ(xk+1,z,yk) z k + 1 = a r g m i n x L ρ ( x k + 1 , z , y k ) z^{k+1}=argmin_xL_\rho(x^{k+1},z,y^k) zk+1=argminxLρ(xk+1,z,yk) y k + 1 = y k + ρ ( A x k + 1 − B z k + 1 − c ) y^{k+1}=y^k+\rho(Ax^{k+1}-Bz^{k+1}-c) yk+1=yk+ρ(Axk+1−Bzk+1−c)

With ρ > 0 \rho>0 ρ>0, the basic original iteration structure of ADMM method is presented.

Understanding the ADMM algorithmADMM1. What is optimization problem?2. Dual Gradient Ascent3. Augmented Lagrangian Method4.ADMM

Understanding the ADMM algorithm

ADMM

1. What is optimization problem?

2. Dual Gradient Ascent

3. Augmented Lagrangian Method

4.ADMM

继续阅读

Codeforces 1417 D. Make Them Equal(思维+构造)

查找算法之二分查找查找算法之二分查找

查找算法学习之二分查找（Python版本）——BinarySearch

CQ V1.0分词bates(基于双数组tire树)—应该是目前最快的中文分词算法

Command Network(POJ 3164)---定根最小树形图模板题题目描述输入格式输出格式输入样例输出样例分析源程序

开源低带宽语音编解码器

241 Different Ways to Add Parentheses（C代码版）

【趋高机器视觉】机器视觉技术原理解析及解决方案

CSMA/CD1． CSMA/CD的概述2． CSMA 的工作原理3． CSMA/CD控制规程及特点4． CSMA/CD协议5． CSMA/CD的优点6．结束语

极大似然法(ML)与最大期望法(EM)

C++ 第十五周报告1--《冒泡法排序》

笔试面试题目：滑动窗口(二)

数据结构与算法（27）——排序（二）

Dijkstra--简易版（最短路径）

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

hdu7108哈希