稀疏表示綜述：A Survey of Sparse Representation: Algorithms and Applications_2015（2）

VI. 基于鄰近算法的優化政策（PROXIMITY ALGORITHM BASED OPTIMIZATION STRATEGY）

proximity algorithm 的核心是使用鄰近算子（proximal operator）去疊代地求解子問題（sub-problem）。

proximity algorithm 被廣泛地應用于求解非光滑（nonsmooth），限制凸優化問題（constrained convex optimization problems） [29]。

假設一個簡單的限制優化問題

稀疏表示綜述：A Survey of Sparse Representation: Algorithms and Applications_2015（2）

其中 χ⊂Rn。

使用 proximity algorithm 解原問題為

其中 τ 和 xt 是已知的。

不失一般性，假設一個線性限制的凸優化問題

通過使用 proximity algorithm，問題 VI.3 的解為

其中

。

更具體地來說，對于帶有 l1-norm正則化的稀疏表示問題，能被寫為

A. 軟門檻值或收縮操作符（SOFT THRESHOLDING OR SHRINKAGE OPERATOR）

B. 疊代收縮門檻值算法（ITERATIVE SHRINKAGE THRESHOLDING ALGORITHM (ISTA)）

[80] M. A. T. Figueiredo and R. D. Nowak, ‘‘A bound optimization approach to wavelet-based image deconvolution,’’ in Proc. IEEE Int. Conf. Image Process., vol. 2. Sep. 2005, pp. II-782–II-785.

[81] P. L. Combettes and J. C. Pesquet, ‘‘Proximal splitting methods in signal processing,’’ in Fixed-Point Algorithms for Inverse Problems in Science and Engineering. New York, NY, USA: Springer-Verlag, 2011, pp. 185–212.

C. 加速地疊代收縮門檻值算法（FAST ITERATIVE SHRINKAGE THRESHOLDING ALGORITHM (FISTA)）

[82] A. Beck and M. Teboulle, ‘‘A fast iterative shrinkage-thresholding algorithm for linear inverse problems,’’ SIAM J. Imag. Sci., vol. 2, no. 1, pp. 183–202, 2009.

[83] A. Y. Yang, S. S. Sastry, A. Ganesh, and Y. Ma, ‘‘Fast L1-minimization algorithms and an application in robust face recognition: A review,’’in Proc. 17th IEEE Int. Conf. Image Process. (ICIP), Sep. 2010, pp. 1849–1852

D. 由可分近似稀疏重建（SPARSE RECONSTRUCTION BY SEPARABLE APPROXIMATION (SpaRSA)）

Sparse reconstruction by separable approximation (SpaRSA) [84] 使用熱啟動（worm-starting）技術優化問題 VI.8 中的參數 λ 和使用 Barzilai-Borwein (BB) 譜方法[85] 為問題 VI.9中的 Hf(α)

1) 使用熱啟動技術優化 λ（UTILIZING THE WORM-STARTING TECHNIQUE TO OPTIMIZE λ）

SpaRSA 使用一個一種自适應連續技術來更新 λ 的值，使得它更快地收斂。

其中 γ 是小的常數。

2) 使用 BB 譜方法近似 Hf(α)（UTILIZING THE BB SPECTRAL METHOD TO APPROXIMATE Hf(α)）

- ISTA 使用 (1/τ)I 來近似 Hf(α)；

- FISTA使用 ∇f(α) 的 Lipschitz 常數來近似 Hf(α)；

- SpaRSA 使用 BB 譜方法選擇 τ 的值來仿真 Hf(α)。

τ 的值要求滿足下列條件：

其中滿足最小化問題

SpaRSA 具體算法：

[84] S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo, ‘‘Sparse reconstruction by separable approximation,’’ IEEE Trans. Signal Process., vol. 57, no. 7, pp. 2479–2493, Jul. 2009.

[85] Y.-H. Dai, W. W. Hager, K. Schittkowski, and H. Zhang, ‘‘The cyclic Barzilai–Borwein method for unconstrained optimization,’’ IMA J. Numer. Anal., vol. 26, no. 3, pp. 604–627, 2006.

E. 基于 l1/2-norm 正則化的稀疏重建（ l1/2-NORM REGULARIZATION BASED SPARSE REPRESENTATION）

帶有 lp-norm (0 < p < 1) 正則化的稀疏表示通常是一個 nonconvex, nonsmooth, 和 non-Lipschitz 優化問題。但是，大多數有代表性的有關 lp-norm (0 < p < 1) 正則化的算法是帶有 l1/2-norm 正則化的稀疏表示 [87]。

[60] 提出一個半接近（half proximal algorithm）算法來求解 l1/2-norm 正則化問題。它使用 iterative shrinkage thresholding algorithm 處理 l1 -norm regularization 和使用 iterative hard thresholding algorithm 處理 l0-norm regularization。

帶有 l1/2-norm 正則化的稀疏表示問題：

其中 F(α) 關于 α 的一階優化條件（first-order optimality condition）為

有

其中 ∇(||α||1/21/2) 是 ||α||1/21/2 的梯度。

兩邊同時乘以一個正常數 τ 和添加一個參數 α ，

引入解算子（resolvent operator）：

是以有

定義 θ(α)=α+τXT(y−Xα)， the resolvent operator 可以表示為

是以， l1/2-norm正則化的半接近門檻值函數（half proximal thresholding function）定義為：

等式 VI.29 代入等式 VI.27，問題 VI.25 可以寫成

到目前為止，half proximal thresholding 算法可以用等式 VI.30 來建構。等式 VI.24中 λ 和 τ 的值使用下面的優化

其中 ε 是一個很接近于 0 的常數， k 表示稀疏的極限（i.e. k-sparsity），[∙]k詳細算法見

[60] Z. Xu, X. Chang, F. Xu, and H. Zhang, ‘‘L1/2 regularization: A thresholding representation theory and a fast solver,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 7, pp. 1013–1027, Jul. 2012.

[87] J. Zeng, Z. Xu, B. Zhang, W. Hong, and Y. Wu, ‘‘Accelerated L1/2 regularization based SAR imaging via BCR and reduced Newton skills,’’Signal Process., vol. 93, no. 7, pp. 1831–1844, 2013.

[88] J. Zeng, S. Lin, Y. Wang, and Z. Xu, ‘‘L1/2 regularization: Convergence of iterative half thresholding algorithm,’’ IEEE Trans. Signal Process., vol. 62, no. 9, pp. 2317–2329, May 2014.

F. 基于增廣拉格朗日乘子的優化政策（AUGMENTED LAGRANGE MULTIPLIER BASED OPTIMIZATION STRATEGY）

Lagrange multiplier 被廣泛地用于轉換等式限制問題（equality constrained problem）為一個帶有适當的懲罰函數（penalty function）的無限制問題（unconstrained problem）。

首先，問題 III.9 的增廣 Lagrangian 函數為：

寫成 Lagrangain 形式

問題 VI.33 可以通過交替優化 α 和 z 獲得：

其中問題 VI.34 可以通過 FISTA 算法求解。問題 VI.34 是疊代求解的，參數 z 使用等式 VI.35 來更新直到收斂。

此外，如果問題 III.9 可以通過 ALM 求解（叫做原增廣Lagrangian方法 primal augmented Lagrangian method (PALM) ），那麼問題 III.9 的對偶函數（dual function）也能被 ALM 求解（叫做對偶增廣Lagrangian 方法 dual augmented Lagrangian method (DALM)）[89]。

下面考慮問題 III.9 的對偶優化（dual optimization）：

其中 µ∈Rd 是 Lagrangian 乘子和 τ 是懲罰參數。

最後，對偶優化問題 VI.43 可以求解

其中 PB1∞(u) 是一個 B1∞ 上的投影，也叫做 group-wise soft-thresholding。

例如，讓

，解 x 的第 i 個成分滿足

對 Q(λ) 關于 λ 求導

DALM詳細見

[89] A. Y. Yang, Z. Zhou, A. G. Balasubramanian, S. S. Sastry, and Y. Ma,‘‘Fast `1-minimization algorithms for robust face recognition,’’ IEEE Trans. Image Process., vol. 22, no. 8, pp. 3234–3246, Aug. 2013.

G. 其他基于鄰近算法的優化方法（OTHER PROXIMITY ALGORITHM BASED OPTIMIZATION METHODS）

鄰近算法（proximity algorithm）的理論基礎是首先建構一個鄰近操作（proximal operator），接着使用這個鄰近操作來求解凸優化問題。

[29] N. Parikh and S. Boyd, ‘‘Proximal algorithms,’’ Found. Trends Optim., vol. 1, no. 3, pp. 123–231, 2013.

VII. 基于 Homotopy 算法的稀疏表示（HOMOTOPY ALGORITHM BASED SPARSE REPRESENTATION）

The main idea of homotopy is to solve the original optimization problems by tracing a continuous parameterized path of solutions along with varying parameters.

A. Lasso Homotopy（LASSO HOMOTOPY）

B. BPDN Homotopy（BPDN HOMOTOPY）

C. Homotopy（ITERATIVE REWEIGHTING l1-NORM MINIMIZATION VIA HOMOTOPY）

D. Homotopy（OTHER HOMOTOPY ALGORITHMS FOR SPARSE REPRESENTATION）

VIII. 稀疏表示的應用（THE APPLICATIONS OF THE SPARSE REPRESENTATION METHOD）

A. 字典學習中的稀疏表示（SPARSE REPRESENTATION IN DICTIONARY LEARNING）

一個超完備字典通常有兩種生成方式：1）transform domain method（預定義的一組變換函數） [5]；2）dictionary learning methods（通過學習獲得） [104]。兩者都是轉換圖檔樣本到其他域并且學習轉換系數的相似性 [105]。但是 transform domain methods 通常使用一組固定的變換函數（正交基例如： over-complete wavelets transform , super-wavelet transform, bandelets , curvelets transform , contourlets transform and steerable wavelet filters）來表示圖檔樣本，而 dictionary learning methods 卻使用一個帶有冗于資訊超完備字典上的稀疏表示。transform domain methods 速度快，但 dictionary learning methods 效果好。

假設矩陣 Y=[y1,y2,⋅⋅⋅,yN]， X=[x1,x2,⋅⋅⋅,xN]T 和字典 D=[d1,d2,⋅⋅⋅,dM]。從 [23], [112]，字典學習的架構一般可以表示為：

其中 Ω={D=[d1,d2,⋅⋅⋅,dM]:dTidi=1,i=1,2,⋅⋅⋅,M}；N 為樣本資料個數（已知）；yi 第 i-th 個樣本向量；D 是字典； xi 是稀疏向量；P(xi) 和 λ

1) 無監督字典學習（UNSUPERVISED DICTIONARY LEARNING）

無監督字典學習主要用于圖像壓縮，圖像表示的特征編碼。

a: KSVD（KSVD FOR UNSUPERVISED DICTIONARY LEARNING）

KSVD 的目标函數為：

其中 Y∈RdxN 是已知樣本矩陣，D∈RdxN 是學習的字典，X∈RNxN 是因子矩陣， k是稀疏限制，xi 是 X 的第 i 行，問題 VIII.2 關于 D 和 X 聯合優化。疊代求解，固定 D，問題 VIII.2 轉為

即所謂稀疏編碼，它的子問題為

可以使用MP和 OMP 求解。接着固定 X，問題 VIII.2 轉為回歸問題

其中

，這種方法叫做 method of directions（MOD）。

由于 VIII.4 中求 inverse problem 的複雜度為 O(n3)。 KSVD 重寫問題 VIII.4 為

首先計算 overall representation residual

，讓後更新 dl 和 xl。為了保持 xTl 的稀疏性，是以隻儲存 El 和 xTl 中的非零元素（即， EPl, from dlxTl）。接着 SVD 分解 xPl 到

，接着更新字典 dl。具體算法見[121]

[121] M. Aharon, M. Elad, and A. Bruckstein, ‘‘K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,’’ IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006.

b: 位置限制的線性編碼（LOCALITY CONSTRAINED LINEAR CODING FOR UNSUPERVISED DICTIONARY LEARNING）

locality constrained linear coding (LLC) 算法 [129] 是一種有效的局部坐标線性編碼方法，它把每個描述子（descriptor ）映射到一個局部限制的系統來獲得一個有效地字典。

It has been demonstrated that the property of locality is more essential than sparsity, because the locality must lead to sparsity but not vice-versa, that is, a necessary condition of sparsity is locality, but not the reverse [129].

假設

為來自樣本的局部描述子（ local descriptors）所組成的矩陣，目标字典

。是以，LLC 的目标函數為：

其中正則化參數 μ 調節權重衰減速度（ weighting decay speed），b 為 locality adaptor，為

特别地，向量 b 的第個值 bi 為

使用 k-means 産生字典 D ，接着 LLC 的解為

其中 a∖b 表示 a−1b，關于 yi 的協方差矩陣（covariance matrix）

。

此外， incremental codebook optimization algorithm 目标函數為

詳細算法：

[129] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, ‘‘Localityconstrained linear coding for image classification,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2010, pp. 3360–3367.

c: 其他（OTHER UNSUPERVISED DICTIONARY LEARNING METHODS）

[112] J. Shi, X. Ren, G. Dai, J. Wang, and Z. Zhang, ‘‘A non-convex relaxation approach to sparse dictionary learning,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2011, pp. 1809–1816.

[117] C. Bao, H. Ji, Y. Quan, and Z. Shen, ‘‘L0 norm based dictionary learning by proximal methods with global convergence,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 3858–3865.

[122] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach, ‘‘Proximal methods for sparse hierarchical dictionary learning,’’ in Proc. 27th Int. Conf. Mach. Learn., 2010, pp. 487–494.

[130] M. Zhou et al., ‘‘Nonparametric Bayesian dictionary learning for analysis of noisy and incomplete images,’’ IEEE Trans. Image Process., vol. 21, no. 1, pp. 130–144, Jan. 2012.

[131] I. Ramirez and G. Sapiro, ‘‘An MDL framework for sparse coding and dictionary learning,’’ IEEE Trans. Signal Process., vol. 60, no. 6, pp. 2913–2927, Jun. 2012.

[132] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, ‘‘Online learning for matrix factorization and sparse coding,’’ J. Mach. Learn. Res., vol. 11, pp. 19–60, Mar. 2010.

[133] M. Yang, L. Van Gool, and L. Zhang, ‘‘Sparse variation dictionary learning for face recognition with a single training sample per person,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2013, pp. 689–696.

2) 有監督字典學習（SUPERVISED DICTIONARY LEARNING）

a: 判别式 KSVD（DISCRIMINATIVE KSVD FOR DICTIONARY LEARNING）

Discriminative KSVD (DKSVD) [126] 主要用于圖像分類問題。DKSVD 整合帶有判别資訊（discriminative information）的字典學習和分類器參數（classier parameters）到目标函數，然後使用 KSVD 為所有參數獲得全局最優解。DKSVD 的目标函數為：

where Y is the given input samples, D is the learned dictionary, X is the coefcient term, H is the matrix composed

of label information corresponding to Y , C is the parameter

term for classier, and and are the weights.

其中 Y 是給定的輸入樣本，D 是學習的字典，X 是因數項，H 是對應于 Y 的标簽資訊組成的矩陣，C 是分類器參數，η 和 μ 是權重。

寫成 KSVD 形式：

字典的每一列都歸一化為 l2 範式，

也歸一化，抛棄 ||C||2F。問題 VIII.12 可以寫為

其中

明顯， VIII.13 類似于 VIII.2 的 KSVD 架構，它可以通過 KSVD 求解。DKSVD 包含兩個階段：1）訓練階段和2）分類階段。

1）訓練階段，Y 是訓練樣本組成的矩陣，目标是學習判别性的字典（discriminative dictionary） D 和分類器參（classier parameter）C。DKSVD 逐列更新 Z，對于每一列 zi，DKSVD 使用 KSVD 獲得 zi 及其對應的權重。然後 DKSVD 歸一化字典 D 和分類器參數 C，即

2）分類階段， Y 是測試樣本矩陣。基于獲得的 D′ 和 C′，對應于每個測試樣本 yi 的稀疏因子矩陣 xi 可以通過 OMP 獲得

基于對應的稀疏因子 xi，對于每個測試樣本 yi 的最後分類标簽為

DKSVD 的亮點在于它使用 KSVD 的架構同時學習一個判别性的字典和分類器參數，然後利用 OMP 算法獲得一個稀疏的解，最後使用稀疏解和學習的分類器來分類。

b: 标簽一緻 KSVD（LABEL CONSISTENT KSVD FOR DISCRIMINATIVE DICTIONARY LEARNING）

label consistent KSVD (LC-KSVD) [134], [135] 整合建構字典的過程和優化線性分類器到混合重建的（reconstructive ）和判别的（discriminative ）目标函數中，然後聯合獲得學習的字典和有效的分類器。

LC-KSVD 的目标函數為：

其中第一項為重建誤差（reconstruction error）項，第二項為判别稀疏編碼誤差（discriminative sparse-code error）項，第三項為分類誤差（ classication error）項。Y 是輸入資料矩陣，D 為學習的字典，X 為稀疏編碼項，η 和 μ 分别為權重，A 為線性變換矩陣（linear transformation matrix），H 為對應于 Y 的标簽資訊（label information）矩陣，C 為分類器參數，L 為關于 Y 和 D 的标簽的聯合标簽矩陣（joint label matrix）。例如，假設 Y=[y1...y4] ，D=[d1...d4]，其中 y1，y2 和 d1，d2來自第一類，y3，y4 和 d3，d4來自第二類，則

。

寫成 KSVD 形式

LC-KSVD 包含兩個階段：1）訓練階段和2）分類階段。

1）訓練階段，應用 KSVD 逐個更新 Z 和計算 X。然後 LC-KSVD 歸一化字典 D ,轉換矩陣 A 和分類器參數 C，即

2）分類階段， Y 是測試樣本矩陣。基于獲得的 D′ 和 C′，對應于每個測試樣本 yi 的稀疏因子矩陣 xi 可以通過 OMP 獲得

基于對應的稀疏因子 xi，對于每個測試樣本 yi 的最後分類标簽為

[134] Z. Jiang, Z. Lin, and L. S. Davis, ‘Learning a discriminative dictionary for sparse coding via label consistent K-SVD,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2011, pp. 16971704.

[135] Z. Jiang, Z. Lin, and L. S. Davis, “Label consistent K-SVD: Learning a discriminative dictionary for recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 11, pp. 26512664, Nov. 2013

c: Fisher 判别式（FISHER DISCRIMINATION DICTIONARY LEARNING FOR SPARSE REPRESENTATION）

Fisher discrimination dictionary learning (FDDL) [136] 整合監督（類标簽資訊）資訊和 Fisher 判别資訊到目标函數中來學習一個結構化的判别性字典。FDDL 的一般模型為：

第一項為判别精确項（discriminative fidelity term），第二項為稀疏正則化項（sparse regularization term），第三項為判别因數項（discriminative coeffcient term），例如 Fisher discrimination criterion in Eq. (VIII.23)。FDDL 分别逐類（class by class）更新字典和計算稀疏表示解。Yi 為輸入資料的第 i 類，Xi為， Yij 為對應于稀疏表示矩陣。

是以 FDDL 的目标函數為：

c 為類的數量。交替求解 VIII.23。

固定 D，問題 VIII.23 可以通過逐類（class by class）計算 Xi獲得，它的 sub-problem 為

固定 α ，問題 VIII.23 可以寫為

其中 Xi表示 Y 在 Di

FDDL 的主要貢獻在于整合 Fisher discrimination criterion 到字典學習過程中。使用問題 VIII.22 中的函數 f 建構判别性的字典和同時通過探索問題 VIII.22 中的函數 g 形成判别性的稀疏表示因數。

[136] M. Yang, L. Zhang, X. Feng, and D. Zhang, `Fisher discrimination dictionary learning for sparse representation,” in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 543550.

[138] M. Yang, L. Zhang, J. Yang, and D. Zhang, “Metaface learning for sparse representation based face recognition,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2010, pp. 16011604.

B. 圖像進行中的稀疏表示（SPARSE REPRESENTATION IN IMAGE PROCESSING）

C. 圖像分類和視覺跟蹤中的稀疏表示（SPARSE REPRESENTATION IN IMAGE CLASSIFICATION AND VISUAL TRACKING）

IX. 實驗評價（EXPERIMENTAL EVALUATION）

A. 參數選擇（PARAMETER SELECTION）

Parameter selection, especially selection of the regularization parameter in different minimization problems, plays an important role in sparse representation. In order to make fair comparisons with different sparse representation algorithms, performing the optimal parameter selection for different sparse representation algorithms on different datasets is advisable and indispensable.

B. 實驗結果（EXPERIMENTAL RESULTS）

C. 讨論（DISCUSSION）

選擇一個合适的正則化參數仍然有很長的路，基于自适應參數（adaptive parameter selection）的方法還很少；
精度和魯棒性仍需提高；
基于 l1 範式最小化的稀疏表示的速率還有進一步發展空間（由于疊代）；
還不存在絕對的算法能在所有資料集上效果好。

X. 結論（CONCLUSION）

effcient sparse representation, robust sparse representation, and dictionary learning based on sparse representation 是目前的主流研究方向。
low-rank representation 的數學理論不太elegant，整合sparse representation 和 low-rank 值得考慮。
稀疏表示還沒有完全應用到遷移學習（transfer learning）架構。
有效性（effectiveness），效率（fciency ）和魯棒性（robustness）要進一步加強；
稀疏表示可以進一步應用到 event detection, scene reconstruction, video tracking, object recognition, object pose estimation, medical image processing, genetic expression and natural language processing等領域；
除了 l2-norm 和 l1-norm，基于 l2;1-norm 的稀疏表示（sparse representation）和字典學習（dictionary learning）也需要進一步研究。