A Convergent 3-Block Semi-Proximal ADMM for Convex Minimization Problems with One Strongly Convex Block Min Li∗, Defeng Sun† and Kim-Chuan Toh‡ October 29, 2014; Revised: November 28, 2014 Accepted by Asia-Pacific Journal of Operational Research Abstract In this paper, we present a semi-proximal alternating direction method of multipliers (ADMM) for solving 3-block separable convex minimization problems with the second block in the objective being a strongly convex function and one coupled linear equation constraint. By choosing the semi-proximal terms properly, we establish the √ global convergence of the proposed semi-proximal ADMM for the step-length τ ∈ (0, (1 + 5)/2) and the penalty parameter σ ∈ (0, +∞). In particular, if σ > 0 is smaller than a certain threshold and the first and third linear operators in the linear equation constraint are injective, then all the three added semiproximal terms can be dropped and consequently, the convergent 3-block semi-proximal ADMM √ reduces to the directly extended 3-block ADMM with τ ∈ (0, (1 + 5)/2). Keywords. Convex minimization problems, alternating direction method of multipliers, semiproximal, strongly convex. AMS subject classifications. 90C25, 90C33, 65K05 1 Introduction We consider the following separable convex minimization problem whose objective function is the sum of three functions without coupled variables: min x1 ,x2 ,x3 θ1 (x1 ) + θ2 (x2 ) + θ3 (x3 ) A∗1 x1 + A∗2 x2 + A∗3 x3 = c, xi ∈ Xi , i = 1, 2, 3 , ∗ (1) School of Economics and Management, Southeast University, Nanjing, 210096, China (limin@seu.edu.cn). This author was supported by the National Natural Science Foundation of China (Grant No. 11001053), Program for New Century Excellent Talents in University (Grant No. NCET-12-0111) and Qing Lan Project. † Department of Mathematics and Risk Management Institute, National University of Singapore, 10 Lower Kent Ridge Road, Singapore (matsundf@nus.edu.sg). This research was supported in part by the Ministry of Education, Singapore, Academic Research Fund (Grant No. R-146-000-194-112). ‡ Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore (mattohkc@nus.edu.sg). This research was supported in part by the Ministry of Education, Singapore, Academic Research Fund (Grant No. R-146-000-194-112). 1 where Xi (i = 1, 2, 3) and Z are real finite dimensional Euclidean spaces each equipped with an inner product ·, · and its induced norm · , θi : Xi → (−∞, +∞] (i = 1, 2, 3) are closed proper convex functions, A∗i : Xi → Z is the adjoint of the linear operator Ai : Z → Xi , i = 1, 2, 3, and c ∈ Z. Since θi , i = 1, 2, 3, are closed proper convex functions, there exist self-adjoint and positive semi-definite operators Σi , i = 1, 2, 3, such that x ˆ i − xi , w ˆ i − wi ≥ x ˆi − xi , Σi (ˆ xi − xi ) ∀x ˆi , xi ∈ dom(θi ), w ˆi ∈ ∂θi (ˆ xi ), wi ∈ ∂θi (xi ), (2) where ∂θi is the sub-differential mapping of θi , i = 1, 2, 3. The solution set of problem (1) is assumed to be nonempty throughout our discussions in this paper. Let σ > 0 be a given penalty parameter and z ∈ Z be the Lagrange multiplier associated with the linear equality constraint in problem (1). For any (x1 , x2 , x3 ) ∈ X1 × X2 × X3 , write x ≡ (x1 , x2 , x3 ), θ(x) ≡ θ1 (x1 ) + θ2 (x2 ) + θ3 (x3 ) and A∗ x ≡ A∗1 x1 + A∗2 x2 + A∗3 x3 . Then the augmented Lagrangian function for problem (1) is defined by Lσ (x1 , x2 , x3 ; z) := θ(x) + z, A∗ x − c + σ ∗ A x−c 2 2 (3) for any (x1 , x2 , x3 , z) ∈ X1 × X2 × X3 × Z. The direct extension of the classical alternating direction method of multipliers (ADMM) for solving problem (1) consists of the following iterations for k = 0, 1, . . . := argmin Lσ (x1 , xk2 , xk3 ; z k ) , xk+1 1 x1 ∈X1 k k xk+1 := argmin Lσ (xk+1 2 1 , x2 , x3 ; z ) , x2 ∈X2 (4) k+1 k+1 k+1 k , x ; z ) , := argmin , x x L (x 3 σ 3 2 1 x3 ∈X3 k+1 k ∗ k+1 z := z + τ σ(A x − c), where τ > 0 is the step-length. Different from the 2-block ADMM whose convergence has been established for a long time [10, 8, 9, 6, 7, 4], the 3-block ADMM may not converge in general, which was demonstrated by Chen, He, Ye and Yuan [2] using counterexamples. Nevertheless, if all the functions θi , i = 1, 2, 3, are strongly convex, Han and Yuan [11] proved the global convergence of the 3-block ADMM scheme (4) with τ = 1 (Han and Yuan actually considered the general m-block case for any m ≥ 3. Here and below we focus on the 3-block case only) under the condition that Σi = µi I 0, i = 1, 2, 3, 0 < σ < min i=1,2,3 µi , 3λmax (Ai A∗i ) where λmax (S) is the largest eigenvalue of a given self-adjoint linear operator S. Hong and Luo [13] proposed to adopt a small step-length τ when updating the Lagrange multiplier z k+1 in (4). Chen, Shen and You [3] proposed the following sufficient condition A∗1 is injective, Σi = µi I 0, i = 2, 3 and 0 < σ < µ2 µ3 , σ≤ λmax (A2 A∗2 ) λmax (A3 A∗3 ) for the global convergence of the directly extended 3-block ADMM with τ = 1 for solving problem (1). Closely related to the work of Chen, Shen and You [3], in [15], Lin, Ma and Zhang provided an analysis on the iteration complexity for the same method under the condition Σi = µi I 0, i = 2, 3 and 0 < σ ≤ min 2 µ2 µ3 , . 2λmax (A2 A∗2 ) 2λmax (A3 A∗3 ) In [16], under additional assumptions including some smoothness conditions, the same group of authors further proved the global linear convergence of the mentioned method. The purpose of this work is to extend the 2-block semi-proximal ADMM studied in [5] to deal with problem (1) by only assuming θ2 to be strongly convex, i.e., Σ2 0. Note that the semiproximal ADMM with τ > 1 often works better in practice than its counterpart with τ ≤ 1. So it is desirable to establish the convergence of the proposed semi-proximal ADMM that allows τ to √ stay in the larger region (0, (1 + 5)/2). One of our motivating examples is the following convex quadratic conic programming min s.t. 1 X, QX + C, X 2 AX ≥ b, X ∈ K , (5) where K is a nonempty closed convex cone in a finite dimensional real Euclidean space X endowed with an inner product ·, · and its induced norm · , Q : X → X is a self-adjoint and positive semi-definite linear operator, A : X → m is a linear map, C ∈ X and b ∈ m are given data. The dual of problem (5) takes the form of 1 X , QX + b, y 2 A∗ y − QX + S = C, max − s.t. y ≥ 0, S ∈ K∗ , (6) where K∗ := {v ∈ X : v, w ≥ 0 ∀ w ∈ K} is the dual cone of K. Since Q is self-adjoint and positive semi-definite, Q can be decomposed as Q = L∗ L for some linear map L. By introducing a new variable Ξ = −LX , we can re-write problem (6) equivalently as 1 Ξ 2 s.t. A∗ y + L∗ Ξ + S = C, min δ m + (y) − b, y + 2 + δK∗ (S) (7) ∗ (·) and δK∗ (·) are the indicator functions of m where δ m + and K , respectively. As one can see, + problem (7) has only one strongly convex block, i.e., the block with respect to Ξ. Consequently, the results in the aforementioned papers for the convergence analysis of the directly extended 3-block ADMM applied to solving problem (7) are no longer valid. We shall show in the next section that our n , the proposed 3-block semi-proximal ADMM can exactly solve this kind of problems. When K = S+ cone of symmetric and positive semi-definite matrices in the space S n of n × n symmetric matrices, problem (7) is a convex quadratic semidefinite programming problem that has been extensively studied both theoretically and numerically in the literature [14, 17, 18, 19, 21, 22, 23, 24, 25, 26], to name only a few. The remaining parts of this paper are organized as follows. In the next section, we first present our 3-block semi-proximal ADMM and then provide the main convergence results. We give some concluding remarks in the final section. Notation. • The effective domain of a function f : X → (−∞, +∞] is defined as dom(f ) := {x ∈ X | f (x) < +∞}. The set of all relative interior points of a given nonempty convex set C is denoted by ri(C). 3 • For convenience, for any given x, we use x 2G to denote x, Gx if G is a self-adjoint linear operator in a given finite dimensional Euclidean space X . If Σ : X → X is a self-adjoint 1 and positive semi-definite linear operator, we use Σ 2 to denote the unique self-adjoint and positive semi-definite square root of Σ. • Denote x1 x := x2 , x3 u := x2 x3 , A1 A := A2 , A3 B := A2 A3 . • Let α ∈ (0, 1] be given. Denote M := H := 2 5(1−α) Σ2 2 0 (1 − α)Σ2 + T2 0 0 Σ3 + T3 + T2 + σBB ∗ 0 5 2 Σ3 + T3 − 5σ 2 ∗ ∗ ∗ −1 2α (A2 A3 ) Σ2 (A2 A3 ) (8) +min(τ, 1+τ −τ 2 )σBB ∗ . (9) A 3-Block Semi-Proximal ADMM Based on our previous introduction and motivation, we propose our 3-block semi-proximal ADMM for solving problem (1) in the following: Algorithm sPADMM: A 3-block semi-proximal ADMM for solving problem (1). Let σ ∈ (0, +∞) and τ ∈ (0, +∞) be given parameters. Let Ti , i = 1, 2, 3, be given selfadjoint and positive semi-definite linear operators defined on Xi , i = 1, 2, 3, respectively. Choose (x01 , x02 , x03 , z 0 ) ∈ dom(θ1 ) × dom(θ2 ) × dom(θ3 ) × Z and set k = 0. Step 1. Compute 1 := argmin Lσ (x1 , xk2 , xk3 ; z k ) + x1 − xk1 2T1 , xk+1 1 2 x1 ∈X1 xk+1 := argmin L (xk+1 , x , xk ; z k ) + 1 x − xk 2 , 2 3 σ 1 2 2 T2 2 2 x2 ∈X2 1 k+1 k+1 k+1 k 2 k x3 := argmin Lσ (x1 , x2 , x3 ; z ) + 2 x3 − x3 T3 , x ∈X 3 3 k+1 z := z k + τ σ(A∗ xk+1 − c). (10) Step 2. If a termination criterion is not met, set k := k + 1 and then goto Step 1. In order to analyze the convergence properties of Algorithm sPADMM, we make the following assumptions. Assumption 2.1 The convex function θ2 satisfies (2) with Σ2 0. Assumption 2.2 The self-adjoint and positive semi-definite operators Ti , i = 1, 2, 3, are chosen such that the sequence {(xk1 , xk2 , xk3 , z k )} generated by Algorithm sPADMM is well defined. 4 Assumption 2.3 There exists x = (x1 , x2 , x3 ) ∈ ri(dom(θ1 ) × dom(θ2 ) × dom(θ3 )) P , where P := x := (x1 , x2 , x3 ) ∈ X1 × X2 × X3 A∗ x = c . Under Assumption 2.3, it follows from [20, Corollary 28.2.2] and [20, Corollary 28.3.1] that x ¯ = (¯ x1 , x ¯2 , x ¯3 ) ∈ X1 × X2 × X3 is an optimal solution to problem (1) if and only if there exists a Lagrange multiplier z¯ ∈ Z such that and A∗ x ¯ − c = 0. −Ai z¯ ∈ ∂θi (¯ xi ), i = 1, 2, 3 (11) Moreover, any z¯ ∈ Z satisfying (11) is an optimal solution to the dual of problem (1). Let x ¯ = (¯ x1 , x ¯2 , x ¯3 ) ∈ X1 × X2 × X3 and z¯ ∈ Z satisfy (11). For the sake of convenience, define for (x1 , u, z) := (x1 , (x2 , x3 ), z) ∈ X1 × (X2 × X3 ) × Z, α ∈ (0, 1] and k = 0, 1, . . ., the following quantities φk (x1 , u, z) := (στ )−1 z k − z 2 + xk1 − x1 2Σ1 +T1 + uk − u 2M and xkie := xki − x ¯i , i = 1, 2, 3, uke := uk − u ¯, ∆xki := xk+1 − xki , i = 1, 2, 3, i x1 , u ¯, z¯) = (στ )−1 zek φk := φk (¯ ∆uk := uk+1 − uk , 2 + xk1e 2 Σ1 +T1 + uke ∆z k := z k+1 − z k , 2 , M 2 , 2 ∗ T3 + σα (A2 A∗3 )∗ Σ−1 2 (A2 A3 ) ∆xk1 21 Σ +T + ∆xk2 21−α Σ +T + ∆xk3 21 2 ∗ 1 2 2 Σ +T3 − σ (A2 A∗3 )∗ Σ−1 2 1 2 2 (A2 A3 ) 2 3 2α k+1 +σ A∗1 x1 + B ∗ uk − c 2 , ξk+1 := ∆xk2 2 T2 zek := z k − z¯, + ∆xk3 sk+1 := tk+1 := ∆xk1 21 Σ +T + ∆uk 1 2 1 k ∗ k r := A x − c. (12) 2 , H To prove the convergence of Algorithm sPADMM for solving problem (1), we first present some useful lemmas. Lemma 2.1 Assume that Assumptions 2.1, 2.2 and 2.3 hold. Let {(xk1 , xk2 , xk3 , z k )} be generated by Algorithm sPADMM. Then, for any τ ∈ (0, +∞) and integer k ≥ 0, we have φk − φk+1 ≥ (1 − τ )σ rk+1 2 + sk+1 , (13) where φk , sk+1 and rk+1 are defined as in (12). Proof. The sequence {(xk1 , xk2 , xk3 , z k )} is well defined under Assumption 2.2. Notice that the iteration scheme (10) of Algorithm sPADMM can be re-written as for k = 0, 1, . . . that −A1 [z k + σ(A∗1 xk+1 + 3j=2 A∗j xkj − c)] − T1 (xk+1 − xk1 ) ∈ ∂θ1 (xk+1 1 1 1 ), −A [z k + σ( 2 A∗ xk+1 + A∗ xk − c)] − T (xk+1 − xk ) ∈ ∂θ (xk+1 ), 2 2 2 2 2 3 3 2 j=1 j j (14) k+1 k+1 k ∗ k+1 k −A3 [z + σ(A x − c)] − T3 (x3 − x3 ) ∈ ∂θ3 (x3 ), k+1 z := z k + τ σ(A∗ xk+1 − c). 5 Combining (2) with (11) and (14), and using the definitions of xk+1 and ∆xki , for i = 1, 2, 3, we ie have i xk+1 ie , 3 A∗j xk+1 j k Ai z¯ − Ai z − σAi ( A∗j xkj − c) − Ti ∆xki ≥ xk+1 ie + j=1 2 Σi . (15) j=i+1 For any vectors a, b, d in the same Euclidean vector space and any self-adjoint linear operator G, we have the identity 1 a − b, G(d − a) = ( d − b 2 2 G − a−b 2 G − a−d 2 G ). Taking a = xk+1 , b=x ¯i , d = xki and G = Ti in the above identity, and using the definitions of i k+1 k xie and ∆xi , we get 1 k k xk+1 ie , −Ti ∆xi = ( xie 2 2 Ti − xk+1 ie 2 Ti − ∆xki 2 Ti ), i = 1, 2, 3. (16) Let + B ∗ uk+1 − c). z˜k+1 = z k + σ(A∗ xk+1 − c) = z k + σ(A∗1 xk+1 1 (17) Substituting (16) and (17) into (15) and using the definition of ∆xkj , for i = 1, 2, we have 3 xk+1 ie , k+1 Ai z¯ − Ai z˜ + σAi j=i+1 1 A∗j ∆xkj + ( xkie 2 − xk+1 ie 2 Ti and 1 xk+1 ¯ − A3 z˜k+1 + ( xk3e 3e , A3 z 2 Adding (18) for i = 1, 2 to (19), we get 2 T3 − xk+1 3e 3 2 T3 ) ≥ 2 Ti ) ≥ 1 ∆xk3 2 1 ∆xki 2 2 T3 2 Ti + xk+1 ie + xk+1 3e 2 Σi 2 Σ3 . (18) (19) 3 xk+1 ie , k+1 Ai z¯ − Ai z˜ +σ xk+1 1e , i=1 ∗ k A∗j ∆xkj + σ xk+1 2e , A2 A3 ∆x3 A1 j=2 1 + 2 3 xkie 2Ti − 2 xk+1 Ti ie i=1 1 ≥ 2 3 3 ∆xki 2Ti i=1 xk+1 ie + 2 Σi . (20) i=1 ∗ k+1 − A∗ x ∗ ¯ + (A∗ xk+1 − c), we get By simple manipulations and using A∗1 xk+1 1 ¯1 = B u 1 1 1e = A1 x1 3 A∗j ∆xkj σ xk+1 1e , A1 ∗ k ∗ k ∗ k+1 = σ − xk+1 = σ − A∗1 xk+1 1e , −A1 B ∆u 1e , B u − B u j=2 = σ (−B ∗ u ¯) − (A∗1 xk+1 − c), (−B ∗ uk+1 ) − (−B ∗ uk ) . 1 (21) For any vectors a, b, d, e in the same Euclidean vector space, we have the identity 1 a − b, d − e = ( a − e 2 2 1 − a − d 2) + ( b − d 2 6 2 − b − e 2 ). (22) In the above identity, by taking a = −B ∗ u ¯, b = A∗1 xk+1 − c, d = −B ∗ uk+1 and e = −B ∗ uk , and 1 applying it to the right-hand side of (21), we obtain from the definitions of uke and z˜k+1 that 3 σ xk+1 1e , A∗j ∆xkj A1 j=2 σ = ( B ∗ uke 2 σ = ( B ∗ uke 2 σ ( A∗1 xk+1 + B ∗ uk+1 − c 2 − A∗1 xk+1 + B ∗ uk − c 2 ) 1 1 2 1 k σ ∗ k+1 2 )+ z − z˜k+1 2 − A x + B ∗ uk − c 2 . (23) 2σ 2 1 1 2 2 − B ∗ uk+1 )+ e 2 − B ∗ uk+1 e Using the Cauchy-Schwarz inequality, for the parameter α ∈ (0, 1], we get 1 σ (αΣ2 )− 2 A2 A∗3 ∆xk3 2 1 ∗ k σ xk+1 2e , A2 A3 ∆x3 = 2 (αΣ2 ) 2 xk+1 2e , ≤ α xk+1 2e 2 Σ2 + σ2 ∆xk3 4α 2 ∗ . (A2 A∗3 )∗ Σ−1 2 (A2 A3 ) (24) 1 z¯ − z˜k+1 , z˜k+1 − z k . σ (25) It follows from (17) that 3 3 xk+1 ie , k+1 k+1 Ai z¯ − Ai z˜ = z¯ − z˜ = A∗i xk+1 ie , i=1 i=1 Substituting (23), (24) and (25) into (20), we obtain 1 1 k z¯ − z˜k+1 , z˜k+1 − z k + z − z˜k+1 σ 2σ 1 + 2 ≥ 2 + σ ( B ∗ uke 2 2 2 − B ∗ uk+1 ) e 3 xkie 2 Ti − xk+1 ie 2 Ti i=1 σ ∗ k+1 + B ∗ uk − c A x 2 1 1 +(1 − α) xk+1 2e From the elementary inequality a 2 Σ2 2 − 2 + σ2 2 3 3 ∆xki 2 Ti i=1 ∆xk3 4α + b 1 2 xk+1 ie + 2 Σi i=1,i=2 2 ∗ . (A2 A∗3 )∗ Σ−1 2 (A2 A3 ) (26) k k ≥ a − b 2 /2 and xk+1 ie − xie = ∆xi , it follows that 3 xk+1 ie 2 Σi + (1 − α) xk+1 2e 2 Σ2 i=1,i=2 = 1 2 3 ( xk+1 ie 2 Σ2 3 ∆xki 2Σi i=1,i=2 1−α + ( xk+1 2e 2 2 Σ2 1 2 3 ( xk+1 ie + xkie 2 Σi ) + xk2e 1−α 2 ( xk+1 Σ2 ) + 2e 2 + i=1,i=2 1−α + ( xk+1 2e 2 1 ≥ 4 2 Σi 1 + 2 2 Σi − xkie i=1,i=2 2 Σ2 − xk2e 3 ( xk+1 ie i=1,i=2 − xk2e 2 Σ2 ). 7 2 Σi ) 2 Σi − xkie 2 Σi ) + 2 Σ2 ) 1−α ∆xk2 4 2 Σ2 (27) By simple manipulations and using the definition of zek , we get 1 1 k z¯ − z˜k+1 , z˜k+1 − z k + z − z˜k+1 2 σ 2σ 1 1 k 1 k = z¯ − z k , z˜k+1 − z k + z − z˜k+1 , z˜k+1 − z k + z − z˜k+1 σ σ 2σ 1 1 k = − zek , z˜k+1 − z k − z − z˜k+1 2 σ 2σ 1 τ −1 k = zek 2 − zek + τ (˜ z k+1 − z k ) 2 + z − z˜k+1 2 . 2στ 2σ 2 (28) By using (14), (17) and the definitions of zek and rk+1 , we have zek+1 = zek + τ (˜ z k+1 − z k ) z k − z˜k+1 = −σrk+1 , and which, together with (28), imply 1 k 1 z¯ − z˜k+1 , z˜k+1 − z k + z − z˜k+1 σ 2σ 2 = 1 ( zek 2στ 2 − zek+1 2 ) + (τ − 1)σ k+1 2 r . 2 (29) Substituting (27) and (29) into (26), and using the definitions of φk , sk+1 and rk+1 , we get the assertion (13). The proof is complete. Lemma 2.2 Assume that Assumptions 2.1 and 2.2 hold. Let {(xk1 , xk2 , xk3 , z k )} be generated by Algorithm sPADMM. Then, for any τ ∈ (0, +∞) and integer k ≥ 1, we have −σ B ∗ ∆uk , rk+1 ≥ −(1 − τ )σ B ∗ ∆uk , rk + 1 2 3 ( ∆xki 2 Ti +2Σi − ∆xk−1 i 2 Ti ) i=2 − ∆xk3 ) , +σ A∗2 ∆xk2 , A∗3 (∆xk−1 3 (30) where ∆uk , ∆xki (i = 2, 3) and rk+1 are defined as in (12). Proof. Let 2 A∗j xk+1 + A∗3 xk3 − c . j v k+1 := z k + σ j=1 By using (14) and the definition of ∆xk2 , we have −A2 v k+1 − T2 ∆xk2 ∈ ∂θ2 (xk+1 2 ) and − A2 v k − T2 ∆x2k−1 ∈ ∂θ2 (xk2 ). Thus, we obtain from (2) that k+1 ∆xk2 , (A2 v k + T2 ∆xk−1 + T2 ∆xk2 ) ≥ ∆xk2 2 ) − (A2 v 2 Σ2 . By using the Cauchy-Schwarz inequality, we obtain k ∆xk2 , T2 (∆xk2 − ∆xk−1 2 ) = ∆x2 2 T2 − ∆xk2 , T2 ∆x2k−1 ≥ 8 1 ∆xk2 2 2 T2 − 1 ∆xk−1 2 2 2 T2 . Adding up the above two inequalities, we get A∗2 ∆xk2 , v k − v k+1 ≥ 1 ∆xk2 2 2 T2 +2Σ2 − 1 ∆xk−1 2 2 2 T2 . (31) Using z k−1 − z k = −τ σrk and the definitions of v k and rk , we have v k − v k+1 = (1 − τ )σrk − σrk+1 − σA∗3 (∆xk−1 − ∆xk3 ). 3 Substituting the above equation into (31), we get σ − A∗2 ∆xk2 , rk+1 ≥ −(1 − τ )σ A∗2 ∆xk2 , rk + σ A∗2 ∆xk2 , A∗3 (∆xk−1 − ∆xk3 ) 3 1 2 + ( ∆xk2 2T2 +2Σ2 − ∆xk−1 T2 ). 2 2 (32) Similarly as for deriving (32), we can obtain that 1 σ − A∗3 ∆xk3 , rk+1 ≥ −(1 − τ )σ A∗3 ∆xk3 , rk + ( ∆xk3 2 2 T3 +2Σ3 − ∆xk−1 3 2 T3 ). Adding up the above inequality and (32), and using the definitions of B ∗ and u, we get the assertion (30). The proof is complete. Lemma 2.3 Assume that Assumptions 2.1 and 2.2 hold. Let {(xk1 , xk2 , xk3 , z k )} be generated by Algorithm sPADMM. For any τ ∈ (0, +∞) and integer k ≥ 1, we have (1 − τ )σ rk+1 2 + sk+1 ≥ tk+1 + max(1 − τ, 1 − τ −1 )σ( rk+1 + min(τ, 1 + τ − τ 2 )στ −1 rk+1 2 2 − rk 2 ) + (ξk+1 − ξk ), (33) where sk+1 , tk+1 , ξk+1 and rk+1 are defined as in (12). Proof. By simple manipulations and using the definition of rk+1 , we obtain + B ∗ uk − c A∗1 xk+1 1 2 = rk+1 − B ∗ ∆uk 2 = rk+1 2 − 2 B ∗ ∆uk , rk+1 + B ∗ ∆uk 2 . (34) It follows from (30) and (34) that + B ∗ uk − c + σ A∗1 xk+1 1 (1 − τ )σ rk+1 2 ≥ σ B ∗ ∆uk + (2 − τ )σ rk+1 2 2 2 − 2(1 − τ )σ B ∗ ∆uk , rk + 2σ A∗2 ∆xk2 , A∗3 (∆xk−1 − ∆xk3 ) 3 3 ∆xki + 2 Ti +2Σi − ∆xk−1 i 2 Ti . (35) i=2 By the Cauchy-Schwarz inequality, for the parameter α ∈ (0, 1], we have 2σ A∗2 ∆xk2 , A∗3 (∆xk−1 − ∆xk3 ) 3 1 1 1 1 = 2 (αΣ2 ) 2 ∆xk2 , σ(αΣ2 )− 2 (A2 A∗3 )∆xk−1 − 2 (αΣ2 ) 2 ∆xk2 , σ(αΣ2 )− 2 (A2 A∗3 )∆xk3 3 ≥ − α ∆xk2 2 Σ2 = −2α ∆xk2 2 Σ2 σ2 2 k ∆xk−1 ∗ − α ∆x2 3 (A2 A∗3 )∗ Σ−1 2 (A2 A3 ) α σ2 2 k − ( ∆xk−1 ∗ + ∆x3 3 (A2 A∗3 )∗ Σ−1 2 (A2 A3 ) α − 9 2 Σ2 − σ2 ∆xk3 α 2 ∗ (A2 A∗3 )∗ Σ−1 2 (A2 A3 ) 2 ∗ ). (A2 A∗3 )∗ Σ−1 2 (A2 A3 ) Substituting the above inequality into (35), we get + σ A∗1 xk+1 + B ∗ uk − c 1 (1 − τ )σ rk+1 2 ≥ σ B ∗ ∆uk + (2 − τ )σ rk+1 + ∆xk3 2 2 2 2 ∗ T3 + σα (A2 A∗3 )∗ Σ−1 2 (A2 A3 ) 2 − 2(1 − τ )σ B ∗ ∆uk , rk + − ∆xk−1 3 ∆xk2 2 2 ∗ T3 + σα (A2 A∗3 )∗ Σ−1 2 (A2 A3 ) 2 T2 − ∆xk−1 2 + 2(1 − α) 2 T2 ∆xk2 2Σ2 2σ 2 ∆xk3 2(A A∗ )∗ Σ−1 (A A∗ ) . 2 3 2 3 2 α By using the definitions of sk+1 and tk+1 , and the fact that +2 ∆xk3 ∆uk 2 H = ∆xk2 2 Σ3 − 2 + ∆xk3 5(1−α) Σ2 +T2 2 (36) 2 2 5 ∗ Σ +T3 − 5σ (A2 A∗3 )∗ Σ−1 2 (A2 A3 ) 2 3 2α + min(τ, 1 + τ − τ 2 )σ B ∗ ∆uk 2 , we have 2σ 2 ∆xk3 2(A A∗ )∗ Σ−1 (A A∗ ) 2 3 2 3 2 α 2 ∗ k 2 + B ∗ uk − c 2 . = −sk+1 + tk+1 − min(τ, 1 + τ − τ )σ B ∆u + σ A∗1 xk+1 1 2(1 − α) ∆xk2 2 Σ2 + 2 ∆xk3 2 Σ3 − Substituting the above equation into (36) and using the definition of ξk+1 , we get (1 − τ )σ rk+1 2 + sk+1 − tk+1 + min(τ, 1 + τ − τ 2 )σ B ∗ ∆uk ≥ σ B ∗ ∆uk 2 + (2 − τ )σ rk+1 2 2 − 2(1 − τ )σ B ∗ ∆uk , rk + (ξk+1 − ξk ). By using the Cauchy-Schwarz inequality, we get −2(1 − τ )σ B ∗ ∆uk , rk ≥ −(1 − τ )σ B ∗ ∆uk −2(1 − τ )σ B ∗ ∆uk , rk ≥ (1 − τ )τ σ B ∗ ∆uk 2 2 − (1 − τ )σ rk + (1−τ )σ τ 2 (37) if τ ∈ (0, 1], (38) rk 2 if τ ∈ (1, +∞). Substituting (38) into (37), we obtain from simple manipulations that (1 − τ )σ rk+1 2 + sk+1 − tk+1 + min(τ, 1 + τ − τ 2 )σ B ∗ ∆uk ≥ max(1 − τ, 1 − τ −1 )σ( rk+1 2 2 − rk 2 ) + min(τ, 1 + τ − τ 2 )σ(τ −1 rk+1 2 + B ∗ ∆uk 2 ) +(ξk+1 − ξk ). The assertion (33) is proved immediately. Now, we are ready to prove the convergence of the sequence {(xk1 , xk2 , xk3 , z k )} generated by Algorithm sPADMM. Theorem 2.1 Assume that Assumptions 2.1, 2.2 and 2.3 hold. Let {(xk1 , xk2 , xk3 , z k )} be generated by Algorithm sPADMM. Then, for any τ ∈ (0, +∞) and integer k ≥ 1, we have φk + max(1 − τ, 1 − τ −1 )σ rk 2 + ξk − φk+1 + max(1 − τ, 1 − τ −1 )σ rk+1 ≥ tk+1 + min(τ, 1 + τ − τ 2 )στ −1 rk+1 2 , 0, + ξk+1 (39) where φk , ξk+1 , tk+1 and rk are defined as in (12). Assume that τ ∈ (0, (1 + α ∈ (0, 1] it holds that 1 Σ1 + T1 + σA1 A∗1 2 2 H 0 and M 0, √ 5)/2). If for some (40) then the whole sequence {(xk1 , xk2 , xk3 )} converges to an optimal solution to problem (1) and {z k } converges to an optimal solution to the dual of problem (1). 10 Proof. By substituting (33) into √ (13), we can easily get (39). Assume that τ ∈ (0, (1 + 5)/2). Since (40) holds for some α ∈ (0, 1], we have min(τ, 1 + τ − τ 2 ) > 0, H 0 and M 0. From (39), we see immediately that the sequence {φk+1 } is bounded, limk→∞ tk+1 = 0 and limk→∞ rk+1 = 0, i.e., lim ∆xk1 k→∞ Since H 2 1 Σ +T1 2 1 lim ∆uk = 0, k→∞ 2 H = 0, lim rk+1 = lim (τ σ)−1 ∆z k = 0. k→∞ k→∞ (41) 0, we also have that lim ∆xk2 = 0, lim ∆xk3 = 0 k→∞ (42) k→∞ and thus 3 A∗1 ∆xk1 = rk+1 − rk − ( 3 A∗j ∆xkj ) ≤ A∗j ∆xkj → 0 rk+1 + rk + j=2 (43) j=2 as k → ∞. Now from (41) and (43), we obtain lim ∆xk1 k→∞ 2 ( 21 Σ1 +T1 +σA1 A∗1 ) Recall that 21 Σ1 + T1 + σA1 A∗1 = lim k→∞ ∆xk1 2 1 Σ +T1 2 1 + σ A∗1 ∆xk1 2 = 0. (44) 0. Thus it follows from (44) that lim ∆xk1 = 0. (45) k→∞ k+1 By the definition of φk+1 , we see that the three sequences { zek+1 }, { xk+1 M} Σ1 +T1 }, and { ue 1e k+1 k+1 } are also bounded. Further} and { x3 are all bounded. Since M 0, the sequences { x2 more, by using = A∗ xk+1 − A∗ x ¯ − B ∗ uk+1 ≤ rk+1 + B ∗ uk+1 , A∗1 xk+1 e e 1e (46) } is bounded, and so is the sequence { xk+1 we also know that the sequence { A∗1 xk+1 (Σ1 +T1 +σA1 A∗1 ) }. 1e 1e k+1 } is also bounded as the operator Σ1 + T1 + σA1 A∗1 This shows that the sequence { x1 1 ∗ 0. Thus, the sequence {(xk1 , xk2 , xk3 , z k )} is bounded. 2 Σ1 + T1 + σA1 A1 Since the sequence {(xk1 , xk2 , xk3 , z k )} is bounded, there is a subsequence {(xk1i , xk2i , xk3i , z ki )} ∞ ∞ ∞ which converges to a cluster point, say {(x∞ 1 , x2 , x3 , z )}. Taking limits on both sides of (14) along the subsequence {(xk1i , xk2i , xk3i , z ki )}, using (41), (42) and (45), we obtain that −Aj z ∞ ∈ ∂θj (x∞ j ), j = 1, 2, 3 and A∗ x∞ − c = 0, ∞ ∞ ∞ ∞ ∞ ∞ ∞ is i.e., (x∞ 1 , x2 , x3 , z ) satisfies (11). Thus {(x1 , x2 , x3 )} is an optimal solution to (1) and z an optimal solution to the dual of problem (1). ∞ ∞ ∞ To complete the proof, we show next that (x∞ 1 , x2 , x3 , z ) is actually the unique limit of k k k k ∞ ∞ ∞ ∞ ∞ ∞ {(x1 , x2 , x3 , z )}. Replacing (¯ x1 , u ¯, z¯) := (¯ x1 , (¯ x2 , x ¯3 ), z¯) by (x∞ 1 , u , z ) := (x1 , (x2 , x3 ), z ) in (39), for any integer k ≥ ki , we have ∞ ∞ −1 φk+1 (x∞ )σ rk+1 1 , u , z ) + max(1 − τ, 1 − τ ≤ ∞ ∞ φki (x∞ 1 ,u ,z ) + max(1 − τ, 1 − τ 11 −1 )σ r 2 + ξk+1 ki 2 + ξki . (47) Note that ∞ ∞ −1 lim φki (x∞ )σ rki 1 , u , z ) + max(1 − τ, 1 − τ 2 i→∞ + ξki = 0. Therefore, from (47) we get ∞ ∞ lim φk+1 (x∞ 1 , u , z ) = 0, k→∞ i.e., lim (στ )−1 z k+1 − z ∞ k→∞ 2 + xk+1 − x∞ 1 1 2 Σ1 +T1 + uk+1 − u∞ 2 M = 0. k ∞ Since M 0, we also have that limk→∞ uk = u∞ , that is limk→∞ xk2 = x∞ 2 and limk→∞ x3 = x3 . k+1 k+1 ∞ Using the fact that limk→∞ r = 0 and limk→∞ u −u = 0, we get from (46) that ∞ ) = 0. Thus limk→∞ A∗1 (xk+1 − x 1 1 limk→∞ xk+1 − x∞ 1 1 2 Σ1 +T1 +σA1 A∗1 = 0. Since Σ1 + T1 + σA1 A∗1 0, we also obtain that limk→∞ xk1 = x∞ 1 . Therefore, we have shown k k k k that the sequence {(x1 , x2 , x3 )} converges to an optimal solution √ to (1) and {z } converges to an optimal solution to the dual of problem (1) for any τ ∈ (0, (1 + 5)/2). The proof is complete. Remark 2.1 Assume√that (1 − α)Σ2 + σA2 A∗2 is invertible for some α ∈ (0, 1]. Set τ = 1 (the case that 1 = τ ∈ (0, (1 + 5)/2) can be discussed in a similar but slightly more complicated manner) and T2 = 0 in (8) and (9). Then the assumptions H 0 and M 0 in (40) reduce to 5(1−α) Σ2 2 + σA2 A∗2 σA3 A∗2 and σA2 A∗3 5 5σ 2 ∗ ∗ ∗ ∗ −1 2 Σ3 + T3 + σA3 A3 − 2α (A2 A3 ) Σ2 (A2 A3 ) (1 − α)Σ2 + σA2 A∗2 σA2 A∗3 σA3 A∗2 Σ3 + T3 + σA3 A∗3 0 0, which are, respectively, equivalent to 5(1 − α) 5σ 2 5 ∗ 2 ∗ Σ3 +T3 +σA3 A∗3 − (A2 A∗3 )∗ Σ−1 Σ2 +σA2 A∗2 2 (A2 A3 )−σ (A3 A2 ) 2 2α 2 and Σ3 + T3 + σA3 A∗3 − σ 2 (A3 A∗2 ) (1 − α)Σ2 + σA2 A∗2 −1 (A2 A∗3 ) −1 (A2 A∗3 ) 0 0 (48) (49) in terms of the Schur-complement format. The conditions (48) and (49) can be satisfied easily by choosing a proper T3 for given α ∈ (0, 1] and σ ∈ (0, +∞). Evidently, with a fixed α, T3 can take a smaller value with a smaller σ and T3 can even take the zero operator for any σ > 0 smaller than a certain threshold if Σ3 + (1 − α)σA3 A∗3 0. To see this, let us consider the following example constructed in [2]: 1 2 x + 20 1 1 1 1 1 s.t. 1 2 min 1 2 1 x2 + x23 20 20 1 x1 2 x2 = 0, 2 x3 12 (50) which is a convex minimization problem with three strongly convex functions. In [2], Chen, He, Ye and Yuan showed that the directly extended 3-block ADMM scheme (4) with τ = σ = 1 applied to 1 problem (50) is divergent. For problem (50), Σ1 = Σ2 = Σ3 = 10 , A1 = (1, 1, 1), A2 = (1, 1, 2) and A3 = (1, 2, 2). From (48) and (49), by taking α = 1, we have that T3 and σ should satisfy the following conditions 1 5 + T3 − 1225σ 2 + σ > 0 4 6 and 1 5 + T3 + σ > 0, 10 6 √ 1+ 1765 2940 which hold true, in particular, if T3 = 0 and σ < 1223.92. ≈ 0.015 or if σ = 1 and T3 > 14687 12 ≈ Remark 2.2 If A∗2 is vacuous, then for any integer k ≥ 0, we have that xk+1 = x02 = x ¯2 , the 2 3-block sPADMM is just a 2-block sPADMM, and condition (40) reduces to 1 Σ1 + T1 + σA1 A∗1 0, 2 which is equivalent to Σ3 + T3 + σA3 A∗3 Σ1 + T1 + σA1 A∗1 since Σ1 0, T1 0, Σ3 Theorem B.1. in [5]. 3 0 and T3 0 0 and and 5 Σ3 + T3 + min(τ, 1 + τ − τ 2 )σA3 A∗3 2 Σ3 + T3 + σA3 A∗3 0 0, (51) 0. Condition (51) is exactly the same as the one used in Conclusions In this paper, we provided a convergence analysis about a 3-block semi-proximal ADMM for solving separable convex minimization problems with the condition that the second block in the objective is strongly convex1 . The step-length τ in our proposed semi-proximal ADMM is allowed to stay √ in the desirable region (0, (1 + 5)/2). From Remark 2.1, we know that with a fixed parameter α ∈ (0, 1], the added semi-proximal terms can be chosen to be small if the penalty parameter σ is small. If A∗1 and A∗3 are both injective and σ > 0 is taken to be smaller than a certain threshold, then the convergent √ 3-block semi-proximal ADMM includes the directly extended 3-block ADMM with τ ∈ (0, (1 + 5)/2) by taking Ti , i = 1, 2, 3, to be zero operators. With no much difficulty, one could extend our 3-block semi-proximal ADMM to deal with the m-block (m ≥ 4) separable convex minimization problems possessing m − 2 strongly convex blocks and provide the iteration complexity analysis for the corresponding algorithm in the sense of [12]. In this work, we choose not to do the extension because we are not aware of interesting applications of the m-block (m ≥ 4) separable convex minimization problems with m − 2 strongly convex blocks. While our sufficient condition bounding the range of values for σ and T3 is quite flexible, it may have one potential limitation: T3 can be very large if σ is not small as shown in Remark 2.1. Since a larger T3 can potentially make the algorithm converge slower, in our future research we shall first study how this limitation can be circumvented before we study other important issues such as the iteration complexity2 . 1 One can prove similar results if the third instead of the second block is strongly convex. In a recent report [1], Cai, Han and Yuan independently proved a result similar to Theorem 2.1 for the directly extended ADMM (i.e., all the three semi-proximal terms T1 , T2 and T3 disappear) with τ = 1 and provided an analysis on the iteration complexity. 2 13 References [1] X. J. Cai, D. R. Han and X. M. Yuan, The direct extension of ADMM for three-block separable convex minimization models is convergent when one function is strongly convex, Optimization Online, (2014). [2] C. H. Chen, B. S. He, Y. Y. Ye and X. M. Yuan, The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent, Mathematical Programming, Ser. A, DOI 10.1007/s10107-014-0826-5, (2014). [3] C. H. Chen, Y. Shen and Y. F. You, On the convergence analysis of the alternating direction method of multipliers with three blocks, Abstract and Applied Analysis, 2013 (2013), Article ID 183961, 7 pages. [4] J. Eckstein and D. P. Bertsekas, On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators, Mathematical Programming, 55 (1992), pp. 293–318. [5] M. Fazel, T. K. Pong, D. F. Sun and P. Tseng, Hankel matrix rank minimization with applications to system identification and realization, SIAM Journal on Matrix Analysis and Applications, 34(3) (2013), pp. 946–977. [6] M. Fortin and R. Glowinski, Augmented Lagrangian methods, vol. 15 of Studies in Mathematics and its Applications, North-Holland Publishing Co., Amsterdam, 1983. Applications to the numerical solution of boundary value problems, Translated from the French by B. Hunt and D. C. Spicer. [7] D. Gabay, Applications of the method of multipliers to variational inequalities, in Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems, M. Fortin and R. Glowinski, eds., vol. 15 of Studies in Mathematics and Its Applications, Elsevier, (1983), pp. 299–331. [8] D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximation, Computers & Mathematics with Applications, 2 (1976), pp. 17–40. [9] R. Glowinski, Lectures on numerical methods for nonlinear variational problems, vol. 65 of Tata Institute of Fundamental Research Lectures on Mathematics and Physics, Tata Institute of Fundamental Research, Bombay, 1980. Notes by M. G. Vijayasundaram and M. Adimurthi. [10] R. Glowinski and A. Marrocco, Sur l’approximation, par ´el´ements finis d’ordre un, et la r´esolution, par p´enalisation-dualit´e, d’une classe de probl`emes de dirichlet non lin´eares, Revue Francaise d’Automatique, Informatique et Recherche Op´erationelle, 9 (1975), pp. 41–76. [11] D. R. Han and X. M. Yuan, A note on the alternating direction method of multipliers, Journal of Optimization Theory and Applications, 155(1) (2012), pp. 227–238. [12] B. S. He and X. M. Yuan, On the O(1/n) convergence rate of the Douglas-Rachford alternating direction method, SIAM Journal on Numerical Analysis, 50 (2012), pp. 700–709. 14 [13] M. Y. Hong and Z. Q. Luo. On the linear convergence of the alternating direction method of multipliers, arXiv:1208.3922v3, 2013. [14] X. D. Li, D. F. Sun, K.-C. Toh, A Schur complement based semi-proximal ADMM for convex quadratic conic programming and extensions, accepted by Mathematical Programming, (2014). [15] T. Y. Lin, S. Q. Ma and S. Z. Zhang, On the convergence rate of multi-block ADMM, arXiv:1408.4265v1, (2014). [16] T. Y. Lin, S. Q. Ma and S. Z. Zhang, On the global linear convergence of the ADMM with multi-block variables, arXiv:1408.4266v1, (2014). [17] J. W. Nie and Y. X. Yuan, A predictor-corrector algorithm for QSDP combining Dikin-type and Newton centering steps, Annals of Operations Research, 103 (2001), pp. 115–133. [18] H. D. Qi, Local duality of nonlinear semidefinite programming, Mathematics of Operations Research, 34(1) (2009), pp. 124–141. [19] H. D. Qi and D. F. Sun, An augmented Lagrangian dual approach for the H-weighted nearest correlation matrix problem, IMA Journal of Numerical Analysis, 31 (2011), pp. 491–511. [20] R. T. Rockafellar, Convex Analysis. Princeton University Press, Princeton (1970). [21] D. F. Sun, The strong second order sufficient condition and constraint nondegeneracy in nonlinear semidefinite programming and their implications, Mathematics of Operations Research, 31 (2006), pp. 761–776. [22] D. F. Sun, J. Sun and L. W. Zhang, The rate of convergence of the augmented Lagrangian method for nonlinear semidefinite programming, Mathematical Programming, 114 (2008), pp. 349–391. [23] J. Sun and S. Zhang, A modified alternating direction method for convex quadratically constrained quadratic semidefinite programs, European Journal of Operational Research, 207 (2010), pp. 1210–1220. [24] K.-C. Toh, An inexact primal-dual path-following algorithm for convex quadratic SDP, Mathematical Programming, 112 (2008), pp. 221–254. [25] K.-C. Toh, R. H. T¨ ut¨ unc¨ u and M. J. Todd, Inexact primal-dual path-following algorithms for a special class of convex quadratic SDP and related problems, Pacific Journal of Optimization, 3 (2007), pp. 135–164. [26] X. Y. Zhao, A semismooth Newton-CG augmented Lagrangian method for large scale linear and convex quadratic SDPs, PhD thesis, Department of Mathematics, National University of Singapore, 2009. 15

1/--pages