close

Вход

Log in using OpenID

Final PDF version - Department of Mathematics

embedDownload
A Convergent 3-Block Semi-Proximal ADMM for Convex
Minimization Problems with One Strongly Convex Block
Min Li∗, Defeng Sun† and Kim-Chuan Toh‡
October 29, 2014; Revised: November 28, 2014
Accepted by Asia-Pacific Journal of Operational Research
Abstract
In this paper, we present a semi-proximal alternating direction method of multipliers (ADMM) for solving 3-block separable convex minimization problems with the second block in the
objective being a strongly convex function and one coupled linear equation constraint. By
choosing the semi-proximal terms properly, we establish the
√ global convergence of the proposed semi-proximal ADMM for the step-length τ ∈ (0, (1 + 5)/2) and the penalty parameter
σ ∈ (0, +∞). In particular, if σ > 0 is smaller than a certain threshold and the first and third
linear operators in the linear equation constraint are injective, then all the three added semiproximal terms can be dropped and consequently, the convergent 3-block
semi-proximal ADMM
√
reduces to the directly extended 3-block ADMM with τ ∈ (0, (1 + 5)/2).
Keywords. Convex minimization problems, alternating direction method of multipliers, semiproximal, strongly convex.
AMS subject classifications. 90C25, 90C33, 65K05
1
Introduction
We consider the following separable convex minimization problem whose objective function is the
sum of three functions without coupled variables:
min
x1 ,x2 ,x3
θ1 (x1 ) + θ2 (x2 ) + θ3 (x3 ) A∗1 x1 + A∗2 x2 + A∗3 x3 = c, xi ∈ Xi , i = 1, 2, 3 ,
∗
(1)
School of Economics and Management, Southeast University, Nanjing, 210096, China (limin@seu.edu.cn). This
author was supported by the National Natural Science Foundation of China (Grant No. 11001053), Program for New
Century Excellent Talents in University (Grant No. NCET-12-0111) and Qing Lan Project.
†
Department of Mathematics and Risk Management Institute, National University of Singapore, 10 Lower Kent
Ridge Road, Singapore (matsundf@nus.edu.sg). This research was supported in part by the Ministry of Education,
Singapore, Academic Research Fund (Grant No. R-146-000-194-112).
‡
Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore
(mattohkc@nus.edu.sg). This research was supported in part by the Ministry of Education, Singapore, Academic Research Fund (Grant No. R-146-000-194-112).
1
where Xi (i = 1, 2, 3) and Z are real finite dimensional Euclidean spaces each equipped with an
inner product ·, · and its induced norm · , θi : Xi → (−∞, +∞] (i = 1, 2, 3) are closed proper
convex functions, A∗i : Xi → Z is the adjoint of the linear operator Ai : Z → Xi , i = 1, 2, 3, and
c ∈ Z. Since θi , i = 1, 2, 3, are closed proper convex functions, there exist self-adjoint and positive
semi-definite operators Σi , i = 1, 2, 3, such that
x
ˆ i − xi , w
ˆ i − wi ≥ x
ˆi − xi , Σi (ˆ
xi − xi )
∀x
ˆi , xi ∈ dom(θi ), w
ˆi ∈ ∂θi (ˆ
xi ), wi ∈ ∂θi (xi ),
(2)
where ∂θi is the sub-differential mapping of θi , i = 1, 2, 3. The solution set of problem (1) is
assumed to be nonempty throughout our discussions in this paper.
Let σ > 0 be a given penalty parameter and z ∈ Z be the Lagrange multiplier associated
with the linear equality constraint in problem (1). For any (x1 , x2 , x3 ) ∈ X1 × X2 × X3 , write
x ≡ (x1 , x2 , x3 ), θ(x) ≡ θ1 (x1 ) + θ2 (x2 ) + θ3 (x3 ) and A∗ x ≡ A∗1 x1 + A∗2 x2 + A∗3 x3 . Then the
augmented Lagrangian function for problem (1) is defined by
Lσ (x1 , x2 , x3 ; z) := θ(x) + z, A∗ x − c +
σ ∗
A x−c
2
2
(3)
for any (x1 , x2 , x3 , z) ∈ X1 × X2 × X3 × Z. The direct extension of the classical alternating direction
method of multipliers (ADMM) for solving problem (1) consists of the following iterations for
k = 0, 1, . . .


:= argmin Lσ (x1 , xk2 , xk3 ; z k ) ,
xk+1

1

x1 ∈X1



k k
 xk+1
:= argmin Lσ (xk+1
2
1 , x2 , x3 ; z ) ,
x2 ∈X2
(4)
k+1
k+1 k+1
k


,
x
;
z
)
,
:=
argmin
,
x
x
L
(x
3
σ
 3
2
1


x3 ∈X3

 k+1
k
∗
k+1
z
:= z + τ σ(A x
− c),
where τ > 0 is the step-length. Different from the 2-block ADMM whose convergence has been
established for a long time [10, 8, 9, 6, 7, 4], the 3-block ADMM may not converge in general, which
was demonstrated by Chen, He, Ye and Yuan [2] using counterexamples. Nevertheless, if all the
functions θi , i = 1, 2, 3, are strongly convex, Han and Yuan [11] proved the global convergence of
the 3-block ADMM scheme (4) with τ = 1 (Han and Yuan actually considered the general m-block
case for any m ≥ 3. Here and below we focus on the 3-block case only) under the condition that
Σi = µi I
0, i = 1, 2, 3,
0 < σ < min
i=1,2,3
µi
,
3λmax (Ai A∗i )
where λmax (S) is the largest eigenvalue of a given self-adjoint linear operator S. Hong and Luo [13]
proposed to adopt a small step-length τ when updating the Lagrange multiplier z k+1 in (4). Chen,
Shen and You [3] proposed the following sufficient condition
A∗1 is injective, Σi = µi I
0, i = 2, 3 and 0 < σ <
µ2
µ3
, σ≤
λmax (A2 A∗2 )
λmax (A3 A∗3 )
for the global convergence of the directly extended 3-block ADMM with τ = 1 for solving problem
(1). Closely related to the work of Chen, Shen and You [3], in [15], Lin, Ma and Zhang provided
an analysis on the iteration complexity for the same method under the condition
Σi = µi I
0, i = 2, 3 and 0 < σ ≤ min
2
µ2
µ3
,
.
2λmax (A2 A∗2 ) 2λmax (A3 A∗3 )
In [16], under additional assumptions including some smoothness conditions, the same group of
authors further proved the global linear convergence of the mentioned method.
The purpose of this work is to extend the 2-block semi-proximal ADMM studied in [5] to deal
with problem (1) by only assuming θ2 to be strongly convex, i.e., Σ2
0. Note that the semiproximal ADMM with τ > 1 often works better in practice than its counterpart with τ ≤ 1. So
it is desirable to establish the convergence
of the proposed semi-proximal ADMM that allows τ to
√
stay in the larger region (0, (1 + 5)/2).
One of our motivating examples is the following convex quadratic conic programming
min
s.t.
1
X, QX + C, X
2
AX ≥ b, X ∈ K ,
(5)
where K is a nonempty closed convex cone in a finite dimensional real Euclidean space X endowed
with an inner product ·, · and its induced norm · , Q : X → X is a self-adjoint and positive
semi-definite linear operator, A : X → m is a linear map, C ∈ X and b ∈ m are given data. The
dual of problem (5) takes the form of
1
X , QX + b, y
2
A∗ y − QX + S = C,
max −
s.t.
y ≥ 0,
S ∈ K∗ ,
(6)
where K∗ := {v ∈ X : v, w ≥ 0 ∀ w ∈ K} is the dual cone of K. Since Q is self-adjoint and
positive semi-definite, Q can be decomposed as Q = L∗ L for some linear map L. By introducing a
new variable Ξ = −LX , we can re-write problem (6) equivalently as
1
Ξ
2
s.t. A∗ y + L∗ Ξ + S = C,
min δ
m
+
(y) − b, y +
2
+ δK∗ (S)
(7)
∗
(·) and δK∗ (·) are the indicator functions of m
where δ m
+ and K , respectively. As one can see,
+
problem (7) has only one strongly convex block, i.e., the block with respect to Ξ. Consequently, the
results in the aforementioned papers for the convergence analysis of the directly extended 3-block
ADMM applied to solving problem (7) are no longer valid. We shall show in the next section that our
n , the
proposed 3-block semi-proximal ADMM can exactly solve this kind of problems. When K = S+
cone of symmetric and positive semi-definite matrices in the space S n of n × n symmetric matrices,
problem (7) is a convex quadratic semidefinite programming problem that has been extensively
studied both theoretically and numerically in the literature [14, 17, 18, 19, 21, 22, 23, 24, 25, 26],
to name only a few.
The remaining parts of this paper are organized as follows. In the next section, we first present
our 3-block semi-proximal ADMM and then provide the main convergence results. We give some
concluding remarks in the final section.
Notation.
• The effective domain of a function f : X → (−∞, +∞] is defined as dom(f ) := {x ∈
X | f (x) < +∞}. The set of all relative interior points of a given nonempty convex set
C is denoted by ri(C).
3
• For convenience, for any given x, we use x 2G to denote x, Gx if G is a self-adjoint linear
operator in a given finite dimensional Euclidean space X . If Σ : X → X is a self-adjoint
1
and positive semi-definite linear operator, we use Σ 2 to denote the unique self-adjoint and
positive semi-definite square root of Σ.
• Denote


x1
x :=  x2  ,
x3

u :=
x2
x3
,

A1
A :=  A2  ,
A3
B :=
A2
A3
.
• Let α ∈ (0, 1] be given. Denote
M :=
H :=
2
5(1−α)
Σ2
2
0
(1 − α)Σ2 + T2
0
0
Σ3 + T3
+ T2
+ σBB ∗
0
5
2 Σ3
+ T3 −
5σ 2
∗
∗ ∗ −1
2α (A2 A3 ) Σ2 (A2 A3 )
(8)
+min(τ, 1+τ −τ 2 )σBB ∗ . (9)
A 3-Block Semi-Proximal ADMM
Based on our previous introduction and motivation, we propose our 3-block semi-proximal ADMM
for solving problem (1) in the following:
Algorithm sPADMM: A 3-block semi-proximal ADMM for solving problem (1).
Let σ ∈ (0, +∞) and τ ∈ (0, +∞) be given parameters. Let Ti , i = 1, 2, 3, be given selfadjoint and positive semi-definite linear operators defined on Xi , i = 1, 2, 3, respectively. Choose
(x01 , x02 , x03 , z 0 ) ∈ dom(θ1 ) × dom(θ2 ) × dom(θ3 ) × Z and set k = 0.
Step 1. Compute

1


:= argmin Lσ (x1 , xk2 , xk3 ; z k ) + x1 − xk1 2T1 ,
xk+1

1

2

x1 ∈X1



 xk+1 := argmin L (xk+1 , x , xk ; z k ) + 1 x − xk 2 ,
2 3
σ 1
2
2 T2
2
2
x2 ∈X2


1
k+1 k+1
k+1
k 2
k


 x3 := argmin Lσ (x1 , x2 , x3 ; z ) + 2 x3 − x3 T3 ,


x
∈X
3
3

 k+1
z
:= z k + τ σ(A∗ xk+1 − c).
(10)
Step 2. If a termination criterion is not met, set k := k + 1 and then goto Step 1.
In order to analyze the convergence properties of Algorithm sPADMM, we make the following
assumptions.
Assumption 2.1 The convex function θ2 satisfies (2) with Σ2
0.
Assumption 2.2 The self-adjoint and positive semi-definite operators Ti , i = 1, 2, 3, are chosen
such that the sequence {(xk1 , xk2 , xk3 , z k )} generated by Algorithm sPADMM is well defined.
4
Assumption 2.3 There exists x = (x1 , x2 , x3 ) ∈ ri(dom(θ1 ) × dom(θ2 ) × dom(θ3 ))
P , where
P := x := (x1 , x2 , x3 ) ∈ X1 × X2 × X3 A∗ x = c .
Under Assumption 2.3, it follows from [20, Corollary 28.2.2] and [20, Corollary 28.3.1] that
x
¯ = (¯
x1 , x
¯2 , x
¯3 ) ∈ X1 × X2 × X3 is an optimal solution to problem (1) if and only if there exists a
Lagrange multiplier z¯ ∈ Z such that
and A∗ x
¯ − c = 0.
−Ai z¯ ∈ ∂θi (¯
xi ), i = 1, 2, 3
(11)
Moreover, any z¯ ∈ Z satisfying (11) is an optimal solution to the dual of problem (1).
Let x
¯ = (¯
x1 , x
¯2 , x
¯3 ) ∈ X1 × X2 × X3 and z¯ ∈ Z satisfy (11). For the sake of convenience, define
for (x1 , u, z) := (x1 , (x2 , x3 ), z) ∈ X1 × (X2 × X3 ) × Z, α ∈ (0, 1] and k = 0, 1, . . ., the following
quantities
φk (x1 , u, z) := (στ )−1 z k − z 2 + xk1 − x1 2Σ1 +T1 + uk − u 2M
and





















xkie := xki − x
¯i , i = 1, 2, 3,
uke := uk − u
¯,
∆xki := xk+1
− xki , i = 1, 2, 3,
i
x1 , u
¯, z¯) = (στ )−1 zek
φk := φk (¯
∆uk := uk+1 − uk ,
2
+ xk1e
2
Σ1 +T1
+ uke
∆z k := z k+1 − z k ,
2 ,
M
2
,
2
∗
T3 + σα (A2 A∗3 )∗ Σ−1
2 (A2 A3 )
∆xk1 21 Σ +T + ∆xk2 21−α Σ +T + ∆xk3 21
2
∗
1
2
2
Σ +T3 − σ
(A2 A∗3 )∗ Σ−1
2 1
2
2 (A2 A3 )
2 3
2α
k+1
+σ A∗1 x1 + B ∗ uk − c 2 ,
ξk+1 := ∆xk2
2
T2
zek := z k − z¯,
+ ∆xk3


sk+1 :=













tk+1 := ∆xk1 21 Σ +T + ∆uk


1
2 1


 k
∗
k
r := A x − c.
(12)
2 ,
H
To prove the convergence of Algorithm sPADMM for solving problem (1), we first present some
useful lemmas.
Lemma 2.1 Assume that Assumptions 2.1, 2.2 and 2.3 hold. Let {(xk1 , xk2 , xk3 , z k )} be generated
by Algorithm sPADMM. Then, for any τ ∈ (0, +∞) and integer k ≥ 0, we have
φk − φk+1 ≥ (1 − τ )σ rk+1
2
+ sk+1 ,
(13)
where φk , sk+1 and rk+1 are defined as in (12).
Proof. The sequence {(xk1 , xk2 , xk3 , z k )} is well defined under Assumption 2.2. Notice that the
iteration scheme (10) of Algorithm sPADMM can be re-written as for k = 0, 1, . . . that


−A1 [z k + σ(A∗1 xk+1
+ 3j=2 A∗j xkj − c)] − T1 (xk+1
− xk1 ) ∈ ∂θ1 (xk+1

1
1
1 ),


 −A [z k + σ( 2 A∗ xk+1 + A∗ xk − c)] − T (xk+1 − xk ) ∈ ∂θ (xk+1 ),
2
2 2
2 2
3 3
2
j=1 j j
(14)
k+1
k+1
k
∗
k+1
k

−A3 [z + σ(A x
− c)] − T3 (x3 − x3 ) ∈ ∂θ3 (x3 ),



 k+1
z
:= z k + τ σ(A∗ xk+1 − c).
5
Combining (2) with (11) and (14), and using the definitions of xk+1
and ∆xki , for i = 1, 2, 3, we
ie
have
i
xk+1
ie ,
3
A∗j xk+1
j
k
Ai z¯ − Ai z − σAi (
A∗j xkj − c) − Ti ∆xki ≥ xk+1
ie
+
j=1
2
Σi .
(15)
j=i+1
For any vectors a, b, d in the same Euclidean vector space and any self-adjoint linear operator G,
we have the identity
1
a − b, G(d − a) = ( d − b
2
2
G
− a−b
2
G
− a−d
2
G ).
Taking a = xk+1
, b=x
¯i , d = xki and G = Ti in the above identity, and using the definitions of
i
k+1
k
xie and ∆xi , we get
1
k
k
xk+1
ie , −Ti ∆xi = ( xie
2
2
Ti
− xk+1
ie
2
Ti
− ∆xki
2
Ti ),
i = 1, 2, 3.
(16)
Let
+ B ∗ uk+1 − c).
z˜k+1 = z k + σ(A∗ xk+1 − c) = z k + σ(A∗1 xk+1
1
(17)
Substituting (16) and (17) into (15) and using the definition of ∆xkj , for i = 1, 2, we have
3
xk+1
ie ,
k+1
Ai z¯ − Ai z˜
+ σAi
j=i+1
1
A∗j ∆xkj + ( xkie
2
− xk+1
ie
2
Ti
and
1
xk+1
¯ − A3 z˜k+1 + ( xk3e
3e , A3 z
2
Adding (18) for i = 1, 2 to (19), we get
2
T3
− xk+1
3e
3
2
T3 )
≥
2
Ti )
≥
1
∆xk3
2
1
∆xki
2
2
T3
2
Ti
+ xk+1
ie
+ xk+1
3e
2
Σi
2
Σ3 .
(18)
(19)
3
xk+1
ie ,
k+1
Ai z¯ − Ai z˜
+σ
xk+1
1e ,
i=1
∗
k
A∗j ∆xkj + σ xk+1
2e , A2 A3 ∆x3
A1
j=2
1
+
2
3
xkie 2Ti
−
2
xk+1
Ti
ie
i=1
1
≥
2
3
3
∆xki 2Ti
i=1
xk+1
ie
+
2
Σi .
(20)
i=1
∗ k+1 − A∗ x
∗ ¯ + (A∗ xk+1 − c), we get
By simple manipulations and using A∗1 xk+1
1 ¯1 = B u
1 1
1e = A1 x1
3
A∗j ∆xkj
σ xk+1
1e , A1
∗
k
∗ k
∗ k+1
= σ − xk+1
= σ − A∗1 xk+1
1e , −A1 B ∆u
1e , B u − B u
j=2
= σ (−B ∗ u
¯) − (A∗1 xk+1
− c), (−B ∗ uk+1 ) − (−B ∗ uk ) .
1
(21)
For any vectors a, b, d, e in the same Euclidean vector space, we have the identity
1
a − b, d − e = ( a − e
2
2
1
− a − d 2) + ( b − d
2
6
2
− b − e 2 ).
(22)
In the above identity, by taking a = −B ∗ u
¯, b = A∗1 xk+1
− c, d = −B ∗ uk+1 and e = −B ∗ uk , and
1
applying it to the right-hand side of (21), we obtain from the definitions of uke and z˜k+1 that
3
σ
xk+1
1e ,
A∗j ∆xkj
A1
j=2
σ
= ( B ∗ uke
2
σ
= ( B ∗ uke
2
σ
( A∗1 xk+1
+ B ∗ uk+1 − c 2 − A∗1 xk+1
+ B ∗ uk − c 2 )
1
1
2
1 k
σ ∗ k+1
2
)+
z − z˜k+1 2 −
A x
+ B ∗ uk − c 2 .
(23)
2σ
2 1 1
2
2
− B ∗ uk+1
)+
e
2
− B ∗ uk+1
e
Using the Cauchy-Schwarz inequality, for the parameter α ∈ (0, 1], we get
1
σ
(αΣ2 )− 2 A2 A∗3 ∆xk3
2
1
∗
k
σ xk+1
2e , A2 A3 ∆x3
= 2 (αΣ2 ) 2 xk+1
2e ,
≤ α xk+1
2e
2
Σ2
+
σ2
∆xk3
4α
2
∗ .
(A2 A∗3 )∗ Σ−1
2 (A2 A3 )
(24)
1
z¯ − z˜k+1 , z˜k+1 − z k .
σ
(25)
It follows from (17) that
3
3
xk+1
ie ,
k+1
k+1
Ai z¯ − Ai z˜
= z¯ − z˜
=
A∗i xk+1
ie
,
i=1
i=1
Substituting (23), (24) and (25) into (20), we obtain
1
1 k
z¯ − z˜k+1 , z˜k+1 − z k +
z − z˜k+1
σ
2σ
1
+
2
≥
2
+
σ
( B ∗ uke
2
2
2
− B ∗ uk+1
)
e
3
xkie
2
Ti
− xk+1
ie
2
Ti
i=1
σ ∗ k+1
+ B ∗ uk − c
A x
2 1 1
+(1 − α) xk+1
2e
From the elementary inequality a
2
Σ2
2
−
2
+
σ2
2
3
3
∆xki
2
Ti
i=1
∆xk3
4α
+ b
1
2
xk+1
ie
+
2
Σi
i=1,i=2
2
∗ .
(A2 A∗3 )∗ Σ−1
2 (A2 A3 )
(26)
k
k
≥ a − b 2 /2 and xk+1
ie − xie = ∆xi , it follows that
3
xk+1
ie
2
Σi
+ (1 − α) xk+1
2e
2
Σ2
i=1,i=2
=
1
2
3
( xk+1
ie
2
Σ2
3
∆xki 2Σi
i=1,i=2
1−α
+
( xk+1
2e
2
2
Σ2
1
2
3
( xk+1
ie
+ xkie
2
Σi )
+ xk2e
1−α
2
( xk+1
Σ2 ) +
2e
2
+
i=1,i=2
1−α
+
( xk+1
2e
2
1
≥
4
2
Σi
1
+
2
2
Σi
− xkie
i=1,i=2
2
Σ2
− xk2e
3
( xk+1
ie
i=1,i=2
− xk2e
2
Σ2 ).
7
2
Σi )
2
Σi
− xkie
2
Σi )
+
2
Σ2 )
1−α
∆xk2
4
2
Σ2
(27)
By simple manipulations and using the definition of zek , we get
1
1 k
z¯ − z˜k+1 , z˜k+1 − z k +
z − z˜k+1 2
σ
2σ
1
1 k
1 k
=
z¯ − z k , z˜k+1 − z k +
z − z˜k+1 , z˜k+1 − z k +
z − z˜k+1
σ
σ
2σ
1
1 k
=
− zek , z˜k+1 − z k −
z − z˜k+1 2
σ
2σ
1
τ −1 k
=
zek 2 − zek + τ (˜
z k+1 − z k ) 2 +
z − z˜k+1 2 .
2στ
2σ
2
(28)
By using (14), (17) and the definitions of zek and rk+1 , we have
zek+1 = zek + τ (˜
z k+1 − z k )
z k − z˜k+1 = −σrk+1 ,
and
which, together with (28), imply
1 k
1
z¯ − z˜k+1 , z˜k+1 − z k +
z − z˜k+1
σ
2σ
2
=
1
( zek
2στ
2
− zek+1 2 ) +
(τ − 1)σ k+1 2
r
.
2
(29)
Substituting (27) and (29) into (26), and using the definitions of φk , sk+1 and rk+1 , we get the
assertion (13). The proof is complete.
Lemma 2.2 Assume that Assumptions 2.1 and 2.2 hold. Let {(xk1 , xk2 , xk3 , z k )} be generated by
Algorithm sPADMM. Then, for any τ ∈ (0, +∞) and integer k ≥ 1, we have
−σ B ∗ ∆uk , rk+1 ≥ −(1 − τ )σ B ∗ ∆uk , rk +
1
2
3
( ∆xki
2
Ti +2Σi
− ∆xk−1
i
2
Ti )
i=2
− ∆xk3 ) ,
+σ A∗2 ∆xk2 , A∗3 (∆xk−1
3
(30)
where ∆uk , ∆xki (i = 2, 3) and rk+1 are defined as in (12).
Proof. Let
2
A∗j xk+1
+ A∗3 xk3 − c .
j
v k+1 := z k + σ
j=1
By using (14) and the definition of ∆xk2 , we have
−A2 v k+1 − T2 ∆xk2 ∈ ∂θ2 (xk+1
2 )
and
− A2 v k − T2 ∆x2k−1 ∈ ∂θ2 (xk2 ).
Thus, we obtain from (2) that
k+1
∆xk2 , (A2 v k + T2 ∆xk−1
+ T2 ∆xk2 ) ≥ ∆xk2
2 ) − (A2 v
2
Σ2 .
By using the Cauchy-Schwarz inequality, we obtain
k
∆xk2 , T2 (∆xk2 − ∆xk−1
2 ) = ∆x2
2
T2
− ∆xk2 , T2 ∆x2k−1 ≥
8
1
∆xk2
2
2
T2
−
1
∆xk−1
2
2
2
T2 .
Adding up the above two inequalities, we get
A∗2 ∆xk2 , v k − v k+1 ≥
1
∆xk2
2
2
T2 +2Σ2
−
1
∆xk−1
2
2
2
T2 .
(31)
Using z k−1 − z k = −τ σrk and the definitions of v k and rk , we have
v k − v k+1 = (1 − τ )σrk − σrk+1 − σA∗3 (∆xk−1
− ∆xk3 ).
3
Substituting the above equation into (31), we get
σ − A∗2 ∆xk2 , rk+1
≥ −(1 − τ )σ A∗2 ∆xk2 , rk + σ A∗2 ∆xk2 , A∗3 (∆xk−1
− ∆xk3 )
3
1
2
+ ( ∆xk2 2T2 +2Σ2 − ∆xk−1
T2 ).
2
2
(32)
Similarly as for deriving (32), we can obtain that
1
σ − A∗3 ∆xk3 , rk+1 ≥ −(1 − τ )σ A∗3 ∆xk3 , rk + ( ∆xk3
2
2
T3 +2Σ3
− ∆xk−1
3
2
T3 ).
Adding up the above inequality and (32), and using the definitions of B ∗ and u, we get the assertion
(30). The proof is complete.
Lemma 2.3 Assume that Assumptions 2.1 and 2.2 hold. Let {(xk1 , xk2 , xk3 , z k )} be generated by
Algorithm sPADMM. For any τ ∈ (0, +∞) and integer k ≥ 1, we have
(1 − τ )σ rk+1
2
+ sk+1 ≥ tk+1 + max(1 − τ, 1 − τ −1 )σ( rk+1
+ min(τ, 1 + τ − τ 2 )στ −1 rk+1
2
2
− rk 2 )
+ (ξk+1 − ξk ),
(33)
where sk+1 , tk+1 , ξk+1 and rk+1 are defined as in (12).
Proof. By simple manipulations and using the definition of rk+1 , we obtain
+ B ∗ uk − c
A∗1 xk+1
1
2
= rk+1 − B ∗ ∆uk
2
= rk+1
2
− 2 B ∗ ∆uk , rk+1 + B ∗ ∆uk 2 .
(34)
It follows from (30) and (34) that
+ B ∗ uk − c
+ σ A∗1 xk+1
1
(1 − τ )σ rk+1
2
≥ σ B ∗ ∆uk
+ (2 − τ )σ rk+1
2
2
2
− 2(1 − τ )σ B ∗ ∆uk , rk + 2σ A∗2 ∆xk2 , A∗3 (∆xk−1
− ∆xk3 )
3
3
∆xki
+
2
Ti +2Σi
− ∆xk−1
i
2
Ti
.
(35)
i=2
By the Cauchy-Schwarz inequality, for the parameter α ∈ (0, 1], we have
2σ A∗2 ∆xk2 , A∗3 (∆xk−1
− ∆xk3 )
3
1
1
1
1
= 2 (αΣ2 ) 2 ∆xk2 , σ(αΣ2 )− 2 (A2 A∗3 )∆xk−1
− 2 (αΣ2 ) 2 ∆xk2 , σ(αΣ2 )− 2 (A2 A∗3 )∆xk3
3
≥ − α ∆xk2
2
Σ2
= −2α ∆xk2
2
Σ2
σ2
2
k
∆xk−1
∗ − α ∆x2
3
(A2 A∗3 )∗ Σ−1
2 (A2 A3 )
α
σ2
2
k
− ( ∆xk−1
∗ + ∆x3
3
(A2 A∗3 )∗ Σ−1
2 (A2 A3 )
α
−
9
2
Σ2
−
σ2
∆xk3
α
2
∗
(A2 A∗3 )∗ Σ−1
2 (A2 A3 )
2
∗ ).
(A2 A∗3 )∗ Σ−1
2 (A2 A3 )
Substituting the above inequality into (35), we get
+ σ A∗1 xk+1
+ B ∗ uk − c
1
(1 − τ )σ rk+1
2
≥ σ B ∗ ∆uk
+ (2 − τ )σ rk+1
+ ∆xk3
2
2
2
2
∗
T3 + σα (A2 A∗3 )∗ Σ−1
2 (A2 A3 )
2
− 2(1 − τ )σ B ∗ ∆uk , rk +
− ∆xk−1
3
∆xk2
2
2
∗
T3 + σα (A2 A∗3 )∗ Σ−1
2 (A2 A3 )
2
T2
− ∆xk−1
2
+ 2(1 − α)
2
T2
∆xk2 2Σ2
2σ 2
∆xk3 2(A A∗ )∗ Σ−1 (A A∗ ) .
2 3
2 3
2
α
By using the definitions of sk+1 and tk+1 , and the fact that
+2 ∆xk3
∆uk
2
H
= ∆xk2
2
Σ3
−
2
+ ∆xk3
5(1−α)
Σ2 +T2
2
(36)
2
2
5
∗
Σ +T3 − 5σ
(A2 A∗3 )∗ Σ−1
2 (A2 A3 )
2 3
2α
+ min(τ, 1 + τ − τ 2 )σ B ∗ ∆uk 2 ,
we have
2σ 2
∆xk3 2(A A∗ )∗ Σ−1 (A A∗ )
2 3
2 3
2
α
2
∗
k 2
+ B ∗ uk − c 2 .
= −sk+1 + tk+1 − min(τ, 1 + τ − τ )σ B ∆u
+ σ A∗1 xk+1
1
2(1 − α) ∆xk2
2
Σ2
+ 2 ∆xk3
2
Σ3
−
Substituting the above equation into (36) and using the definition of ξk+1 , we get
(1 − τ )σ rk+1
2
+ sk+1 − tk+1 + min(τ, 1 + τ − τ 2 )σ B ∗ ∆uk
≥ σ B ∗ ∆uk
2
+ (2 − τ )σ rk+1
2
2
− 2(1 − τ )σ B ∗ ∆uk , rk + (ξk+1 − ξk ).
By using the Cauchy-Schwarz inequality, we get

 −2(1 − τ )σ B ∗ ∆uk , rk ≥ −(1 − τ )σ B ∗ ∆uk
 −2(1 − τ )σ B ∗ ∆uk , rk ≥ (1 − τ )τ σ B ∗ ∆uk
2
2
− (1 − τ )σ rk
+
(1−τ )σ
τ
2
(37)
if τ ∈ (0, 1],
(38)
rk
2
if τ ∈ (1, +∞).
Substituting (38) into (37), we obtain from simple manipulations that
(1 − τ )σ rk+1
2
+ sk+1 − tk+1 + min(τ, 1 + τ − τ 2 )σ B ∗ ∆uk
≥ max(1 − τ, 1 − τ −1 )σ( rk+1
2
2
− rk 2 ) + min(τ, 1 + τ − τ 2 )σ(τ −1 rk+1
2
+ B ∗ ∆uk 2 )
+(ξk+1 − ξk ).
The assertion (33) is proved immediately.
Now, we are ready to prove the convergence of the sequence {(xk1 , xk2 , xk3 , z k )} generated by
Algorithm sPADMM.
Theorem 2.1 Assume that Assumptions 2.1, 2.2 and 2.3 hold. Let {(xk1 , xk2 , xk3 , z k )} be generated
by Algorithm sPADMM. Then, for any τ ∈ (0, +∞) and integer k ≥ 1, we have
φk + max(1 − τ, 1 − τ −1 )σ rk
2
+ ξk − φk+1 + max(1 − τ, 1 − τ −1 )σ rk+1
≥ tk+1 + min(τ, 1 + τ − τ 2 )στ −1 rk+1 2 ,
0,
+ ξk+1
(39)
where φk , ξk+1 , tk+1 and rk are defined as in (12). Assume that τ ∈ (0, (1 +
α ∈ (0, 1] it holds that
1
Σ1 + T1 + σA1 A∗1
2
2
H
0
and
M
0,
√
5)/2). If for some
(40)
then the whole sequence {(xk1 , xk2 , xk3 )} converges to an optimal solution to problem (1) and {z k }
converges to an optimal solution to the dual of problem (1).
10
Proof. By substituting (33) into
√ (13), we can easily get (39).
Assume that τ ∈ (0, (1 + 5)/2). Since (40) holds for some α ∈ (0, 1], we have min(τ, 1 + τ −
τ 2 ) > 0, H 0 and M 0. From (39), we see immediately that the sequence {φk+1 } is bounded,
limk→∞ tk+1 = 0 and limk→∞ rk+1 = 0, i.e.,
lim ∆xk1
k→∞
Since H
2
1
Σ +T1
2 1
lim ∆uk
= 0,
k→∞
2
H
= 0,
lim rk+1 = lim (τ σ)−1 ∆z k = 0.
k→∞
k→∞
(41)
0, we also have that
lim ∆xk2 = 0,
lim ∆xk3 = 0
k→∞
(42)
k→∞
and thus
3
A∗1 ∆xk1 = rk+1 − rk − (
3
A∗j ∆xkj )
≤
A∗j ∆xkj → 0
rk+1 + rk +
j=2
(43)
j=2
as k → ∞. Now from (41) and (43), we obtain
lim ∆xk1
k→∞
2
( 21 Σ1 +T1 +σA1 A∗1 )
Recall that 21 Σ1 + T1 + σA1 A∗1
= lim
k→∞
∆xk1
2
1
Σ +T1
2 1
+ σ A∗1 ∆xk1
2
= 0.
(44)
0. Thus it follows from (44) that
lim ∆xk1 = 0.
(45)
k→∞
k+1
By the definition of φk+1 , we see that the three sequences { zek+1 }, { xk+1
M}
Σ1 +T1 }, and { ue
1e
k+1
k+1
} are also bounded. Further} and { x3
are all bounded. Since M
0, the sequences { x2
more, by using
= A∗ xk+1 − A∗ x
¯ − B ∗ uk+1
≤ rk+1 + B ∗ uk+1
,
A∗1 xk+1
e
e
1e
(46)
} is bounded, and so is the sequence { xk+1
we also know that the sequence { A∗1 xk+1
(Σ1 +T1 +σA1 A∗1 ) }.
1e
1e
k+1
} is also bounded as the operator Σ1 + T1 + σA1 A∗1
This shows that the sequence { x1
1
∗
0. Thus, the sequence {(xk1 , xk2 , xk3 , z k )} is bounded.
2 Σ1 + T1 + σA1 A1
Since the sequence {(xk1 , xk2 , xk3 , z k )} is bounded, there is a subsequence {(xk1i , xk2i , xk3i , z ki )}
∞ ∞ ∞
which converges to a cluster point, say {(x∞
1 , x2 , x3 , z )}. Taking limits on both sides of (14)
along the subsequence {(xk1i , xk2i , xk3i , z ki )}, using (41), (42) and (45), we obtain that
−Aj z ∞ ∈ ∂θj (x∞
j ), j = 1, 2, 3
and A∗ x∞ − c = 0,
∞ ∞ ∞
∞ ∞ ∞
∞ is
i.e., (x∞
1 , x2 , x3 , z ) satisfies (11). Thus {(x1 , x2 , x3 )} is an optimal solution to (1) and z
an optimal solution to the dual of problem (1).
∞ ∞ ∞
To complete the proof, we show next that (x∞
1 , x2 , x3 , z ) is actually the unique limit of
k
k
k
k
∞ ∞
∞
∞ ∞
∞
{(x1 , x2 , x3 , z )}. Replacing (¯
x1 , u
¯, z¯) := (¯
x1 , (¯
x2 , x
¯3 ), z¯) by (x∞
1 , u , z ) := (x1 , (x2 , x3 ), z ) in
(39), for any integer k ≥ ki , we have
∞ ∞
−1
φk+1 (x∞
)σ rk+1
1 , u , z ) + max(1 − τ, 1 − τ
≤
∞ ∞
φki (x∞
1 ,u ,z )
+ max(1 − τ, 1 − τ
11
−1
)σ r
2
+ ξk+1
ki 2
+ ξki .
(47)
Note that
∞ ∞
−1
lim φki (x∞
)σ rki
1 , u , z ) + max(1 − τ, 1 − τ
2
i→∞
+ ξki = 0.
Therefore, from (47) we get
∞ ∞
lim φk+1 (x∞
1 , u , z ) = 0,
k→∞
i.e.,
lim (στ )−1 z k+1 − z ∞
k→∞
2
+ xk+1
− x∞
1
1
2
Σ1 +T1
+ uk+1 − u∞
2
M
= 0.
k
∞
Since M
0, we also have that limk→∞ uk = u∞ , that is limk→∞ xk2 = x∞
2 and limk→∞ x3 = x3 .
k+1
k+1
∞
Using the fact that limk→∞ r
= 0 and limk→∞ u
−u
= 0, we get from (46) that
∞ ) = 0. Thus
limk→∞ A∗1 (xk+1
−
x
1
1
limk→∞ xk+1
− x∞
1
1
2
Σ1 +T1 +σA1 A∗1
= 0.
Since Σ1 + T1 + σA1 A∗1
0, we also obtain that limk→∞ xk1 = x∞
1 . Therefore, we have shown
k
k
k
k
that the sequence {(x1 , x2 , x3 )} converges to an optimal solution
√ to (1) and {z } converges to an
optimal solution to the dual of problem (1) for any τ ∈ (0, (1 + 5)/2). The proof is complete.
Remark 2.1 Assume√that (1 − α)Σ2 + σA2 A∗2 is invertible for some α ∈ (0, 1]. Set τ = 1 (the case
that 1 = τ ∈ (0, (1 + 5)/2) can be discussed in a similar but slightly more complicated manner)
and T2 = 0 in (8) and (9). Then the assumptions H 0 and M 0 in (40) reduce to
5(1−α)
Σ2
2
+ σA2 A∗2
σA3 A∗2
and
σA2 A∗3
5
5σ 2
∗
∗
∗ ∗ −1
2 Σ3 + T3 + σA3 A3 − 2α (A2 A3 ) Σ2 (A2 A3 )
(1 − α)Σ2 + σA2 A∗2
σA2 A∗3
σA3 A∗2
Σ3 + T3 + σA3 A∗3
0
0,
which are, respectively, equivalent to
5(1 − α)
5σ 2
5
∗
2
∗
Σ3 +T3 +σA3 A∗3 −
(A2 A∗3 )∗ Σ−1
Σ2 +σA2 A∗2
2 (A2 A3 )−σ (A3 A2 )
2
2α
2
and
Σ3 + T3 + σA3 A∗3 − σ 2 (A3 A∗2 ) (1 − α)Σ2 + σA2 A∗2
−1
(A2 A∗3 )
−1
(A2 A∗3 )
0
0 (48)
(49)
in terms of the Schur-complement format. The conditions (48) and (49) can be satisfied easily by
choosing a proper T3 for given α ∈ (0, 1] and σ ∈ (0, +∞). Evidently, with a fixed α, T3 can take a
smaller value with a smaller σ and T3 can even take the zero operator for any σ > 0 smaller than
a certain threshold if Σ3 + (1 − α)σA3 A∗3
0. To see this, let us consider the following example
constructed in [2]:
1 2
x +
20 1

1 1

1 1
s.t.
1 2
min
1 2
1
x2 + x23
20
20


1
x1
2   x2  = 0,
2
x3
12
(50)
which is a convex minimization problem with three strongly convex functions. In [2], Chen, He, Ye
and Yuan showed that the directly extended 3-block ADMM scheme (4) with τ = σ = 1 applied to
1
problem (50) is divergent. For problem (50), Σ1 = Σ2 = Σ3 = 10
, A1 = (1, 1, 1), A2 = (1, 1, 2)
and A3 = (1, 2, 2). From (48) and (49), by taking α = 1, we have that T3 and σ should satisfy the
following conditions
1
5
+ T3 − 1225σ 2 + σ > 0
4
6
and
1
5
+ T3 + σ > 0,
10
6
√
1+ 1765
2940
which hold true, in particular, if T3 = 0 and σ <
1223.92.
≈ 0.015 or if σ = 1 and T3 >
14687
12
≈
Remark 2.2 If A∗2 is vacuous, then for any integer k ≥ 0, we have that xk+1
= x02 = x
¯2 , the
2
3-block sPADMM is just a 2-block sPADMM, and condition (40) reduces to
1
Σ1 + T1 + σA1 A∗1 0,
2
which is equivalent to
Σ3 + T3 + σA3 A∗3
Σ1 + T1 + σA1 A∗1
since Σ1 0, T1 0, Σ3
Theorem B.1. in [5].
3
0 and T3
0
0
and
and
5
Σ3 + T3 + min(τ, 1 + τ − τ 2 )σA3 A∗3
2
Σ3 + T3 + σA3 A∗3
0
0,
(51)
0. Condition (51) is exactly the same as the one used in
Conclusions
In this paper, we provided a convergence analysis about a 3-block semi-proximal ADMM for solving
separable convex minimization problems with the condition that the second block in the objective
is strongly convex1 . The step-length
τ in our proposed semi-proximal ADMM is allowed to stay
√
in the desirable region (0, (1 + 5)/2). From Remark 2.1, we know that with a fixed parameter
α ∈ (0, 1], the added semi-proximal terms can be chosen to be small if the penalty parameter σ is
small. If A∗1 and A∗3 are both injective and σ > 0 is taken to be smaller than a certain threshold,
then the convergent
√ 3-block semi-proximal ADMM includes the directly extended 3-block ADMM
with τ ∈ (0, (1 + 5)/2) by taking Ti , i = 1, 2, 3, to be zero operators. With no much difficulty,
one could extend our 3-block semi-proximal ADMM to deal with the m-block (m ≥ 4) separable
convex minimization problems possessing m − 2 strongly convex blocks and provide the iteration
complexity analysis for the corresponding algorithm in the sense of [12]. In this work, we choose
not to do the extension because we are not aware of interesting applications of the m-block (m ≥ 4)
separable convex minimization problems with m − 2 strongly convex blocks. While our sufficient
condition bounding the range of values for σ and T3 is quite flexible, it may have one potential
limitation: T3 can be very large if σ is not small as shown in Remark 2.1. Since a larger T3 can
potentially make the algorithm converge slower, in our future research we shall first study how
this limitation can be circumvented before we study other important issues such as the iteration
complexity2 .
1
One can prove similar results if the third instead of the second block is strongly convex.
In a recent report [1], Cai, Han and Yuan independently proved a result similar to Theorem 2.1 for the directly
extended ADMM (i.e., all the three semi-proximal terms T1 , T2 and T3 disappear) with τ = 1 and provided an
analysis on the iteration complexity.
2
13
References
[1] X. J. Cai, D. R. Han and X. M. Yuan, The direct extension of ADMM for three-block
separable convex minimization models is convergent when one function is strongly convex,
Optimization Online, (2014).
[2] C. H. Chen, B. S. He, Y. Y. Ye and X. M. Yuan, The direct extension of ADMM for multi-block
convex minimization problems is not necessarily convergent, Mathematical Programming, Ser.
A, DOI 10.1007/s10107-014-0826-5, (2014).
[3] C. H. Chen, Y. Shen and Y. F. You, On the convergence analysis of the alternating direction
method of multipliers with three blocks, Abstract and Applied Analysis, 2013 (2013), Article
ID 183961, 7 pages.
[4] J. Eckstein and D. P. Bertsekas, On the Douglas-Rachford splitting method and the proximal
point algorithm for maximal monotone operators, Mathematical Programming, 55 (1992),
pp. 293–318.
[5] M. Fazel, T. K. Pong, D. F. Sun and P. Tseng, Hankel matrix rank minimization with
applications to system identification and realization, SIAM Journal on Matrix Analysis and
Applications, 34(3) (2013), pp. 946–977.
[6] M. Fortin and R. Glowinski, Augmented Lagrangian methods, vol. 15 of Studies in Mathematics and its Applications, North-Holland Publishing Co., Amsterdam, 1983. Applications
to the numerical solution of boundary value problems, Translated from the French by B. Hunt
and D. C. Spicer.
[7] D. Gabay, Applications of the method of multipliers to variational inequalities, in Augmented
Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems,
M. Fortin and R. Glowinski, eds., vol. 15 of Studies in Mathematics and Its Applications,
Elsevier, (1983), pp. 299–331.
[8] D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems
via finite element approximation, Computers & Mathematics with Applications, 2 (1976),
pp. 17–40.
[9] R. Glowinski, Lectures on numerical methods for nonlinear variational problems, vol. 65 of
Tata Institute of Fundamental Research Lectures on Mathematics and Physics, Tata Institute
of Fundamental Research, Bombay, 1980. Notes by M. G. Vijayasundaram and M. Adimurthi.
[10] R. Glowinski and A. Marrocco, Sur l’approximation, par ´el´ements finis d’ordre un, et la
r´esolution, par p´enalisation-dualit´e, d’une classe de probl`emes de dirichlet non lin´eares, Revue
Francaise d’Automatique, Informatique et Recherche Op´erationelle, 9 (1975), pp. 41–76.
[11] D. R. Han and X. M. Yuan, A note on the alternating direction method of multipliers, Journal
of Optimization Theory and Applications, 155(1) (2012), pp. 227–238.
[12] B. S. He and X. M. Yuan, On the O(1/n) convergence rate of the Douglas-Rachford alternating
direction method, SIAM Journal on Numerical Analysis, 50 (2012), pp. 700–709.
14
[13] M. Y. Hong and Z. Q. Luo. On the linear convergence of the alternating direction method of
multipliers, arXiv:1208.3922v3, 2013.
[14] X. D. Li, D. F. Sun, K.-C. Toh, A Schur complement based semi-proximal ADMM for convex quadratic conic programming and extensions, accepted by Mathematical Programming,
(2014).
[15] T. Y. Lin, S. Q. Ma and S. Z. Zhang, On the convergence rate of multi-block ADMM,
arXiv:1408.4265v1, (2014).
[16] T. Y. Lin, S. Q. Ma and S. Z. Zhang, On the global linear convergence of the ADMM with
multi-block variables, arXiv:1408.4266v1, (2014).
[17] J. W. Nie and Y. X. Yuan, A predictor-corrector algorithm for QSDP combining Dikin-type
and Newton centering steps, Annals of Operations Research, 103 (2001), pp. 115–133.
[18] H. D. Qi, Local duality of nonlinear semidefinite programming, Mathematics of Operations
Research, 34(1) (2009), pp. 124–141.
[19] H. D. Qi and D. F. Sun, An augmented Lagrangian dual approach for the H-weighted nearest
correlation matrix problem, IMA Journal of Numerical Analysis, 31 (2011), pp. 491–511.
[20] R. T. Rockafellar, Convex Analysis. Princeton University Press, Princeton (1970).
[21] D. F. Sun, The strong second order sufficient condition and constraint nondegeneracy in nonlinear semidefinite programming and their implications, Mathematics of Operations Research,
31 (2006), pp. 761–776.
[22] D. F. Sun, J. Sun and L. W. Zhang, The rate of convergence of the augmented Lagrangian
method for nonlinear semidefinite programming, Mathematical Programming, 114 (2008),
pp. 349–391.
[23] J. Sun and S. Zhang, A modified alternating direction method for convex quadratically constrained quadratic semidefinite programs, European Journal of Operational Research, 207
(2010), pp. 1210–1220.
[24] K.-C. Toh, An inexact primal-dual path-following algorithm for convex quadratic SDP, Mathematical Programming, 112 (2008), pp. 221–254.
[25] K.-C. Toh, R. H. T¨
ut¨
unc¨
u and M. J. Todd, Inexact primal-dual path-following algorithms for
a special class of convex quadratic SDP and related problems, Pacific Journal of Optimization,
3 (2007), pp. 135–164.
[26] X. Y. Zhao, A semismooth Newton-CG augmented Lagrangian method for large scale linear
and convex quadratic SDPs, PhD thesis, Department of Mathematics, National University of
Singapore, 2009.
15
1/--pages
Пожаловаться на содержимое документа