Open Access

A multi-block alternating direction method with parallel splitting for decentralized consensus optimization

EURASIP Journal on Wireless Communications and Networking20122012:338

DOI: 10.1186/1687-1499-2012-338

Received: 15 February 2012

Accepted: 5 October 2012

Published: 12 November 2012

Abstract

Decentralized optimization has attracted much research interest for resource-limited networked multi-agent systems in recent years. Decentralized Tconsensus optimization, which is one of the decentralized optimization problems of great practical importance, minimizes an objective function that is the sum of the terms from individual agents over a set of variables on which all the agents should reach a consensus. This problem can be reformulated into an equivalent model with two blocks of variables, which can then be solved by the alternating direction method (ADM) with only communications between neighbor nodes. Motivated by a recently emerged class of so-called multi-block ADMs, this article demonstrates that it is more natural to reformulate a decentralized consensus optimization problem to one with multiple blocks of variables and solve it by a multi-block ADM. In particular, we focus on the multi-block ADM with parallel splitting, which has easy decentralized implementation. Convergence rate is analyzed in the setting of average consensus, and the relation between two-block and multi-block ADMs are studied. Numerical experiments demonstrate the effectiveness of the multi-block ADM with parallel splitting in terms of speed and communication cost and show that it has better network scalability.

Introduction

In recent years, the communication, signal processing, control, and optimization communities have witnessed considerable research efforts on decentralized optimization for networked multi-agent systems [13]. A networked multi-agent system, such as a wireless sensor network (WSN) or a networked control system (NCS), is composed of multiple geographically distributed but interconnected agents which have sensing, computation, communication, and actuating abilities. This system generally has limited resources for communication, since battery power is limited and recharging is difficult, while communication between two agents is energy-consuming. Furthermore, the communication link is often vulnerable and bandwidth-limited. In this situation, decentralized optimization emerges as an effective approach to improve network scalability. In decentralized optimization, data and computation are decentralized. Each agent exchanges information with its neighbors and accomplishes an otherwise centralized optimization task.

This article focuses on the decentralized consensus optimization problem. We consider a network of L agents which cooperatively optimize a separable objective function [38]:
min i = 1 L f i ( x ) , https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ1_HTML.gif
(1)

where f i ( x ) : R N R https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq1_HTML.gif is a convex function known to agent i only. The goal is to minimize the objective subject to consensus on x.

Related study

The decentralized consensus optimization formulation (1) arises in many practical applications, such as averaging [911], estimation [1217], learning [1821], etc. The form of f i (x) can be least squares [1113], 1-regularized least squares [1417], or more general ones [1821]. Note that this model can be extended to account for those with separable constraints, such as the network utility maximization (NUM) problem [2224].

Existing approaches to solving (1) include: i) belief propagation based on graphical models and Markovian random fields [1820]; ii) incremental optimization which minimizes the overall objective function along a predefined path on the network [7, 8]; iii) stochastic optimization with information exchange between neighboring agents [46]; and iv) optimization with explicit consensus constraints which can be handled with the alternating direction method (ADM) [3, 1217]. The ADM approach is fully decentralized, does not make any assumptions on network infrastructure such as free of loop or with a predefined path, and generally has satisfactory convergence performance. In this article, we mainly discuss the application of ADMs in the decentralized consensus optimization problem.

Our research is along the line of information-driven signal processing and control of WSNs and NCSs [2426]. Accompanied with the unprecedented data collection abilities offered by large-scale networked multi-agent systems, a new challenge also arises: how should we process such a large amount of data to make estimates and produce control strategies given limited network resources? Instead of processing the data in a fusion center, our solution is letting each agent autonomously make decisions aided by limited communication with its neighbors. From this perspective, each individual objective function f i (x) in (1) is constructed from the data collected by agent i, and x is the global information common to all agents (e.g., estimates or control strategies) obtained based on the data collected by the whole network. Though this framework can be generalized to various signal processing and control problems, this article focuses on those can be formulated as (1). For problems such as dynamic control and Kalman filtering of networked multi-agent systems, interested readers are referred to [1, 2, 27, 28], respectively.

Our contribution

Motivated by a series of recent articles on multi-block ADMs and their convergence analysis [2931], this article describes their applications to the decentralized consensus optimization problem. The multi-block ADM with parallel spliting is reviewed in Section 3. Unlike the classical ADM (see textbooks [32, 33]), this multi-block ADM splits the optimization variables into multiple blocks and sequentially updates just one of them while fixing the others. The classical ADM, on the other hand, only has two blocks of variables. Hence in this article we refer to it by the two-block ADM. Our problem (1) does not naturally have two distinct blocks of variables, and to apply the two-block ADM one needs to introduce extra variables (see e.g., [15, 16, 32]). We review this in Section 2. On the other hand, it is simpler to apply the multi-block ADM to (1) and the resulting algorithm is readily decentralized.

In this article also analyzes the convergence rate of the multi-block ADM applied to the average consensus problem, which is a special case of (1) where f i ( x ) = 1 2 x b i 2 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq2_HTML.gif for all i. In this setting, if the parameters of the multi-block ADM satisfy a certain formula, it is equivalent to the two-block ADM. Therefore, the two-block ADM can be considered as a special case of the multi-block ADM on average consensus problems. This relation also gives a guideline to select the parameters of the multi-block ADM so that it is not equivalent to and runs faster than the two-block ADM on all the tested decentralized consensus optimization problems, including the tested average consensus problems. The simulation results demonstrate that the multi-block ADM accelerates convergence, reduces communication cost, and thus improves network scalability.

Paper organization

The rest of this article is organized as follows. Section 2 reviews a reformulation of the decentralized consensus optimization problem (1), to which the two-block ADM is applied. Section 3 reviews the multi-block ADM and applies a parallel-splitting version of it to (1). Section 4 elaborates on the convergence rate analysis on the average consensus problem, and shows that the two-block ADM is a special case of the multi-block ADM in this case. Section 5 presents numerical simulations of the two-block and multi-block ADMs. Finally, Section 6 concludes the article. Appendix Appendix 1 is placed in the last section.

Problem formulation and the two-block ADM

In this section, we describe an equivalent formulation of the decentralized consensus optimization problem (1) and outline the algorithm design based on the two-block ADM.

Problem formulation

We consider a networked multi-agent system described by an undirected connected communication graph G = ( L , E ) https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq3_HTML.gif, where L https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq4_HTML.gif is the set of L vertexes (distributed agents) and E https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq5_HTML.gif is the set of edges (communication links). There exists an edge ( i , j ) E https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq6_HTML.gif between agents i and j if they can directly communicate with each other. The two agents are also called one-hop neighbors, or simply neighbors. The set of one-hop neighbors of agent i is denoted by N i https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq7_HTML.gif, whose cardinality is denoted by | N i | https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq8_HTML.gif.

Our objective is to solve (1) with only information exchange between neighbors. To this end, define x(i) as agent i’s local copy of x and impose consensus constraints x(i)=x(j)for all pairs of neighbors i and j. With these and given that the communication graph G https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq9_HTML.gif is connected, we obtain the following equivalent formulation of (1) (see e.g., [13]):
min i = 1 L f i ( x ( i ) ) , s.t. x ( i ) = x ( j ) , j N i , i. https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ2_HTML.gif
(2)

The two-block ADM

Let us consider the following convex program with separable equality constraints:
min g 1 ( θ 1 ) + g 2 ( θ 2 ) , s.t. D 1 θ 1 + D 2 θ 2 = e. https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ3_HTML.gif
(3)
Here for i=1 and 2, g i : R N i R https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq10_HTML.gif is convex, D i R M × N i https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq11_HTML.gif, e R M https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq12_HTML.gif. The two-block ADM constructs the augmented Lagrangian function as:
L a θ 1 , θ 2 , λ = g 1 θ 1 + g 2 θ 2 + λ T D 1 θ 1 + D 2 θ 2 e + c 2 | | D 1 θ 1 + D 2 θ 2 e | | 2 2 . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equa_HTML.gif
Here λ R M https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq13_HTML.gif is a Lagrange multiplier and c is a positive constant. At the t th iteration, the two-block ADM updates the optimization variables θ1(t + 1) and θ2(t + 1) as:
θ 1 ( t + 1 ) = arg min θ 1 L a ( θ 1 , θ 2 ( t ) , λ ( t ) ) , https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equb_HTML.gif
θ 2 ( t + 1 ) = arg min θ 2 L a ( θ 1 ( t + 1 ) , θ 2 , λ ( t ) ) , https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equc_HTML.gif
and updates the Lagrange multiplier λ(t + 1) as:
λ ( t + 1 ) = λ ( t ) + c ( D 1 θ 1 ( t + 1 ) + D 2 θ 2 ( t + 1 ) e ) . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equd_HTML.gif

The two-block ADM guarantees global convergence for any c > 0 [32]. More precisely, when each g i is convex for i = 1 and 2, the dual sequence {λ(t)} converges to an optimal dual solution of (5); if further the primal sequence {θ1(t) T θ2(t) T T } is bounded, the sequence converges to an optimal primal solution of (5).

The two-block ADM for decentralized consensus optimization

The two-block ADM cannot be directly applied to problem (2) because its constraints interconnect all the variables pair by pair. There are no obvious two blocks. To overcome this, [32] describes a new block of auxiliary variables, and reformulates (2) as:
min { x ( i ) } , { z ij } i = 1 L f i ( x ( i ) ) , s.t. x ( i ) = z ij , x ( j ) = z ij , j N i , i. https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ4_HTML.gif
(4)

Here z ij is an auxiliary variable attached to x(i)and x(j).

Treating {x(i)} and {z ij } as two blocks of variables, the two-block ADM is applied to problem (4). This technique has been adopted in [15, 16] to solve the decentralized consensus optimization problem with neighboring consensus constraints. After eliminating {z ij } from the iterative updates and further simplifications, the two-block ADM for (4) is given below as algorithm TB-ADM.

Initialization: Each agent i initializes x(i)(0)=0 and α i (0)=0.

Step 1: At time t, each agent i updates its local copy x(i)as: x ( i ) ( t + 1 ) = arg min x ( i ) f i ( x ( i ) ) + α i T ( t ) x ( i ) + c j N i | | x ( i ) 1 2 x ( i ) ( t ) + x ( j ) ( t ) | | 2 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq14_HTML.gif, where α i is the Lagrange multiplier and c is a positive constant.

Step 2: At time t, each agent i updates its Lagrange multiplier α i as: α i ( t + 1 ) = α i ( t ) + c | N i | x ( i ) ( t + 1 ) c j N i x ( j ) ( t + 1 ) https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq15_HTML.gif.

Step 3: Repeat Step 1 and Step 2 until convergence.

TB-ADM is well suited for decentralized computation since the updates require only communication between agents i and j, who are one-hop neighbors. Detailed derivation of TB-ADM can be found in [15, 16, 32].

The multi-block ADM

The fact that many practical optimization problems naturally have multiple blocks of variables motivates the development of a class of multi-block ADMs, such as the one with parallel splitting [29], with prediction-correction [30], and with Gaussian back substitution [31]. Due to the nature of the decentralized consensus optimization problem (2) and the need of parallelization, we choose the multi-block ADM with parallel splitting in [29].

The multi-block ADM with parallel splitting

Consider an equality constrained convex program which can be separated to L parts:
min i = 1 L g i ( θ i ) , s.t. i = 1 L D i θ i = e. https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ5_HTML.gif
(5)
Here for all i, g i : R N i R https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq16_HTML.gif is convex, D i R M × N i https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq17_HTML.gif, e R M https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq18_HTML.gif. At the t th iteration, the multi-block ADM with parallel splitting works as follows: Step 1: Updating an auxiliary variable q:
q ( t + 1 ) = λ ( t ) + β i = 1 L D i θ i ( t ) e , https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Eque_HTML.gif
where q is an auxiliary variable, λ is a Lagrange multiplier, and β is a positive constant. Step 2: Updating optimization variables{θ i }:
θ i ( t + 1 ) = arg min θ i g i ( θ i ) + q ( t + 1 ) T D i θ i + μ 2 | | D i θ i D i θ i ( t ) | | 2 2 , i , https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equf_HTML.gif
where μ is a positive constant. Step 3: Updating the Lagrange multiplier λ:
λ ( t + 1 ) = λ ( t ) + β i = 1 L D i θ i ( t + 1 ) e . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equg_HTML.gif

The multi-block ADM guarantees global convergence if the two positive constants β and μ are properly chosen. For the convergence proof and the settings of β and μ, the interested reader is referred to [29].

The multi-block ADM for decentralized consensus optimization

Applying the multi-block ADM in (2) directly gets a decentralized algorithm, and does not need to introduce a new block of auxiliary variables and eliminate them, as we have done in the two-block ADM. We provide the algorithm to solve (2) based on the multi-block ADM with parallel splitting, denoted as MB-ADM. Detailed derivation of MB-ADM is given in Appendix Appendix 1.

Initialization: Each agent i initializes q i (0)=0, x(i)(0)=0, and λ i (0)=0.

Step 1: At time t, each agent i updates its auxiliary variable q i as: q i ( t + 1 ) = λ i ( t ) + β | N i | x ( i ) ( t ) β j N i x ( j ) ( t ) https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq19_HTML.gif, where β is a positive constant.

Step 2: At time t, each agent i updates its local copy x(i)as: x ( i ) ( t + 1 ) = arg min x ( i ) f i ( x ( i ) ) + 2 q i T ( t + 1 ) x ( i ) + μ | N i | | | x ( i ) x ( i ) ( t ) | | 2 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq20_HTML.gif, where μ is a positive constant.

Step 3: At time t, each agent i updates its Lagrange multiplier λ i as: λ i ( t + 1 ) = λ i ( t ) + β | N i | x ( i ) ( t + 1 ) β j N i x ( j ) ( t + 1 ) https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq21_HTML.gif.

Step 4: Repeat Step 1 to Step 3 until convergence.

In each iteration, to update q i (t + 1) and λ i (t), agent i needs x(j)(t) with the size of N×1 from all neighbors j N i https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq22_HTML.gif; to optimize x(i)(t + 1), agent i only needs local information q i (t + 1) and x(i)(t). In all, each agent only needs to broadcast an N×1 vector of its local copy (i.e., x(i)(t)) to its neighbors per iteration. MB-ADM and TB-ADM have the same per-iteration communication cost. At the t th iteration, agent i needs to update x(i)(t), q i (t), and λ i (t) in its memory for MB-ADM. Hence the memory requirement is slightly higher than that of TB-ADM, for which only x(i)(t) and α i (t) need to be updated.

Convergence rate analysis

Convergence rate is an significant issue for decentralized algorithms, since it directly influences the overall communication cost. With respect to general separable convex programs, [29, 34] proves the sublinear convergence rates of 1 t https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq23_HTML.gif for the multi-block and two-block ADMs, respectively. However, when they are applied to the average consensus problems, much faster convergence can be observed. For this reason, we improve the convergence rate in this section.

The average consensus problem gives rise to problem (2) with f i ( x ( i ) ) = 1 2 | | x ( i ) b i | | 2 2 , i https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq24_HTML.gif[911]; namely, agents aims at averaging their original measurements {b i } via one-hop communication. Without loss of generality, we assume that x(i) and b i are both scalars since their dimensions have no effect on the convergence rate.

Convergence rate of MB-ADM

In analyzing the convergence rate of MB-ADM for the average consensus problem, we first rewrite MB-ADM as a state transition equation form and then use the spectral analysis tools to provide a bound of convergence rate. Our train of thought is similar to that in [35] for the two-block ADM.

According to the derivation in Appendix Appendix 1, we can rewrite MB-ADM in a state transition equation form. Let us define a state vector s M (t + 1)=[x(1)(t + 1),…,x(L)(t + 1),x(1)(t),…,x(L)(t)] T and the corresponding state transition equation of MB-ADM is:
s M ( t + 1 ) = Φ M s M ( t ) . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ6_HTML.gif
(6)
Here the state transition matrix Φ M is defined as:
Φ M = Γ M Ω M I L × L 0 L × L https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equh_HTML.gif

with Γ M being an L×L matrix whose (i,i)th entry is 1 4 β | N i | + 4 μ | N i | 1 + 2 μ | N i | https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq25_HTML.gif and (i,j)th entry is 4 β 1 + 2 μ | N i | https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq26_HTML.gif if i and j are neighbors, and Ω T being an L×L matrix whose (i,i)th entry is 2 β | N i | 2 μ | N i | 1 + 2 μ | N i | https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq27_HTML.gif and (i,j)th entry is 2 β 1 + 2 μ | N i | https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq28_HTML.gif if i and j are neighbors. We can see that summation of each row of Φ M is 1. The initial state is s M ( 1 ) = [ b 1 1 + 2 μ | N 1 | , , b L 1 + 2 μ | N L | , 0 , , 0 ] https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq29_HTML.gif when each agent i initializes q i (0)=0, x(i)(0)=0, and λ i (0)=0.

Proposition 1

(convergence and convergence rate of MB-ADM on average consensus) The state transition equation (6) defined above has the following properties:

Property 1

The matrix Φ M has an eigenvalue ρM 1=1 with multiplicity 1, and its corresponding left and right eigenvectors are:
l M 1 = 1 + 2 μ | N 1 | , , 1 + 2 μ | N L | , 2 μ | N 1 | , , 2 μ | N L | https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equi_HTML.gif
and
r M 1 = 1 L , , 1 L , 1 L , , 1 L T , https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equj_HTML.gif

respectively. Note that lM 1 and rM 1 are chosen subject to lM 1rM 1=1.

Property 2

Define:
ρ M = max i 1 | ρ Mi | , https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equk_HTML.gif
where ρ Mi is the i th eigenvalue of Φ M . If ρ M <1, then the limit property of s M (t) is:
lim t s M ( t + 1 ) = lim t Φ M t s M ( 1 ) = r M 1 l M 1 s M ( 1 ) = 1 L i = 1 L b i 1 , , 1 , 1 , , 1 T . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equl_HTML.gif
Further, denoting that κ M is the size of the largest Jordan block of Φ M , the convergence rate is:
| | s M ( t + 1 ) s M ( ) | | 2 t ( κ M 1 ) ρ M t . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equm_HTML.gif

Proof of Property 1 is given in Appendix Appendix 1. Property 2 comes from the classical convergence rate analysis of state transition equations. If ρM 1=1 and ρ M <1, then there exists a unique s M ( ) https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq30_HTML.gif and the convergence rate is | | s M ( t + 1 ) s M ( ) | | 2 t ( κ M 1 ) ρ M t https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq31_HTML.gif (see [36], Fact 3). Next we try to find one possible (and hence unique) s M ( ) https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq32_HTML.gif. By definition, Φ M rM 1=ρM 1rM 1=rM 1. Hence lim t Φ M t r M 1 = r M 1 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq33_HTML.gif. Similarly, lim t l M 1 Φ M = l M 1 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq34_HTML.gif. These two facts mean that rM 1lM 1 is a possible limit point of lim t Φ M t https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq35_HTML.gif. Therefore, rM 1lM 1s M (1) is a possible (and hence unique) limit point of s M ( ) https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq36_HTML.gif.

Remark 1

Note that the t ( κ M 1 ) ρ M t https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq37_HTML.gif rate, though still loose, is tighter than the 1 t https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq38_HTML.gif rate of the multi-block ADM for general separable convex programs [29]. Indeed, from numerical experiments, we find that κ M , the size of the largest Jordan block of Φ M , is often equal to 1 (it means that Φ M is diagonalizable). In this case, the convergence rate can be as fast as ρ M t https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq39_HTML.gif.

In Property 2, there is a condition that ρ M <1. It is not necessarily for true any choices of μ and β. Next we show two nontrivial special cases where the condition in Property 2 satisfy. The first special case connects MB-ADM with TB-ADM. Analysis of these two special cases as well as numerical simulations provide guidelines for parameter selection in MB-ADM.

Proposition 2

(two nontrivial special cases) We have ρ M <1 in either one of the following two cases: Case 1: The parameters μ and β are chosen such that μ = 2β > 0; further, 2 β | N j | 1 + 2 μ | N j | < 1 4 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq40_HTML.gif and 2 μ | N j | 1 + 2 μ | N j | < 1 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq41_HTML.gif for all j=1,2,…,L. Case 2: The parameters μ and β are chosen such that μ=β>0; further, 2 β | N j | 1 + 2 μ | N j | < 1 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq42_HTML.gif and 2 μ | N j | 1 + 2 μ | N j | < 1 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq43_HTML.gif for all j=1,2,…,L.

Remark 2

The proof of Proposition 2 is given in Appendix Appendix 1. In case 1, we set μ=2β>0, which indeed leads to the equivalence between MB-ADM and TB-ADM, as we will show in the next subsection. In case 2, we set μ=β>0, which brings faster convergence for the average consensus problem according to numerical simulations (see Section 5.2). Hence we recommend to set β=τμ with a fixed ratio 1 2 τ 1 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq44_HTML.gif, and just tune the value of μ. This setting also works well for the general decentralized consensus optimization problem (1). Tuning μ for MB-ADM is similar to tuning c for TB-ADM; both algorithms have 1 parameter subject to the user choice. Note that the conditions in Proposition 2 are merely sufficient; 2 β | N j | 1 + 2 μ | N j | https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq45_HTML.gif and 2 μ | N j | 1 + 2 μ | N j | https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq46_HTML.gif can be larger than their upper bounds given above.

Connection between MB-ADM and TB-ADM

To show the connection between MB-ADM and TB-ADM, we also write TB-ADM as a state transition equation form. Note that [35] considers another kind of two-block ADM for the average consensus problem, where consensus constraints are quadratically penalized by different weights in the augmented Lagrangian function. In TB-ADM, the consensus constraints are quadratically penalized by the same weight c.

We define a state vector s T (t + 1)=[x(1)(t + 1),…,x(L)(t + 1),x(1)(t),…,x(L)(t)] T and the corresponding state transition equation, according to the derivation in Appendix 1:
s T ( t + 1 ) = Φ T s T ( t ) . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ7_HTML.gif
(7)
Here the state transition matrix Φ T is defined as:
Φ T = Γ T Ω T I L × L 0 L × L https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equn_HTML.gif

with IL×L being the L×L identity matrix, 0L×L being the L×L zero matrix, Γ T being an L×L matrix whose (i,i)th entry is 1 and (i,j)th entry is 2 c 1 + 2 c | N i | https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq47_HTML.gif if i and j are neighbors, and Ω T being an L×L matrix whose (i,i)th entry is c | N i | 1 + 2 c | N i | https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq48_HTML.gif and (i,j)th entry is c 1 + 2 c | N i | https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq49_HTML.gif if i and j are neighbors. The initial state is s T ( 1 ) = [ b 1 1 + 2 c | N 1 | , , b L 1 + 2 c | N L | , 0 , , 0 ] https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq50_HTML.gif when each agent i initializes x(i)(0)=0 and α i (0)=0.

Comparing the state transition equations of MB-ADM and TB-ADM, we can find that TB-ADM is indeed a special case of MB-ADM when c=μ=2β>0. In this sense, MB-ADM provides more flexibility in parameter selection than TB-ADM. According to our simulations in Section 5.2, setting β=τμ with 1 2 τ 1 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq51_HTML.gif makes MB-ADM faster than TB-ADM.

Let ρ Ti be the i th eigenvalue of Φ T . Apparently ρT 1=1. Defining:
ρ T = max i 1 | ρ Ti | , https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equo_HTML.gif

and denoting κ T as the size of the largest Jordan block of Φ T , we can prove that TB-ADM has a similar t ( κ T 1 ) ρ T t https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq52_HTML.gif convergence rate to the optimal solution given the conditions in Case 1 of Proposition 2. Interestingly, the upper bounds of 2 c | N j | 1 + 2 c | N j | < 1 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq53_HTML.gif for all j=1,2,…,L are no longer needed since TB-ADM guarantees global convergence for any c>0.

Numerical Experiments

In this section, we present numerical simulations and demonstrate the performance of MB-ADM on the decentralized consensus optimization problems. Particularly, we are interested in how the communication cost scales to the network size.

Simulation Settings

In the numerical experiments, we consider the case that the agents cooperatively solve a least-squares problem. Each agent i has a measurement matrix A i R M × N https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq54_HTML.gif and a measurement vector b i R M https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq55_HTML.gif. The objective function in (1) is thus f ( x ) = i = 1 L f i ( x ) = 1 2 i = 1 L | | A i x b i | | 2 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq56_HTML.gif. The elements of the true signal vector x0 and the entries of the measurement matrices {A i } follow the normal distribution N ( 0 , 1 ) https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq57_HTML.gif. The measurement vector b i =A i x0 + η i ; the elements of the noise vector η i follow the normal distribution N ( 0 , 0 . 1 ) https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq58_HTML.gif. In the tests of average consensus, {A i } reduce to identity matrices and are no longer random.

In the simulation, we assume that L agents are uniformly randomly deployed in a 100×100 area. All agents have a common communication range r C , which is chosen such that the networked multi-agent system is connected. Given r C , the average node degree d can be calculated. We consider the following three scenarios: # 1) L=50, M=1, N=1, {A i =1}, r C =30, d12; # 2) L=50, M=10, N=5, r C =30, d12; # 3) L=200, M=10, N=5, r C =15, d12. Scenario # 1 is the average consensus test. Throughout the simulations, we set β=τμ in MB-ADM with τ=0. 9.

Convergence rate for average consensus

Under different choices of c, μ, and β, the values of ρ T for TB-ADM and the values of ρ M for MB-ADM with respect to scenario # 1 are shown in Figure 1. For TB-ADM, ρ T sharply reduces when c increases from 0; after a certain turning point (at c0. 17) which corresponds to the fastest convergence rate, ρ T steadily increases. The curve of ρ M for MB-ADM shows to be more complicated due to the existence of two parameters, μ and β. For each μ, ρ M steadily reduces when β increases from 0, then sharply goes to be larger than 1 which corresponds to divergence. The larger μ, the wider convergence range for β; but the side-effect is the relatively slower convergence rate. The curve of particular interest to us is μ=c0. 17. In this curve, 2β=c0. 17 corresponds to Case 1 in Proposition 2; namely, when MB-ADM reduces to TB-ADM. Increasing β from c, ρ M still decreases until reaching a turning point 2β=2c0. 34, which corresponds to Case 2 in Proposition 2. This simulation validates our analysis in Section 5.2, as well as the proposed parameter selection rule (namely, setting a ratio τ, 1 2 τ 1 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq59_HTML.gif, such that β=τμ).
https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Fig1_HTML.jpg
Figure 1

Curves of ρ T and ρ M for TB-ADM MB-ADM in scenario # 1. The dash line is for TB-ADM and its turning point corresponds to c0.17; the solid line is for MB-ADM with μ=2c0.34; the four dot lines are for MB-ADM with μ=0.1, 0.3, 0.5, and 0.7.

Simulation results about the actual convergence properties are shown in Figure 2. By absolute error we denote the ℓ2-norm of the distance between the current solution and the centralized optimal solution. Though the convergence rates of MB-ADM and TB-ADM are at the same magnitude, MB-ADM shows to be slightly superior to TB-ADM.
https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Fig2_HTML.jpg
Figure 2

Convergence of the decentralized consensus optimization algorithms for scenario # 1. Here β = τμ with τ = 0.9.

According to the theoretical analysis in Sections 4.1 and 4.2, the estimated convergence rates of MB-ADM and TB-ADM are t ( κ M 1 ) ρ M t https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq60_HTML.gif and t ( κ T 1 ) ρ T t https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq61_HTML.gif, respectively. However, numerical simulations show that they are loose bounds; the actual convergence rate, as we can observe from Figure 2, are linear.

Performance Comparison

Figures 3 and 4 depict the convergence properties of the two decentralized consensus optimization algorithms for scenarios #2 and #3, respectively. The parameters μ and c are tune to be near the best ones with. Here we still have β=τμ with τ=0.9. For either the medium network in scenario #2 or the large network in scenario #3, both algorithms linearly converge to the optimal solution. Comparing the two decentralized algorithms, MB-ADM outperforms TB-ADM in each scenario regarding convergence rate.
https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Fig3_HTML.jpg
Figure 3

Convergence of the decentralized consensus optimization algorithms for scenario # 2. The parameters c and μ are tuned to near the best, and β=τμ with τ=0.9

https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Fig4_HTML.jpg
Figure 4

Convergence of the decentralized consensus optimization algorithms for scenario # 3. The parameters c and μ are tuned to near the best, and β = τμ with τ = 0. 9.

What of particular interest to us is whether the decentralized algorithms are scalable to network size. Observing Figure 3 with L=50 agents and d12, and Figure 4 with L=200 agents and d12, we can find that the convergence rates of the two algorithm are more dependent on the average node degree other than on the network size. These numerical experiments verify the well-recognized claim that decentralized optimization may improve the performance of a networked multi-agent system with respect to network scalability.

Communication cost

Communication cost, in terms of energy consumption and bandwidth, is the major design consideration of a resource-limited networked multi-agent system, and can be approximately evaluated by the volume of information exchange during the decentralized consensus optimization process. Ignoring the extra burden of coordinating the network, for each agent, the communication cost is proportional to the number of iterations multiplied by the volume of information exchange per iteration. Therefore, reducing the information exchange per iteration is of critical importance to the design of lightweight algorithms.

Comparing the two decentralized consensus optimization algorithms, the information exchange per iteration is decided by the communication mode of agents, namely, broadcast or unicast. In the broadcast mode, one agent can send one piece of information to all of its neighbors with one transmission; contrarily, in the unicast mode, the agent needs multiple transmissions to do so. The two modes both have their pros and cons. The broadcast mode utilizes the characteristic of wireless communication, but may brings difficulties in coordinating the network, such as avoiding collisions. Though the unicast mode consumes much more transmissions, the randomized-gossip-like scheme is very useful in communication for the sake of robustness [37]. The average volume of information exchange per iteration of the four decentralized consensus optimization algorithms are outlined in Table 1, for both the broadcast and unicast modes.
Table 1

Average volume of information exchange per iteration

 

TB-ADM

MB-ADM

Broadcast mode

N

N

Unicast mode

dN

dN

In summary, the decentralized consensus optimization algorithms, no matter with the broadcast or unicast mode, are scalable to the network size. Since the number of iterations is proportional to the average node degree d, the overall average volume of information exchange is Nd for the broadcast mode and N d2 for the unicast mode. As a comparison, consider a centralized networked multi-agent system uniformly randomly deployed in a two-dimensional area with a fusion center which collects measurement vectors from all agents. The average volume of information exchange is M L https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq62_HTML.gif while the worst one is ML for agents near the fusion center. When the network size L increases, the communication cost caused by the centralized network infrastructure is unaffordable and the decentralized network infrastructure is hence superior.

Conclusion

This article considers solving the decentralized consensus optimization problem with the parallel version multi-block ADM in a networked multi-agent system. The traditional ADM can be used but it requires the introduction of a second block of auxiliary variables whereas our method takes advantages of the problem’s nature of having multiple blocks of variables. We analyze the rate of convergence of our method applied to the average consensus problem. Analysis results that the two-block ADM is a special case of the multi-block ADM on average consensus. With extensive numerical experiments, we demonstrate the effectiveness of the proposed algorithm.

In the implementation of a networked multi-agent system, practical issues such as packet loss, asynchronization, and quantization are inevitable. This article assumes that the communication links are reliable, the network time is slotted and well synchronized, and the exchanged information is not quantized. We would like to address these issues in future research.

Appendix 1

This section provides some theoretical results in the article.

Development of MB-ADM

The decentralized consensus optimization problem (2) with neighboring consensus constraints can be rewritten as the form of (5). Apparently, g i =f i , θ i =x(i), and e is an L2N×1 zero vector. Each D i is an L2N×N matrix with L2 blocks of N×N matrices. Each block of D i can be defined as follows. Consider an L×L matrix U(i), whose ( i , j ) https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq63_HTML.gifth entry U i j ( i ) = 1 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq64_HTML.gif if j = i https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq65_HTML.gif and i N i https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq66_HTML.gif, U i j ( i ) = 1 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq67_HTML.gif if i = i https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq68_HTML.gif and j N i https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq69_HTML.gif, and U i j ( i ) = 0 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq70_HTML.gif otherwise. The ( i + L j L ) https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq71_HTML.gifth block of D i is U i j ( i ) I N https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq72_HTML.gif, where I N is an N×N identity matrix. Substituting them to the multi-block ADM, we can find that in optimizing x(i), agent i only needs its local information as well as part of q; on the other hand, to update its corresponding part of q and λ, each agent only needs based on the information from itself and its neighbors. The resulting algorithm is hence fully decentralized due to the nice structure of {D i }.

At time t, the multi-block ADM works as follows: Step 1: Updating the auxiliary variables {q ij }:
q ij ( t + 1 ) = λ ij ( t ) + β x ( i ) ( t ) x ( j ) ( t ) , i , j N i . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ8_HTML.gif
(8)
Step 2: Optimizing the local copies{x(i)}:
x ( i ) ( t + 1 ) = arg min x ( i ) f i ( x ( i ) ) + j N i ( q ij ( t + 1 ) q ji ( t + 1 ) ) T x ( i ) + μ | N i | | | x ( i ) x ( i ) ( t ) | | 2 2 , i. https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ9_HTML.gif
(9)
Step 3: Updating the Lagrange multipliers {λ ij }:
λ ij ( t + 1 ) = λ ij ( t ) + β x ( i ) ( t + 1 ) x ( j ) ( t + 1 ) , i , j N i . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ10_HTML.gif
(10)

Note that β and μ are positive constant parameters used by the multi-block ADM.

The updating rules (8), (9), and (10) can also be further simplified. Since we often set {λ ij (0)} as 0, (8) and (10) imply that q ij (t + 1)=−q ji (t + 1) and λ ij (t + 1)=−λ ji (t + 1). Summing up the two sides of (8) and (10) and defining a new auxiliary variable q i = j N i q ij https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq73_HTML.gif as well as a new Lagrange multiplier λ i = j N i λ ij https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq74_HTML.gif, their updating rules are:
q i ( t + 1 ) = λ i ( t ) + β | N i | x ( i ) ( t ) β j N i x ( j ) ( t ) , https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ11_HTML.gif
(11)
λ i ( t + 1 ) = λ i ( t ) + β | N i | x ( i ) ( t + 1 ) β j N i x ( j ) ( t + 1 ) . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ12_HTML.gif
(12)
Hence (9) simplifies to:
x ( i ) ( t + 1 ) = arg min x ( i ) f i ( x ( i ) ) + 2 q i T ( t + 1 ) x ( i ) + μ | N i | | | x ( i ) x ( i ) ( t ) | | 2 2 . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ13_HTML.gif
(13)

State transition equation of MB-ADM

Combining the updating rules of q i (t + 1) and q i (t) in (11) and the updating rule of λ i (t) in (12), we get:
q i ( t + 1 ) q i ( t ) = 2 β | N i | x ( i ) ( t ) 2 β j N i x ( j ) ( t ) β | N i | x ( i ) ( t 1 ) + β j N i x ( j ) ( t 1 ) . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ14_HTML.gif
(14)
Substituting f i ( x ( i ) ) = 1 2 | | x ( i ) b i | | 2 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq75_HTML.gif into (13), the optimality condition for x(i)(t + 1) is:
x ( i ) ( t + 1 ) + 2 q i ( t + 1 ) + 2 μ | N i | x ( i ) ( t + 1 ) 2 μ | N i | x ( i ) ( t ) b i = 0 . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ15_HTML.gif
(15)
Combining the optimality conditions of x(i)(t + 1) and x(i)(t) and (14) leads to:
x ( i ) ( t + 1 ) = 1 4 β | N i | + 4 μ | N i | 1 + 2 μ | N i | x ( i ) ( t ) + 4 β 1 + 2 μ | N i | j N i x ( j ) ( t ) + 2 β | N i | 2 μ | N i | 1 + 2 μ | N i | x ( i ) ( t 1 ) 2 β 1 + 2 μ | N i | j N i x ( j ) ( t 1 ) . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ16_HTML.gif
(16)

The initial state is s M ( 1 ) = b 1 1 + 2 μ | N 1 | , , b L 1 + 2 μ | N L | , https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq76_HTML.gif 0 , , 0 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq77_HTML.gif when each agent i initializes q i (0)=0, x(i)(0)=0, and λ i (0)=0.

Proof of Property 1 in Proposition 1

It is straightforward to show that ρM 1=1 is an eigenvalue of Φ M , as well as lM 1 and rM 1 are its corresponding left and right eigenvectors. Next we prove that ρM 1 = 1 is with multiplicity 1 by contradiction. If ρM 1 = 1 belongs to a larger Jordan block, there exists a vector [ w T , w ̄ T ] T https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq78_HTML.gif, such that Φ M [ w T , w ̄ T ] T = [ w T , w ̄ T ] T + [ 1 , , 1 , 1 , , 1 ] T https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq79_HTML.gif. Here w and w ̄ https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq80_HTML.gif are both L×1 vectors (see [36], Fact 2). Observing the lower half of Φ M , apparently w ̄ = w [ 1 , , 1 ] T https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq81_HTML.gif. Suppose that w k has the largest real part among all elements of w. Picking up the k th row of Φ M [ w T , w ̄ T ] T https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq82_HTML.gif = [ w T , w ̄ T ] T + [ 1 , , 1 , 1 , , 1 ] T https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq83_HTML.gif, we have:
1 4 β | N k | + 4 μ | N k | 1 + 2 μ | N k | w k + 2 β | N k | 2 μ | N k | 1 + 2 μ | N k | w k 1 + j N k 4 β 1 + 2 μ | N k | w j 2 β 1 + 2 μ | N k | ( w j 1 ) = w k + 1 , https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ17_HTML.gif
(17)
or equivalent to:
2 μ | N k | 1 + 2 μ | N k | + j N k 2 β 1 + 2 μ | N k | w j = 1 + 2 β | N k | 1 + 2 μ | N k | w i . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ18_HTML.gif
(18)
Denote the real part of w k and w j as Re(w k ) and Re(w j ), respectively. Recalling that Re(w k )≥Re(w j ) and picking up the real part of (18), we have:
1 + 2 β | N k | 1 + 2 μ | N k | Re ( w i ) = 2 μ | N k | 1 + 2 μ | N k | + j N k 2 β 1 + 2 μ | N k | Re ( w j ) 2 μ | N k | 1 + 2 μ | N k | + j N k 2 β 1 + 2 μ | N k | Re ( w k ) = 2 μ | N k | 1 + 2 μ | N k | + j N k 2 β | N k | 1 + 2 μ | N k | . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ19_HTML.gif
(19)

This leads to contradiction. Hence ρM 1=1 is an eigenvalue of Φ M with multiplicity 1.

Proof of Proposition 2

Denote the i th eigenvalue of Φ i as ρ Mi . Apparently, its eigenvectors should have the form of [ρ Mi v T ,v T ] T where v T =[v1,…,v L ] T is a nonzero vector, since the lower half of Φ M is [IL×L,0L×L]. Suppose that v k has the largest norm (here we use |·| to denote the norm of a complex number) among all elements of v. Then picking up the k th row of Φ M [ρ Mi v T ,v T ] T = ρ Mi [ρ Mi v T ,v T ] T , we have:
1 4 β | N k | + 4 μ | N k | 1 + 2 μ | N k | ρ Mi + 2 β | N k | 2 μ | N k | 1 + 2 μ | N k | v k + j N k 4 β 1 + 2 μ | N k | ρ Mi 2 β 1 + 2 μ | N k | v j = ρ Mi 2 v k , https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ20_HTML.gif
(20)
or equivalently:
ρ Mi 2 1 4 β | N k | + 4 μ | N k | 1 + 2 μ | N k | ρ Mi 2 β | N k | 2 μ | N k | 1 + 2 μ | N k | v k = j N k 4 β 1 + 2 μ | N k | ρ Mi 2 β 1 + 2 μ | N k | v j . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ21_HTML.gif
(21)
Since v k has the largest norm among all elements of v, taking norms for the both sides of (21) leads to:
ρ Mi 2 1 4 β | N k | + 4 μ | N k | 1 + 2 μ | N k | ρ Mi 2 β | N k | 2 μ | N k | 1 + 2 μ | N k | | v k | = j N k 4 β 1 + 2 μ | N k | ρ Mi 2 β 1 + 2 μ | N k | v j j N k 4 β 1 + 2 μ | N k | ρ Mi 2 β 1 + 2 μ | N k | v j j N k 4 β 1 + 2 μ | N k | ρ Mi 2 β 1 + 2 μ | N k | | v j | 4 β | N k | 1 + 2 μ | N k | ρ Mi 2 β | N k | 1 + 2 μ | N k | | v k | . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ22_HTML.gif
(22)
Notice that the inequalities turn to equalities when and only when v j = v k , j N k https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq84_HTML.gif. As v k has the largest norm among all elements of v, any v j with j N k https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq85_HTML.gif also has such inequalities, and the inequalities turn to equalities when and only when v j = v j , j N j https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq86_HTML.gif. Because the network is connected, we can deduce that these inequalities turn to equalities when and only when {v i } are all equal. This corresponds to the eigenvalue ρM 1=1. Canceling |v k | from the both sides and defining d 1 = 2 β | N k | 1 + 2 μ | N k | https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq87_HTML.gif and d 2 = 2 μ | N k | 1 + 2 μ | N k | https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq88_HTML.gif, (22) is equivalent to:
| ρ Mi 2 ρ Mi + 2 d 1 ρ Mi d 2 ρ Mi d 1 + d 2 | | 2 d 1 ρ Mi d 1 | . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ23_HTML.gif
(23)

Let us consider the two nontrivial special cases.

Case 1

The parameters μ and β are chosen such that μ=2β>0; further, 2 β | N j | 1 + 2 μ | N j | < 1 4 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq89_HTML.gif and 2 μ | N j | 1 + 2 μ | N j | < 1 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq90_HTML.gif for all j=1,2,…,L.

In this case, d 1 = 2 β | N k | 1 + 2 μ | N k | < 1 4 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq91_HTML.gif and d 2 = 2 μ | N k | 1 + 2 μ | N k | < 1 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq92_HTML.gif. Let us choose d = d 1 = d 2 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq93_HTML.gif, 1 4 > d > 0 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq94_HTML.gif. Hence, (23) simplifies to:
| ρ Mi 2 ρ Mi + d | | 2 d ρ Mi d | . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ24_HTML.gif
(24)
Define w = ρ Mi 1 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq95_HTML.gif, we have:
| w 2 + d 1 4 | 2 d | w | | w | 2 + d 1 4 2 d | w | | w | 2 2 d | w | + d 2 d 2 d + 1 4 ( | w | d ) 2 ( d 1 2 ) 2 | w | 1 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ25_HTML.gif
(25)

Recall that | w | = 1 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq96_HTML.gif only for ρM 1=1. For any other eigenvalues, | w | < 1 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq97_HTML.gif, and hence |ρ Mi |<1 for i≠1.

Case 2

The parameters μ and β are chosen such that μ=β>0; further, 2 β | N j | 1 + 2 μ | N j | < 1 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq98_HTML.gif and 2 μ | N j | 1 + 2 μ | N j | < 1 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq99_HTML.gif for all j=1,2,…,L.

In this case, d 1 = 2 β | N k | 1 + 2 μ | N k | < 1 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq100_HTML.gif and d 2 = 2 μ | N k | 1 + 2 μ | N k | < 1 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq101_HTML.gif. Let us choose d=d1=d2, 1 2 > d > 0 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq102_HTML.gif. Hence, (23) simplifies to:
| ρ Mi 2 ρ Mi + d ρ Mi | | 2 d ρ Mi d | . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ26_HTML.gif
(26)
Let us prove the conclusion by contradiction. Suppose that there exists a ρ Mi with |ρ Mi |≥1 satisfies (26), then:
| ρ Mi 2 ρ Mi + d ρ Mi | | 2 d ρ Mi d | | ρ Mi 1 + d | | 2 d ρ Mi d | | ρ Mi 1 2 | 1 2 + d 2 d | ρ Mi 1 2 | | ρ Mi 1 2 | 1 2 | ρ Mi | 1 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ27_HTML.gif
(27)

Again, the inequalities turns to equalities only for ρM 1=1. For any other eigenvalue ρ Mi , we have |ρ Mi |<1 which contradicts with |ρ Mi |≥1. Therefore, |ρ Mi |<1 for i≠1.

State transition equation of TB-ADM

Substituting f i ( x ( i ) ) = 1 2 | | x ( i ) b i | | 2 2 https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_IEq103_HTML.gif into:
x ( i ) ( t + 1 ) = arg min x ( i ) f i ( x ( i ) ) + α i T ( t ) x ( i ) + c j N i | | x ( i ) 1 2 x ( i ) ( t ) + x ( j ) ( t ) | | 2 2 , https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equp_HTML.gif
the optimality condition for x(i)(t + 1) is:
1 + 2 c | N i | x ( i ) ( t + 1 ) c | N i | x ( i ) ( t ) j N i x ( j ) ( t ) + α i ( t ) b i = 0 . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ28_HTML.gif
(28)
Considering x(i)(t), the optimality condition is correspondingly:
1 + 2 c | N i | x ( i ) ( t ) c | N i | x ( i ) ( t 1 ) c j N i x ( j ) ( t 1 ) + α i ( t 1 ) b i = 0 . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ29_HTML.gif
(29)
Combining (28) and (29) with:
α i ( t + 1 ) = α i ( t ) + c | N i | x ( i ) ( t + 1 ) c j N i x ( j ) ( t + 1 ) , https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equq_HTML.gif
the state transition equation for agent i is:
x ( i ) ( t + 1 ) = x ( i ) ( t ) + 2 c 1 + 2 c | N i | j N i x ( j ) ( t ) c | N i | 1 + 2 c | N i | x ( i ) ( t 1 ) c 1 + 2 c | N i | j N i x ( j ) ( t 1 ) . https://static-content.springer.com/image/art%3A10.1186%2F1687-1499-2012-338/MediaObjects/13638_2012_Article_539_Equ30_HTML.gif
(30)

Declarations

Acknowledgements

The work of Qing Ling is supported in part by NSFC grant 61004137 and Fundamental Research Funds for the Central Universities. The work of Wotao Yin is supported in part by ARL and ARO grant W911NF-09-1-0383 and NSF grants DMS-0748839 and ECCS-1028790. The work of Xiaoming Yuan is supported in part by the General Research Fund No. 203311 from Hong Kong Research Grants Council.

Authors’ Affiliations

(1)
Department of Automation, University of Science and Technology of China
(2)
School of Science, Nanjing University of Posts and Telecommunications
(3)
Department of Computational and Applied Mathematics
(4)
Department of Mathematics, HongKong Baptist University

References

  1. Ren W, Beard R, Atkins E: Information consensus in multivehicle cooperative control: collective group behavier through local interaction. IEEE Control Systs. Mag 2007, 27: 71-82.View Article
  2. Olfati-Saber R: Kalman-consensus filter: optimality, stability, and performance. In Proceedings of CDC. Shanghai, China; 2009:7036-7042.
  3. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundation Trends Mach. Learn 2010, 3: 1-122. 10.1561/2200000016View ArticleMATH
  4. Tsitsiklis J: Problems in decentralized decision making and computation. MIT, Ph.D Thesis; 1984.
  5. Nedic A, Ozdaglar A: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 2009, 54: 48-61.MathSciNetView ArticleMATH
  6. Srivastava K, Nedic A: Distrbited asynchronous constrained stochastic optimization. IEEE J. Sel. Topics Signal Process 2011, 5: 772-790.View Article
  7. Rabbat M, Nowak R: Distributed optimization in sensor networks. In Proceedings of IPSN. Berkeley, USA; 2004:20-27.
  8. Rabbat M, Nowak R: Quantized incremental algorithms for distributed optimization. IEEE J. Sel. Areas Commun 2006, 23: 798-808.View Article
  9. Xiao L, Boyd S, Kim S: Distributed average consensus with least-mean-square deviation. J. Parallel Distrib. Comput 2007, 67: 33-46. 10.1016/j.jpdc.2006.08.010View ArticleMATH
  10. Kar S, Moura J: Distributed consensus algorithms in sensor networks: quantized data and random link failures. IEEE Trans. Signal Process 2010, 58: 1383-1400.MathSciNetView Article
  11. Olshevsky A: Efficient Information Aggregation Strategies for Distributed Control and Signal Processing. Ph.D Thesis, MIT; 2010.
  12. Schizas I, Ribeiro A, Giannakis G: Consensus in ad hoc WSNs with noisy links - Part I: distributed estimation of deterministic signals. IEEE Trans. Signal Process 2008, 56: 350-364.MathSciNetView Article
  13. Mateos G, Schizas I, Giannakis G: Distributed recursive least-squares for consensus-based in-network adaptive estimation. IEEE Trans. Signal Process 2009, 57: 4583-4588.MathSciNetView Article
  14. Ling Q, Tian Z: Decentralized sparse signal recovery for compressive sleeping wireless sensor networks. IEEE Trans. Signal Process 2010, 58: 3816-3827.MathSciNetView Article
  15. Bazerque J, Giannakis G: Distributed spectrum sensing for cognitive radio networks by exploiting sparsity. IEEE Trans. Signal Process 2010, 58: 1847-1862.MathSciNetView Article
  16. Mateos G, Bazerque J, Giannakis G: Distributed sparse linear regression. IEEE Trans. Signal Process 2010, 58: 5262-5276.MathSciNetView Article
  17. Jakovetic D, Xavier J, Moura J: Cooperative convex optimization in networked systems: augmented Lagrangian algorithms with direct gossip communication. IEEE Trans. Signal Process 2011, 59: 3889-3902.MathSciNetView Article
  18. Cetin M, Chen L, Fisher I I I J, Ihler A, Moss R, Wainwright M, Willsky A: Distributed fusion in sensor networks. IEEE Signal Process. Mag 2006, 23: 42-55.View Article
  19. Predd J, Kulkarni S, Poor V: Distributed learning in wireless sensor networks. IEEE Signal Process. Mag 2007, 24: 56-69.
  20. Predd J, Kulkarni S, Poor H: A collaborative training algorithm for distributed learning. IEEE Trans. Inf. Theory 2009, 55: 1856-1871.MathSciNetView Article
  21. Khan U, Kar S, Moura J: Higher dimensional consensus: learning in large-scale networks. IEEE Trans. Signal Process 2010, 58: 2836-2849.MathSciNetView Article
  22. Jadbabaie A, Ozdaglar A, Zargham M: A distributed Newton method for network optimization. In Proceedings of CDC. Shanghai, China; 2009:2736-2741.
  23. Koshal J, Nedic A, Shanbhag U: Multiuser optimization: distributed algorithms and error analysis. SIAM J. Optimiz 2011, 21: 1046-1081. 10.1137/090770102MathSciNetView ArticleMATH
  24. Wan P, Lemmon M: Distributed network utility maximization using event-triggered augmented Lagrangian methods. In Proceedings of ACC. St. Louis, USA; 2009:3298-3303.
  25. Zhao F, Shin J, Reich J: Information-driven dynamic sensor collaboration. IEEE Signal Process. Mag 2002, 19: 61-72. 10.1109/79.985685View Article
  26. Zhao F, Guibas L: Wireless Sensor Networks: an Information Processing Approach. Morgan Kaufmann, Burlington, USA; 2004.
  27. Schizas I, Giannakis G, Roumeliotis S, Ribeiro A: Consensus in ad hoc WSNs with noisy links – part II: distributed estimation and smoothing of random signals. IEEE Trans. Signal Process 2008, 56(4):1650-1666.MathSciNetView Article
  28. Ribeiro A, Schizas I, Roumeliotis S, Giannakis G: Kalman filtering in wireless sensor networks: reducing communication cost in state estimation problems. IEEE Control Systs. Mag 2010, 30: 66-86.MathSciNetView Article
  29. Tao M: Some parallel splitting methods for separable convex programming with O(1/t) convergence rate. in press
  30. He B, Tao M, Xu M, Yuan X: Alternating directions based contraction method for generally separable linearly constrained convex programming problems. in press
  31. He B, Tao M, Yuan X: Alternating direction method with Gaussian back substitution for separable convex programming. SIAM J. Optim 2012, 22: 313-340. 10.1137/110822347MathSciNetView ArticleMATH
  32. Bertsekas D, Tsitsiklis J: Parallel and Distributed Computation: Numerical Methods. Athena Scientific, Nashua, USA; 1997.MATH
  33. Bertsekas D: Numerical Optimization. Athena Scientific, Nashua, USA; 1999.
  34. He B, Yuan X: On the O(1/n) convergence rate of Douglas-Rachford alternating direction method. SIAM J. Num. Anal 2012, 50: 700-709. 10.1137/110836936MathSciNetView ArticleMATH
  35. Erseghe T, Zennaro D, Dall’Anese E, Vangelista L: Fast consensus by the alternating direction multipliers method. IEEE Trans. Signal Process 2011, 59: 5523-5537.MathSciNetView Article
  36. Rosenthal J: Convergence rates for Markov chains. SIAM Rev 1995, 37: 387-405. 10.1137/1037083MathSciNetView ArticleMATH
  37. Boyd S, Ghosh A, Prabhakar B, Shah D: Randomized gossip algorithms. IEEE Trans. Inf. Theory 2006, 52: 2508-2530.MathSciNetView ArticleMATH

Copyright

© Ling et al.; licensee Springer. 2012

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://​creativecommons.​org/​licenses/​by/​2.​0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.