Privacy-preserving Incremental ADMM for Decentralized ...

11
1 Privacy-preserving Incremental ADMM for Decentralized Consensus Optimization Yu Ye, Student Member, IEEE, Hao Chen, Student Member, IEEE, Ming Xiao, Senior Member, IEEE, Mikael Skoglund, Fellow, IEEE, and H. Vincent Poor Fellow, IEEE Abstract—The alternating direction method of multipliers (ADMM) has been recently recognized as a promising optimizer for large-scale machine learning models. However, there are very few results studying ADMM from the aspect of communication costs, especially jointly with privacy preservation, which are critical for distributed learning. We investigate the communi- cation efficiency and privacy-preservation of ADMM by solving the consensus optimization problem over decentralized networks. Since walk algorithms can reduce communication load, we first propose incremental ADMM (I-ADMM) based on the walk algorithm, the updating order of which follows a Hamiltonian cycle instead. However, I-ADMM cannot guarantee the privacy for agents against external eavesdroppers even if the randomized initialization is applied. To protect privacy for agents, we then propose two privacy-preserving incremental ADMM algorithms, i.e., PI-ADMM1 and PI-ADMM2, where perturbation over step sizes and primal variables is adopted, respectively. Through theoretical analyses, we prove the convergence and privacy preservation for PI-ADMM1, which are further supported by numerical experiments. Besides, simulations demonstrate that the proposed PI-ADMM1 and PI-ADMM2 algorithms are commu- nication efficient compared with state-of-the-art methods. Index Terms—Decentralized optimization; alternating direc- tion method of multipliers (ADMM); privacy preservation. I. I NTRODUCTION Consider a decentralized network G = (V, E), where V = {1 , ..., N } is the set of agents with computing capability, and E is the set of links. G is connected and there is no center agent. The agents seek to solve the consensus optimization problem (1) collaboratively through sharing information among each other, min x N i=1 f i ( x ; D i ) , (1) where f i : p is the local function at agent i , and D i is the private dataset at agent i . All agents share a common optimization variable x p . The decentralized optimization problem (1) is widely applied in many areas such as signal processing [1]–[3], machine learning [4]–[6], wireless sensor networks (WSNs) [7], [8] and smart grids [9]–[11], just to name a few. This work was supported in part by the U.S. National Science Founda- tion under Grant CCF-1908308. Yu Ye, Hao Chen, Ming Xiao and Mikael Skoglund are with the School of Electrical Engineering and Computer Science, Royal Institute of Technology (KTH), Stockholm, Sweden (email: [email protected], [email protected], [email protected], [email protected]). H. Vincent Poor is with Department of Electrical Engineering, Princeton University, Princeton, USA (email: [email protected]). In the existing literature, e.g., [11]–[18], a large number of decentralized algorithms have been investigated to solve the consensus problem (1). Typically, the algorithms can mainly be classified into primal and primal-dual types, namely, gradient descent based (GD) methods and the alternating direction method of multipliers (ADMM) based methods, respectively. In this work, we will use ADMM as the optimizer, which can usually achieve more accurate consensus performance than GD based methods with constant step size [18]. Though distributed gradient descent (DGD) [12], EXTRA [13] and distributed ADMM (D-ADMM) [14] have good con- vergence rates with respect to the number of iterations, these algorithms require each agent to collect information from all its neighbors. This makes the amount of communication high for each iteration. Hence, for the applications with unstable links among agents such as WSNs, the communication load becomes one of the main bottlenecks. Moreover the straggler problem is also pronounced in synchronous ADMM based methods [16], where the total running time is significantly increased due to slowly computing nodes. Besides, for the ADMM based decentralized approaches provided in [14]–[18], agents are required to exchange primal and dual variables with neighbors in each iteration. This inevitably leads to the privacy problem if the local function f i and data set D i are private to agent i (∈ N), for instance, in the practical applications with sensitive information such as personal medical or financial records, confidential business notes and so on. Thus, guar- anteeing privacy is critical for decentralized consensus. A. Related Works To reduce the communication load for solving consensus problem (1), various decentralized approaches have been pro- posed recently. Among these efforts, one important direction is to limit information sharing for each iteration. In [19], given an underlying graph, the weighted ADMM is developed by deleting some links prior to the optimization process. Communication-censored ADMM (COCA) in [20] can adap- tively determine whether a message is informative during the optimization process. Following COCA, an extreme scenario is to only activate one link in the graph per iteration such as the random-walk ADMM (W-ADMM) [17], which incrementally updates the optimization variables. To achieve optimized trade- off between communication cost and running time, parallel random walk ADMM method (PW-ADMM) and intelligent PW-ADMM (IPW-ADMM) are proposed in [18]. References [21], [22] also take advantages of random walks but deal with arXiv:2003.10615v1 [math.OC] 24 Mar 2020

Transcript of Privacy-preserving Incremental ADMM for Decentralized ...

1

Privacy-preserving Incremental ADMM forDecentralized Consensus Optimization

Yu Ye, Student Member, IEEE, Hao Chen, Student Member, IEEE, Ming Xiao, Senior Member, IEEE,Mikael Skoglund, Fellow, IEEE, and H. Vincent Poor Fellow, IEEE

Abstract—The alternating direction method of multipliers(ADMM) has been recently recognized as a promising optimizerfor large-scale machine learning models. However, there are veryfew results studying ADMM from the aspect of communicationcosts, especially jointly with privacy preservation, which arecritical for distributed learning. We investigate the communi-cation efficiency and privacy-preservation of ADMM by solvingthe consensus optimization problem over decentralized networks.Since walk algorithms can reduce communication load, we firstpropose incremental ADMM (I-ADMM) based on the walkalgorithm, the updating order of which follows a Hamiltoniancycle instead. However, I-ADMM cannot guarantee the privacyfor agents against external eavesdroppers even if the randomizedinitialization is applied. To protect privacy for agents, we thenpropose two privacy-preserving incremental ADMM algorithms,i.e., PI-ADMM1 and PI-ADMM2, where perturbation over stepsizes and primal variables is adopted, respectively. Throughtheoretical analyses, we prove the convergence and privacypreservation for PI-ADMM1, which are further supported bynumerical experiments. Besides, simulations demonstrate that theproposed PI-ADMM1 and PI-ADMM2 algorithms are commu-nication efficient compared with state-of-the-art methods.

Index Terms—Decentralized optimization; alternating direc-tion method of multipliers (ADMM); privacy preservation.

I. INTRODUCTION

Consider a decentralized network G = (V, E), where V ={1, ..., N} is the set of agents with computing capability, and Eis the set of links. G is connected and there is no center agent.The agents seek to solve the consensus optimization problem(1) collaboratively through sharing information among eachother,

minx

N∑i=1

fi(x;Di), (1)

where fi : Rp → R is the local function at agent i, and Di

is the private dataset at agent i. All agents share a commonoptimization variable x ∈ Rp . The decentralized optimizationproblem (1) is widely applied in many areas such as signalprocessing [1]–[3], machine learning [4]–[6], wireless sensornetworks (WSNs) [7], [8] and smart grids [9]–[11], just toname a few.

This work was supported in part by the U.S. National Science Founda-tion under Grant CCF-1908308.

Yu Ye, Hao Chen, Ming Xiao and Mikael Skoglund are with theSchool of Electrical Engineering and Computer Science, Royal Institute ofTechnology (KTH), Stockholm, Sweden (email: [email protected], [email protected],[email protected], [email protected]).

H. Vincent Poor is with Department of Electrical Engineering, PrincetonUniversity, Princeton, USA (email: [email protected]).

In the existing literature, e.g., [11]–[18], a large number ofdecentralized algorithms have been investigated to solve theconsensus problem (1). Typically, the algorithms can mainly beclassified into primal and primal-dual types, namely, gradientdescent based (GD) methods and the alternating directionmethod of multipliers (ADMM) based methods, respectively.In this work, we will use ADMM as the optimizer, which canusually achieve more accurate consensus performance than GDbased methods with constant step size [18].

Though distributed gradient descent (DGD) [12], EXTRA[13] and distributed ADMM (D-ADMM) [14] have good con-vergence rates with respect to the number of iterations, thesealgorithms require each agent to collect information from allits neighbors. This makes the amount of communication highfor each iteration. Hence, for the applications with unstablelinks among agents such as WSNs, the communication loadbecomes one of the main bottlenecks. Moreover the stragglerproblem is also pronounced in synchronous ADMM basedmethods [16], where the total running time is significantlyincreased due to slowly computing nodes. Besides, for theADMM based decentralized approaches provided in [14]–[18],agents are required to exchange primal and dual variables withneighbors in each iteration. This inevitably leads to the privacyproblem if the local function fi and data set Di are private toagent i(∈ N), for instance, in the practical applications withsensitive information such as personal medical or financialrecords, confidential business notes and so on. Thus, guar-anteeing privacy is critical for decentralized consensus.

A. Related Works

To reduce the communication load for solving consensusproblem (1), various decentralized approaches have been pro-posed recently. Among these efforts, one important directionis to limit information sharing for each iteration. In [19],given an underlying graph, the weighted ADMM is developedby deleting some links prior to the optimization process.Communication-censored ADMM (COCA) in [20] can adap-tively determine whether a message is informative during theoptimization process. Following COCA, an extreme scenariois to only activate one link in the graph per iteration such as therandom-walk ADMM (W-ADMM) [17], which incrementallyupdates the optimization variables. To achieve optimized trade-off between communication cost and running time, parallelrandom walk ADMM method (PW-ADMM) and intelligentPW-ADMM (IPW-ADMM) are proposed in [18]. References[21], [22] also take advantages of random walks but deal with

arX

iv:2

003.

1061

5v1

[m

ath.

OC

] 2

4 M

ar 2

020

2

stochastic gradients. Another line of works is to exchangesparse or quantized messages to transmit fewer bits. Followingthis direction, the quantized ADMM is provided in [23].Two quantized stochastic gradient descent (SGD) methods,sign SGD with error-feedback (EF-SignSGD) and periodicquantized averaging SGD (PQASGD), are proposed in [24]and [25], respectively. Aji et al. [26] presented a heuristicapproach to truncate the smallest gradient components andonly communicate the remaining large ones. However, thesealgorithms cannot guarantee exact convergence to the optimalsolution [27].

Meanwhile, with increasing concerns on data privacy, e.g.,GDPR in Europe [28], more and more efforts are allocatedto developing privacy preserving algorithms for solving thedecentralized consensus problem (1). To measure the privacypreserving performance, differential privacy has been used[29]. In general, artificial uncertainty is introduced over sharedmessages or local functions to maintain sensitive informationsecure against eavesdroppers. Differential private distributedalgorithms based on GD have been presented in [30]–[32],where noise following different distributions is added to vari-able or gradients. Moreover, in primal-dual methods [33]–[36], perturbation is conducted either on penalty term, or onthe primal and dual variables before sharing to neighboringagents. On the other hand, cryptographic based methods,such as (partial) homomorphic Encryption [37], [38], are alsoadopted to protect the privacy of agents through applyingthe encryption mechanism to exchanged information. Otherprivacy-preserving approaches, including perturbing problemsor states, are provided in [39], [40]. However, these methodslead to high computational overheads and communication loadespecially in large-scale decentralized optimization.

B. Motivation and ContributionsW-ADMM can significantly reduce communication load

by activating only one node and link per iteration in asuccessive manner. Since the sequence of the updating orderfor agents is randomized by following a Markov chain, it ispossible that the primal and dual variables at some agentsare updated for much fewer rounds than others. Thus, theconvergence speed may degrade. To guarantee agents not to beinactive for a long time, Random Walk with Choice (RWC) isintroduced in IPW-ADMM [18] to reduce running time. Themain principle of RWC is that through restricting the updatingorder of agents, the convergence speed can be improved forincremental approaches, which shall save communication load.This phenomenon is also supported by the walk proximalgradient (WPG) presented in [11], where the updating orderof agents follows a Hamiltonian Cycle. Therefore, in whatfollows, we will fix the updating order of agents as WPGand present the incremental ADMM (I-ADMM). Differentfrom WPG, the I-ADMM is a primal-dual based approach.Moreover, comparing to PW-ADMM and IPW-ADMM, thereis only one walk with a fixed order in I-ADMM.

To enable the privacy preservation for I-ADMM against theeavesdropper, which overhears all links in E, we introduceuncertainty among local primal-dual variables and the trans-mitted tokens, e.g., through adding artificial noise over primal

or dual variables [33], [35]. However, without scaling noisewith iterations, there exists an error bound at convergence [35],which leads to a trade-off between privacy and accuracy. Toavoid compromising convergence performance, we propose toperform perturbation over the step size for both updates ofprimal and dual variables, i.e. {xi, yi}i∈V . Equivalently, thisprocedure can be seen as adding noise, which is scaled bythe primal and dual residues, over xki and yki , respectively foriteration k.

Different from the existed results, in what follows, we willstudy the ADMM based optimizer for solving (1) from theaspect of both communication efficiency and privacy preserva-tion. The main contributions of this paper can be summarizedas follows.• We propose the I-ADMM method for solving decen-

tralized consensus optimization (1). Different from otherincremental ADMM methods such as W-ADMM [17],the update order of I-ADMM follows a Hamiltoniancycle, which can further reduce communication load andimprove the convergence speed.

• Since I-ADMM is not privacy-preserving, we first intro-duce the randomized initialization. To guarantee privacypreservation, we provide two privacy-preserving incre-mental ADMM (PI-ADMM) algorithms, i.e., PI-ADMM1and PI-ADMM2, which apply perturbation over step sizeand primal variable, respectively. To the best of ourknowledge, this is the first paper to consider privacypreservation for incremental ADMM.

• We prove the convergence of PI-ADMM1, and show thatat convergence, the primal and dual variables generatedby PI-ADMM1 satisfy the Karush-Kuhn-Tucker (KKT)conditions with mild assumptions over local functions.Besides, we prove that privacy preservation is guaranteedagainst eavesdropper or colluding agents.

• Numerical results show that the proposed I-ADMM basedalgorithms are communication efficient compared withstate-of-the-art methods. Moreover, we show that PI-ADMM1 can guarantee both privacy-preservation andaccurate convergence.

The remainder of this paper is organized as follows. We firstintroduce the incremental ADMM in Section II. To guaranteethe privacy of local agents against external eavesdroppers, wepresent two PI-ADMM algorithms in Section III. The conver-gence and privacy analyses regarding proposed approaches arepresented in Section IV. To validate the efficiency of proposedmethods, we provide numerical experiments in Section V.Finally, Section VI concludes the paper.

II. INCREMENTAL ADMM

By defining x = [x1, ..., xN ] ∈ RpN , problem (1) can berewritten as

minx,z

N∑i=1

fi(xi;Di), s.t . 1 ⊗ z − x = 0, (2)

where z ∈ Rp , 1 = [1, ..., 1] ∈ RN , and ⊗ is Kroneckerproduct. For simplicity, we denote fi(xi;Di) as fi(xi). Theaugmented Lagrangian for problem (2) is

3

6

2 3

5

41

z 2

z 3

z 4

z 5z 6

z 7 agent

eavesdropper

z content of token

path of token

overhear

Hamiltonian Network

link

Fig. 1. An example of I-ADMM with an external eavesdropper.

Agents

Iter

atio

ns

1 2 5 6

6n

6n+1

6n+4

6n+5

nx 62

ny62

nx 65

ny65

nx 66

ny66

nx 6 1

1ny6 1

1

nx 6 2

2ny6 2

2

nx 65

ny65

nx 66

ny66

nx 6 1

1ny6 1

1

nx 6 2

2ny6 2

2

nx 6 5

5ny6 5

5

nx 66

ny66

nx 6 1

1ny6 1

1

nx 6 2

2ny6 2

2

nx 6 5

5ny6 5

5

nx 6 6

6ny6 6

6

...

...

...

...

...

...

...

...

nz 6 nz 6 1

nz 6 2

nz 6 5

nz 6 6

nx 6 1

1ny6 1

1

...

...

Fig. 2. Updating process of I-ADMM for example in Fig. 1 with iterations[6n, 6n + 5] where n ∈ N.

Lρ(x, y, z) =N∑i=1

fi(xi)+〈y,1 ⊗ z − x〉+ ρ2‖1 ⊗ z − x‖2 , (3)

where y = [y1, ..., yN ] ∈ RpN is the dual variable, while ρ >0 is a constant parameter. Following W-ADMM [17], withguaranteeing

∑Ni=1(x0

i −y0i

ρ ) = 0, the updates of x, y and z atthe (k + 1)-th iteration are given by

xk+1i :=

arg min

xifi(xi) +

ρ

2

zk − xi +ykiρ

2

, i = ik ;

xki , i ∈ V, i , ik ;

(4a)

yk+1i :=

yki + ρ

(zk − xk+1

i

), i = ik ;

yki , i ∈ V, i , ik ;(4b)

zk+1 :=1N

N∑i=1

(xk+1i −

yk+1i

ρ

). (4c)

Note that the updates (4a)-(4c) are not decentralized since (4c)collects information from all agents. With initializing x0 = 0and y0 = 0, it is easy to verify that the update of z can beincrementally implemented as

zk+1 = zk +1N

[(xk+1ik−

yk+1ik

ρ

)−

(xkik −

ykikρ

)]. (5)

Then the decentralized implementation of I-WADMM is pre-sented in Algorithm 1. Different from W-ADMM, the agentsare activated in a predetermined circulant pattern for I-ADMM,

Algorithm 1: I-ADMM

1: initialize: {z0 = 0, x0i = 0, y0

i = 0, k = 0|i ∈ V};2: for k = 0, 1, ... do3: agent ik = k mod N + 1 do:4: receive token zk ;5: update xk+1 according to (4a);6: update yk+1 according to (4b);7: update zk+1 according to (5);8: send token zk+1 to agent ik+1 = k mod N + 2;9: end for

where the activated agent in the k-th iteration is i = kmod N + 1. Furthermore, we assume walk 〈ik〉k≥0 repeats aHamiltonian cycle order: 1→ 2→ · · · → N → 1→ 2→ · · ·as WPG [11]. In I-ADMM, the variable zk+1 gets updated atagent ik and passed as a token to its neighbor ik+1 throughHamiltonian cycle. In Fig. 2, we present the evolution of{xi, yi} at agents and the transition of token z when I-ADMMis applied to the example shown in Fig. 1.

A. Convergence Analysis

To start with the convergence analysis for I-ADMM, wefirst make the following assumptions over network G and localfunctions { fi}. For simplicity, we denote F(x) = ∑N

i=1 fi(xi).

Assumption 1. Graph G is well connected and there exists aHamiltonian cycle.

Assumption 2. The objective function F(x) is bounded frombelow over x, and F(x) is coercive over x. Each fi(x) is L-Lipschitz differentiable, i.e., for any u, v ∈ Rp ,

‖∇ fi(u) − ∇ fi(v)‖ ≤ L ‖u − v‖ , i = 1, ..., N . (6)

With Assumption 2, we obtain the following useful inequal-ity for loss function fi .

Lemma 1. For any u, v ∈ Rp , we have

fi(u) ≤ fi(v) + 〈∇ fi(v), u − v〉 +L2‖u − v‖2. (7)

Proof. Inequality (7) is well known, see [41]. �

Then according to Lemma 1 and Theorem 1 in [17], wehave the following convergence property for I-ADMM.

Lemma 2. Under Assumptions 1 and 2, for ρ ≥ 2L + 2,the iterations (xk, yk, zk) generated by I-ADMM satisfy thefollowing properties:

1) Lρ(xk, yk, zk) − Lρ(xk+1, yk+1, zk+1) ≥ 0;2) 〈Lρ(xk, yk, zk)〉k≥0 is lower bounded and convergent;3) lim

k→∞

∇Lρ(xk, yk, zk) = 0.

Proof. The convergence proof of I-ADMM is similar to thatof W-ADMM in [17]. The difference is that as Assumption 1in [17], each agent is visited with a fixed period, i.e. J(δ) =N . �

4

B. Computation Efficient I-ADMM

With complex local function fi , the computation load forsolving minimization problem (4a) is high. To alleviate com-putation burden for the update of xk+1 in I-ADMM, we canapply the first-order approximation by replacing (4a) withupdate:

xk+1i :=

zk +

ykiρ− 1ρ∇ fi

(xki

), i = ik ;

xki , i ∈ V, i , ik .(8)

Comparing to (4a), update process (8) can save computation.However, it may cause higher communication loads to achievethe same accuracy, as shown in [42]. In what follows, wewill focus on investigating the privacy preserving I-ADMMbased on Algorithm 1. Besides, the obtained results can alsobe extended to the computation efficient I-ADMM methodswith privacy preservation.

III. INCREMENTAL ADMM WITH PRIVACY PRESERVATION

In this section, we will first analyze the privacy of Algorithm1. Then to guarantee the security for agents in solving (2), wepropose the PI-ADMM algorithms.

A. Review of I-ADMM

Assume that there exists an external eavesdropper over-hearing all the links, as shown in Fig. 1. While the agentswant to protect their private functions and datasets from theeavesdropper. The privacy property of I-ADMM is analyzedas follows.

Lemma 3. In Algorithm 1, the intermediate states and gradi-ents {xk+1

i , yk+1i ,∇ fi(xk+1

i )|i ∈ V, k = 0, 1, ...} can be inferredby the eavesdropper.

Proof. Assume that the eavesdropper has the knowledge ofxkik and ykik where k ≥ 0. Through the update process (4a), theeavesdropper can establish the following equations

∇ fik(xk+1ik

)= yk+1

ik. (9)

Hence once yk+1ik

is inferred by the master, ∇ fik (xk+1ik) will be

revealed as well. Combining the update process (4b) and (5),we have

yk+1ik= ykik + ρ

(zk − xk+1

ik

);

∆k+1 =

1N

[(xk+1ik−

yk+1ik

ρ

)−

(xkik −

ykikρ

)],

(10)

where ∆k+1 = zk+1 − zk . Through rearranging (10), therecursive equations are given by

xk+1ik=

12

(N∆k+1 + zk + xkik

);

yk+1ik=ykik +

ρ

2

(zk − N∆k+1 − xkik

),

(11)

Since the eavesdropper also overhears zk+1 and zk , (11) con-sists of 2 equations with 2 variables. Hence xk+1

ikand yk+1

ikcan

Algorithm 2: I-ADMM with Random Initialization

1: initialize: z1 = 0;2: for k = 0, ..., N − 1 do3: agent i = k mod N + 1 do:4: generate υ ∈ Rp randomly;5: set x0

i ← υ and y0i ← −ρυ;

6: end for7: run steps 2-9 of Algorithm 1.

Algorithm 3: PI-ADMM1 with Step Size Perturbation1: Follow steps of Algorithm 2, but substitute steps 6 and 7

of Algorithm 1 with the following 4 steps;2: generate γkik ∼ Pik ;3: set ρkik ← ρ · γkik ;4: update xk+1 by

xk+1i :=

arg min

xifi(xi) +

ρki2

zk − xi +yki

ρki

2

, i = ik ;

xki , o.w.;(12)

5: update yk+1 by

yk+1i :=

yki + ρ

ki

(zk − xk+1

i

), i = ik ;

yki , o.w..(13)

be obtained by solving (11). In I-ADMM, since {x0i , y

0i |i ∈ V}

are known to the eavesdropper, with (9) and the update (4a),(4b), we can conclude that {xki , yki ,∇ fi(xki )|i ∈ V, k = 0, 1, ...}can also be inferred. �

In addition, with {∇ fi(xki )|i ∈ V, k = 0, 1, ...}, the privatefunctions { fi} can be inferred by the eavesdropper [33].

B. Privacy-preserving I-ADMM

Inspired by the privacy-preservation definitions in [33], [43],[44], we define the privacy as follows.

Definition 1. A mechanism M :M(X) → Y is defined to beprivacy preserving if the input X cannot be uniquely derivedfrom the output Y.

Hence from Lemma 1, I-ADMM is not privacy preserving.Recalling the proof of Lemma 3, by making x0

i and y0i private,

information {xki , yki ,∇ fi(xki )|k = 0, 1, ...} cannot be inferred bythe eavesdropper exactly from the recursive equation (11) ifthe states and multipliers are unknown at the end of iteration.In fact, to implement the incremental update (5), we only needto ensure

∑Ni=1(x0

i −y0i

ρ ) = 0. To introduce uncertainty, werandomize the initialization and present Algorithm 2, wherevariables x0

i and y0i are randomly generated at agent i such

that (4c) is satisfied. Then Algorithm 1 is carried out withinitialization {x0

i = υ, y0i = −ρυ |i ∈ V}.

To enhance the privacy of Algorithm 2, we can introducemore uncertainty by applying perturbation on primal and

5

Algorithm 4: PI-ADMM2 with Primal Perturbation1: Follow steps of Algorithm 2, but substitute step 5 of

Algorithm 1 with the following two steps;2: generate ωk

ik∼ N(0, σ);

3: update xk+1 by

xk+1i :=

arg min

xifi(xi) +

ρ

2

zk − xi +ykiρ

2

+ ωki , i = ik ;

xki , o.w..(14)

dual updates. Through substituting the update of primal anddual variables with (12) and (13), respectively, where thestep size ρ is multiplied by a perturbation γkik , we canobtain the PI-ADMM1 in Algorithm 3. It is worth notingthat (13) can be regarded as introducing additive noise, i.e.,ρ(γkik−1)(zk+1−xk+1

ik), over the dual variable updating, which is

scaled by the primal residue. Regarding to the primal updatewith the first order approximation, it can also be expressedas appending the scaled noise 1

ρ (1γkik

− 1)(ykik −∇ fik (xkik )) over

xk+1ik

. Different from existing ADMM based privacy-preservingmethods [29], [33], we randomly generate initialization in PI-ADMM1 instead of implementing encryption for each itera-tion. Thus, the computation complexity and communicationload of PI-ADMM1 are significantly reduced. In addition, theapplied perturbation mechanism avoids computation over stepsize ρ as Algorithm 1 in [33].

From the results of [29], another approach to enhance theprivacy of PI-ADMM1 is perturbing the primal updates byadding noise according to fixed stochastic, which is illustratedin Algorithm 4. For the k-th iteration, an artificial noiseωkik

generated at agent ik locally is affiliated over xk+1ik

.From the analyses in [29], this perturbation mechanism canalso guarantee the privacy for agents. However, due to pagelimitation, we only evaluate the convergence property for PI-ADMM2 in Section V.

IV. CONVERGENCE AND PRIVACY ANALYSIS

In this section, we will first prove the convergence for PI-ADMM1 and PI-ADMM2. Then privacy analysis is provided.

A. Convergence Analysis

We first show the convergence of Algorithm 2.

Remark 1. Algorithm 2 has the same convergence propertiesas those of I-ADMM.

Proof. Since∑N

i=1(xi−yiρ ) = 0 is guaranteed from steps 2-6 in

Algorithm 2, the convergence can be proved as Lemma 2. �

Denoting x∗ = arg minx F(x), then with Assumption 2, Theconvergence properties of PI-ADMM1 is provided as follows.

Lemma 4. With the following conditions

ρ > L; (15a)

γkik > max{

2ρ2 + 4ρ + 1ρ − L

, 2(ρ + 2)N}, k = 0, 1, ..., (15b)

the iterations (xk, yk, zk) generated by PI-ADMM1 satisfy thefollowing properties:

1) Lρ(xk, yk, zk) − Lρ(xk+1, yk+1, zk+1) ≥ 0;2) 〈Lρ(xk, yk, zk)〉k≥0 is lower bounded by optimum F(x∗)

and convergent;

Proof. See Appendix A. �

Based on the analysis of Lemma 4 for PI-ADMM1, theconvergence property is summarized in the following theorem.

Theorem 1. Following Assumptions 1 and 2, and the condi-tions given in Lemma 4, the iteration (xk, yk, zk) generatedby PI-ADMM1 have limit points, which satisfy the KKTconditions of problem (2).

Proof. See Appendix B. �

It should be noted that the conditions given in Lemma 4 andTheorem 1 is sufficient for the convergence of PI-ADMM1.

B. Privacy Analysis

In what follows, we will analyze the privacy property forAlgorithm 2 and PI-ADMM1.

Remark 2. In Algorithm 2, the states, multipliers and gra-dients, i.e. {xk+1

ik, yk+1

ik,∇ fik (xk+1

ik)|i ∈ N, k = 0, ...,K}, cannot

be inferred by the eavesdropper exactly if zk+1 , xk+1ik

for allk = 0, ...K .

Proof. Assume that the eavesdropper collects informationfrom K iterations to infer the information of agents. Accordingto (11), the measurements considering all the agents can beformulated as

x0i −

y0i

ρ= 0, i ∈ V;

x1i0=

12

(N∆1 + z0 + x0

i0

);

y1i0= y0

i0+ρ

2

(z0 − N∆1 − x0

i0

);

...

xK+1iK=

12

(N∆K+1 + zK + xKiK

);

yK+1iK= yKiK +

ρ

2

(zK − N∆K+1 − xKiK

).

(16)

In (16), {zk |k = 0, ...,K} are known to the eavesdropper, while{xkik , y

kik, xk+1

ik, yk+1

ik|k = 0, ...,K} are unknown variables to the

eavesdropper. Thus, (16) consists of 2K + 3 equations and2K+2N+2 unknown variables. Since N > 2, the eavesdroppercannot solve (16) to infer the exact values of {xki , yki |i ∈ V, k =0, ...,K}. Hence with (9), the gradients {∇ fi(xk+1

i )|i ∈ V, k =0, ...,K} cannot be inferred exactly by the eavesdropper either.

6

The consensus happens when Algorithm 2 converges, wherez coincides with local variables {xi}. Hence the eavesdroppercan use z to infer {xi}. However, with finite iterations, thegap between token z and states {xi} can only be guaranteedwithin a threshold ε1. Denote xi as the measurement of theeavesdropper regarding to xi . According to (11), once xK+1

iKis measured, all the states {xkiK |k = 0, ...,K} of agent iK canalso be estimated sequentially.

Remark 3. Assume that the eavesdropper uses zK+1 to inferxK+1iK

of agent iK = K mod N+1. If ‖zK+1− xK+1iK‖ < ε , there

exists an error bound for measurement (16) as xkiK − xkiK < 2nε,

ykiK − ykiK

< ρ(2 b

KN c+1 − 2n

)ε, (17)

where k ∈ [K − nN + 1,K − (n − 1)N], 1 ≤ n ≤ b KN c.

Proof. See Appendix C. �

Remark 4. In Algorithm 2, the primal and dual variables,and gradients, i.e., {xki , yki ,∇ fi(xki )|i ∈ V, k = 0, ...,K}, canbe inferred by the eavesdropper asymptoticly, i.e., K →∞.

Proof. According to [17], it is guaranteed that limK→∞ zK =xKi , ∀i ∈ V. Furthermore, with equation (11) and condition

x0i −

y0i

ρ = 0, the values {xki , yki |k = 0, ...,K} can be derivedrecursively. Due to (9), the gradients {∇ fi(xki )|k = 0, ...,K} isrevealed. �

As for PI-ADMM1, due to introduced randomization on theupdate of dual variable, it can guarantee augmented privacypreservation.

Theorem 2. In PI-ADMM1, the exact states {xki |i ∈ V, k =0, 1, ...} cannot be inferred by the eavesdropper unless zk+1 =xk+1ik

holds. The multipliers and gradients {yki ,∇ fi(xki )|i ∈N, k = 0, 1, ...} cannot be exactly inferred either.

Proof. Due to the randomized initialization, the values of x0i

and y0i cannot be revealed by the eavesdropper. Assume that

the eavesdropper collects information from K iterations toinfer the information of agents. The measurements consideringall the agents can be formulated as

x0i −

y0i

ρ= 0, i ∈ V;

x1i0=

11 + γ0

i0

(N∆1 + γ0

i0z0 + x0

i0

);

y1i0= y0

i0+

ργ0i0

1 + γ0i0

(z0 − N∆1 − x0

i0

);

...

xK+1iK=

11 + γKiK

(N∆K+1 + γKiK zK + xKiK

);

yK+1iK= yKiK +

ργKiK1 + γKiK

(zK − N∆K+1 − xKiK

).

(18)

Different form (16), in the above equations, the tokens{zk+1 |k = 0, ...,K + 1} are known to the eavesdropper, while

1ε can be a predetermined stopping criterion for Algorithm 2.

{xkik , ykik|k = 0, ...,K + 1} and {γkik |k = 0, ...,K} are unknown

variables. Thus, (18) consists of 2K+N+2 equations and 3K+2N+3 unknown variables. Thus the eavesdropper cannot solve(18) to infer the exact values of {xki , yki |i ∈ V, k = 0, ...,K}since K+N+1 > 0. When K →∞, the eavesdropper can haveanother piece of information according to the KKT conditions(33). However, due to K � N , the under-determined system(18) still cannot be solved asymptotically. �

Corollary 1. In PI-ADMM1, the local functions { fi |i ∈ V}cannot be inferred by the eavesdropper.

Proof. According to Theorem 2, the states and correspondinggradients are unknown to the eavesdropper. Hence from [33],the local objective functions { fi} of agents cannot be inferredby the eavesdropper. �

From above analysis, we can conclude that the proposedPI-ADMM1 is privacy-preserving for the local functions anddatasets of agents against the external eavesdropper. In addi-tion, the privacy for agent i can also be guaranteed againstother agents.

Theorem 3. In PI-ADMM1, the states, multipliers and gradi-ents of agent i(∈ V), i.e., {xki , yki ,∇ fi(xki )|k = 0, 1, ...}, cannotbe inferred by other N − 1 colluding agents.

Proof. Without loss of generality, we consider the privacyfor agent i0 against other agents in set V\i0, which containsall agents except i0. Recalling the measurements (18), usefulinformation for deducing the primal and dual variables, andgradients is given by

x0i0−

y0i0

ρ= 0;

x1+nNi0

=1

1 + γnNi0

(N∆1+nN + γnNi0 znN + xnNi0

);

y1+nNi0

= ynNi0 +ργnNi0

1 + γnNi0

(znN − N∆1+nN − xnNi0

),

(19)

where n is from 0 to b KN c. Since (19) contains 2b KN c + 3equations and 3b KN c + 5 unknown variables, the remainingagents cannot obtain the exact values of {xki0, y

ki0,∇ fi0 (xki0 )|k =

0, ...,K} by solving this under-determined system. With K →∞, the KKT conditions reduce to

∇ fi0 (xK+1i0) = yK+1

i0; yK+1

i0=

∑j∈V\i0

yK+1j ; zK+1 = xK+1

i0. (20)

However, even combining (19) and (20), the exact values of{xki0, y

ki0,∇ fi0 (xki0 )|k = 0, ...,K} cannot be obtained asymptoti-

cally. �

Since only the token is transmitted among agents in PI-ADMM1, the local variables and gradients {xki0, y

ki0,∇ fi0 (xki0 )}

of agent i0 cannot be observed by other agents directly. In fact,except {xkj , ykj ,∇ fj(xkj )| j ∈ V\i0}, the information gatheredby the eavesdropper is the same as that of colluding agents inV\i0. This explains why the system (18) reduces to (19) inTheorem 3.

7

100

101

102

103

104

105

10-15

10-10

10-4

102

100

101

102

103

104

105

10-15

10-10

10-4

102

100

101

102

103

104

105

10-15

10-10

10-4

102

Fig. 3. The accuracy of decentralized ridge regression for W-ADMM (β =10), IPW-ADMM (ρ = 1, τ = 0, M = 25), COCA (c = 1, α = 1, ρ = 0.85), I-ADMM (ρ = 10), PI-ADMM1 (ρ = 10) and PI-ADMM2 (ρ = 10, σ = 10−3)with different network settings: (a) N = 100, η = 0.3; (b) N = 100, η = 0.5;(c) N = 200, η = 0.3.

V. NUMERICAL EXPERIMENTS

In this section, we will conduct numerical experiments toevaluate the convergence and privacy properties of proposedPI-ADMM algorithms. For the simulation, we generate theconnected network G with N agents and |E | = N (N−1)

2 η links.This ensures a Hamiltonian cycle in G. We consider unicastamong agents, and the resultant communication cost for eachtransmission of a p-dimensional vector is 1 unit.

We evaluate the convergence of proposed approaches withstate-of-the-art approaches regarding the accuracy defined by

accuracy =1N

N∑i=1

xki − x∗ x0

i − x∗ , (21)

where x∗ ∈ Rp is the optimal solution of (2). The dimensionof xi , yi and z is set to be p = 2.

A. Decentralized Ridge Regression

We first consider the decentralized least square problem as[45], which aims at solving (1) with a local function

fi(xi;Di) =1bi

bi∑j=1

xTi oi, j − ti, j 2, (22)

where Di = {oi, j, ti, j | j = 1, ..., bi} is the dataset of agent ilocally. The entries of input oi, j ∈ R2 and target ti, j ∈ Rfollow i.i.d. distribution U(0, 1). The number of data samplesis kept unique across agents with bi = 30.

0 500 1000 1500 2000

0.3

0.35

0.4

0.45

0 500 1000 1500 2000

-3

-2.5

-2

-1.5

-1

0 500 1000 1500 2000

0

20

40

60

0 500 1000 1500 2000

-1500

-1000

-500

0

500

0 500 1000 1500 2000

-60

-40

-20

0

0 500 1000 1500 2000

0

200

400

600

Fig. 4. The estimation and true value in decentralized ridge regression withrespect to {xk1 (1), y

k1 (1) |k = 0, 1, ..., 2000} for I-ADMM, PI-ADMM1 and

PI-ADMM2.

The accuracy over communication cost is shown in Fig.2. It is clear that I-ADMM is the most communicationefficient compared with W-ADMM [17], IPW-ADMM [18]and COCA [20] for different network settings. Equivalently,it demonstrates that I-ADMM has a faster convergence speedthan W-ADMM. This is due to that both I-ADMM and W-ADMM consume the same communication load with sameamount of iterations. The update order of I-ADMM followsHamiltonian cycle, while W-ADMM is updated by a randomwalk. In W-ADMM, each agent is randomly visited, and thusthe unbalanced updating frequency of agents degrades theconvergence speed. It also can be demonstrated that I-ADMMis more communication efficient than W-ADMM. Besides, PI-ADMM1 presents different convergence behaviors comparedwith I-ADMM and W-ADMM. To guarantee the privacy inPI-ADMM1, we set {Pi = U(1 − 1

ρ, 1 +1ρ )|i ∈ V}, and

generate {x0i ∼ U(0, 100)|i ∈ V}, which hence makes the

initialization {x0i } far away from the optimal solution x∗.

This explains the phenomenon that PI-ADMM1 convergesslower than I-ADMM and W-ADMM with few iterations.However, along with the updating process, the convergencespeed of PI-ADMM1 surpasses that of W-ADMM, whichcoincides with that of I-ADMM. This is because the artificialnoise added on the step size is scaled by the primal residuezk+1− xk+1

i , which converges to 0 gradually. Since the additivenoise in PI-ADMM2 does not scale with iterations. As shownin Fig. 4, this introduces error bounds for the accuracy,which is determined by σ. Comparing sub-figures (a) and(b), the convergence behaviors of I-ADMM, PI-ADMM1 and

8

100

101

102

103

104

105

106

10-11

10-7

10-3

101

100

101

102

103

104

105

106

10-11

10-7

10-3

101

100

101

102

103

104

105

106

10-11

10-7

10-3

101

Fig. 5. The accuracy of decentralized logistic regression for W-ADMM (β =1), IPW-ADMM (ρ = 1, τ = 3, M = 25), COCA (c = 1, α = 1, ρ = 0.85),I-ADMM (ρ = 1), PI-ADMM1 (ρ = 1) and PI-ADMM2 (ρ = 1, σ = 10−3)with different network settings: (a) N = 100, η = 0.3; (b) N = 100, η = 0.5;(c) N = 200, η = 0.3.

PI-ADMM2 do not depend on the connectivity η. This isbecause the convergence property of these proposed algorithmsis only determined by the size of Hamiltonian cycle. Thisalso explains that the convergence speed degrades with theexpanding networks.

Then we evaluate the capability of privacy preservation forproposed PI-ADMM algorithms with N = 100 and η = 0.3.Without loss of generality, we only focus on the estima-tion over {xk1 (1), y

k1 (1)|k = 0, 1, ..., 2000} with observations

{zk |k = 0, 1, ..., 2001} for proposed methods. Besides, asfor I-ADMM, we assume that the eavesdropper uses therecursive equation (11) to deduce the primal and dual variableof agent 1 with initialization {xi = 0, yi = 0| ∈ V} andinferring x2001

i2000with z2001. The results are shown in Fig. 6

(a) and (b), where the estimation coincides with the truevalue. The figures clearly demonstrates that I-ADMM is notprivacy preserving. While for PI-ADMM1, we reformulatethe under-determined equations (18) into the form Aυ = b,where υ = [x0

1(1), ..., x20001 (1), y0

1(1), ..., y20001 (1)]. A ∈ Rl×m is

the coefficient matrix that also considers the KKT condition(33) as

∑Ni=1 y

2000i (1) = 0. Since {γkik } is unknown to the

eavesdropper, we set γkik = 1 in the estimation without lossof generality. Moreover, we substitute x2001−i

i2000−iwith z2001−i for

i = 0, 1, ..., N − 1 both in systems (16) and (18). Since A issparse and l > m, we use core function lsqr(A, b) in matlab toobtain the solution υ∗, which solves the least square problem

υ∗ = arg minυ‖Aυ − b‖2. (23)

0 500 1000 1500 2000

-0.2

-0.15

-0.1

-0.05

0

0 500 1000 1500 2000

-0.04

-0.03

-0.02

-0.01

0

0 500 1000 1500 2000

-1

0

1

2

0 500 1000 1500 2000

-4

-2

0

2

4

0 500 1000 1500 2000

-4

-2

0

2

0 500 1000 1500 2000

-2

0

2

4

Fig. 6. The estimation and true value in decentralized logistic regressionwith respect to {xk1 (1), y

k1 (1) |k = 0, 1, ...2000} for I-ADMM, PI-ADMM1

and PI-ADMM2.

A similar procedure for estimation can also be establishedfor PI-ADMM2. The estimation results for PI-ADMM1 andPI-ADMM2 are presented in Fig. 6 (c)-(f). From sub-figures(c) and (e), for both PI-ADMM1 and PI-ADMM2, the gapsbetween the true value and estimation for xk1 (1) shrinkages to 0as convergence happens. This is inevitable and also consistentwith analysis in Remark 4. However, the estimation error foryk1 (1) diverges with iterations. We can see that PI-ADMM1and PI-ADMM2 can prevent information leakage against theexternal eavesdropper regarding to the multipliers.

B. Decentralized Logistic Regression

In the decentralized logistic regression, the local loss func-tion of agent i is

fi(xi) =1bi

bi∑j=1

log(1 + exp

(−ti, j xTi oi, j

)), (24)

where ti, j ∈ {−1, 1} and bi = 30. Each sample feature oi, jfollows N(0, I). To generate ti, j , we first generate a randomvector x ∈ R2 ∼ N(0, I). Then for each sample, we generatevi, j according to U(0, 1), and if vi, j ≤ (1 + exp(−xT oi, j))−1,we set ti, j as 1, otherwise −1. Since it is difficult to solvethe optimization problem (2) with (24) in I-ADMM basedmethods, we alternatively use the updating process (8). Fairly,we also adopt the first-order approximation for algorithmsIPW-ADMM, W-ADMM and COCA.

In Fig. 5, we present the accuracy over communication costsfor different network settings. Apparently, compared to W-ADMM, IPW-ADMM and COCA, the proposed I-ADMM

9

based algorithms are the most communication-efficient. Thecurves with different network setups present the similar trendsas those of Fig. 3.

The results regarding to privacy preservation are shown inFig. 6. For PI-ADMM1 and PI-ADMM2, we set {Pi = U(1−0.1ρ , 1 +

0.1ρ )|i ∈ V} and initialize {x0

i ∼ U(0, 10)|i ∈ V}.The estimation approaches for I-ADMM, PI-ADMM1 and PI-ADMM2 follow those of Fig. 4. In Fig. 6, the sub-figures (a)and (b) demonstrate that I-ADMM cannot guarantee privacyfor agents. As for PI-ADMM1 and PI-ADMM2, the statesxk1 (1) can be revealed with iteration k increasing. However,the estimation for yk1 (1) diverges from the ground truth. Thus,proposed PI-ADMM algorithms are privacy preserving.

VI. CONCLUSIONS

We investigate the I-ADMM based privacy preserving meth-ods for solving consensus problem in decentralized networks.Different from W-ADMM, the updating order of I-ADMMfollows an Hamiltonian cycle, that guarantees a faster con-vergence speed than W-ADMM. Since fewer iteration roundsare needed, I-ADMM is more communication efficient thanW-ADMM and other state-of-the-art methods. Moreover, toprevent information leakage over the local function and privatedataset against the external eavesdropper, we propose two PI-ADMM algorithms, i.e., PI-ADMM1 and PI-ADMM2, whichconduct step size perturbation and primal perturbation, respec-tively. We prove the convergence and privacy preservationfor PI-ADMM1. Numerical experiments also show that PI-ADMM2 is privacy preserving. Besides, numerical resultsshow that PI-ADMM1 can achieve almost the same commu-nication efficiency as I-ADMM.

APPENDIX APROOF OF LEMMA 4

From steps 2-11 of PI-ADMM1, it is easy to verify that1N

∑Ni=1(x0

i −y0i

ρ ) = 0. Then we prove that Lρ(xk, yk, zk) isnon-increasing with iteration k. From the optimality conditionof (4a) and update (13), we can derive

∇ fik(xk+1ik

)=ykik + ργ

kik

(zk − xk+1

ik

)= yk+1

ik. (25)

After updating to xk+1 and yk+1 by (4c), (13), we have

Lρ(xk, yk, zk

)− Lρ

(xk+1, yk+1, zk

)(a)= fik

(xkik

)− fik

(xk+1ik

)A

+⟨ykik , z

k − xkik

⟩−

⟨yk+1ik

, zk − xk+1ik

⟩+

⟨ρ(zk − xk+1

ik

), xk+1

ik− xkik

⟩B

2

xkik − xk+1ik

2,

(26)

where (a) holds since the cosine identity ‖b+ c‖2− ‖a+ c‖2 =‖b − a‖2 + 2〈a + c, b − a〉. According to Lemma 1, term A in(26) satisfies

A ≥⟨∇ fik

(xkik

), xkik − xk+1

ik

⟩− L

2

xk+1ik− xkik

2

=⟨ykik , xkik − xk+1

ik

⟩− L

2

xk+1ik− xkik

2,

(27)

Then A + B can be rewritten asA + B

(a)=

⟨ykik − yk+1

ik, zk − xk+1

ik

⟩+

1γkik

⟨yk+1ik− ykik , xk+1

ik− xkik

⟩(b)≥ − 1

γkik

(1ρ+

12

) yk+1ik− ykik

2−

(1

2γkik+

L2

) xk+1ik− xkik

2,

(28)where (a) is because (29) and (b) are from Young’s inequality.

ρ(zk − xk+1

ik

)=

1γkik

(yk+1ik− ykik

). (29)

Through updating token z at iteration k, the change of aug-mented Lagrangian is measured by

Lρ(xk+1, yk+1, zk

)− Lρ

(xk+1, yk+1, zk+1

)=

N∑i=1

[ ⟨yk+1i , zk − zk+1⟩

2

( zk − xk+1i

2 − zk+1 − xk+1

i

2)]=ρN2

zk+1 − zk 2+

N∑i=1

ρ

⟨zk+1 − xk+1

i +yk+1i

ρ, zk − zk+1

⟩(a)=ρN2

zk+1 − zk 2,

(30)where (a) holds because the truth zk+1 = 1

N

∑Ni=1(xk+1

i − yk+1i

ρ ).By combining (26) and (30), we obtain

Lρ(xk, yk, zk

)− Lρ

(xk+1, yk+1, zk+1

)=

(ρ − L

2− 1

2γkik

) xkik − xk+1ik

2+ρN2

zk+1 − zk 2

− 1γkik

(1ρ+

12

) yk+1ik− ykik

2

(a)≥ ρN

(12− ρN + 2N

γkik

) zk+1 − zk 2

+

(ρ − L

2− 1

2γkik− ρ2 + 2ρ

γkik

) xk+1ik− xkik

2,

(31)

where (a) is from yk+1ik− ykik = ρ(x

k+1ik− xkik ) − ρN(zk+1 − zk)

and triangle inequality. Hence to ensure that Lρ(xk, yk, zk)is non-increasing with iterations, we must have ρ > L andγkik > max{ 2ρ2+4ρ+1

ρ−L , 2(ρ + 2)N}. This proves property 1). Toprove statement 2), we verify that Lρ(xk+1, yk+1, zk+1) is lowerbounded.

Lρ(xk+1, yk+1, zk+1

)=

N−1∑j=0

[fik− j

(xk−j+1ik− j

)+

⟨yk−j+1ik− j

, zk+1 − xk−j+1ik− j

⟩+ρ

2

zk+1 − xk−j+1ik− j

2]

(a)≥

N−1∑j=0

[fik− j

(zk+1

)+ρ − L

2

zk+1 − xk−j+1ik− j

2

10

−⟨∇ fik− j

(xk−j+1ik− j

)− y

k−j+1ik− j

, zk+1 − xk−j+1ik− j

⟩]≥min

x

{N∑i=1

fi(x)}+ρ − L

2

N−1∑j=0

zk+1 − xk−j+1ik− j

2

(b)≥ F(x∗) > −∞, (32)

where (a) is from L-Lipschitz differentiable fi and (b) is fromthe Assumption 2 in which the optimal value F(x∗) is assumedto be lower bounded. Guaranteeing ρ > L can make (32)satisfied, which completes the proof for property 2). Thenwith the monotonicity statement 1), we conclude that Lρ isconvergent.

APPENDIX BPROOF OF THEOREM 1

Note that the KKT point (x∗, y∗, z∗) of problem (2) satisfiesthe following conditions

∇ fi(x∗i

)− y∗i = 0, ∀i ∈ V; (33a)

N∑i=1

y∗i = 0; (33b)

z∗ = x∗i , ∀i ∈ V . (33c)

Since (33) also implies that∑N

i=1 ∇ fi(x∗i ) = 0, x∗ is also astationary point of problem (2). By summing (31) from alliterations 0 to k, we obtain

k∑l=0

[ρN

(12− ρN + 2N

γlil

) zl+1 − zl 2

+

(ρ − L

2− 1

2γlil− ρ2 + 2ρ

γlil

) xl+1il− xlil

2]

≤Lρ(x0, y0, z0

)− Lρ

(xk+1, yk+1, zk+1

)=

[Lρ

(x0, y0, z0

)− F(x∗)

]−

[Lρ

(xk+1, yk+1, zk+1

)− F(x∗)

](a)≤Lρ

(x0, y0, z0

)− F(x∗) < +∞,

(34)where (a) is from (32). Then with the conditions ρ > L andγkik > max{ 2ρ2+4ρ+1

ρ−L , 2(ρ + 2)N}, the left hand side (LHS) ofinequality (34) is positive and increasing with k. Since theRHS of (34) is finite, the iterates (xk, yk, zk) must satisfy,

limk→∞

zk+1 − zk = 0; lim

k→∞

xk+1i − xki

= 0, ∀i ∈ V . (35)

In addition, from the inequality of (28), the dual residue isgiven by limk→∞ ‖yk+1

ik−ykik ‖ ≤ lim

k→∞ρ‖xk+1

ik−xkik ‖+ρN ‖zk+1−

zk ‖ = 0. Since ‖yk+1ik− ykik ‖ ≥ 0, we can conclude that

limk→∞

yk+1i − yki

= 0, ∀i ∈ V . (36)

Then we utilize (35) and (36) to show that the limit point of(xk, yk, zk) is a KKT point of problem (2). For any i ∈ V,there always exists a j ∈ [0, N − 1] satisfying zk+1 − xk+1

i

= zk − xk−j+1ik− j

zk+1 − zk−j + zk−j − xk−j+1

ik− j

(a)=

zk+1 − zk−j + 1

ργk−jik− j

yk−j+1ik− j

− yk−jik− j

,(37)

where (a) is from (29). Then we can conclude

limk→∞

zk+1 − xk+1i

= 0, ∀i ∈ V . (38)

By applying (4c) and (25), we obtain that

limk→∞

N∑i=1

yki = 0; limk→∞∇ fi(xki ) − yki = 0, ∀i ∈ V . (39)

Equations (38) and (39) imply that (xk, yk, zk) asymptoticallysatisfy the KKT conditions in (33). Then we conclude that theiterates (xk, yk, zk) in PI-ADMM1 have limit points, whichsatisfy KKT conditions of original problem (2).

APPENDIX CPROOF OF REMARK 3

According to (4a), the states of agent iK satisfy

xK−nN+1iK

= xK−nN+2iK

= ... = xK−(n−1)NiK

, 1 ≤ n ≤⌊

KN

⌋. (40)

Then from recursive equation (11), for k ∈ [K − nN + 1,K −(n − 1)N], we have

xkiK =xK−(n−1)NiK

=2xK−(n−1)N+1iK

− N∆K−(n−1)N+1 − zK−(n−1)N+1

=4xK−(n−2)N+1iK

− 2N∆K−(n−1)N+1 − 2zK−(n−1)N+1

− N∆K−(n−2)N+1 − zK−(n−2)N+1

...

=2nxK+1iK−

n∑j=1

(2n−jN∆K−(n−j)N+1 − 2n−j zK−(n−j)N+1

).

(41)

By substituting xK+1iK

with zK+1 in (41), the measurement xkiKis obtained. With the condition ‖zK+1 − xK+1

iK‖ < ε , the LHS

of (17) is obtained. Since the eavesdropper cannot deduceyKiK directly, instead ykiK can be derived recursively from y0

iK,

whereykiK =y

K−nN+1iK

=yK−nNiK+ρ

2

(zK−nN − N∆K−nN+1 − xK−nNiK

)...

=y0iK+

b KN c∑j=n

ρ

2

(zK−jN − N∆K−jN+1 − xK−jNiK

).

(42)

Due to y0iK= ρx0

iKand the bounds for measurement xkiK , the

final result is concluded.

11

REFERENCES

[1] Q. Ling and Z. Tian, “Decentralized sparse signal recovery for compres-sive sleeping wireless sensor networks,” IEEE Transactions on SignalProcessing, vol. 58, no. 7, pp. 3816–3827, July 2010.

[2] T. Chang, M. Hong, and X. Wang, “Multi-agent distributed optimizationvia inexact consensus admm,” IEEE Transactions on Signal Processing,vol. 63, no. 2, pp. 482–497, Jan 2015.

[3] C. Ravazzi, S. M. Fosson, and E. Magli, “Distributed iterative thresh-olding for `0/`1-regularized linear inverse problems,” IEEE Transactionson Information Theory, vol. 61, no. 4, pp. 2081–2100, April 2015.

[4] J. B. Predd, S. R. Kulkarni, and H. V. Poor, “A collaborative trainingalgorithm for distributed learning,” IEEE Transactions on InformationTheory, vol. 55, no. 4, pp. 1856–1871, 2009.

[5] P. A. Forero, A. Cano, and G. B. Giannakis, “Consensus-based dis-tributed support vector machines,” Journal of Machine Learning Re-search, vol. 11, no. May, pp. 1663–1707, 2010.

[6] G. Mateos, J. A. Bazerque, and G. B. Giannakis, “Distributed sparselinear regression,” IEEE Transactions on Signal Processing, vol. 58,no. 10, pp. 5262–5276, Oct 2010.

[7] J. Chen and A. H. Sayed, “Diffusion adaptation strategies for distributedoptimization and learning over networks,” IEEE Transactions on SignalProcessing, vol. 60, no. 8, pp. 4289–4305, Aug 2012.

[8] I. D. Schizas, A. Ribeiro, and G. B. Giannakis, “Consensus in ad-hocwsns with noisy links-part i: Distributed estimation of deterministicsignals,” IEEE Transactions on Signal Processing, vol. 56, no. 1, pp.350–364, Jan 2008.

[9] G. B. Giannakis, V. Kekatos, N. Gatsis, S. Kim, H. Zhu, and B. F.Wollenberg, “Monitoring and optimization for power grids: A signalprocessing perspective,” IEEE Signal Processing Magazine, vol. 30,no. 5, pp. 107–128, Sep. 2013.

[10] H. J. Liu, W. Shi, and H. Zhu, “Distributed voltage control in distributionnetworks: Online and robust implementations,” IEEE Transactions onSmart Grid, vol. 9, no. 6, pp. 6106–6117, Nov 2018.

[11] X. Mao, Y. Gu, and W. Yin, “Walk proximal gradient: An energy-efficient algorithm for consensus optimization,” IEEE Internet of ThingsJournal, vol. 6, no. 2, pp. 2048–2060, April 2019.

[12] K. Yuan, Q. Ling, and W. Yin, “On the convergence of decentralizedgradient descent,” SIAM Journal on Optimization, vol. 26, no. 3, pp.1835–1854, 2016.

[13] W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact first-orderalgorithm for decentralized consensus optimization,” SIAM Journal onOptimization, vol. 25, no. 2, pp. 944–966, 2015.

[14] J. F. C. Mota, J. M. F. Xavier, P. M. Q. Aguiar, and M. Puschel, “D-admm: A communication-efficient distributed algorithm for separableoptimization,” IEEE Transactions on Signal Processing, vol. 61, no. 10,pp. 2718–2723, May 2013.

[15] T. Chang, M. Hong, W. Liao, and X. Wang, “Asynchronous distributedadmm for large-scale optimization-part i: Algorithm and convergenceanalysis,” IEEE Transactions on Signal Processing, vol. 64, no. 12, pp.3118–3130, June 2016.

[16] R. Zhang and J. Kwok, “Asynchronous distributed admm for consensusoptimization,” in International Conference on Machine Learning, 2014,pp. 1701–1709.

[17] W. Yin, X. Mao, K. Yuan, Y. Gu, and A. H. Sayed, “A communication-efficient random-walk algorithm for decentralized optimization,” ArXiv,vol. abs/1804.06568, 2018.

[18] Y. Ye, H. Chen, Z. Ma, and M. Xiao, “Decentralized consensus optimiza-tion based on parallel random walk,” IEEE Communications Letters, pp.1–1, 2019.

[19] Q. Ling, Y. Liu, W. Shi, and Z. Tian, “Weighted admm for fast decen-tralized network optimization,” IEEE Transactions on Signal Processing,vol. 64, no. 22, pp. 5930–5942, Nov 2016.

[20] Y. Liu, W. Xu, G. Wu, Z. Tian, and Q. Ling, “Communication-censoredadmm for decentralized consensus optimization,” IEEE Transactions onSignal Processing, vol. 67, no. 10, pp. 2565–2579, 2019.

[21] C. G. Lopes and A. H. Sayed, “Incremental adaptive strategies overdistributed networks,” IEEE Transactions on Signal Processing, vol. 55,no. 8, pp. 4064–4077, Aug 2007.

[22] ——, “Randomized incremental protocols over adaptive networks,” in2010 IEEE International Conference on Acoustics, Speech and SignalProcessing, March 2010, pp. 3514–3517.

[23] S. Zhu, M. Hong, and B. Chen, “Quantized consensus admm for multi-agent distributed optimization,” in 2016 IEEE International Conferenceon Acoustics, Speech and Signal Processing (ICASSP), March 2016, pp.4134–4138.

[24] S. P. Karimireddy, Q. Rebjock, S. Stich, and M. Jaggi, “Error feedbackfixes SignSGD and other gradient compression schemes,” ser. Proceed-ings of Machine Learning Research, vol. 97. Long Beach, California,USA: PMLR, 09–15 Jun 2019, pp. 3252–3261.

[25] P. Jiang and G. Agrawal, “A linear speedup analysis of distributed deeplearning with sparse and quantized communication,” in Advances inNeural Information Processing Systems 31. Curran Associates, Inc.,2018, pp. 2525–2536.

[26] A. Fikri Aji and K. Heafield, “Sparse communication for distributedgradient descent,” arXiv preprint arXiv:1704.05021, 2017.

[27] J. Zhang, K. You, and T. Basar, “Distributed discrete-time optimizationin multiagent networks using only sign of relative state,” IEEE Transac-tions on Automatic Control, vol. 64, no. 6, pp. 2352–2367, June 2019.

[28] 2018 reform of eu data protection rules. European Commission.[Online]. Available: https://ec.europa.eu/commission/sites/beta-political/files/data-protection-factsheet-changes_en.pdf

[29] H. Xiao, Y. Ye, and S. Devadas, “Local differential privacy in decen-tralized optimization,” arXiv preprint arXiv:1902.06101, 2019.

[30] Z. Huang, S. Mitra, and N. Vaidya, “Differentially private distributedoptimization,” in Proceedings of the 2015 International Conference onDistributed Computing and Networking. ACM, 2015, p. 4.

[31] S. Han, U. Topcu, and G. J. Pappas, “Differentially private distributedconstrained optimization,” IEEE Transactions on Automatic Control,vol. 62, no. 1, pp. 50–64, 2016.

[32] E. Nozari, P. Tallapragada, and J. Cortés, “Differentially private dis-tributed convex optimization via functional perturbation,” IEEE Trans-actions on Control of Network Systems, vol. 5, no. 1, pp. 395–408, 2016.

[33] C. Zhang, M. Ahmad, and Y. Wang, “ADMM based privacy-preservingdecentralized optimization,” IEEE Transactions on Information Foren-sics and Security, vol. 14, no. 3, pp. 565–580, March 2019.

[34] X. Zhang, M. M. Khalili, and M. Liu, “Improving the privacy and accu-racy of ADMM-based distributed algorithms,” in Proceedings of the 35thInternational Conference on Machine Learning, vol. 80. StockholmSweden: PMLR, 10–15 Jul 2018, pp. 5796–5805.

[35] T. Zhang and Q. Zhu, “Dynamic differential privacy for ADMM-baseddistributed classification learning,” IEEE Transactions on InformationForensics and Security, vol. 12, no. 1, pp. 172–187, Jan 2017.

[36] Z. Huang, R. Hu, Y. Guo, E. Chan-Tin, and Y. Gong, “Dp-admm:Admm-based distributed learning with differential privacy,” IEEE Trans-actions on Information Forensics and Security, vol. 15, pp. 1002–1012,2020.

[37] A. B. Alexandru, K. Gatsis, Y. Shoukry, S. A. Seshia, P. Tabuada,and G. J. Pappas, “Cloud-based quadratic optimization with partiallyhomomorphic encryption,” arXiv preprint arXiv:1809.02267, 2018.

[38] S. Han, W. K. Ng, L. Wan, and V. C. S. Lee, “Privacy-preservinggradient-descent methods,” IEEE Transactions on Knowledge and DataEngineering, vol. 22, no. 6, pp. 884–899, June 2010.

[39] O. L. Mangasarian, “Privacy-preserving horizontally partitioned linearprograms,” Optimization Letters, vol. 6, no. 3, pp. 431–436, 2012.

[40] S. Gade and N. H. Vaidya, “Private learning on networks: Part ii,” arXivpreprint arXiv:1703.09185, 2017.

[41] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of NonlinearEquations in Several Variables. Society for Industrial and AppliedMathematics, 2000. [Online]. Available: https://epubs.siam.org/doi/abs/10.1137/1.9780898719468

[42] Y. Ye, M. Xiao, and M. Skoglund, “Decentralized multi-task learningbased on extreme learning machines,” arXiv preprint arXiv:1904.11366,2019.

[43] F. Yan, S. Sundaram, S. V. N. Vishwanathan, and Y. Qi, “Distributedautonomous online learning: Regrets and intrinsic privacy-preservingproperties,” IEEE Transactions on Knowledge and Data Engineering,vol. 25, no. 11, pp. 2483–2493, Nov 2013.

[44] Y. Lou, L. Yu, S. Wang, and P. Yi, “Privacy preservation in distributedsubgradient optimization algorithms,” IEEE Transactions on Cybernet-ics, vol. 48, no. 7, pp. 2154–2165, July 2018.

[45] C. Huang, L. Liu, C. Yuen, and S. Sun, “Iterative channel estimationusing lse and sparse message passing for MmWave MIMO systems,”IEEE Transactions on Signal Processing, vol. 67, no. 1, pp. 245–259,Jan 2019.