Multiscale Evolutionary Perturbation Attack on Community ...

1

Multiscale Evolutionary Perturbation Attack onCommunity Detection

Jinyin Chen, Yixian Chen, Lihong Chen, Minghao Zhao, and Qi Xuan, Member, IEEE

Abstract—Community detection, aiming to group nodesbased on their connections, plays an important role innetwork analysis, since communities, treated as meta-nodes,allow us to create a large-scale map of a network to simplifyits analysis. However, for privacy reasons, we may wantto prevent communities from being discovered in certaincases, leading to the topics on community deception. Inthis paper, we formalize this community detection attackproblem in three scales, including global attack (macroscale),target community attack (mesoscale) and target node attack(microscale). We treat this as an optimization problem andfurther propose a novel Evolutionary Perturbation Attack(EPA) method, where we generate adversarial networks torealize the community detection attack. Numerical experi-ments validate that our EPA can successfully attack networkcommunity algorithms in all three scales, i.e., hide targetnodes or communities and further disturb the communitystructure of the whole network by only changing a smallfraction of links. By comparison, our EPA behaves betterthan a number of baseline attack methods on six syntheticnetworks and three real-world networks. More interestingly,although our EPA is based on the louvain algorithm, it is alsoeffective on attacking other community detection algorithms,validating its good transferability.

Index Terms—Social network, community detection, com-munity deception, privacy protection, genetic algorithm.

I. INTRODUCTION

CONSISTING of many nodes and links, networkcaptures certain relationship in real world and is

often used as a mathematical representation for a va-riety of complex systems, such as social systems [1]–[4], transportation systems [5], [6] and supply chainsystems [7], [8], etc. Many real-world networks canbe divided into communities, with the nodes withinthe same communities connected densely, while thoseacross different communities connected sparsely [9]. The

This work was supported in part by the National Natural ScienceFoundation of China under Grant 61502423, Grant 61572439 and Grant62072406, in part by the Zhejiang Provincial Natural Science Founda-tion of China under Grant LR19F030001 and Grant LY19F020025, inpart by the Zhejiang Science and Technology Plan Project under GrantLGF18F030009, and in part by the Key Technologies, System and Appli-cation of Cyberspace Big Search, Major Project of Zhejiang Lab underGrant 2019DH0ZX01, the Major Special Funding for ”Scien”ce andTechnology Innovation 2025” in Ningbo under Grant No. 2018B10063.(Corresponding author: Qi Xuan.)

J. Chen, Y. Chen, L. Chen, and Q. Xuan are with the Institute of Cy-berspace Security, College of Information Engineering, Zhejiang Uni-versity of Technology, Hangzhou 310023, China. E-mail: {chenjinyin,2111803168, 2111803032, xuanqi}@zjut.edu.cn.M. Zhao is with the Fuxi AI Lab, NetEase Inc, Hangzhou 310052,China. E-mail: [email protected].

revealed community structure can not only present theclose relationship among the nodes inside a certaincommunity, but may also indicate that these nodes tendto share common properties or play similar roles in therespective fields [10]. Analyzing community structurein a network thus can help to better understand theinteractions inter- and intra- close groups of nodes inthe network, which is the exact reason why new com-munity detection algorithms are continuously proposedand widely used in a large number of areas.

Community detection algorithms are designed to di-vide the network into partitions of dense regions whichcorrespond to strong related entities. Evaluating theeffectiveness of these algorithms has been a controversialproblem until Newman proposed the modularity Q [11],[12]. The proposal of modularity makes the problemof non-overlapping community detection unprecedenteddeveloped. A bunch of modularity optimization algo-rithms were subsequently proposed, some of whichare based on splitting or aggregation [13]–[16], whilemany others use optimization algorithms to maximizesthe modular Q, such as annealing [17], Particle SwarmOptimization (PSO) [18], external optimization [19] andspectral optimization [20]. These algorithms translatecommunity detection into an optimization problem andtry to find the optimal community division by maximiz-ing certain fitness. In addition, there are also communitydetection algorithms based on information theory [21]–[24] and label propagation [25]–[27]. The former believesthat data flow can be compressed with regular code,such as random walk model. The main idea is that theprobability of wandering to the nodes in the same groupis much greater than those in different groups; whilethe latter updates each node label to its most frequentneighbor label through iteration, which was widely useddue to its simplicity and high efficiency.

However, on one hand, due to privacy concerns,people may not want their social information, such asbeing a member of certain communities, to be discoveredby algorithms; on the other hand, a lot of graph-basedmodels, e.g., graph-based recommenders, need to inte-grate community detection to improve their efficiency,especially for those relatively large systems. The normaloperation of such systems thus may rely on the robust-ness of community detection algorithm. These bring up aproblem: how to attack community detection algorithmsor defend against such attacks? Since different links playdifferent roles in keeping the community structure of a

arX

iv:1

910.

0974

1v3

[cs

.SI]

8 F

eb 2

021

2

TABLE I: Comparison with the existent approaches.

Approach Scale of Attack Way to Change Network Budget Required in the beginning

Nagaraja [28] Mesoscale Addition YesWaniek et al. [29] Mesoscale Rewiring YesFionda et al. [30] Mesoscale Rewiring YesChen et al. [31] Macroscale Rewiring Yes

Li et al. [32] Macroscale Rewiring YesOur EPA Macroscale & Mesoscale & Microscale Rewiring & Addition No

network, the community detection algorithms could besignificantly disturbed by only changing a small fractionof links. In this paper, we focus on the attack part,and seek the maximum community change by rewiringminimal number of links, which can help to judge whichlinks are most vulnerable and thus also provides insightsfor the defense part in order to keep the communitystructure.

There are some attack strategies to disturb networkalgorithms [33]–[36], but few studies related to com-munity detection attack. Nagaraja [28] first introduceda community hiding problem, where they added linksbased on centrality values. Waniek et al. [29] proposeda heuristic rewiring method to hide a community, bydeleting the links within the same communities whileadding the links across different communities. Becausethe links are random selected, the method is of relativelylow effectiveness. And they also propose the conceptof individual hiding, as a way to avoid an influentialnode being highlighted by three centrality measures(i.e., degree, closeness, betweenness). Fionda et al. [30]proposed a deception score to evaluate the effect ofcommunity deception, which takes the accuracy andrecall of community detection into account, as well asthe reachability of target community. The essence ofthis score is that the target community is expected tobe divided into more new communities and each newcommunity contains less percentage of target nodes.However, this may lead to the following two problems:first, it encourages more communities which might makethe attack less concealed; second, when the number ofcommunities is set to be constant, it can’t guide the nodesfrom the target community to the optimal communityto achieve better attack capacity. Quite recently, Chen etal. [31] proposed Q-Attack based on Genetic Algorithm(GA) and modularity, for the first time, to disturb theoverall modular structure of a network. Li et al. [32]generated adversarial networks by attacking the graphneural networks (GNNs) based surrogate model to hidea set of nodes from community detection.

Generally, we think there are following three types ofcommunity detection attacks of different scales.

• Global attack (macroscale): This attack is to achievemaximum community changes of the whole net-work by rewiring minimal number of links.

• Target community attack (mesoscale): This attackis also known as community deception, which is tohide one specific community by rewiring minimal

number of links that are connected to at least onenode in the community.

• Target node attack (microscale): This attack is tomake target node belong to different communitiesby rewiring the minimal number of links around it.Note that our target node attack is different from theindividual hiding proposed by Waniek et al. [29],since the latter has nothing to do with communitydetection.

In this paper, we formalize the community detectionattack problem and present an Evolutionary Perturba-tion Attack (EPA) algorithm to attack community detec-tion algorithms in three scales, i.e., macroscale (network),mesoscale (community) and microscale (node). The maindifferences between our method and the previous attackalgorithms are summarized in TABLE I. Moreover, weuse a series of metrics to evaluate the attack resultsobtained by different attack methods on several realdatasets. In particular, we make the following maincontributions.• To the best of our knowledge, this is the first time

to formalize the problem of community detectionattack into three scales, i.e., macroscale (network),mesoscale (community) and microscale (node).

• We propose a novel EPA algorithm to attack com-munity detection, which is capable of generatingapproximate optimal adversarial network with theminimal number of rewired links to launch all thethree scales of attacks. We compare our EPA withother baseline attack methods on several syntheticnetworks and real-world networks, and find thatEPA behaves the best in most cases, achieving thestate-of-the-art results.

• We use GA to solve the optimization problem inEPA algorithm, and meanwhile introduce networkstructural properties including betweenness and theshortest path lengths between pairwise nodes intothe mutation process to accelerate the convergencerate, making it faster to obtain the approximateoptimal solution.

• We find that our EPA has outstanding transferabil-ity, i.e., the adversarial network generated by EPAagainst the louvain algorithm (LOU) can also beused to successfully fool other community detectionalgorithms.

The rest of paper is organized as follows. In Sec. II,we formalize the problem of community detection attackand introduce our EPA method in details. Then, we

3

TABLE II: The notation definition of main symbols.

Symbol Definition

G The original networkV/E/A The nodes/links/adjacency matrix of original networkG/G The perturbation/adversarial networkE/E The links of perturbation/adversarial networkA/A The adjacency matrix of perturbation/adversarial networkE+/E− The set of link additions/deletionsφg/φt The evaluation function of global/target community attackδ/µ The evaluation function of community/network modificationε The predefined threshold of target node attackn The number of nodes in networkm/mt The number of links in network/target communityβ The budget which is the number of rewired linksθ The maximum number of rewired linksC/C The communities before/after the attackM The confusion matrix whose element is mi,jMi./M.i The number of nodes in new/original community Ci

mi,j/mi,jThe number of community Ci’s nodes which originally

belong/not belong to the target community CjEMr /EMc The entropy of new/target communitiesNa/Nb The number of communities in the control/test groupOi The i-th chromosome of the population(.)|Oi The value under the attack represented by Oi

compare the attack effects by utilizing EPA and otherattack methods on a number of synthetic networks andreal-world networks in Sec. III, where we further use avariety of community detection algorithms to verify thetransferability of EPA. Finally, we conclude the paperand highlight future research directions in Sec. IV.

II. COMMUNITY DETECTION ATTACK

A. Problem FormulationThe main symbols used in this paper are listed in

TABLE II for convenience. First, we state and formalizethe problem of community detection attack. Given anetwork G=(V, E) as an undirected graph including aset of nodes V and a set of links E, suppose it canbe divided into communities C={C1, C2, · · · , Cp}, withCi ⊆ V and Ci ∩ Cj=∅, for all i,j ∈ {1, 2, · · · , p} andi 6= j. Then, we propose the concept of communitydetection attack, which is launched by an adversarialnetwork to fool community detection methods. We studythis problem in three scales: global attack (macroscale),target community attacks (mesoscale) and target nodeattack (microscale). Under these definitions, some otherdefinitions and problems can be defined as follows.

Definition 1 (Adversarial network): Denoting the origi-nal network G=(V, E) as the target, the adversarial attackselects some key links to construct an adversarial pertur-bation network G=(V, E), and A is the adjacency matrixwhose element Au,v ∈ [−1, 0, 1] means the modifiedstrategy on the adjacency matrix of original networkA. Adversarial network G=(V, E), generated by addingadversarial perturbation on the original network, has theadjacency matrix A defined as

A = A + A. (1)

Problem 1 (Global Attack): Given a network G=(V, E),generate an adversarial network G=(V, E) via attackstrategy to make community detection method fail withbudget β. The function φg is used to evaluate the attackgain of global attack, and G is divided into communitiesC={C1, C2, · · · , Cq} by using the same community detec-tion algorithm. Furthermore, E+ (resp., E−) denotes a setof link additions (resp., deletions) and the global attackis realized by solving the optimization problem:

arg max {φg(C, C, E, E, E+, E−, β)}, (2)

where β = |E+| = |E−|, meaning that we make thetotal number of links remains the same, and E = (E ∪E+)\E−. Since the links are rewired globally, we have

E+ ⊆ {(u, v) : u ∈ V ∧ v ∈ V, (u, v) /∈ E}, (3)

E− ⊆ {(u, v) : u ∈ V ∧ v ∈ V, (u, v) ∈ E}. (4)

Problem 2 (Target Community Attack): Given a targetcommunity Ci ⊆ C, which is obtained by community de-tection method ϕ on the original network G. After targetcommunity attack, Ci cannot be detected by ϕ, i.e., in theadversarial network G, the nodes in Ci belong to a set ofnew communities, denoted by C={C1, C2, · · · , Cr} ⊆ C,with Cj ∩Ci 6= ∅, ∀j ∈ [1, r]. Given the function φt whichis used to evaluate the attack gain of target communityattacks, and it is realized by solving the optimizationproblem:

arg max {φt(Ci, C, E, E, E+, E−, β)}. (5)

Here, we still consider the rewiring process and thushave β = |E+| = |E−|, E = (E ∪ E+)\E−. In orderto hide community Ci, it would be best to disconnectthe links inside the community while add connectionsbetween the nodes in Ci and those in other communities,thus we have

E+ ⊆ {(u, v) : u ∈ Ci ⊕ v ∈ Ci, (u, v) /∈ E}, (6)

E− ⊆ {(u, v) : u ∈ Ci ∧ v ∈ Ci, (u, v) ∈ E}. (7)

Problem 3 (Target Node Attack): Given a target nodet, suppose it belongs to community Ci in the originalnetwork G, while it belongs to Cj in the adversarialnetwork G, then the target node attack is realized bysolving the optimization problem:

arg max {δ× µ(E, E, E+, E−, β)}, (8)

δ =

1 |Cj∩Ci||Cj| <ε

0 else, (9)

where δ and µ are the functions to measure the amountof community and network change, respectively. ε ∈[0, 1] is a predefined threshold, based on which wedetermine whether communities Ci and Cj are closeto each other or not. We think the attack is successfulonly when these two communities are relatively differentfrom each other. Similarly , we still have β = |E+| = |E−|and E = (E∪ E+)\E−. To make the attack more effective,

4

Initialization

Evaluation

Crossover

Mutation

EPA

Max Iteration?

1

54

6

9

3

2

8

7

Optimal solution

Original Network

Yes

No

1211

10

1

54

6

9

3

2

8

7

1211

10

Network Change

1

54

6

9

3

2

8

7

Adversarial Network

1211

10

Community detection

Fitness=0

Fitness≠0

1

54

6

9

3

2

8

7

1211

10

1

54

6

9

3

2

8

7

1211

10

①

②

③

④

④

Fig. 1: The framework of EPA, which consists of four steps. First, an original network is input to EPA. Second, GA isused to generate the approximate optimal network change, denoted by the sets of added and deleted links. Third,an adversarial network is established by changing the links in the original network. Finally, various communitydetection algorithms are performed on the adversarial network to validate the attack effect of EPA.

at this time, we rewire links around the target node t,thus we have

E+ ⊆ {(u, t) : u ∈ V, (u, t) /∈ E}, (10)

E− ⊆ {(u, t) : u ∈ V, (u, t) ∈ E}. (11)

Note that the attack gain consists of two parts: thebudget β, defined as the number of rewired links, andthe attack effect. As we can see, the overall attack effectgenerally increases as the budget β grows, as a result, itis our focus to improve the attack gain with only limitedbudget β.

These indicate that community detection attack can al-ways be represented as an optimization problem, whichcould be basically solved by evolutionary computingmethods, such as Genetic Algorithm (GA). In this paper,we propose a GA based Evolutionary Perturbation At-tack (EPA) for community detection. In order to make theGA more suitable for finding appropriate attack strategy,we use rewired link ID as genes on the chromosomeinstead of binary coding, which can effectively reducethe storage space of the population. Considering thatthe number of rewired links is also a variable here, weadopt a strategy called unequal crossover to make thelength of chromosome changeable during the crossoverprocess. Moreover, a novel search mechanism basedon network structural properties, including betweennessand the shortest path lengths between pairwise nodes,is introduced in the mutation process to accelerate theconvergence of GA, making it faster to obtain the ap-proximate optimal solution.

In particular, our EPA is established in four stages,including initialization, evaluation, crossover and mu-tation, which will be introduced one by one in thefollowing. The flowchart of EPA is shown in Fig. 1.

B. InitializationFirst, we directly use the ID of rewired link as the gene

of chromosome to facilitate the evolving. Specifically,we create the indexes for existent links and nonexistentlinks and treat them as link deletion and addition genes,respectively. To make the attack more concealed, we setthe number of deleted links equal to that of added links,so that the total number of links in the network keepsconstant. Denote the threshold of the number of rewiredlinks by θ and the chromosome by Oi with the budgetβ ∈ [1, θ]. We thus have

Oi = {Ai, Bi}= {a1

i , a2i , · · · , aβ

i , b1i , b2

i , · · · , bβi }, (12)

where Oi, with the index i ∈ [1, P], represents the i-thchromosome in population and P is the population sizeof GA. Ai and Bi represent the sets of link addition anddeletion links, respectively, in chromosome Oi, with theelement ak

i (or bki ) being the index of a node pair which

are disconnected (or connected). The adjacency matrixof adversarial perturbation network A, represented bychromosome Oi, is thus calculated by

Auv =

1 Index(u, v) ∈ Ai−1 Index(u, v) ∈ Bi0 otherwise

, (13)

5

2

8

10

3

47

6

9

52

14

1112

1

1

5

4

6

9

12

3

11

10

87

Encoding

Link deletion

1

……

2

3

4

5

6

7

8

1352

Link addition

2

1

3

4

5

6

7

8

……

14

13

Random selectionAttack

2

8

10

3

47

6

9

52

14

1112

1

1

5

4

6

9

12

3

11

10

87

13

[1,2]

[2,4]

[3,6]

[4,7]

[6,9]

[5,6]

[1,6]

[11,12]

[4,8]

……

[1,3]

[1,5]

[1,7]

[1,8]

[1,11]

[1,10]

[1,4]

[10,11]

[1,9]

2 3 6 8 1 5 8 14

……

G

| iG O

:iO

Fig. 2: An illustration of chromosome encoding andinitialization. G|Oi is the adversarial network after theattack represented by chromosome Oi. The original net-work G contains 14 links and 52 disconnected pairsof nodes that have potential to be connected in theattack. The red dashed lines and the green solid linesrepresent the links that are deleted and added in theattack, respectively. In the left, the rectangles representthe link addition genes while the triangles denote thelink deletion genes. Then, we initialize chromosome Oiby randomly selecting β = 4 link addition and deletiongenes, respectively.

where Index(u, v) is the function to obtain the indexof a node pair (u, v). An illustration to explain howchromosomes are coded and initialized is shown inFig. 2.

Note that, to make the target node attack more effec-tive, instead of initializing randomly, we first select acommunity Ct randomly, then we preferentially deletethe links between the target node and those not belong-ing to the community Ct, while establish links betweenthe target node and those in the community Ct. If allnodes in the community Ct are connected to the targetnode, then we randomly select another community andrepeat the above steps.

C. Evaluation1) Fitness Function: Each chromosome corresponds to

an attack strategy. After encoding the attack strategies,we then need to evaluate the attack effect of each strat-egy using some fitness function. Note that the entropyis maximized when each new community consists ofnodes uniformly from many different real communities.Since the uniform distribution emphasized by entropymakes the overall accuracy and recall as low as possible,

we thus think it’s an appropriate metric to evaluate theattack effect. For the global attack or target communityattack, their fitness functions follow the same generalform, represented by

φi = Ψ(d′|Oi)× X(C, C|Oi), (14)

which consists of two parts: the attenuation functionΨ ∈ [0, 1] and the attack effect evaluation function X.Ψ is a monotonic decreasing function of d′|Oi, with d′

being the normalization of d which represents the degreechange after the attack represented by chromosome Oi. Xis the function to evaluate the attack effect based on en-tropy, and the terms C and C|Oi denote the communitydetection results before and after the attack, respectively.For the target node attack, however, since the attackeffect is binary, i.e., either success or failure, the fitnessfunction only contains the first part, i.e., the change oftarget node degree.

2) Attenuation Function: In order to perform the attackwith limited budget β, we use the exponential decayfunction as our attenuation function, described as fol-lows:

Ψ(d′|Oi) = exp(c× d′)|Oi, (15)

where d′ = d/m (m is the number of links in the wholenetwork) for global attack while d′ = d/mt (mt is thenumber of links in the target community) for targetcommunity attack; and c is a constant that controls thedecay speed. For a network of n nodes, the degrees ofall nodes before and after the attack are denoted byD={d1, d2, · · · , dn} and D={d1, d2, · · · , dn}, respectively.Then, the distance d between them is calculated by

d =14

n

∑i=1

∣∣di − di∣∣ , (16)

where di and di are the degrees of node i before and afterthe attack, respectively.

We think that, for the same budget β, the fewer nodeschange their degrees, the smaller the network structurevaries. We thus use distance d instead of budget β as theinput of attenuation function. In the case of β = 1, thedegrees of at most four nodes will change, therefore thevalue of d defined by Eq. (16) tends to be 1, equal to thenumber of rewired links. When the rewiring involves thesame node, the distance d will be smaller than budgetβ. Our EPA method are thus more likely to rewire linksinvolving fewer nodes, making it more concealed.

3) Attack Effect: Given a confusion matrix M witheach element mij representing the number of the sharednodes between the original community Ci and the newcommunity Cj, we define the function X to evaluate theattack effect:

X(C, C|Oi) = (E′Mr+ E′Mc

)|Oi, (17)

where E′Mrand E′Mc

are the normalized entropy of Mrand Mc, and thus X must be in the range of [0, 2].

6

Suppose EMr and EMc are the entropy of new com-munities and target communities, which are obtainedby considering the row vectors and column vectors ofM, respectively. Suppose the matrix M dimension isNa × Nb, with Na and Nb being the numbers of com-munities in control group (i.e., real community number)and test group (i.e., community number detected bycertain community detection algorithm after the attack),respectively, the maximum values of EMr (resp., EMc )EMc are obtained when the values of each row (resp.,column) are equal. In this case, the calculation of EMrand EMc is simplified to

max EMr = −Na

∑i=1

Mi.n

(1

Nblog2

1Nb× Nb) = log2 Nb, (18)

max EMc = −Nb

∑i=1

M.in

(1

Nalog2

1Na× Na) = log2 Na, (19)

where n is the number of nodes in network. Mi. andM.i represent the numbers of nodes in new and originalcommunity Ci, respectively.

For global attack, the entropy for new communitiesEMr and that for target communities EMc are calculatedby

EMr = −Na

∑i=1

Nb

∑j=1

Mi.n

(mi,j

Mi.log2

mi,j

Mi.), (20)

EMc = −Nb

∑i=1

Na

∑j=1

M.in

(mj,i

M.ilog2

mj,i

M.i), (21)

where mij is the number of nodes in new community Cithat originally belong to community Cj.

Suppose all the communities keep exactly the sameafter the attack, we have EMr = EMc = 0. With Eq. (18)we can conclude that EMr and EMc must be in the rangeof [0, log2 Nb] and [0, log2 Na], respectively. Therefore,based on Eq. (14) and Eq. (17), the fitness of Oi for globalattack is defined as

φi = Ψ(d′|Oi)× (EMr

log2 Nb+

EMc

log2 Na)|Oi. (22)

For target community attack, here we limit that anyadded or deleted link must be connected to at least onenode in the target community, which greatly reducesthe searching space of solutions. In this case, M is anNa × 2 matrix with element mij (resp., mi,j) being thenumber of nodes in community Ci that originally belong(resp., don’t belong) to target community Cj. For targetcommunity Cj, EMr and EMc are defined as

EMr = −Na

∑i=1

Mi.n

(mi,j

Mi.log2

mi,j

Mi.+

mi,j

Mi.log2

mi,j

Mi.), (23)

EMc = −Na

∑i=1

mi,j

M.ilog2

mi,j

M.i. (24)

Based on the above definitions, we always havemi,j=Mi. −mi,j. The remaining variables are defined thesame as those in Eq. (20) and Eq. (21).

Similarly, with Eq. (18) we can conclude that EMr andEMc must be in the range of [0, 1] and [0, log2 Na], respec-tively. Therefore, the fitness of Oi for target communityattack is defined as

φi = Ψ(d′)× (EMr +EMc

log2 Na). (25)

It should be noted that our fitness function based onentropy is somewhat similar to the deception score Hmentioned in [30], i.e., both of them have a tendencyto spread more nodes of the target community intomore communities. The main difference is that by usingnormalization, in our method, the number of new com-munities will not be too large, making the attack moreconcealed.

Finally, for target node attack, it is quite easy to hide anode by rewiring links. In this study, however, we try tohide a node by only adding links to the target node.We believe that adding, rather than deleting, certainlinks are much more convenient for social network users.For target node t, the fitness of chromosome Oi thus iscalculated by

φi =

{dt

dt+β|Oisuccessful attack

0 otherwise, (26)

where dt is the degree of target node t before theattack and β|Oi is the length of chromosome Oi. Thus,dt + β|Oi is the degree of the target node after the attackrepresented by Oi.

For any kind of attack, after evaluating each individualin GA, we further use roulette selection method toselect offspring. If the fitness of each individual in thepopulation is φi (i = 1, 2, · · · , M) and the size of thepopulation is M, the selection probability of individualOi is calculated by

P(Oi) =φi

∑Mj=1 φj

. (27)

D. CrossoverTraditionally, the length of chromosomes is set the

same and is fixed throughout the evolution in GA. Here,we prefer to use non-equal crossover, where the lengthof chromosomes can change in the process of crossover,so that we can find the approximate optimal solutionwith the smallest budget β, which is described by thefollowing steps.

1) Choose two chromosomes Oi and Oj and form twosubsets Ei={AE

i , BEi } ⊆ Oi and Ej={AE

j , BEj } ⊆ Oj by

removing the identical elements that are considerednonexchangeable, where AE

i (or AEj ) and BE

i (orBE

j ) are the exchangeable gene sets of addition anddeletion, respectively, in chromosomes Oi (or Oj).

2) Calculate the lengths of AEi , BE

i , AEj and BE

j , and

denote them by liA, li

B, l jA and l j

B, respectively.3) Generate two random integers ri and rj in the inter-

vals [1, min(liA, li

B)] and [1, min(l jA, l j

B)], respectively.

7

Input5

3

3

2

1

5

7

6

8

Crossover

9

11

4

10

7

8

4

1

3

2

1

6

1

10

9

12

Difference sets

3j

Al

4j

Bl

Randomly selection

1jr

3ir

11

E

jAE

jB

E

iA 5i

Al

6i

Bl

iO jO

iE

jE

2

2

54

3 5 7 84 6

76 8

129 1110

129 1110

54 3 5 7 84 6

11 11

321 64

321 109

31 2 5 8 74 6

1 2 129 10

321 64 11 31 2 5 8 11

321 109 5 87 74 61 2 129 10

9 10

6 8

7 85

9 1110

11

7

iO

jO

E

iB

Fig. 3: An example of non-equal crossover operation. Oi and Oj (resp., Oi and Oj) are the chromosomes before(resp., after) the crossover. li

A and liB (resp. l j

A and l jB) are the numbers of exchangeable addition and deletion genes,

respectively, on chromosome Oi (resp. Oj). Furthermore, ri and rj are the numbers of crossed genes in Oi and Oj,respectively.

4) Suppose the numbers of rewired links by chromo-somes Oi and Oj are βi and β j, respectively, and thethreshold is θ. If βi − ri + rj or β j + ri − rj /∈ [1, θ],go to step 3); otherwise, go to step 5).

5) select ri addition and deletion genes, respectively,from Oi, and select rj addition and deletion genes,respectively, from Oj; exchange the genes selectedfrom Oi with those selected from Oj, to generatetwo new chromosomes O1 and O2.

The whole process of crossover is shown in Fig. 3.

E. MutationHere, we further utilize the structural information

around pairwise nodes as heuristic information to ac-celerate the searching process of GA.

For link addition, two nodes are of less similarity if thenetwork distance (or the shortest path length) betweenthem is relatively long. Therefore, in order to make theattack more effective, it is better to add the links betweenpairwise nodes of longer distance. Suppose the distancebetween a pair of disconnected nodes i and j is λij, thenthe probability that a link addition gene to create a linkbetween these two nodes is generated by mutation isdefined as

P(ak) =λij

∑(i,j)/∈E λij, (28)

where ak is the index of the pair of nodes (i, j).For link deletion, the links inside communities always

have lower betweenness than those across different com-munities. Therefore, in order to make the attack moreeffective, it is better to delete the links of lower between-ness. Given a link with its index bk, its betweenness iscalculated by

CB(bk) = ∑s,t∈V

σ(s, t|bk)

σ(s, t), (29)

where V is the node set in the network, σ(s, t) representsthe total number of shortest paths between nodes sand t, and σ(s, t|bk) represents the number of shortestpaths through the link. Then, the probability that alink deletion gene to remove the link is generated bymutation is defined as

P(bk) =

1CB(bk)

∑mk=1

1CB(bk)

, (30)

where m is the total number of links in the network.

III. EXPERIMENTS

In this part, we will perform the three kinds of attackson several synthetic networks and real-world networks.For global attack, we first compare our EPA with threeheuristic algorithms under different budges β, and wealso design the experiment to show the ability of EPAto find the approximate optimal budget. For target com-munity attack, we first use EPA to get the approximateoptimal budget and then compare EPA with the otheralgorithms under this budget. For target node attack,we select some representative nodes as targets. For eachexperiment, we run 10 times and record the mean re-sult. Our experimental environment consists of i7-87003.2GHz (CPU), GTX 1050Ti 4GB (GPU), 16GB memoryand Windows 10.

A. Datasets

To evaluate the attack effect of EPA, we use threecommunity detection algorithms, i.e., greedy (GRE) [13],Infomap Algorithm (INF) [21] and Louvain (LOU) [37],on the six synthetic networks and three real networks,described as follows, with their basic properties pre-sented in TABLE III and TABLE IV, respectively. The

8

descriptions of parameters for generating a syntheticnetwork are listed in TABLE V for convenience.• The synthetic networks [38]: These networks are

generated by LFR benchmark, all of which are undi-rected and unweighted networks.

• The USA college football (Football) [14]: Thisnetwork represents the matches between Americanfootball teams during the season of 2000.

• email-Eu-core network (Email) [39]: The networkrepresents the emails between members of a largeEuropean research institution.

• Political blogs (Pol.Blogs) [40]: This network rep-resents the political leaning collected from blog di-rectories.

TABLE III: The basic properties of the six syntheticnetworks.

Network N k max k µ min c max c

N-1000-µ-0.3 1000 10 50 0.3 50 100N-1000-µ-0.5 1000 10 50 0.5 50 100N-3000-µ-0.3 3000 10 50 0.3 100 200N-3000-µ-0.5 3000 10 50 0.5 100 200N-5000-µ-0.3 5000 15 100 0.3 100 200N-5000-µ-0.5 5000 15 100 0.5 100 200

TABLE IV: The basic properties of the three networks.

Network #Nodes #Links #Communities

Football 115 613 12Email 1005 25571 42

Pol.Books 1490 19090 2

TABLE V: The meaning of parameters in synthetic net-works.

Parameter Meaning

N number of nodesk average degree

max k maximum degreeµ mixing parameter

min c minimum community sizemax c maximum community size

In experiments, we only consider undirected networksand remove the isolated nodes from data sets since theyare meaningless in community detection. The three com-munity detection algorithms we adopt are also brieflyintroduced as follows to make the paper self-contained.• GRE [13]: Each node is considered as a separate

community initially and the communities are fusedin the direction of maximum increment of modular-ity Q. The complexity is O(|V|log2(|V|)).

• INF [21]: It’s an information theory based algorithmwhich strive to compress the average descriptionlength for a random walk. Its complexity is O(|E|).

• LOU [37]: This is a modularity based algorithmwhich can generate a hierarchical community struc-ture by compressing the communities continuously.This algorithm runs in time O(|V|log(|V|)).

• WAL [22]: WalkTrap algorithm merges nodes basedon the transition probability of random walk model.The complexity is O(|V|2log(|V|)).

• EIG [41]: This algorithm finds communities basedon the fidler vector of Laplacian matrix. The com-plexity is O(|V|(|V|+ |E|)).

• SPI [42]: it’s a semi-supervised community detec-tion algorithm which reduces community detectionto the problem of finding the ground state of aninfinite spin glass. Its complexity is O(|V|3.2).

B. Baseline Attack Methods

Inspired by various community detection algorithms,we use the following four attack methods as the base-lines for global attack. Due to the poor performance ofrandom attack, we introduce heuristic strategies on therandom attack and propose two heuristic attack methodsAB and AD.

• AQ: A GA based method where the modularity Qis used to design the fitness function [31].

• AS: Rather than using the entropy-based fitnessfunction, here we use the average of deceptionscore [30] as the fitness function and the rest is thesame as EPA.

• AB: Deleting the links with the highest betweennesscentrality, while adding the links between the nodeswith the longest distance.

• AD: Deleting the links with the largest sum ofdegrees of their terminal nodes, while adding thelinks between the nodes with the longest distance.

For target community attack, we use the safeness baseddeception algorithm Ds [30] and random algorithmDw [29] as the baseline attack methods, which caneffectively hide the target community against differentcommunity detection algorithms, and they are brieflyintroduced as follows.

• Ds: Both link addition and deletion aim to maximizethe safeness of target community defined in [30].

• Dw: This method randomly adds links inter differentcommunities while deletes links intra communities.

For target node attack, we propose a heuristic algorithmDr which randomly add links between target node andthe nodes in other communities.

C. Performance Metrics

In order to verify the effectiveness of our EPA, wecompare it with other baseline attack methods by usinga series of metrics.

For global attack, we use Normalized Mutual Informa-tion (NMI) and Adjusted Rand Index (ARI) to evaluatethe community detection results. Then, we further eval-uate the attack effects by comparing their values beforeand after attacks. NMI and ARI are two of the mostwidely used metrics of community detection [43], [44],

9

and NMI is also used in community detection attack [30],[31]. In particular, NMI and ARI are defined as

NMI =−2 ∑Na

i=1 ∑Nbj=1 mij log(

mijnMi Mj

)

∑Nai=1 Mi log(Mi

n ) + ∑Nbj=1 Mj log(

Mjn )

, (31)

ARI =∑ij (

mij2 )− [∑i (

Mi2 )∑i (

Mj2 )]/(n

2)

12 [∑i (

Mi2 ) + ∑j (

Mj2 )]− [∑i (

Mi2 )∑j (

Mj2 )]/(n

2), (32)

where Mi and Mj are the sums over row i and column j,respectively. Na and Nb are the numbers of communitiesin control group (i.e., real community number) andtest group (i.e., community number detected by certaincommunity detection algorithm), respectively.

For target community attack, in addition to the fitness,we also use the deception score H [30] to evaluate theattack effect, which is defined as

H =

[1− |S(C)| − 1

|C| − 1

]×[

12(1−max(R)) +

12(1− P)

], (33)

where |S(C)| is the number of connected components inthe subgraph induced by the members in C. P and R arethe mean precision and recall rate, respectively.

For target node attack, we just use the percentage oftarget node degree increment in the attack to evaluatethe results.

D. Experimental Results

1) Global Attack: In global attack, we fix the budgetand rewire k% of links to compare our EPA with fourbaseline methods including AQ, AS, AB and AD. Wechoose LOU as the basic community detection algorithm,and meanwhile we also use the GRE and INF algorithmsto verify the black-box attack effect. The populationused in the experiment is 100, the maximum numberof iterations is 200, the crossover and mutation ratesare 0.6 and 0.1, respectively. TABLE VI represents thecommunity detection results before attack.

The community detection results, in terms of NMIand ARI, on different datasets obtained by variouscommunity detection algorithms, after the attacks byEPA and the four baseline attack methods for variouspercentages of rewired links, are presented in Fig. 4. Andthe decrease of NMI on Pol.Blogs network is shown inFig. 5. Generally, we can find that, our EPA has the bestattack effect on each performance metric, i.e., leading tosmaller NMI and ARI, in most cases.

More specifically, we find that EPA performs especiallywell on Pol.Blogs network, i.e., rewiring 4% links candecrease both NMI and ARI to near 0.3. This maybe because LOU performs quite well in revealing thecommunity structure of this network, and meanwhilethis network is also easy to be disturbed since theconnections are much sparse. In fact, none of the at-tack methods performs well on LFR generated networkswith small mixing parameter, i.e., NMI and ARI still

keep relatively high after any kind of attack, since thecommunities of these networks are relatively dense andthus are easier to be detected. Indeed, for larger mixingparameter, all attacks behave better, leading to smallerNMI and ARI. By comparison, EPA outperforms all theothers on synthetic networks, no matter for large orsmall mixing parameters. Besides, we can also find thatwhite-box attack (LOU) behaves better than black-boxattacks (other community detection algorithms), whichis quite intuitive since all the attack methods here arebased on the LOU algorithm. More interestingly, bycomparison, the communities detected by GRE are moreaffected than other algorithms, under the attack of LOUbased EPA. This may be because both GRE and LOU aremodularity-based algorithms, while the others are not.In addition, the communities detected by EIG on LFRnetworks are quite robust, which may be because EIGperforms very poor compared with other communitydetection algorithms on LFR network.

Now, let’s focus on the influence of the attenuationfactor c which controls the degree that the budget βpenalizes the fitness and ultimately controls the approxi-mate optimal budget generate by EPA. We thus comparethe attack results under different values of c, and fix theother parameters. For each value, we run the experiment10 times and report the mean results in TABLE VII,where we can see that as c increases from 3 to 6, thefinal budget generated by EPA significantly decreaseswith a little bit sacrifice of attack effects, i.e., NMI andARI increases slightly as c increases. We thus suggest touse relatively large value of c if we want to get a moreconcealed attack, while use relatively small value of c ifthe approximate optimal attack effect is pursued.

2) Target Community Attack: Target community attackis also known as community deception. As an example,we visualize the attack effect of our EPA on the Dolphinnetwork using t-SNE algorithm [45], as shown in Fig. 6.Because the low-dimensional vectors generated by t-SNE algorithm will preserve the feature of networkcommunity structure, the position of nodes in Fig. 6can reflect the result of community detection. Here, wecompare EPA with safeness based deception algorithmDs [30] and random algorithm Dw [29]. In order to makefair comparison, we fix the budget β for each algorithm,i.e., we first use our EPA to find the approximate optimalbudget and then use it as the input of Ds and Dw. Again,we record the mean values of fitness and deceptionscore H for all detected communities with at least 10nodes on each network, as presented in TABLE VIII-TABLE IX. Note that here, for synthetic networks, weonly give the results of INF and LOU algorithms, sinceother algorithms is of high time complexity and thus isquite time-consuming on networks including more thanthousands of nodes, especially we need to attack eachcommunity in a network.

We find that, again, our EPA behaves significantlybetter, in terms of much higher fitness and deceptionscore H, than both Ds and Dw. And such results are

10

EPA AQ AS AB AD EPA AQ AS AB AD EPA AQ AS AB AD EPA AQ AS AB AD EPA AQ AS AB AD EPA AQ AS AB AD EPA AQ AS AB AD EPA AQ AS AB AD EPA AQ AS AB AD

GRE .67 .72 .72 .73 .76 .63 .69 .69 .78 .72 .59 .69 .71 .71 .73 .37 .43 .39 .41 .41 .32 .40 .35 .37 .38 .29 .38 .34 .31 .32 .57 .62 .61 .65 .71 .42 .49 .46 .61 .68 .32 .36 .36 .60 .65

INF .88 .92 .90 .92 .92 .86 .92 .88 .92 .92 .81 .90 .86 .92 .92 .61 .64 .61 .63 .63 .57 .60 .60 .60 .60 .54 .58 .60 .59 .61 .44 .47 .48 .51 .51 .33 .37 .38 .52 .52 .28 .34 .33 .51 .50

LOU .78 .80 .83 .89 .85 .75 .77 .79 .89 .88 .71 .73 .78 .89 .86 .51 .51 .57 .56 .53 .48 .48 .55 .50 .51 .42 .42 .53 .49 .52 .53 .58 .57 .60 .63 .34 .45 .43 .58 .58 .29 .35 .36 .51 .49

WLK .82 .85 .86 .86 .89 .79 .82 .80 .86 .89 .79 .81 .80 .85 .89 .53 .55 .58 .54 .55 .51 .52 .53 .52 .54 .49 .50 .53 .51 .52 .57 .62 .57 .63 .63 .39 .50 .43 .62 .62 .30 .41 .32 .58 .51

EIG .68 .70 .72 .72 .70 .64 .65 .67 .70 .72 .61 .65 .65 .68 .72 .44 .46 .46 .50 .50 .43 .44 .46 .48 .49 .40 .42 .41 .41 .43 .64 .65 .66 .66 .64 .59 .61 .64 .66 .63 .55 .57 .59 .66 .58

SPI .89 .91 .90 .91 .90 .89 .89 .90 .91 .90 .87 .89 .90 .90 .89 .52 .53 .54 .55 .54 .48 .49 .48 .49 .49 .47 .48 .50 .50 .50 .52 .53 .54 .60 .60 .39 .41 .42 .59 .57 .34 .36 .36 .52 .51


GRE .42 .44 .50 .49 .53 .35 .49 .41 .55 .48 .30 .48 .48 .42 .50 .14 .16 .15 .16 .16 .12 .15 .14 .15 .15 .10 .14 .13 .11 .12 .73 .76 .76 .74 .82 .58 .65 .65 .70 .79 .47 .53 .52 .68 .78

INF .85 .90 .86 .90 .90 .84 .90 .85 .90 .90 .79 .88 .83 .90 .90 .28 .30 .28 .29 .29 .26 .29 .28 .27 .28 .23 .25 .26 .26 .29 .57 .60 .60 .61 .61 .46 .50 .51 .61 .61 .41 .47 .44 .58 .56

LOU .56 .61 .65 .80 .72 .53 .54 .58 .80 .72 .47 .47 .57 .82 .71 .25 .23 .34 .32 .25 .24 .25 .32 .26 .25 .22 .18 .31 .25 .27 .68 .69 .72 .69 .71 .38 .62 .58 .65 .70 .31 .38 .48 .56 .54

WLK .62 .71 .71 .71 .82 .58 .62 .59 .71 .82 .57 .60 .59 .70 .82 .18 .18 .24 .19 .20 .17 .18 .20 .19 .21 .17 .19 .20 .19 .19 .70 .73 .70 .73 .73 .54 .64 .58 .71 .72 .43 .54 .44 .67 .57

EIG .41 .47 .49 .54 .46 .34 .37 .37 .52 .53 .33 .38 .39 .51 .50 .21 .22 .21 .26 .25 .21 .21 .23 .24 .25 .16 .20 .18 .18 .19 .76 .77 .77 .78 .76 .69 .71 .72 .78 .75 .66 .67 .68 .76 .68

SPI .82 .85 .82 .85 .84 .82 .83 .84 .85 .84 .74 .79 .83 .83 .82 .24 .25 .26 .29 .24 .22 .22 .23 .23 .25 .21 .23 .23 .24 .25 .68 .70 .70 .69 .69 .56 .58 .58 .66 .63 .49 .50 .50 .56 .56


GRE .71 .73 .78 .80 .76 .65 .67 .70 .69 .68 .62 .69 .68 .68 .67 .63 .67 .66 .67 .67 .58 .65 .62 .63 .62 .57 .61 .61 .60 .62 .65 .76 .70 .65 .67 .63 .75 .66 .65 .63 .61 .70 .65 .64 .62

INF .93 .96 .96 .94 .94 .90 .94 .94 .92 .93 .88 .90 .92 .90 .92 .94 .95 .96 .95 .95 .90 .93 .95 .92 .93 .86 .90 .92 .89 .93 .98 1.0 1.0 .98 .98 .95 .99 1.0 .96 .97 .92 .98 .94 .95 .95

LOU .95 .95 .98 .98 .97 .92 .92 .96 .95 .94 .90 .89 .95 .95 .93 .98 .98 .99 .99 .99 .97 .97 .98 .99 .98 .95 .94 .97 .98 .97 .99 1.0 1.0 1.0 .99 .98 1.0 .99 .99 .98 .95 .99 .99 .97 .96

WLK .80 .81 .82 .84 .84 .77 .78 .79 .78 .83 .74 .74 .78 .75 .83 .75 .79 .80 .78 .80 .71 .76 .77 .77 .79 .66 .72 .73 .74 .79 .89 .92 .93 .92 .93 .88 .89 .92 .88 .92 .84 .85 .88 .87 .91

EIG .39 .41 .41 .45 .48 .32 .36 .43 .41 .45 .29 .30 .43 .39 .47 .28 .44 .29 .29 .38 .22 .32 .15 .20 .33 .21 .25 .27 .28 .33 .19 .33 .23 .20 .20 .15 .23 .23 .16 .25 .14 .12 .21 .12 .28

SPI .96 .98 .98 .96 .97 .93 .97 .96 .94 .94 .92 .96 .94 .92 .92 .97 .98 .99 .98 .98 .94 .98 .98 .96 .96 .93 .97 .97 .94 .95 .91 .94 .93 .92 .93 .89 .94 .94 .91 .90 .85 .93 .93 .86 .88


GRE .51 .54 .60 .61 .59 .45 .49 .56 .56 .55 .45 .52 .51 .56 .52 .37 .44 .39 .40 .42 .31 .39 .36 .37 .36 .30 .36 .34 .34 .36 .31 .46 .36 .32 .34 .28 .44 .32 .31 .29 .27 .38 .31 .30 .28

INF .92 .95 .95 .94 .94 .88 .92 .93 .90 .91 .86 .88 .90 .88 .89 .93 .94 .95 .94 .95 .89 .92 .94 .91 .93 .84 .88 .91 .88 .93 .98 1.0 1.0 .98 .98 .94 .99 .99 .95 .95 .90 .98 .91 .91 .92

LOU .91 .92 .98 .97 .96 .86 .90 .96 .95 .93 .84 .82 .95 .95 .92 .94 .94 .99 .99 .99 .93 .93 .98 .98 .97 .91 .89 .96 .98 .96 .97 1.0 1.0 .99 .98 .95 .99 .99 .99 .95 .93 .99 .99 .95 .94

WLK .76 .77 .80 .81 .82 .74 .74 .75 .76 .81 .70 .70 .74 .72 .80 .72 .75 .77 .74 .76 .70 .71 .73 .73 .74 .61 .65 .67 .68 .74 .88 .90 .91 .90 .92 .85 .85 .89 .85 .89 .79 .81 .85 .83 .87

EIG .16 .23 .20 .28 .26 .10 .16 .27 .23 .26 .10 .11 .23 .20 .23 .07 .14 .09 .09 .15 .03 .08 .04 .04 .12 .03 .05 .10 .07 .10 .04 .06 .05 .04 .04 .03 .05 .05 .03 .08 .03 .03 .04 .03 .08

SPI .96 .98 .98 .96 .97 .93 .97 .96 .94 .94 .89 .96 .94 .90 .91 .97 .98 .98 .98 .98 .91 .98 .98 .93 .96 .90 .96 .97 .91 .95 .76 .78 .77 .77 .77 .71 .78 .78 .72 .74 .65 .78 .78 .66 .71


GRE .33 .38 .37 .37 .34 .29 .32 .35 .35 .31 .27 .28 .31 .36 .31 .27 .28 .35 .31 .29 .24 .25 .29 .32 .28 .20 .21 .19 .24 .19 .37 .51 .46 .41 .42 .30 .44 .44 .38 .35 .25 .41 .40 .35 .37

INF .75 .76 .76 .77 .78 .72 .74 .74 .74 .77 .67 .71 .73 .70 .79 .68 .69 .70 .68 .71 .65 .67 .67 .67 .70 .61 .64 .66 .64 .71 .89 .92 .94 .91 .94 .85 .90 .91 .88 .93 .81 .86 .89 .85 .92

LOU .60 .60 .75 .70 .79 .51 .53 .71 .63 .69 .48 .45 .58 .60 .67 .49 .50 .67 .63 .59 .43 .45 .56 .60 .62 .39 .44 .51 .50 .46 .92 .93 .97 .94 .95 .87 .92 .94 .91 .93 .85 .90 .94 .89 .92

WLK .45 .49 .46 .47 .46 .43 .47 .44 .45 .48 .40 .42 .43 .42 .48 .42 .43 .44 .44 .44 .40 .42 .43 .40 .42 .37 .42 .43 .40 .44 .53 .56 .56 .56 .57 .50 .52 .53 .53 .59 .49 .49 .51 .53 .60

EIG .21 .19 .22 .27 .23 .20 .20 .21 .21 .23 .19 .22 .20 .21 .18 .13 .17 .14 .15 .12 .13 .14 .13 .14 .14 .13 .14 .13 .14 .15 .17 .17 .19 .17 .21 .13 .16 .15 .14 .15 .12 .13 .18 .14 .14

SPI .88 .89 .89 .89 .88 .80 .84 .85 .83 .82 .75 .80 .80 .77 .85 .85 .87 .89 .86 .87 .78 .84 .87 .80 .87 .76 .81 .81 .77 .87 .86 .91 .88 .87 .89 .83 .87 .85 .85 .87 .82 .84 .84 .84 .86


GRE .17 .21 .23 .25 .31 .16 .17 .21 .21 .25 .15 .15 .18 .22 .23 .15 .15 .22 .20 .11 .12 .13 .17 .21 .15 .09 .11 .09 .14 .08 .13 .20 .21 .15 .15 .10 .16 .19 .13 .12 .08 .14 .15 .12 .13

INF .66 .67 .67 .69 .69 .61 .64 .63 .64 .67 .55 .60 .63 .59 .70 .54 .55 .56 .54 .57 .50 .52 .53 .53 .57 .45 .48 .50 .48 .59 .86 .90 .91 .89 .92 .79 .86 .88 .83 .89 .70 .81 .86 .76 .86

LOU .44 .44 .65 .62 .72 .36 .37 .58 .49 .55 .34 .32 .42 .46 .55 .33 .34 .50 .52 .43 .27 .33 .40 .46 .44 .24 .31 .38 .37 .32 .72 .76 .93 .86 .88 .71 .83 .85 .79 .85 .68 .82 .85 .75 .80

WLK .25 .30 .26 .27 .27 .23 .26 .24 .27 .33 .21 .21 .23 .25 .32 .19 .19 .20 .21 .21 .14 .18 .18 .18 .20 .14 .15 .17 .16 .22 .39 .42 .40 .41 .44 .33 .34 .35 .38 .46 .28 .33 .29 .35 .47

EIG .10 .08 .09 .12 .10 .09 .08 .09 .09 .10 .08 .10 .08 .09 .05 .05 .06 .05 .05 .03 .05 .05 .05 .05 .05 .05 .05 .04 .05 .06 .03 .04 .05 .03 .07 .03 .04 .04 .03 .04 .03 .03 .05 .03 .03

SPI .84 .88 .88 .87 .85 .75 .81 .82 .79 .76 .67 .75 .75 .69 .79 .82 .83 .88 .82 .83 .74 .82 .85 .76 .85 .68 .79 .76 .73 .85 .64 .74 .66 .65 .69 .61 .66 .63 .65 .69 .60 .61 .66 .65 .68

k =3 k =5

k =5

k =1 k =3 k =5

k =1 k =3 k =5

k =1

k =3 k =5

k =1 k =3

k =1 k =3 k =5

k =5

k =1 k =3 k =5 k =1 k =3 k =5 k =1 k =3 k =5

k =3 k =5

k =5 k =1 k =3 k =5Algorithm

Algorithm

Algorithm

Algorithm

Algorithm

k =1 k =3 k =5

k =1 k =3 k =5

k =1 k =3 k =5

k =1

Algorithmk =1 k =3 k =5 k =1

Football Email Pol.Blogs

N-5000-μ-0.3

N-5000-μ-0.5

N-3000-μ-0.3

N-3000-μ-0.5

N-1000-μ-0.3

N-1000-μ-0.5

k =1 k =3

k =3 k =5 k =1 k =3

k =1

ARI

NMI

Metric

Metric

Metric

Metric

Metric

NMI

ARI

ARI

NMI

Metric

Fig. 4: The attack effects, reflected by the change of NMI and ARI, obtained by EPA and the four baseline attackmethods by rewiring k% of links, with k varies from 1 to 5, for the three real-world networks and six syntheticnetworks. Darker colors represent better attack performances and the best performance of each k is shown in redbox.

TABLE VI: The community detection results before attack.

NetworkGRE INF LOU WAL EIG SPI

NMI ARI NMI ARI NMI ARI NMI ARI NMI ARI NMI ARI

Football 0.73 0.49 0.92 0.90 0.73 0.49 0.89 0.82 0.70 0.46 0.91 0.85Email 0.45 0.16 0.68 0.30 0.58 0.33 0.57 0.19 0.50 0.26 0.55 0.25

Pol.Blog 0.69 0.79 0.52 0.65 0.64 0.77 0.69 0.78 0.64 0.76 0.63 0.77N-1000-µ-0.3 0.73 0.54 0.97 0.95 0.99 0.98 0.85 0.82 0.45 0.26 0.99 0.99N-1000-µ-0.5 0.37 0.19 0.78 0.70 0.85 0.74 0.47 0.27 0.23 0.10 0.92 0.90N-3000-µ-0.3 0.63 0.34 0.96 0.96 1.0 1.0 0.81 0.77 0.28 0.06 0.99 0.99N-3000-µ-0.5 0.28 0.14 0.71 0.56 0.57 0.41 0.44 0.20 0.14 0.05 0.89 0.85N-5000-µ-0.3 0.75 0.44 1.00 1.00 1.0 1.0 0.94 0.93 0.22 0.04 0.94 0.78N-5000-µ-0.5 0.42 0.15 0.95 0.93 0.98 0.95 0.56 0.44 0.21 0.04 0.89 0.66

TABLE VII: The global attack results obtained by EPA under various attenuation factor c.

Network c=3 c=4 c=5 c=6

β NMI ARI β NMI ARI β NMI ARI β NMI ARI

Football 12.8 0.76 0.53 10.6 0.77 0.54 8.6 0.77 0.55 7.8 0.78 0.57Email 566.6 0.48 0.22 506.4 0.49 0.23 134.5 0.51 0.25 113.7 0.53 0.26

Pol.Blogs 1947.1 0.33 0.35 1690.9 0.34 0.35 1172.8 0.37 0.36 1125.5 0.38 0.39

11

Fig. 5: The attack effects of the five attack strategies on six community detection algorithms and Pol.Blogs network,in term of NMI, for various numbers of attacks.

(a) Before attack (b) After attack

Fig. 6: The t-SNE visualization of community deceptionobtained by EPA on Dolphin network. The nodes ofsame color belongs to the same community detected byGRE, and the target community is marked by blue. (a)Before attack; (b) After attack.

quite robust. For instance, for the Football network, itseems that Ds and Dw lose their attack effects on INFand LOU, i.e., the values of Fitness and H are quitesmall by comparing with those on GRE, as presentedin TABLE IX. However, by using our EPA, their valuesstill keep relatively large for all the six communitydetection algorithms, indicating that EPA is effective inboth white-box and black-box situations. In addition, wecan also find that EPA is not effective on SPI, whichmay be because SPI is a semi-supervised algorithm withrelatively strong robustness. Moreover, it seems that thesynthetic networks with smaller mixing parameters µalways have lower fitness in the corresponding cases.The reason may be that the networks of smaller µ tendto greater differences between communities, making itmore difficult to hide the target community, i.e., corre-

TABLE VIII: Deception results on synthetic networks.

dataset Alg INF Lou

Fitness H Fitness H

1000-0.3EPA 0.26 0.51 0.34 0.68Ds 0.14 0.42 0.14 0.60Dw 0.04 0.18 0.08 0.55

1000-0.5EPA 0.32 0.52 1.06 0.80Ds 0.12 0.36 0.54 0.61Dw 0.10 0.32 0.52 0.61

3000-0.3EPA 0.16 0.41 0.26 0.72Ds 0.10 0.39 0.10 0.68Dw 0.04 0.23 0.06 0.63

3000-0.5EPA 0.18 0.48 1.08 0.84Ds 0.12 0.40 0.50 0.60Dw 0.10 0.39 0.50 0.60

5000-0.3EPA 0.06 0.27 0.08 0.51Ds 0.04 0.24 0.08 0.48Dw 0.02 0.18 0.02 0.46

5000-0.5EPA 0.14 0.43 0.34 0.82Ds 0.10 0.37 0.12 0.70Dw 0.04 0.26 0.08 0.67

sponding to a smaller fitness value, according to Eq. (23).Compared with Ds algorithm, EPA algorithm needs

complete knowledge about network structure, but theeffect of community deception is much better than Dsalgorithm. In addition, because the fitness of EPA fortarget community attack only considers the change ofthe target community, it can attack the target commu-nity as long as the community detection algorithm canaccurately divide the target community before attack. Inthe other word, EPA can still attack the target community

12

TABLE IX: Deception results on real-world networks.

Network Alg GRE INF Lou WAL EIG SPI

Fitness H Fitness H Fitness H Fitness H Fitness H Fitness H

FootballEPA 0.62 0.58 0.48 0.41 0.54 0.46 0.33 0.41 0.66 0.56 0.29 0.34Ds 0.26 0.33 0.04 0.08 0.10 0.16 0.18 0.32 0.38 0.40 0.15 0.27Dw 0.16 0.22 0.00 0.00 0.06 0.09 0.05 0.08 0.30 0.34 0.05 0.09

EmailEPA 0.68 0.57 0.54 0.45 0.78 0.66 0.48 0.44 0.68 0.55 0.34 0.49Ds 0.40 0.52 0.06 0.13 0.46 0.54 0.28 0.34 0.34 0.49 0.28 0.41Dw 0.32 0.44 0.02 0.06 0.36 0.47 0.13 0.25 0.30 0.41 0.27 0.43

Pol.BlogsEPA 0.38 0.47 0.76 0.58 0.54 0.53 0.56 0.39 0.93 0.42 0.42 0.44Ds 0.26 0.43 0.18 0.52 0.32 0.42 0.51 0.32 0.67 0.33 0.32 0.38Dw 0.22 0.40 0.10 0.33 0.30 0.38 0.49 0.24 0.43 0.29 0.33 0.38

Target node

Target node

(a) Before attack (b) After attack

Fig. 7: The t-SNE visualization of target node attackobtained by EPA on Dolphin network. The target nodeoriginally belongs to the blue community. After theattack, it belongs to the red community. The communitiesare detected by GRE. (a) Before attack; (b) After attack.

even if the knowledge of other communities is missingpartly.

3) Target Node Attack: Similarly, for target node attack,we also visualize the attack effect of our EPA on theDolphin network using t-SNE algorithm, as shown inFig. 7. In addition to the three real-wrold datasets, wealso do target node attack on the 9/11 terrorist net-works [46], in which terrorists may try to hide them-selves in terrorist community. The 9/11 terrorist networkconsists of 37 nodes and 85 links. Here, we focus onattacking hub and bridge nodes in a network sincethey are always more important in various applications.In particular, we first rank the nodes in each networkbased on their degree and betweenness, from large tosmall, respectively. Then, in addition to selecting thetwo nodes with the largest degree and betweenness,respectively, as the target nodes, we also sum the twoorders for each node and choose the top one as ouranother target node. We name these three target nodesas T1,T2 and T3, respectively. Finally, we compare EPAwith the Dr algorithm that randomly adds links betweentarget nodes and the nodes in different communities.

The percentages of degree increment in the attack fordifferent strategies are presented in TABLE X-TABLE X.We can find that, in general, EPA performs significantlybetter than Dr, in terms of smaller percentage of degreeincrement, especially in the football network. Moreover,the bridge nodes of large betweenness are relatively

TABLE X: Target node attack on synthetic networks.

dataset Node INF Lou

EPA Dr EPA Dr

1000-0.3T1 67% 336% 53% 406%T2 64% 412% 43% 349%T3 70% 409% 55% 445%

1000-0.5T1 43% 385% 53% 227%T2 46% 320% 22% 58%T3 29% 265% 28% 117%

3000-0.3T1 142% 340% 40% 625%T2 114% 408% 47% 522%T3 194% 493% 54% 596%

3000-0.5T1 242% 302% 16% 138%T2 60% 279% 8% 19%T3 78% 321% 16% 26%

5000-0.3T1 126% 625% 264% 1853%T2 97% 535% 224% 1280%T3 338% 691% 416% 2261%

5000-0.5T1 60% 560% 55% 706%T2 70% 558% 52% 653%T3 128% 672% 83% 815%

easy to be attacked, since the neighbors of bridge nodesare always distributed in different communities, makingthem quite sensitive to link changes. On the contrary, it’srelatively difficult to hide a node with both high degreeand betweenness, since as hub nodes, a much largenumber of links always need to be added to change theircommunities. In addition, we can also find that both EPAand Dr algorithms fail to attack GRE algorithm, whichmeans that the community that target node T2 belongsto can’t be changed even if T2 is connected to a numberof nodes in other communities. It may be due to the factthat, based on GRE algorithm, the community that T2belongs to is not only very densely connected, but alsoall the other nodes except T2 of this community has fewlinks to the other communities. Note that, for real-worldnetworks, based on EPA, the number of added links ismuch smaller than the degree of target nodes, indicatingour method performs well in target node attack.

IV. CONCLUSIONS

In this paper, we propose a novel Evolutionary Per-turbation Attack (EPA) method, based on Genetic Algo-rithm (GA), to disturb community detection algorithms

13

TABLE XI: Target node attack on real-world networks.

Network Node GRE INF LOU WAL EIG SPI

EPA Dr EPA Dr EPA Dr EPA Dr EPA Dr EPA Dr

FootballT1 8.3% 186% 67% 367% 33% 658% 42% 67% 8.3% 17% 67% 375%T2 9.1% 155% 27% 182% 18% 145% 9.1% 45% 9.1% 18% 17% 47%T3 33% 199% 50% 550% 42% 237% 25% 50% 17% 33% 38% 52%

EmailT1 8.4% 15% 10% 35% 0.3% 0.87% 19% 122% 4.3% 18% 1.7% 11%T2 117% 23% 6.0% 29% 0.5% 2.31% 17% 116% 1.4% 2.9% 1.4% 10%T3 123% 32% 23% 61% 5.2% 12% 25% 123% 24% 39% 16% 34%

Pol.BlogsT1 39% 83% 34% 51% 37% 52% 16% 30% 109% 118% 20% 66%T2 38% 75% 16% 33% 17% 27% 38% 47% 7.6% 121% 32% 41%T3 32% 71% 24% 60% 26% 48% 20% 25% 211% 227% 12% 61%

TERT1 6.7% 55% 27% 93% 13% 39% 47% 72% 13% 94% 27% 69%T2 False False 30% 84% 30% 42% 30% 71% 10% 182% 20% 32%T3 14% 41% 50% 121% 14% 44% 29% 54% 14% 38% 21% 54%

in three scales, from local to global, by only changing asmall fraction of links. In particular, we design a non-equal crossover strategy to treat the length of chromo-some as a variable and naturally integrate it into GA; andwe also integrate the network structural information,such as network distance and betweenness, into thealgorithm to guide the search so as to find the betterapproximate optimal attack strategy more quickly. Nu-merical experiments on six synthetic networks and threereal-world validate the effectiveness and transferabilityof our EPA method on attacking various communitydetection algorithms, i.e., by comparing with other attackmethods of different scales, EPA behaves the best inmost cases, achieving the state-of-the-art attack effects.However, our EPA is of relatively high time complex-ity, making it time consuming to deal with large-scalenetworks.

In the future, we will expand this work in the follow-ing three directions. First, we will propose new networkcoding methods and also utilize more network structuralproperties to improve the efficiency of EPA; second, wewill try to integrate network embedding and deep learn-ing graph models to improve the attack performance;third, we will do more experiments on more variousnetworks, and further check the effectiveness of ourEPA on the downstream algorithms based on communitydetection, to see the potential influence of EPA on manyreal applications.

ACKNOWLEDGMENT

The authors would like to thank all the membersin our IVSN research group in Zhejiang University ofTechnology for the valuable discussion about the ideasand technical details presented in this paper.

REFERENCES

[1] W. Chen and S.-H. Teng, “Interplay between social influence andnetwork centrality: A comparative study on shapley centralityand single-node-influence centrality,” in Proceedings of the 26thInternational Conference on World Wide Web. International WorldWide Web Conferences Steering Committee, 2017, pp. 967–976.

[2] C. Ding, F. Xia, G. Gopalakrishnan, W. Qian, and A. Zhou, “Team-gen: An interactive team formation system based on professionalsocial network,” in International Conference on World Wide WebCompanion, 2017, pp. 195–199.

[3] C. Fu, M. Zhao, L. Fan, X. Chen, J. Chen, Z. Wu, Y. Xia, andQ. Xuan, “Link weight prediction using supervised learningmethods and its application to yelp layered network,” IEEETransactions on Knowledge and Data Engineering, 2018.

[4] Q. Xuan, Z.-Y. Zhang, C. Fu, H.-X. Hu, and V. Filkov, “Socialsynchrony on complex networks,” IEEE transactions on cybernetics,vol. 48, no. 5, pp. 1420–1431, 2018.

[5] K. An, Y.-C. Chiu, X. Hu, and X. Chen, “A network partitioning al-gorithmic approach for macroscopic fundamental diagram-basedhierarchical traffic network management,” IEEE Transactions onIntelligent Transportation Systems, vol. 19, no. 4, pp. 1130–1139,2018.

[6] J. Bao, T. He, S. Ruan, Y. Li, and Y. Zheng, “Planning bike lanesbased on sharing-bikes’ trajectories,” in Proceedings of the 23rdACM SIGKDD international conference on knowledge discovery anddata mining. ACM, 2017, pp. 1377–1386.

[7] Q. Xuan, F. Du, Y. Li, and T.-J. Wu, “A framework to model thetopological structure of supply networks,” IEEE Transactions onAutomation Science and Engineering, vol. 8, no. 2, pp. 442–446, 2011.

[8] S. He, L. Huang, J. Shen, G. Gao, G. Wang, X. Chen, and L. Zhu,“Time synchronization network for east poloidal field powersupply control system based on ieee 1588,” IEEE Transactions onPlasma Science, 2018.

[9] F. D. Malliaros and M. Vazirgiannis, “Clustering and communitydetection in directed networks: A survey,” Physics Reports, vol.533, no. 4, pp. 95–142, 2013.

[10] M. R. Kamdar and M. A. Musen, “Phlegra: Graph analytics inpharmacology over the web of life sciences linked open data,” inProceedings of the 26th International Conference on World Wide Web.International World Wide Web Conferences Steering Committee,2017, pp. 321–329.

[11] M. E. Newman and M. Girvan, “Finding and evaluating commu-nity structure in networks,” Physical review E, vol. 69, no. 2, p.026113, 2004.

[12] M. Newman, “Community detection in networks: Modularityoptimization and maximum likelihood are equivalent,” arXivpreprint arXiv:1606.02319, 2016.

[13] M. E. Newman, “Fast algorithm for detecting community struc-ture in networks,” Physical review E, vol. 69, no. 6, p. 066133, 2004.

[14] M. Girvan and M. E. Newman, “Community structure in socialand biological networks,” Proceedings of the national academy ofsciences, vol. 99, no. 12, pp. 7821–7826, 2002.

[15] A. Clauset, “Finding local community structure in networks,”Physical review E, vol. 72, no. 2, p. 026132, 2005.

[16] P. Schuetz and A. Caflisch, “Multistep greedy algorithm identi-fies community structure in real-world and computer-generatednetworks,” Physical Review E, vol. 78, no. 2, p. 026112, 2008.

[17] R. Guimera and L. A. N. Amaral, “Functional cartography ofcomplex metabolic networks,” nature, vol. 433, no. 7028, p. 895,2005.

14

[18] D. Zhou and X. Wang, “A neighborhood-impact based commu-nity detection algorithm via discrete pso,” Mathematical Problemsin Engineering, vol. 4, pp. 1–15, 2016.

[19] F. Folino and C. Pizzuti, “An evolutionary multiobjective ap-proach for community discovery in dynamic networks,” IEEETransactions on Knowledge and Data Engineering, vol. 26, no. 8, pp.1838–1852, 2014.

[20] S. Fortunato, “Community detection in graphs,” Physics reports,vol. 486, no. 3-5, pp. 75–174, 2010.

[21] M. Rosvall and C. T. Bergstrom, “Maps of random walks oncomplex networks reveal community structure,” Proceedings of theNational Academy of Sciences, vol. 105, no. 4, pp. 1118–1123, 2008.

[22] P. Pons and M. Latapy, “Computing communities in large net-works using random walks,” in International symposium on com-puter and information sciences. Springer, 2005, pp. 284–293.

[23] M. Alamgir and U. Von Luxburg, “Multi-agent random walks forlocal clustering on graphs,” in Data Mining (ICDM), 2010 IEEE10th International Conference on. IEEE, 2010, pp. 18–27.

[24] X. Fu, C. Wang, Z. Wang, and Z. Ming, “Threshold randomwalkers for community structure detection in complex networks,”Journal of Software, vol. 8, no. 2, pp. 286–295, 2013.

[25] U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time al-gorithm to detect community structures in large-scale networks,”Physical review E, vol. 76, no. 3, p. 036106, 2007.

[26] M. J. Barber and J. W. Clark, “Detecting network communities bypropagating labels under constraints,” Physical Review E, vol. 80,no. 2, p. 026129, 2009.

[27] X. Liu and T. Murata, “Advanced modularity-specialized labelpropagation algorithm for detecting communities in networks,”Physica A: Statistical Mechanics and its Applications, vol. 389, no. 7,pp. 1493–1500, 2010.

[28] S. Nagaraja, “The impact of unlinkability on adversarial com-munity detection: effects and countermeasures,” in InternationalSymposium on Privacy Enhancing Technologies Symposium. Springer,2010, pp. 253–272.

[29] M. Waniek, T. P. Michalak, M. J. Wooldridge, and T. Rahwan,“Hiding individuals and communities in a social network,” Na-ture Human Behaviour, vol. 2, no. 2, p. 139, 2018.

[30] V. Fionda and G. Pirro, “Community deception or: How to stopfearing community detection algorithms,” IEEE Transactions onKnowledge and Data Engineering, vol. 30, no. 4, pp. 660–673, 2018.

[31] J. Chen, L. Chen, Y. Chen, M. Zhao, S. Yu, Q. Xuan, and X. Yang,“Ga-based q-attack on community detection,” IEEE Transactionson Computational Social Systems, vol. 6, no. 3, pp. 491–503, 2019.

[32] J. Li, H. Zhang, Z. Han, Y. Rong, H. Cheng, and J. Huang, “Ad-versarial attack on community detection by hiding individuals,”in Proceedings of The Web Conference 2020, 2020, pp. 917–927.

[33] D. Zugner, A. Akbarnejad, and S. Gunnemann, “Adversarialattacks on neural networks for graph data,” in Proceedings of the24th ACM SIGKDD International Conference on Knowledge Discovery& Data Mining. ACM, 2018, pp. 2847–2856.

[34] A. Bojchevski and S. Gunnemann, “Adversarial attacks on nodeembeddings via graph poisoning,” in International Conference onMachine Learning, 2019, pp. 695–704.

[35] J. Chen, Y. Wu, X. Xu, Y. Chen, H. Zheng, and Q. Xuan,“Fast gradient attack on network embedding,” arXiv preprintarXiv:1809.02797, 2018.

[36] S. Yu, M. Zhao, C. Fu, J. Zheng, H. Huang, X. Shu, Q. Xuan, andG. Chen, “Target defense against link-prediction-based attacks viaevolutionary perturbations,” IEEE Transactions on Knowledge andData Engineering, 2019.

[37] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre,“Fast unfolding of communities in large networks,” Journal ofstatistical mechanics: theory and experiment, vol. 2008, no. 10, p.P10008, 2008.

[38] A. Lancichinetti, S. Fortunato, and F. Radicchi, “Benchmarkgraphs for testing community detection algorithms,” Physicalreview E, vol. 78, no. 4, p. 046110, 2008.

[39] J. Leskovec, J. Kleinberg, and C. Faloutsos, “Graph evolution:Densification and shrinking diameters,” ACM Transactions onKnowledge Discovery from Data (TKDD), vol. 1, no. 1, p. 2, 2007.

[40] L. A. Adamic and N. Glance, “The political blogosphere and the2004 us election: divided they blog,” in Proceedings of the 3rdinternational workshop on Link discovery. ACM, 2005, pp. 36–43.

[41] M. E. Newman, “Finding community structure in networks usingthe eigenvectors of matrices,” Physical review E, vol. 74, no. 3, p.036104, 2006.

[42] J. Reichardt and S. Bornholdt, “Statistical mechanics of commu-nity detection,” Physical review E, vol. 74, no. 1, p. 016110, 2006.

[43] X. Liu, H.-M. Cheng, and Z.-Y. Zhang, “Evaluation of communitydetection methods,” IEEE Transactions on Knowledge and DataEngineering, 2019.

[44] A. Ghorbanian and B. Shaqaqi, “A genetic algorithm for modu-larity density optimization in community detection,” InternationalJournal of Economy, Management and Social Sciences, vol. 4, no. 1,pp. 117–122, 2015.

[45] L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,”Journal of machine learning research, vol. 9, no. Nov, pp. 2579–2605,2008.

[46] V. E. Krebs, “Mapping networks of terrorist cells,” Connections,vol. 24, no. 3, pp. 43–52, 2002.

Multiscale Evolutionary Perturbation Attack on Community ...

Documents

Transcript of Multiscale Evolutionary Perturbation Attack on Community ...