FirmTruss Community Search in Multilayer Networks - arXiv
-
Upload
khangminh22 -
Category
Documents
-
view
2 -
download
0
Transcript of FirmTruss Community Search in Multilayer Networks - arXiv
FirmTruss Community Search in Multilayer Networks
Ali Behrouzโ University of British Columbia
Farnoosh Hashemiโ University of British Columbia
Laks V.S. LakshmananUniversity of British Columbia
ABSTRACTIn applications such as biological, social, and transportation net-works, interactions between objects span multiple aspects. Foraccurately modeling such applications, multilayer networks havebeen proposed. Community search allows for personalized com-munity discovery and has a wide range of applications in largereal-world networks. While community search has been widelyexplored for single-layer graphs, the problem for multilayer graphshas just recently attracted attention. Existing community models inmultilayer graphs have several limitations, including disconnectiv-ity, free-rider effect, resolution limits, and inefficiency. To addressthese limitations, we study the problem of community search overlarge multilayer graphs. We first introduce FirmTruss, a novel densestructure in multilayer networks, which extends the notion of trussto multilayer graphs. We show that FirmTrusses possess nice struc-tural and computational properties and bring many advantagescompared to the existing models. Building on this, we present a newcommunity model based on FirmTruss, called FTCS, and show thatfinding an FTCS community is NP-hard. We propose two efficient2-approximation algorithms, and show that no polynomial-timealgorithm can have a better approximation guarantee unless P = NP.We propose an index-basedmethod to further improve the efficiencyof the algorithms. We then consider attributed multilayer networksand propose a new community model based on network homophily.We show that community search in attributed multilayer graphsis NP-hard and present an effective and efficient approximationalgorithm. Experimental studies on real-world graphs with ground-truth communities validate the quality of the solutions we obtainand the efficiency of the proposed algorithms.PVLDB Reference Format:Ali Behrouz, Farnoosh Hashemi, and Laks V.S. Lakshmanan. FirmTrussCommunity Search in Multilayer Networks. PVLDB, 16(1): XXX-XXX, 2023.doi:XX.XX/XXX.XX
1 INTRODUCTIONCommunity detection is a fundamental problem in network science,and has been traditionally addressed with the aim of determiningan organization of a given network into subgraphs that expressdense groups of nodes well connected to each other [25]. Recently, aquery-dependent community discovery problem, called communitysearch (CS) [66], has attracted much attention due to its ability toโ These authors contributed equally.
This work is licensed under the Creative Commons BY-NC-ND 4.0 InternationalLicense. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy ofthis license. For any use beyond those covered by this license, obtain permission byemailing [email protected]. Copyright is held by the owner/author(s). Publication rightslicensed to the VLDB Endowment.Proceedings of the VLDB Endowment, Vol. 16, No. 1 ISSN 2150-8097.doi:XX.XX/XXX.XX
discover personalized communities. It has several applications likesocial contagion modeling [71], content recommendation [10], andteam formation [27]. The CS problem seeks a cohesive subgraphcontaining the query vertices given a graph and a set of query nodes.
Significant research effort has been devoted to the study of com-munity search over large graphs, where the major focus has beenon networks with a single type of connection, also known as single-layer networks. However, in applications featuring complex net-works such as social, biological, and transportation networks, theinteractions between objects tend to span multiple aspects. E.g.,interactions between people can be social or professional, and pro-fessional interactions can in turn differ according to topics. Mul-tilayer (ML) networks [50], where nodes can have interactions inmultiple layers, have been proposed for accurately modeling suchapplications. Recently, ML networks have gained popularity inan array of applications in social and biological networks and inopinion dynamics [13, 58, 61, 65], due to their more informativerepresentation than single-layer graphs.Example 1. Figure 1(a) is an ML network showing a group of re-searchers collaborating in various topics, where each layer representscollaborations in an individual topic.
To find cohesive communities in single-layer graphs, many mod-els have been proposed, e.g.,๐-core [66, 67],๐-truss [43],๐-plex [74],and ๐-clique [14]. However, existing methods for finding cohesivestructures in ML networks are inefficient. As a result, there is a lackof practical density-based community models in ML graphs. Indeed,there have been a number of studies on cohesive structures in MLnetworks [28, 38, 56, 84]. However, existing works suffer from twomain limitations. (1) The decomposition algorithms [28, 38, 56]based on these models have an exponential running time complexityin the number of layers, making them prohibitive for finding co-hesive structures needed for community search. (2) These modelshave a hard constraint that nodes/edges need to satisfy in all layers.It has been noted that ML networks may contain noisy/insignificantlayers [28, 35]. These noisy/insignificant layers may be different foreach node/edge. Therefore, this hard constraint could result in miss-ing some dense structures [35]. Recently, FirmCore structure [35]in ML graphs has been proposed to address these limitations. How-ever, a connected FirmCore can be disconnected by just removingone edge, and it might have an arbitrarily large diameter. Both ofthese properties are undesirable for community models.
In addition to the above drawbacks of cohesive structures in MLnetworks, existing community search methods in ML graphs (e.g.,[29, 44, 59]) suffer from some important limitations. (1) Free-ridereffect [76]: some cohesive structure, irrelevant to the query vertices,could be included in the answer community. (2) Lack of connec-tivity: a community, at a minimum, is expected to be a connectedsubgraph [43, 77], but existing community models in ML graphsare not guaranteed to be connected. Natural attempts to enforce
arX
iv:2
205.
0074
2v1
[cs
.SI]
2 M
ay 2
022
v8
v2 v3 v4
v6v8
v5v1
v7
v2 v3v4
v8v6
v5v1
v7v2
v1
v7
v4v5v1
v2 v3 v4
v6v8
v5v1
v7
v3v4
v8v6
v5v1
v7
v9
v9
v10v11
v12v13
v10
v10
v11
v11
v12
v12
v13
v13
Data Mining
Algorithms
Machine Learning
v9
v6
v2
v2
v3
(a) Multilayer network๐บ
v12 v9 v9 v5 v7 v7 v2
Path schema:v12 v9 v5 v7
v1
v2 v1
Diameter:
(b) Diameter (its path schema) of๐บ
Figure 1: An example of an ML collaboration network.
connectivity in existing ML community models lead to additionalcomplications (see ยง 5.2 for a detailed comparison with previouscommunity models). (3) Resolution Limit [26]: in a large network,communities smaller than a certain size may not be detected. (4)Failure to scale: to be applicable to large networks, a communitymodel must admit scalable algorithms. To the best of our knowledge,all existing models suffer from these main limitations.
To address the above limitations of existing studies, we study theproblem of community search over multilayer networks. First of all,we propose the notion of (๐, _)-FirmTruss, based on the truss struc-ture in simple graphs, as a subgraph (not necessarily induced) inwhich every two adjacent nodes in at least _ individual layers are inat least ๐โ2 common triangles within the subgraph.We show that itinherits the nice properties of trusses in simple graphs, viz., unique-ness, hierarchical structure, bounded diameter, edge-connectivity,and high density. Based on FirmTruss, we formally define our prob-lem of FirmTruss Community Search (FTCS). Specifically, given aset of query nodes, FTCS aims to find a connected subgraph which(1) contains the query nodes; (2) is a FirmTruss; and (3) has theminimum diameter. We formally show that the diameter constraintin FTCS definition avoids the so-called "free-rider effect".
In real-world networks, nodes are often associatedwith attributes.For example, node attributes could represent a summary of theuserโs profile in social networks, or the molecular functions, orcellular components of a protein in protein-protein interactionnetworks. This rich information can help us find communities ofsuperior quality. While there are several studies on single-layerattributed graphs, to the best of our knowledge, the problem ofcommunity search in multilayer attributed networks has not beenstudied. Unfortunately, even existing methods in single-layer at-tributed graphs suffer from significant limitations. Existing modelsrequire users to input query attributes; however, users may not befamiliar with the attribute distribution in the entire network, limit-ing their ability to specify proper query attributes. Moreover, thesestudies only focus on one particular type of attribute (e.g., keyword),while most real-world graphs involve more complex attributes. E.g.,attributes of proteins can be a multidimensional vectors [36]. Therecently proposed VAC model [57] for single-layer graphs does notrequire users to input query attributes. However, it is limited tometric similarity measures. To mitigate these limitations, we ex-tend our FTCS model to attributed multilayer graphs, call it AFTCS,and present a novel community model leveraging the well-known
phenomenon of network homophily. This approach is based onmaximizing the ๐-mean of similarities between users within thecommunity and does not require users to input any query attribute.At the same time, should a user wish to specify query attributes(say for exploration), AFTCS can easily support them. Moreover, itnaturally handles a vector of attributes, handling complex features.
Since multilayer graphs provide more complex and richer infor-mation than single-layer graphs, they can benefit typical applica-tions of single-layer community search [21] (e.g., event organiza-tion, friend recommendation, advertisement in e-commence, etc.),delivering better solutions. On the other hand, below we illustratean exclusive application for multilayer community search.
Brain Networks. Detecting and monitoring functional systems inthe human brain is an important and fundamental task in neuro-science [6, 62]. A brain network is a graph in which nodes repre-sent the brain regions, and edges represent co-activation betweenregions. A brain network generated from an individual subjectcan be noisy and incomplete, however using brain networks frommany subjects helps us identify important structures more accu-rately [52, 59]. A multilayer brain network is a multilayer graph inwhich each layer represents the brain network of a different person.A community search method in multilayer graphs can be used to (1)identify functional systems of each brain region; (2) identify com-mon patterns between peopleโs brains affected by diseases or underthe influence of drugs; (3) select features in order to discriminatediseased patients from healthy individuals.
In summary, we make the following contributions: (1) We intro-duce a novel dense subgraph model for ML graphs, FirmTruss, andshow that it retains the nice structural properties of Trusses (ยง 4).(2)We then formulate the problem of FirmTruss-based CommunitySearch (FTCS) in ML graphs, and show the FTCS problem is NP-hard and cannot be approximated in PTIME within a factor betterthan 2 of the optimal diameter, unless P = NP (ยง 5). (3) We developtwo efficient 2-approximation algorithms (ยง 6), and propose an in-dexing method to further improve efficiency (ยง 7). (4)We extend theFirmTruss-based community to attributed networks and propose anovel homophily-based community model. We propose an exactalgorithm for a special case of the problem and an approximationalgorithm for the general case (ยง 8). (5)We conduct extensive exper-iments on real-world ML graphs with ground-truth communities.The results show that our algorithms can efficiently and effectivelydiscover communities, significantly outperforming baselines (ยง 9).For lack of space, some proofs are sketched. Complete details of allproofs and additional details can be found in Appendix.
2 RELATEDWORK
Community Search.Community search, which aims to find query-dependent communities in a graph, was introduced by Sozio andGionis [67]. Since then, various community models have been pre-sented, which can be categorized based on different dense sub-graphs [21], including ๐-core [66, 67], ๐-truss [2, 41, 43], quasi-clique [14], ๐-plex [74], and densest subgraph [79]. Wu et al. [76]discussed an undesirable phenomenon, called free-rider effect, andpropose query biased density to reduce the free-rider effect for thereturned community. More recently, community search has alsobeen investigated formore complex graphs, such as directed [22, 23],
2
weighted [82], geo-social [34, 83], temporal [55], multi-valued [54],and labeled [18] graphs. Recently, graph neural network-based com-munity search is introduced [30, 46], which needs a time-consumingstep for training. All these models are different from our communi-ties as they focus on a single type of interactions.
Attributed Community Search. Given a set of query nodes, at-tributed community search finds the query-dependent communitiesin which nodes share attributes [19, 42]. Based on query type, mostexisting works on attributed single-layer graphs can be classifiedinto two categories. The first category takes both nodes and at-tributes as query input [12, 20]. The second category takes onlyattributes as query input, and returns the community related to thequery attributes [11, 85]. All these existing studies (1) require usersto specify attributes as query input, and (2) consider only simpleattributes (e.g., keywords), which limits their applications. Most re-cently, Liu et al. [57] introduced VAC in single-layer graphs, whichdoes not require users to input query attributes. However, theyassume that the similarity space of users is metric, which can limitapplications. All these models are limited to single-layer graphs.
Community Search and Detection in Multilayer Networks.Surprisingly, the problem of community search in ML networksis relatively less explored. Interdonato et al. [44], design a greedystrategy to maximize the ratio of similarity between nodes insideand outside of the local community over all layers. Galimberti et al.[29] propose a communitymodel based on theML k-core [5]. Finally,recently, Luo et al. [59] design a random walk strategy to find localcommunities in multi-domain networks. Also, some methods havebeen proposed for community detection inML networks [39, 40, 70].However, they focus on detecting all communities, which are time-consuming and are independent of given query nodes.
Dense Structures inMLGraphs. Jethava et al. [45] formulate thedensest common subgraph problem. Azimi et al. [5] propose a newdefinition of core, k-core, over ML graphs. Galimberti et al. [28] pro-pose algorithms to find all possible k-cores, and define the densestsubgraph problem in ML graphs. Zhu et al. [84] introduce the prob-lem of diversified coherent ๐-core search. Liu et al. [56] propose theCoreCube problem for computing ML ๐-core decomposition on allsubsets of layers. Hashemi et al. [35] propose a new dense structure,FirmCore, and develop a FirmCore-based approximation algorithmfor the problem of ML densest subgraph. Recently, Huang et al. [38]define TrussCube problem in ML graphs, which aims to find a sub-graph in which each edge has support โฅ ๐ โ 2 in all selected layers,which is different from the concept of FirmTruss. ML graphs can berelated to heterogeneous graphs, where communities [24, 68, 78] aredefined based on meta patterns. However, these models emphasizeheterogeneous types of entities connected by different relationships,which is different from the concept of ML graphs.
3 PRELIMINARIESWe let๐บ = (๐ , ๐ธ, ๐ฟ) denote anML graph, where๐ is the set of nodes,๐ฟ is the set of layers, and ๐ธ โ ๐ ร๐ ร๐ฟ is the set of intra-layer edges.In this paper, we follow the common definition of ML networks [50],and consider inter-layer edges between two instances of identicalvertices in successive layers. The set of neighbors of node ๐ฃ โ ๐in layer โ โ ๐ฟ is denoted ๐โ (๐ฃ) and the degree of ๐ฃ in layer โ is
degโ (๐ฃ) = |๐โ (๐ฃ) |. For a set of nodes ๐ป โ ๐ , ๐บ [๐ป ] = (๐ป, ๐ธ [๐ป ], ๐ฟ)represents the subgraph of ๐บ induced by ๐ป , ๐บโ [๐ป ] = (๐ป, ๐ธโ [๐ป ])denotes this subgraph in layer โ , and deg๐ปโ (๐ฃ) denotes the degreeof ๐ฃ in this subgraph. We abuse notation and denote ๐บโ [๐ ] and๐ธโ [๐ ] as ๐บโ and ๐ธโ , respectively. We next define some backgroundnotions used in this paper.
Edge Schema. Connections (i.e., relationships) between objectsin ML networks can have multiple types; by the edge schema of aconnection, we mean the connection ignoring its type.
Definition 1 (Edge Schema). Given anML network๐บ = (๐ , ๐ธ, ๐ฟ)and an intra-layer edge ๐ = (๐ฃ,๐ข, โ) โ ๐ธ, the edge schema of ๐ is thepair ๐ = (๐ฃ,๐ข), which represents the relationship between two nodes,๐ฃ and ๐ข, without considering its type. We denote by E the set of alledge schemas in ๐บ , E = {(๐ฃ,๐ข) |โโ โ ๐ฟ : (๐ฃ,๐ข, โ) โ ๐ธ}.
Given an edge schema ๐ = (๐ฃ,๐ข), we abuse the notation anduse ๐โ to refer to the relationship between ๐ฃ and ๐ข in layer โ , i.e.,๐โ = (๐ฃ,๐ข, โ), whenever (๐ฃ,๐ข, โ) โ ๐ธ.Distance in ML Networks. For consistency, we use the commondefinition of ML distance [3] in the literature. However, our algo-rithms are valid for any definition of distance that is metric.
Definition 2 (Path inMultilayerNetworks). Let๐บ = (๐ , ๐ธ, ๐ฟ)be an ML graph and ๐ฃโ represent a node ๐ฃ in layer โ โ ๐ฟ. A path in๐บ is a sequence of nodes P : ๐ฃ1
โ1โ ๐ฃ2
โ2โ ยท ยท ยท โ ๐ฃ๐
โ๐such that every
consecutive pair of nodes is connected by an intra-layer or inter-layeredge, i.e., ๐ฃ๐ = ๐ฃ๐+1 or [โ๐ = โ๐+1 & (๐ฃ๐ , ๐ฃ๐+1, โ๐ ) โ ๐ธ]. The path schema๐ of P is obtained by removing inter-layer edges from path P.
Wedefine the distance of two nodes ๐ฃ and๐ข, denoted by๐๐๐ ๐ก (๐ฃ,๐ข),as the length of the shortest path between them. The diameter ofa subgraph ๐บ [๐ป ], ๐๐๐๐(๐บ [๐ป ]), is the maximum distance betweenany pair of nodes within the ๐บ [๐ป ].1
Example 2. In Figure 1(a), the diameter of ML graph ๐บ is 7, whichcorresponds to the path (path schema) shown in Figure 1(b).
Density in ML Networks. In this study, we use a common defi-nition of density in multilayer graphs proposed in [29].
Definition 3 (Density). [29] Given an ML graph ๐บ = (๐ , ๐ธ, ๐ฟ),a non-negative real number ๐ฝ , the density function is a real-valuedfunction ๐๐ฝ : 2๐ โ R+, defined as:
๐๐ฝ (๐) = max๏ฟฝฬ๏ฟฝโ๐ฟ
minโโ๏ฟฝฬ๏ฟฝ
|๐ธโ [๐] ||๐ | |๏ฟฝฬ๏ฟฝ |
๐ฝ .
Free-Rider Effect. Prior work has identified an undesirable phe-nomenon known as the "free-rider effect" [76]. Intuitively, if a com-munity definition admits irrelevant subgraphs in the discoveredcommunity, we refer to the irrelevant subgraphs as free riders. Typ-ically, a community definition is based on a goodness metric ๐ (๐)for a subgraph ๐ : subgraphs with the highest (lowest) ๐ (.) valueare identified as communities.
Definition 4 (Free-Rider Effect). Given an ML graph ๐บ =
(๐ , ๐ธ, ๐ฟ), a non-empty set of query vertices ๐ , let ๐ป be a solution to1For convenience, we refer to both the longest shortest path distance as well as anypath with that length as diameter.
3
a community definition that maximizes (resp. minimizes) goodnessmetric ๐ (.), and ๐ปโ be a (global or local) optimum solution when ourquery set is empty. If ๐ (๐ปโ โช๐ป ) โฅ ๐ (๐ป ) (resp. ๐ (๐ปโ โช๐ป ) โค ๐ (๐ป )),we say that the community definition suffers from free rider effect.
GeneralizedMeans.Given a finite set of positive real numbers ๐ =
{๐1, ๐2, . . . , ๐๐}, and a parameter ๐ โ Rโช{โโ, +โ}, the generalizedmean (๐-mean) of ๐ is defined as
๐๐ (๐) =ยฉยญยซ 1|๐ |
|๐ |โ๏ธ๐=1(๐๐ )๐
ยชยฎยฌ1/๐
.
For ๐ โ {โโ, 0, +โ}, the mean can be defined by taking limits, sothat๐+โ (๐) = max๐๐ ,๐0 (๐)= (
โ |๐ |๐=1 ๐๐ )
1/ |๐ | , and๐โโ (๐)=min๐๐ .
4 FIRMTRUSS STRUCTUREIn this section, we first recall the notion of ๐-truss in single-layernetworks and then present FirmTruss structure in ML networks.Definition 5 (Support). Given a single-layer graph ๐บ = (๐ , ๐ธ),the support of an edge ๐ = (๐ข, ๐ฃ) โ ๐ธ, denoted ๐ ๐ข๐ (๐,๐บ), is definedas |{โณ๐ข,๐ฃ,๐ค : ๐ข, ๐ฃ,๐ค โ ๐ }|, where โณ๐ข,๐ฃ,๐ค , called triangle of ๐ข, ๐ฃ, and๐ค , is a cycle of length three containing nodes ๐ข, ๐ฃ, and๐ค .
The ๐-truss of a single-layer graph ๐บ is the maximal subgraph๐ป โ ๐บ , such thatโ๐ โ ๐ป , ๐ ๐ข๐ (๐, ๐ป ) โฅ (๐โ2). Since each layer of anML network can be counted as a single-layer network, one possibleextension of truss structure is to consider different truss numbersfor each layer, separately. However, this approach forces all edgesto satisfy a constraint in all layers, including noisy/insignificantlayers. This hard constraint would result in missing some densestructures [35]. Next, we suggest FirmTruss, a new family of cohe-sive structures based on the ๐-truss of single-layer networks.Definition 6 (FirmTruss). Given an ML graph ๐บ = (๐ , ๐ธ, ๐ฟ),its edge schema set E, an integer threshold 1 โค _ โค |๐ฟ |, and aninteger ๐ โฅ 2, the (๐, _)-FirmTruss of ๐บ ((๐, _)-FT for short) is amaximal subgraph ๐บ [๐ฝ_
๐] = (๐ฝ_
๐, ๐ธ [๐ฝ_
๐], ๐ฟ) such that for each edge
schema ๐ โ E[๐ฝ_๐] there are at least _ layers {โ1, ..., โ_} โ ๐ฟ such
that ๐โ๐ โ ๐ธ [๐ฝ_๐ ] and sup(๐โ๐ ,๐บโ๐ [๐ฝ_๐ ]) โฅ (๐ โ 2).Example 3. In Figure 1(a), let ๐ = 4, _ = 2, the union of blue andpurple nodes is a (4, 2)-FirmTruss, as every two adjacent nodes in atleast 2 layers are in at least 2 common triangles within the subgraph.
For each edge schema ๐ = (๐ข, ๐ฃ) โ E, we consider an |๐ฟ |-dimensional support vector, denoted S๐ , in which ๐-th element,S๐๐ , denotes the support of the corresponding edge of ๐ in ๐-thlayer. We define the Top-_ support of ๐ as the _-th largest valuein ๐ โs support vector. Next, we show that not only is the maximal(๐, _)-FirmTruss unique, it also has the nested property.Property 1 (Uniqeness). The (๐, _)-FirmTruss of ๐บ is unique.
Property 2 (Hierarchical Structure). Given a threshold _ โ N+,and an integer ๐ โฅ 0, the (๐ + 1, _)-FT and (๐, _ + 1)-FT of ๐บ aresubgraphs of its (๐, _)-FT.Property 3 (Minimum Degree). Let ๐บ = (๐ , ๐ธ, ๐ฟ) be an MLgraph, and ๐ป = ๐บ [๐ฝ_
๐] be its (๐, _)-FT. Then โ node ๐ข โ ๐ฝ_
๐, there are
at least _ layers {โ1, ..., โ_} โ ๐ฟ such that deg๐ปโ๐ (๐ข) โฅ ๐ โ1, 1 โค ๐ โค _.
In ML networks, the degree of a node ๐ฃ is an |๐ฟ |-dimensionalvector whose ๐-th element is the degree of node ๐ฃ in ๐-th layer. LetTop-_ degree of ๐ฃ be the _-th largest value in the degree vector of ๐ฃ .By Property 3, each node in a (๐, _)-FirmTruss has a Top-_ degreeof at least ๐ โ 1. That means, each (๐, _)-FirmTruss is a (๐ โ 1, _)-FirmCore [35]. Like trusses, FirmTrusses may be disconnected, andwe refer to its connected components as connected FirmTrusses.
Trusses are known to be dense, cohesive, and stable structures.These important characteristics of trusses make them popular formodeling communities [43]. Next, we discuss the density, closeness,and edge connectivity of the FirmTrusses. Detailed proofs of theresults and tightness examples can be found in Appendix A.2.
Theorem 1 (Density Lower Bound). Given an ML graph ๐บ =
(๐ , ๐ธ, ๐ฟ), the density of a (๐, _)-FirmTruss, ๐บ [๐ฝ_๐] โ ๐บ , satisfies:
๐๐ฝ (๐ฝ_๐ ) โฅ(๐ โ 1)2|๐ฟ | max
b โZ,0โคb<_(_ โ b) (b + 1)๐ฝ .
Example 4. Let ๐ฝ = 0, in Figure 1(a), the entire graph is a (4,1)-FirmTruss with the density of 25
13 ร 1 โ 1.92. Theorem 1 provides alower bound of 3
2ร3 ร 1 = 0.5 on their density.
Theorem 2 (Diameter Upper Bound). Given an ML graph ๐บ =
(๐ , ๐ธ, ๐ฟ), the diameter of a connected (๐, _)-FirmTruss, ๐บ [๐ฝ_๐] โ ๐บ ,
is no more than ๐ ร โ 2 | ๐ฝ_๐|โ2
๐โ, where ๐ = 1 + 1
โ |๐ฟ ||๐ฟ |โ_ โ.
Proof Sketch. We show that if P is the diameter of the (๐, _)-FT, and ๐ก
๐ก+1 |๐ฟ | > _ โฅ ๐กโ1๐ก |๐ฟ |, then its path schema,๐, has a length
at least ๐ก๐ก+1 ร|P|. Then we consider every ๐ก consecutive edges in the
diameter as a block and construct a path, with the same path schemaas P such that edges in each block are in the same layer. Next, weuse edge schema supports to bound its length in each block. โก
Example 5. In Figure 1(a), the union of blue and purple nodes is aconnected (4, 2)-FirmTruss with diameter 2. Theorem 2 provides theupper bound of โ 43 ร โ
2ร6โ24 โโ = โ 83 โ = 2 for its diameter.
Theorem 3 (Edge Connectivity). For anML graph๐บ = (๐ , ๐ธ, ๐ฟ),any connected (๐, _)-FirmTruss๐บ [๐ฝ_
๐] โ ๐บ remains connected when-
ever fewer than _ ร (๐ โ 1) intra-layer edges are removed.
5 FIRMTRUSS-BASED COMMUNITY SEARCHIn this section, we propose a community model based on FirmTrussstructure in ML networks. Generally, a community in a network isidentified as a set of nodes that are densely connected. Accordingly,we use the notion of FirmTruss for modeling a densely connectedcommunity in ML graphs, which inherits several desirable struc-tural properties in communities, such as high density (Theorem 1),bounded diameter (Theorem 2), edge connectivity (Theorem 3), andhierarchical structure (Property 2).
Problem 1 (FirmTruss Community Search). Given an ML net-work ๐บ = (๐ , ๐ธ, ๐ฟ), two integers ๐ โฅ 2 and _ โฅ 1, and a set of queryvertices ๐ โ ๐ , the FirmTruss community search (FTCS) is to find aconnected subgraph ๐บ [๐ป ] โ ๐บ satisfying:
(1) ๐ โ ๐ป ,(2) ๐บ [๐ป ] is a connected (๐, _)-FirmTruss,(3) diameter of๐บ [๐ป ] should be theminimum among all subgraphs
satisfying conditions (1) and (2).4
Here, Condition (1) requires that the community contains thequery vertex set ๐ , Condition (2) makes sure that the communityis densely connected through a sufficient number of layers, andCondition (3) requires that each vertex in the community be as closeto other vertices as possible, which excludes irrelevant verticesfrom the community. Together, all three conditions ensure that thereturned community is a cohesive subgraph with good quality.
Example 6. In the graph shown in Figure 1, let ๐ฃ1 be the query node,๐ = 4, and _ = 2. The union of purple and blue nodes is a (4, 2)-FirmTruss, with diameter 2. The FTCS community removes purplenodes to reduce the diameter. Let ๐ฃ6 be the query node, ๐ = 4, and_ = 1, the entire graph is a (4, 1)-FirmTruss, with diameter 7. TheFTCS community removes blue and green nodes to reduce the diameter.
Why FirmTruss Structure? Triangles are fundamental buildingblocks of networks, which show a strong and stable relationshipamong nodes [75]. In ML graphs, every two nodes can have differ-ent types of relations, and a connection can be counted as strongand stable if it is a part of a triangle in each type of interaction.However, forcing all edges to be a part of a triangle in every interac-tion type is too strong a constraint. Indeed, TrussCube [38], whichis a subgraph in which each edge has support ๐ โ 2 in all selectedlayers, is based on this strong constraint. In Figure 1, the greennodes are densely connected. However, while this subgraph is a(4, 1)-FirmTruss, due to the hard constraint of TrussCube, greennodes are a 2-TrussCube, meaning that this model misses it. That is,even if the green subgraph were to be far less dense and have no tri-angles in it, it would still be regarded as 2-TrussCube. Furthermore,in some large networks, there is no non-trivial TrussCube whenthe number of selected layers is more than 3 [38]. In addition tothese limitations, the exponential-time complexity of its algorithmsmakes it impractical for large ML graphs. By contrast, FirmTrusseshave a polynomial-time algorithm, with guaranteed high density,bounded diameter, and edge connectivity.
5.1 Problem AnalysisNext we analyze the hardness of the FTCS problem and show notonly that it is NP-hard, but it cannot be approximated within afactor better than 2. Thereto, we define the decision version of theFTCS, ๐-FTCS, to test whether ๐บ contains a connected FirmTrusscommunity with diameter at most ๐ , that contains ๐ .
Theorem 4 (๐-FTCS Hardness). The ๐-FTCS problem is NP-hard.
Given the NP-hardness of the FTCS problem, we next analyzeits approximability.
Hardness of Approximation. Given ๐ผ โฅ 1 and the optimalsolution to FTCS problem, ๐บ [๐ปโ], an algorithm achieves an ๐ผ-approximation to FTCS problem if it outputs a connected (๐, _)-FirmTruss,๐ป , such that๐ โ ๐ป and๐๐๐๐(๐บ [๐ป ]) โค ๐ผร๐๐๐๐(๐บ [๐ปโ]).Theorem 5 (FTCS Non-Approximability). For any ๐ > 0, theFTCS-problem cannot be approximated in polynomial-time within afactor (2 โ ๐) of the optimal solution, unless ๐ = ๐๐ .
In ยง 6, we provide a 2-approximation algorithm for FTCS, thusessentially matching this lower bound.
Free-rider Effect. In the following, we show that FTCS-problemavoids the problem of the free-rider effect.
Theorem 6 (FTCS Free-Rider Effect). For any multilayer net-work ๐บ = (๐ , ๐ธ, ๐ฟ) and query vertices ๐ โ ๐ , there is a solution๐บ [๐ป ] to the FTCS problem such that for all query-independent opti-mal solutions ๐บ [๐ปโ], either ๐ปโ = ๐ป , or ๐บ [๐ป โช ๐ปโ] is disconnected,or ๐บ [๐ป โช ๐ปโ] has a strictly larger diameter than ๐บ [๐ป ].
5.2 Comparison of CS Models in ML NetworksIn this section, we compare FirmTruss with the existing communitysearch models in ML networks and highlight its advantages.
Cohesiveness. Commonly in the literature, communities are iden-tified as a subgraph for which the nodes are cohesive and denselyconnected. Accordingly, cohesiveness, i.e., high density, is an im-portant metric to measure the quality of found communities. Itis shown that FirmCore can find subgraphs with higher densitythan the ML k-core [35]. Since each (๐, _)-FirmTruss is a (๐ โ 1, _)-FirmCore (Property 3), FirmTruss structure is more cohesive thanML k-core. ML-LCD model [44] maximizes the similarity of nodeswithin the subgrpah. RWM [59] is a random walk-based approachand minimizes the conductance. Both of these models do not con-trol the minimum degree or density of the subgraph. Accordingly,one node may have degree 1 within the subgraph, which means itmight find non-cohesive structures.
Connectivity. A minimal requirement for a community is to be aconnected subgraph. Surprisingly, ML k-core, ML-LCD, and RWM(with multiple query nodes) community search models do not guar-antee connectivity! Natural attempts to enforce connectivity inthese communitymodels lead to additional complications andmightchange the hardness of the problem. Even after enforcing connec-tivity, these models can be disconnected by just removing one intra-layer edge, which is undesirable for community models [37]. OurFirmTruss community model forces the subgraph to be connected,and guarantees that after removing up to _ ร (๐ โ 1) intra-layeredges, the (๐, _)-FirmTruss is still connected (Theorem 3).
EdgeRedundancy. InML networks, the rich information about nodeconnections leads to repetitions, meaning edges between the samepair of nodes repeatedly appear in multiple layers. Nodes withrepeated connections are more likely to belong to the same commu-nity [80]. Also, without such redundancy of connections, the tightconnection between objects inML networksmay not be representedeffectively and accurately. While none of the models ML k-core, ML-LCD, and RWM guarantees edge redundancy, in a (๐, _)-FirmTruss,each edge is required to appear in at least _ layers.
Hierarchical Structure. The hierarchical structure is a desirableproperty for community searchmodels as it represents a communityat different levels of granularity, and can also avoid the ResolutionLimit problem as is discussed in [26]. While FirmTruss has a hier-archical structure, none of the existing models has this property.
6 FTC ONLINE SEARCHGiven the hardness of the FTCS problem, we propose two online2-approximation algorithms in top-down and bottom-up manner.
6.1 Global SearchWe start by defining query distance in multilayer networks.
5
Algorithm 1: FTCS Global SearchInput :An ML graph๐บ = (๐ , ๐ธ, ๐ฟ) , a set of query vertices๐ โ ๐ ,
and two integers ๐ โฅ 2 and _ โฅ 1Output :A connected (๐, _)-FT containing๐ with a small diameter
1 ๐บ0 โ Find a maximal connected (๐, _)-FirmTruss containing๐ ;2 ๐ โ 0; ๐min โ 1; ๐max โ dist๐บ0 (๐บ0,๐) ; G โ ๐บ0;3 while ๐min < ๐max do4 ๐๐๐ฃ๐ โ โ๐min+๐max
2 โ;๐บโฒ โ G5 ๐ โ set of vertices with ๐๐๐ฃ๐ โค dist๐บโฒ (๐ข,๐) ;6 Delete nodes in ๐ and their incident edges from๐บโฒ in all layers;7 Maintain๐บโฒ as (๐, _)-FirmTruss by removing vertices/edges;8 if ๐ โ ๐บโฒ or๐บโฒ is disconnected or ๐max < dist๐บโฒ (๐บโฒ,๐) then9 ๐min โ 1 + ๐๐๐ฃ๐ ;
10 else11 ๐max โ dist๐บโฒ (๐บโฒ,๐) ;12 Let the remaining graph๐บโฒ as G;
13 return G;
Definition 7 (Query Distance). Given an multilayer network๐บ = (๐ , ๐ธ, ๐ฟ), a subgraph ๐บ [๐ป ] โ ๐บ , a set of query vertices ๐ โ ๐ป ,and a vertex set ๐ โ ๐ป , the query distance of ๐ in๐บ [๐ป ],๐๐๐ ๐ก๐บ [๐ป ] (๐,๐),is defined as the maximum length of the shortest path from ๐ข โ ๐ toa query vertex ๐ โ ๐ , i.e., ๐๐๐ ๐ก๐บ [๐ป ] (๐,๐) = max๐ขโ๐,๐โ๐ ๐๐๐ ๐ก (๐ข, ๐).
For a graph๐บ , we use๐๐๐ ๐ก๐บ (๐ข,๐) to denote the query distance fora vertex๐ข โ ๐ . Previousworks (e.g., see [18, 43]) use a simple greedyalgorithm which iteratively removes the nodes with maximumdistance to query nodes, in order to minimize the query distance. Aweakness of this approach is that in the worst case, it reduces thequery distance by just 1 in each iteration, which is inefficient. Weinstead employ a binary search on the query distance of a subgraph.
We present the details of the FTCS Global algorithm in Algo-rithm 1. It first finds a maximal connected (๐, _)-FirmTruss ๐บ0containing ๐ . We keep our best found subgraph in G, through thealgorithm. Then in each iteration, we make a copy of G,๐บ โฒ, and foreach vertex ๐ข โ ๐ [๐บ โฒ], we compute the query distance of ๐ข. Then,we conduct a binary search on the value of ๐๐๐ฃ๐ and delete verticeswith query distance at least ๐๐๐ฃ๐ and all their incident edges, in alllayers. Then from the resulting graph we remove edges/verticesto maintain ๐บ โฒ as a (๐, _)-FirmTruss (lines 6 and 7). We maintainthe (๐, _)-FirmTruss by deleting the edge schemas whose Top-_support is smaller than ๐ โ 2. Finally, the algorithm terminates andreturns a subgraph G, with the smallest query distance.
The procedure for finding the maximal FirmTruss containing ๐is given in Algorithm 2. Notice, a (๐, _)-FirmTruss (see Def. 6) is amaximal subgraph ๐บ [๐ฝ_
๐] in which each edge schema ๐ โ E[๐ฝ_
๐]
has Top-_ support โฅ ๐ โ 2. The algorithm first uses Property 3, andremoves all vertices with Top-_ degree < ๐ โ 1. It then iterativelydeletes all instances of disqualified edge schemas in all layers fromthe original graph๐บ , and then updates the support of their adjacentedges. To efficiently update the Top-_ support of their adjacentedges, we use the following fact:
Fact 1. If two edge schemas๐ and๐ โฒ are adjacent in layer โ , removingedge schema ๐ cannot affect Topโ_(S๐โฒ), unless Topโ_(S๐โฒ) = Sโ
๐โฒ .
Thus, in lines 12-20, we update the Top-_ support of those edgeschemas whose Top-_ support may be affected by removing ๐ .
Algorithm 2:Maximal (๐, _)-FirmTruss containing ๐Input :An ML graph๐บ = (๐ , ๐ธ, ๐ฟ) , a set of query nodes๐ โ ๐ ,
and integers ๐ โฅ 2 and _ โฅ 1Output :A maximal connected (๐, _)-FirmTruss containing๐
1 ๐บโฒ โ Remove all vertices with Top-_ degree less than ๐ โ 1;2 Compute Sโ๐ = ๐ ๐ข๐ (๐โ ,๐บ
โฒโ ) for each edge schema ๐ โ E and โ โ ๐ฟ;
3 ๐, ๐ต โ โ ;4 forall ๐ โ E [๐บโฒ] do5 ๐ผ [๐ ] โ Top-_ (S๐ ) + 2;6 if ๐ผ [๐ ] < ๐ then7 ๐ โ ๐ โช {๐ };
8 while ๐ โ โ do9 Pick and remove ๐ = (๐ฃ,๐ข) from ๐ ;
10 forall (๐ฃ, ๐ค, โ) โ ๐ธ [๐บโฒ] and ๐ผ [ (๐ฃ, ๐ค) ] โฅ ๐ and ๐โ โ ๐ธ do11 if (๐ข, ๐ค, โ) โ ๐ธ [๐บโฒ] and ๐ผ [ (๐ข, ๐ค) ] โฅ ๐ then12 if Sโ(๐ฃ,๐ค) + 2 = ๐ผ [ (๐ฃ, ๐ค) ] then13 ๐ต โ ๐ต โช {(๐ฃ, ๐ค) };14 if Sโ(๐ข,๐ค) + 2 = ๐ผ [ (๐ข, ๐ค) ] then15 ๐ต โ ๐ต โช {(๐ข, ๐ค) };16 Sโ(๐ฃ,๐ค) โ Sโ(๐ฃ,๐ค) โ 1; S
โ(๐ข,๐ค) โ Sโ(๐ข,๐ค) โ 1;
17 forall ๐โฒ = (๐ค, ๐ก ) โ ๐ต do18 Update ๐ผ [๐โฒ];19 if ๐ผ [๐โฒ] < ๐ then20 ๐ โ ๐ โช {๐โฒ };
21 Remove all instance of ๐ from๐บโฒ in all layers;
22 ๐ป โ The connected component of๐บโฒ containing๐ ;23 return ๐ป ;
Finally, we use BFS traversal from a query node ๐ โ ๐ to find theconnected component including query vertices. We omit the detailsof FirmTruss maintenance since it can use operations similar tothose in lines 8-21 of Algorithm 2.
Example 7. In Figure 1, let ๐ = 4, _ = 1, and ๐ฃ1 be the querynode. Algorithm 1 starts from the entire graph as ๐บ0. Since the querydistance is 7, it sets ๐๐๐ฃ๐ = 7+1
2 = 4, removes all vertices with querydistance at least 4, and maintains the remaining graph as (4, 1)-FirmTruss. The remaining graph includes blue, purple, and red nodes.Next, it sets๐๐๐ฃ๐ = โ 3+12 โ = 2, removes all vertices with query distanceat least 2, and maintains the remaining graph as (4, 1)-FirmTruss.The remaining graph includes blue nodes and Algorithm 1 terminatesand returns this subgraph as the solution.
Next, we analyze the approximation quality and complexity ofthe FTCS Global algorithm.
Theorem 7 (FTCS-GlobalQualityApproximation). Algorithm 1achieves 2-approximation to an optimal solution ๐บ [๐ปโ] of the FTCSproblem, that is, the obtained (๐, _)-FirmTruss, ๐บ [๐ป ] satisfies
๐๐๐๐(๐บ [๐ป ]) โค 2 ร ๐๐๐๐(๐บ [๐ปโ]).
Lemma 1. Algorithm 2 takes O(โโโ๐ฟ |๐ธโ |1.5 + |๐ธ | |๐ฟ | + |๐ธ |_ log |๐ฟ |)time, and O(|๐ธ | |๐ฟ |) space.
Theorem 8 (FTCS-Global Complexity). FTCS-Global algorithmtakes O(๐พ ( |๐ | |๐ธ [๐บ0] |+
โโโ๐ฟ |๐ธโ |1.5)+ |๐ธ | |๐ฟ | + |๐ธ |_ log |๐ฟ |) time, and
O(|๐ธ | |๐ฟ |) space, where ๐พ = log(๐๐๐ ๐ก๐บ0 (๐บ0, ๐)
).
6
Algorithm 3: FTCS Local SearchInput :An ML graph๐บ = (๐ , ๐ธ, ๐ฟ) , a set of query vertices๐ โ ๐ ,
and two integers ๐ โฅ 2 and _ โฅ 1Output :A connected (๐, _)-FT containing๐ with a small diameter
1 ๐min โ 1; ๐mid โ 1;๐บout โ โ ; ๐max โโ;๐ โฒ = โ ;2 while ๐min < ๐max and๐ โฒ โ ๐ do3 ๐ โฒ โ ๐ โช {๐ข โ ๐ |dist๐บ (๐ข,๐) โค ๐mid };4 ๐บโฒ โ Induced subgraph of๐บ by vertices๐ โฒ;5 ๐บโฒ โ Find maximal (๐, _)-FirmTruss of๐บโฒ containing๐ ;6 while๐บโฒ โ โ do7 ๐ โ โ ;8 for ๐ข โ ๐ [๐บโฒ] do9 if dist๐บโฒ (๐ข,๐) > ๐mid then10 ๐ โ ๐ โช {๐ข };
11 if ๐ = โ then12 ๐max โ ๐mid; ๐mid โ โ๐min+๐max
2 โ;13 ๐บout โ ๐บโฒ;14 Break; //Break in the inner while loop
15 else16 Delete๐ and their incidents edges in all layers from๐บโฒ;17 Maintain๐บโฒ as (๐, _)-FirmTruss;
18 if ๐บโฒ = โ then19 ๐min โ ๐mid + 1; ๐mid โ 2 ร ๐mid;
20 return๐บout;
6.2 Local SearchThe Global algorithm finds the FirmTruss in a top-down manner.In massive networks, this top-down approach might cause manyunnecessary computations. In this section, we present another algo-rithm, called FTCS Local (Algorithm 3), which works in a bottom-upmanner to address this limitation.
One potential approach is first to collect all vertices whose querydistances are no greater than๐ into๐ โฒ (line 3) and then construct๐บ โฒas the induced subgraph of๐บ by๐ โฒ (line 4). Next, given ๐ , examinewhether๐บ โฒ contains a (๐, _)-FirmTruss whose query distance is๐ . Ifsuch a FirmTruss exists, it is returned as the solution, and otherwise,the algorithm increases ๐ by 1 for a new iteration. However, onedrawback of this approach is that it increases the query distanceonly by 1 in each iteration, which is not efficient. Accordingly, herewe conduct a binary search on the value of ๐ . One challenge is thelack of upper bound on the value of ๐ . A trivial upper bound, whichis the query distance in the entire graph, might lead to consideringalmost the entire graph in the first iteration, which defeats thepurpose of the bottom-up method, i.e., increased efficiency. Weinstead use a doubling searchwherebywe double the query distance๐ in every iteration until a solution is found. Then by consideringthe query distance of this solution as an upper bound of ๐ , weconduct a binary search. Algorithm 3 shows the details.Theorem 9 (FTCS-LocalQuality Approximation). Algorithm 3achieves 2-approximation to an optimal solution ๐บ [๐ปโ] of the FTCSproblem, that is, the obtained (๐, _)-FirmTruss, ๐บ [๐ป ] satisfies
๐๐๐๐(๐บ [๐ป ]) โค 2 ร ๐๐๐๐(๐บ [๐ปโ]).Proof Sketch. We first prove that the binary search method
finds a solution with a smaller query distance than the optimalsolutionโs. Next, based on the triangle inequality, we show that the
Algorithm 4: FirmTruss DecompositionInput :An ML graph๐บ = (๐ , ๐ธ, ๐ฟ)Output :Skyline FirmTruss index of each edge schema
1 Compute Sโ๐ = ๐ ๐ข๐ (๐โ ,๐บโ ) for each edge schema ๐ โ E in eachlayer โ โ ๐ฟ;
2 forall _ = 1, 2, . . . , |๐ฟ | do3 reinitialize supports, Sโ๐ ;4 forall ๐ โ E do5 ๐ผ [๐ ] โ Top-_ (S๐ ) + 2;6 ๐ต [๐ผ [๐ ] ] โ ๐ต [๐ผ [๐ ] ] โช {๐ };7 forall ๐ = 2, 3, . . . , |๐ | do8 while ๐ต [๐ ] โ โ do9 Pick and remove ๐ = (๐ฃ,๐ข) from ๐ต [๐ ];
10 SFT(๐) โ SFT(๐) โช (๐, _) , ๐ โ โ ;11 forall (๐ฃ, ๐ค, โ) โ ๐ธ and ๐ผ [ (๐ฃ, ๐ค) ] > ๐ and ๐โ โ ๐ธ do12 if (๐ข, ๐ค, โ) โ ๐ธ and ๐ผ [ (๐ข, ๐ค) ] > ๐ then13 if Sโ(๐ฃ,๐ค) + 2 = ๐ผ [ (๐ฃ, ๐ค) ] then14 ๐ โ ๐ โช {(๐ฃ, ๐ค) };15 if Sโ(๐ข,๐ค) + 2 = ๐ผ [ (๐ข, ๐ค) ] then16 ๐ โ ๐ โช {(๐ข, ๐ค) };17 Sโ(๐ฃ,๐ค) โ Sโ(๐ฃ,๐ค) โ 1; S
โ(๐ข,๐ค) โ Sโ(๐ข,๐ค) โ 1;
18 forall ๐โฒ = (๐ค, ๐ก ) โ ๐ do19 Remove ๐โฒ from ๐ต [๐ผ [๐โฒ] ];20 Update ๐ผ [๐โฒ];21 ๐ต [๐ผ [๐โฒ] ] โ ๐ต [๐ผ [๐โฒ] ] โช {๐โฒ };22 Remove all instance of ๐ from๐บ in all layers;
23 Remove all dominated indices in SFT(๐) for each ๐ โ E;
diameter of the found solution is at most two times of the optimal.The detailed proof can be found in Appendix A.2. โก
Theorem 10 (FTCS-Local Complexity). FTCS-Local algorithmtakes O(๐พ ( |๐ | |๐ธ | + โ
โโ๐ฟ |๐ธโ |1.5) + |๐ธ | |๐ฟ | + |๐ธ |_ log |๐ฟ |) time, andO(|๐ธ | |๐ฟ |) space, where ๐พ = log
(๐๐๐ ๐ก๐บ0 (๐บ0, ๐)
).
7 INDEX-BASED ALGORITHMBoth online algorithms need to find FirmTruss from scratch. How-ever, for each query set, computing the maximal FirmTruss fromscratch can be inefficient for large multilayer networks. In thissection, we discuss how to employ FirmTruss decomposition to ac-celerate our algorithms, by storing maximal FirmTrusses as they areidentified into an index structure. We first present our FirmTrussdecomposition algorithm and then describe how the index can beused for efficient retrieval of the maximal FirmTruss given a query.
7.1 FirmTruss DecompositionIn this section, we define the Skyline FirmTrussness index. Foran edge schema ๐ โ E, we let ๐น๐ ๐ผ (๐) denote the set {(๐, _) |๐ is in a (๐, _)-FirmTruss}. We will use the following notion of in-dex dominance.
Definition 8 (Index Dominance). Given two pairs of numbers(๐1, _1) and (๐2, _2), we say (๐1, _1) dominates (๐2, _2), denoted(๐2, _2) โชฏ (๐1, _1), provided ๐1 โฅ ๐2 and _1 โฅ _2.
Clearly, (๐น๐ ๐ผ (๐), โชฏ) is a partial order.7
Algorithm 5: Index-based Maximal FirmTruss FindingInput :An ML graph๐บ = (๐ , ๐ธ, ๐ฟ) , a set of query vertices๐ โ ๐ ,
and two integers ๐ โฅ 2 and _ โฅ 1Output :A maximal connected (๐, _)-FirmTruss containing๐
1 ๐บ0 โ โ ; ๐ โ ๐ ;2 while ๐ โ โ do3 Pick and remove ๐ข โ ๐ ;4 for each unvisited edge schema ๐ = (๐ข, ๐ฃ) do5 Mark ๐ as visited;6 for each skyline FirmTruss index (๐๐ , _๐ ) โ SFT(๐) do7 if (๐, _) โชฏ (๐๐ , _๐ ) then8 add ๐ฃ and ๐ข with all their incident edges into๐บ0;9 ๐ โ ๐ โช {๐ฃ };
10 return๐บ0
Definition 9 (Skyline FirmTrussness). Let ๐ โ E be an edgeschema. The skyline FirmTrussness of ๐ , denoted SFT(๐), containsthe maximal elements of ๐น๐ ๐ผ (๐).
In order to find all possible FirmTrusses, we only need to computethe skyline FirmTrussness for every edge schema in a multilayergraph๐บ . To this end, we present the details of FirmTruss algorithmin Algorithm 4. For a given edge schema ๐ , if Topโ_(S๐ ) = ๐ โ 2,then it cannot be a part of a (๐ โฒ, _)-FirmTruss, for ๐ โฒ > ๐ . Therefore,given _, we can consider Topโ_(S๐ ) + 2 as an upper bound on theFirmTruss index of ๐ (line 5). In the FirmTruss decomposition, werecursively pick an edge schema ๐ with the lowest Topโ_(S๐ ),assign its FirmTruss index as Topโ_(S๐ ) + 2, and then remove itfrom the graph. After that, to efficiently update the Top-_ supportof its adjacent edges, we use Fact 1 (lines 13-16). At the end of thealgorithm, we remove all dominated indices in SFT(๐) for each๐ โ E to only store skyline indices (line 23). We can show:
Theorem 11 (FirmTruss Decomposition Complexity). Algo-rithm 4 takes O(โโโ๐ฟ |๐ธโ |1.5 + |๐ธ | |๐ฟ |2) time.
7.2 Index-based Maximal FirmTruss FindingUsing the FirmTruss decomposition, we can find all skyline FirmTrussindices for a given edge schema and query vertex set. Algorithm 5shows the procedure. The algorithm starts from the query verticesand by using a breath-first search, checks for each neighbor whetherits corresponding edge schema has a skyline FirmTruss index thatdominates the input (๐, _). We have:
Theorem 12. Algorithm 5 takes O(|๐ธ [๐บ0] |) time.
This indexing approach can be used in Algorithm 1 to find themaximal ๐บ0, as well as in Algorithm 3 so that we only need toadd edges whose corresponding edge schema has an index thatdominates (๐, _). We refer to these variants of Global and Local asiGlobal and iLocal, respectively.
8 ATTRIBUTED FIRMTRUSS COMMUNITYOften networks come naturally endowed with attributes associatedwith their vertices. For example, in DBLP, vertices which correspondto authors may have areas of interest as attributes. In protein-protein interaction networks where vertices represent proteins,the attributes may correspond to biological processes, molecular
functions, or cellular components of a protein in the GO databasesmade available through the Gene Ontology (GO) project [4]. It isnatural to impose some level of similarity between members of acommunity, based on their attributes.
Network homophily is a phenomenon which states similar nodesmay be more likely to attach to each other than dissimilar ones.Inspired by this โbirds of a feather flock togetherโ phenomenon,in social networks, we argue that users remain engaged with theircommunity if they feel enough similarity with others, while userswho feel dissimilar from a community may decide to leave thecommunity. Accordingly, for each vertex, we measure how similarit is to the communityโs members and use it to define the homophilyin the community.
We show that surprisingly, use of homophily in a definitionof attributed community offers an alternative means to avoid thefree rider effect. In this section, we extend the definition of theFirmTruss-based community to attributed ML networks, where weassume each vertex has an attribute vector. In order to capture ver-tex similarity, we propose a new function to measure the homophilyin a subgraph. We show that this function not only guarantees ahigh correlation between attributes of vertices in a community butalso avoids the free-rider effect. Unlike previous work [19, 42, 81],our model allows for continuous valued attributes. E.g., in a PPInetwork, the biological process associated with a protein may havea real value, as opposed to just a boolean or a categorical value.
Let A = {๐ด1, ..., ๐ด๐ } be a set of attributes. An attributed mul-tilayer network ๐บ = (๐ , ๐ธ, ๐ฟ,ฮจ), where (๐ , ๐ธ, ๐ฟ) is a multilayernetwork and ฮจ : ๐ โ R๐โฅ0 is a non-negative function that assignsa ๐-dimensional vector to each vertex, with ฮจ(๐ฃ) [๐] representingthe strength of attribute ๐ด๐ in vertex ๐ฃ . Let โ(๐ฃ,๐ข) be a symmetricand non-negative similarity measure based on attribute vectorsof ๐ข and ๐ฃ . E.g., โ(๐ฃ,๐ข) can be the cosine similarity between ฮจ(๐ข)and ฮจ(๐ฃ). Let ๐ be a community containing ๐ฃ . We define โ๐ (๐ฃ),capturing the aggregate similarity between ๐ฃ and members of ๐ :
โ๐ (๐ฃ) =โ๏ธ๐ขโ๐๐ขโ ๐ฃ
โ(๐ฃ,๐ข).
The higher the valueโ๐ (๐ฃ) themore similar user ๐ฃ โfeels" they arewith the community ๐ . While cosine similarity of attribute vectorsis a natural way to compute the similarity โ(๐ฃ,๐ข), any symmetricand non-negative measure can be used in its place.
Based on โ๐ (๐ฃ), we define the homophily score of community๐ as follows. Let ๐ โ R โช {+โ,โโ} be any number. Then thehomophily score of ๐ is defined as:
ฮ๐ (๐) =(1|๐ |
โ๏ธ๐ฃโ๐
โ๐ (๐ฃ)๐)1/๐
.
The parameter ๐ gives flexibility for controlling the emphasison similarity at different ends of the spectrum. When ๐ โ +โ(resp. ๐ โ โโ) , we have higher emphasis on large (resp. small)similarities. This flexibility allows us to tailor the homophily scoreto the application at hand.
8.1 Attributed FirmCommunity ModelProblem 2 (Attributed FirmTruss Community Search). Givenan attributed ML network๐บ = (๐ , ๐ธ, ๐ฟ,ฮจ), two integers ๐ โฅ 2, _ โฅ 1,
8
a parameter ๐ โ Rโช{+โ,โโ}, and a set of query vertices๐ โ ๐ , theattributed FirmTruss community search (AFTCS) is to find a connectedsubgraph ๐บ [๐ป ] โ ๐บ satisfying:
(1) ๐ โ ๐ป ,(2) ๐บ [๐ป ] is a connected (๐, _)-FirmTruss,(3) ฮ๐ (๐ป ) is the maximum among all subgraphs satisfying (1) and
(2).
HardnessAnalysis.Next we analyze the complexity of the AFTCSproblem and show that when ๐ is finite, it is NP-hard.
Theorem 13 (AFTCS Hardness). The AFTCS problem is NP-hard,whenever ๐ is finite.
Proof Sketch. The crux of our proof idea comes from the hard-ness of finding the densest subgraphwith at least๐ vertices in single-layer graphs [48]. Given an instance of this problem, ๐บ = (๐ , ๐ธ),we consider a complete ML attributed graph and provide an ap-proach to construct an attribute vector of each node such that forevery two vertices ๐ข and ๐ฃ with (๐ข, ๐ฃ) โ ๐ธ : โ(๐ข, ๐ฃ) = 1
2 |๐ | and if(๐ข, ๐ฃ) โ ๐ธ : โ(๐ข, ๐ฃ) = 0. So the densest subgraph with at least ๐vertices in ๐บ is a solution for AFTCS, and vice versa. โก
Free-rider Effect. In the following, we show that the AFTCS prob-lem avoids the free-rider effect.
Theorem 14 (AFTCS Free-Rider Effect). For any attributed MLnetwork๐บ = (๐ , ๐ธ, ๐ฟ,ฮจ) and query vertices๐ โ ๐ , there is a solution๐บ [๐ป ] to the AFTCS problem such that for all query-independentoptimal solutions๐บ [๐ปโ], either๐ปโ = ๐ป , or๐บ [๐ปโช๐ปโ] is disconnected,or ๐บ [๐ป โช ๐ปโ] has a strictly smaller homophily score than ๐บ [๐ป ].
Analogously to Theorem 6, the theorem shows that use of at-tribute based homophily leads to avoidance of the free-rider-effect.
8.2 AlgorithmsIn this section, we propose an efficient approximation algorithmfor the AFTCS problem. We show that when ๐ = +โ, or โโ, thisalgorithm finds the exact solution. We can show that our objectivefunction ฮ๐ (.) is neither submodular nor supermodular (proof inAppendix C), offering some evidence that this problem may be hardto approximate, for some values of ๐ .
Peeling Approximation Algorithm.We divide the problem intotwo cases: (i) ๐ > 0, and (ii) ๐ < 0. For finite ๐ > 0, argmax ฮ๐ (๐) =argmax ฮ๐๐ (๐), so for simplicity, we focus on maximizing ฮ
๐๐ (.).
Similarly, for finite ๐ < 0, we focus on minimizing ฮ๐๐ (.). Note that,
for any finite ๐ an ๐ผ-approximate solution for optimizing ฮ๐๐ (.)
provides an ๐ผ1/๐ -approximate solution for optimizing ฮ๐ (.).Consider a set of vertices ๐ โ ๐ . Our approximation algorithm
is to greedily remove nodes ๐ข โ ๐ that may improve the objective.Since removing any node ๐ข โ ๐ will change the denominator ofฮ๐๐ (๐) in the same way, we can choose the node that leads to theminimum (maximum) drop in the numerator. Let us examine thechange to the ฮ๐๐ (๐) from dropping ๐ข โ ๐ :
ฮ๐๐ (๐\{๐ข}) =
โ๐ฃโ๐\{๐ข } โ๐\{๐ข } (๐ฃ)๐
|๐ | โ 1 =1
|๐ | โ 1
(|๐ | ยท ฮ๐๐ (๐) โ ฮ๐ข (๐)
),
Algorithm 6: AFTCS-ApproxInput :An attributed ML graph๐บ = (๐ , ๐ธ, ๐ฟ,ฮจ) , a set of query
vertices๐ โ ๐ , and two integers ๐ โฅ 2 and _ โฅ 1Output :A connected (๐, _)-FT containing๐ with a large ฮ๐ (.)
1 ๐ โ 0;2 ๐บ0 โ Find a maximal connected (๐, _)-FirmTruss containing๐ ;3 Calculate โ๐ [๐บ0 ] (๐ข) for all ๐ข โ ๐ [๐บ0 ];4 while๐ โ ๐ [๐บ๐ ] do5 if ๐ > 0 then6 ๐ข โ argmin๐ขโ๐ [๐บ๐ ] ฮ๐ข (๐ [๐บ๐ ]) ;7 else8 ๐ข โ argmax๐ขโ๐ [๐บ๐ ] ฮ๐ข (๐ [๐บ๐ ]) ;9 Delete vertex ๐ข and its incident edges from๐บ๐ in all layers;
10 Maintain๐บ๐ as (๐, _)-FirmTruss by removing vertices/edges;11 Let the remaining graph as๐บ๐+1; ๐ โ ๐ + 1;12 return argmax๐ปโ{๐บ0,...,๐บ๐โ1} ฮ๐ (๐ป ) ;
where
ฮ๐ข (๐) = โ๐ (๐ข)๐ +ยฉยญยซ
โ๏ธ๐ฃโ๐\{๐ข }
โ๐ (๐ฃ)๐ โ [โ๐ (๐ฃ) โ โ(๐ฃ,๐ข)]๐ยชยฎยฌ .
Notice that ฮ๐ข (๐) represents the exact decrease in the numeratorof ฮ๐๐ (๐) resulting from removing ๐ข. Based on this observation,in Algorithm 6, we recursively remove a vertex with a minimum(maximum) ฮ value, and maintain the remaining subgraph as a(๐, _)-FirmTruss. We have the following result:
Theorem 15 (AFTCS-Approx Complexity). Algorithm 6 takesO(๐ |๐0 |2 + ๐ก ( |๐0 | + |๐ธ0 |) +
โโโ๐ฟ |๐ธโ |1.5 + |๐ธ | |๐ฟ | + |๐ธ |_ log |๐ฟ |) time,
and O(|๐ธ | |๐ฟ | + |๐0 |2) space, where ๐ก is the number of iterations, ๐0and ๐ธ0 are the vertex set and edge set of maximal (๐, _)-FirmTruss.
As for the approximation quality, we can show the followingwhen ๐ โฅ 1. The detailed proof and tightness example can be foundin Appendix A.2 and B.
Theorem 16 (AFTCS-ApproxQualityApproximation). Let๐ โฅ 1,Algorithm 6 returns a (๐ + 1)1/๐ -approximation solution of AFTCSproblem.
Proof Sketch. Let ๐ปโ be the optimal solution. Since removinga node ๐ขโ โ ๐ปโ will produce a subgraph with homophily score atmost ฮ๐๐ (๐ปโ), we have ฮ
๐๐ (๐ปโ) โค ฮ๐ขโ (๐ปโ). Next, we show that the
first removed node ๐ขโ โ ๐ปโ by the algorithm cannot be removed bymaintaining FirmTruss, so it was a node with a minimum ฮ. Then,we use the fact that the minimum value of ฮ is less than the averageof ฮ over all nodes and provide an upper bound of (๐ + 1)ฮ๐๐ (๐) forthe average of ฮ over ๐ . Finally, we show that function |๐ |ฮ๐๐ (๐)is supermodular for ๐ โฅ 1, and based on its increasing differencesproperty, we conclude the approximation guarantee. โก
Remark 1. (1) How much good can Algorithm 6 work? As ๐ increases,Algorithm 6 has a better approximation factor. In the worst case,(๐ = 1), we get approximation factor = 2, and when ๐ โ โ, ourapproximation factor has a limit of 1. This limit of the approximationfactor intuitively matches the fact that when ๐ = +โ the optimal so-lution is trivial to obtain by the maximal FirmTruss. (2) Although ourmethod provides no formal guarantees when ๐ < 1, it achieves resultsof good empirical quality in practice, as validated in our experiments.
9
Table 1: Network Statistics
Dataset |๐ | |๐ธ | |๐ฟ | Attribute Ground Truth
Terrorist 79 2.2K 14RM 91 14K 10FAO 214 319K 364Brain 190 934K 520DBLP 513K 1.0M 10Obama 2.2M 3.8M 3YouTube 15K 5.6M 4Amazon 410K 8.1M 4YEAST 4.5K 8.5M 4Higgs 456K 13M 4Friendfeed 510K 18M 3StackOverflow 2.6M 47.9M 24Google+ 28.9M 1.19B 4
Exact Algorithm when ๐ = +โ, or โโ. The case ๐ = +โis straightforward, where we just want to maximize ฮ+โ (๐) =
max๐ฃโ๐ โ๐ (๐ฃ). The solution of this case is the maximal subgraphthat satisfies the conditions (1) and (2) in Problem 2. In the ๐ = โโcase, we want to maximize ฮโโ (๐) = min๐ฃโ๐ โ๐ (๐ฃ). We can recur-sively remove a vertex with minimum value of โ๐ and maintainthe remaining subgraph such that satisfies conditions (1) and (2) inProblem 2. The pseudocode is identical to Algorithm 6, except inlines 5-8, we recursively remove a vertex with a minimum value ofโ๐ . We refer to this modified peeling algorithm as Exact-MaxMin.
Theorem 17 (Correctness of Exact-MaxMin). Exact-MaxMinreturns the exact solution to the AFTCS problem with ๐ = โโ.
9 EXPERIMENTSIn this section, we conduct experiments to evaluate proposed com-munity search models and algorithms. Additional experiments onefficiency and parameter sensitivity can be found in Appendix F.
Setup. All algorithms are implemented in Python and compiled byCython. The experiments are performed on a Linux machine withIntel Xeon 2.6 GHz CPU and 128 GB RAM.
BaselineMethods.We compare our FTCSwith the state-of-the-artcommunity search methods in multilayer networks. ML k-core [29]uses an objective function to automatically choose a subset of layersand then finds a subgraph such that the minimum of per-layer min-imum degrees, across selected layers, is maximized. ML-LCD [44]uses a greedy strategy to maximize the ratio of Jaccard similarity be-tween nodes inside and outside of the local community. RWM [59]sends random walkers in each layer to obtain the local proximityw.r.t. the query nodes and returns the set of nodes with the small-est conductance. We also compare our approach with CTC [43],which finds the closest truss community in single-layer graphs, andVAC [57], an attributed variant of CTC, also on single-layer graphs.For baselines, we tune the parameters according to their originalpapers and report the best results.
Datasets. We perform extensive experiments on thirteen real net-works [1, 9, 15โ17, 28, 31, 49, 51, 53, 60, 63] covering social, genetic,co-authorship, financial, brain, and co-purchasing networks, whosemain characteristics are summarized in Table 1. While Terroristand DBLP datasets naturally have attributes, for RM and YouTube,we chose one of the layers, embedded it using node2vec [33], andused the vector representation of each node as its attribute vector.
Queries and Evaluation Metrics.We evaluate the performanceof all algorithms using different queries by varying the number of
Terrorist RM Brain DBLP YouTube Amazon0.0
0.2
0.4
0.6
0.8
1.0
F1-S
core
FTCS RWM ML-LCD ML k-core CTC
Figure 2: Quality evaluation on ground-truth networks.
Table 2: Evaluation of FTCS with the state-of-the-art meth-ods on datasets without ground truth.
CS Model FAO Obama YEAST HiggsDensity Diameter Density Diameter Density Diameter Density Diameter
FTCS 979.71 1 9.81 1.84 177.27 1.52 65.14 1.93ML k-core - - 8.13 โ 159.94 โ 59.41 โML-LCD 952.88 1.09 4.87 2.46 - - - -RWM 911.94 1.12 4.62 3.07 25.45 1.84 24.99 3.16CTC 733.85 1 5.35 1.99 139.03 1.92 35.18 2.05
query nodes, and the parameters ๐ , _, and ๐ . To evaluate the qualityof found communities, we measure their F1-score to grade theiralignment with the ground truth. To evaluate the efficiency, wereport the running time. In reporting results, we cap the runningtime at 5 hours and memory footprint at 100 GB. For index-basedmethods, we cap the construction time at 24 hours. Unless statedotherwise, we run our algorithms over 100 random query sets witha random size between 1 and 10, and report the average results. Werandomly set ๐ and _ to one of the common skyline indices of edgeschemas incident to query nodes.
Quality. We evaluate the effectiveness of different communitysearch models over multilayer networks. Figure 2 reports the av-erage F1-scores of all methods on datasets with the ground-truthcommunity. We observe that our approach achieves the highestF1-score on all networks against baselines. The reason is two-fold.First, in our problem definition, we enforce the minimum-diameterrestriction, effectively removing the irrelevant vertices from the re-sult. Second, FirmTruss requires each edge schema to have enoughsupport in a sufficient number of layers, ensuring that the foundsubgraphs are cohesive and densely connected. While CTC alsominimizes the diameter, it is a single-layer approach and missessome structure due to ignoring the type of connections.
We also evaluate all algorithms in terms of other goodness met-rics โ density (๐ฝ = 1), and diameter. Table 2 reports the resultson FAO, Obama, YEAST, and Higgs datasets. The results on otherdatasets are similar, and are omitted for lack of space. We observethat our approach achieves the highest density, and lowest diameteron all networks against baselines.
Since there are no prior models for attributed community searchin ML networks, we compare the quality of AFTCS with our MLunattributed baselines as well as VAC. Table 3 reports the F1-scoreand density of communities found, over four datasets with ground-truth communities. AFTCS consistently beats the baselines. Noticethat AFTCS has a higher F1-score than FTCS in all but one case, asthe existence of both attributes and structure is richer informationthan only structure. Accordingly, AFTCS is better able to distinguishmembers from non-members of a ground-truth community.
Efficiency.Weevaluate the efficiency of different community searchmodels on multilayer graphs. Figure 3 shows the query processingtime of all methods. All of our methods terminate within 1 hour,
10
Table 3: Evaluation of AFTCSwith the state-of-the-art meth-ods on attributed datasets with ground-truth.
CS Model Terrorist RM DBLP YoutubeF1 Density F1 Density F1 Density F1 Density
AFTCS
๐ = 2 0.52 15.29 0.79 61.24 0.61 8.22 0.45 11.64๐ = 1 0.61 15.22 0.83 64.31 0.60 7.91 0.45 11.59๐ = 0 0.61 15.18 0.82 63.98 0.64 8.11 0.43 10.88๐ = โ1 0.59 13.76 0.81 63.19 0.61 8.19 0.44 11.24๐ = โโ 0.57 14.08 0.85 62.46 0.62 7.97 0.46 11.49
FTCS 0.59 10.23 0.84 60.52 0.61 8.69 0.47 10.36ML k-core 0.35 8.43 0.53 55.98 0.46 5.53 0.26 8.78ML-LCD 0.32 7.82 0.49 47.26 0.50 6.49 - -RWM 0.37 5.45 0.65 39.81 0.48 5.12 0.35 7.46VAC 0.41 7.51 0.48 52.50 0.35 5.27 0.24 4.34
Terrorist FAO Brain DBLPYouTube
Amazon HiggsFriendFeed
StackOverflowGoogle+
10 210 1100101102103104
Tim
e (s
)
iLocal Local iGlobal Global RWM ML-LCD ML k-core
Figure 3: Efficiency Evaluation.
2 3 4 5 6 7 8 9 10 k
10 20.1
110
102103104
Tim
e (s
)
GlobaliGlobal
LocaliLocal
(a) Effect of ๐ (DBLP)
1 2 3 4 5 6 7 8 9 10 10 2
0.11
10102103104
Tim
e (s
)
GlobaliGlobal
LocaliLocal
(b) Effect of _ (DBLP)
Terrorist RM DBLPYouTube
10 2
10 1
100
101
102
Tim
e (s
)
p = 2p = 1
p = 0p = -1
p =
(c) Effect of ๐
Figure 4: Parameter Sensitivity Evaluation.
except Global on the two largest datasets, as it generates a large can-didate graph ๐บ0. Our algorithms Local and iLocal run much fasterthan Online-Global. Overall, iLocal achieves the best efficiency, andit can deal with a search query within a second on most datasets.Local is the only algorithm that scales to graphs containing billionsof edges. Bars for iGlobal and iLocal are missing for Google+, asindex construction time exceeds our threshold.
Parameter Sensitivity.We evaluate the sensitivity of algorithmefficiency to the parameters ๐, _, and ๐ , varying one parameter at atime. Figures 4(a) and (b) show the running time as a function of๐ and _ on DBLP. The larger ๐ and _ for Global and iGlobal resultin lower running time since the algorithms generate a smaller๐บ0.However, the larger ๐ and _ increase the running time of Local andiLocal since they need to count more nodes in the neighborhoodof query nodes to find a (๐, _)-FirmTruss. Figure 4(c) shows therunning time as a function of ๐ . We observe that AFTCS-Approxachieves a stable efficiency on different finite values of ๐ . Notice,when ๐ = โโ, this algorithm takes less time as it does not need tocalculate ฮ๐ข (๐) for each node and can simply remove the vertexwith minimum โ๐ (๐ข) in each iteration.
Scalability. We test our algorithms using different versions ofStackOverflow obtained by selecting a variable #layers from 1 to 24and also with different subsets of edges. Figure 5 shows the resultsof the index-based Global, Local Search, and AFTCS-Approx algo-rithms. The results Global and iLocal are similar, and are omitted forlack of space (see Appendix F). The running time of all approachesscales linearly in #layers. By varying #edges, all algorithms scale
|E|
10 102103
104105
106
|L|3
69
1215
1821
24
Tim
e (s
)
10 2
0.1
1
10
(a) iGlobal
|E|
10 102103
104105
106
|L|3
69
1215
1821
24
Tim
e (s
)
10 3
10 2
0.1
1
10
(b) Local Search
|E|
10 102103
104105
106
|L|3
69
1215
1821
24
Tim
e (s
)
10 2
0.1
1
10
(c) AFTCS-Approx
Figure 5: Scalability of proposed algorithms with varyingthe number of layers and the number of edges.
Terrorist RM FAO Brain DBLP ObamaYouTube
AmazonYEAST Higgs
FriendFeedStackOverflow
0.11
10102103104105106
Tim
e (s
)
Time Memory
0.1110102103104105106
Mem
ory
(MB
)
Figure 6: Index Construction Costs.
gracefully. As expected, the Local algorithm is less sensitive tovarying #edges than #layers.
Index Construction Costs. Figure 6 reports the SFT index con-struction time and size. For all datasets, the SFT index can be builtwithin 24 hours, and its size is within 2.6ร of the original graphsize. The result shows the efficiency of SFT index construction.
Case Studies: Identify Functional Systems inBrainNetworks.Detecting and monitoring functional systems in the human brainis a primary task in neuroscience. However, the brain network gen-erated from an individual can be noisy and incomplete. Using brainnetworks from many individuals can help to identify functionalsystems more accurately. A community in a multilayer brain net-work, where each layer is the brain network of an individual, canbe interpreted as a functional system in the brain. In this case study,to show the effectiveness of the FTCS, we compare its detectedfunctional system with ground truth. In this experiment, we focuson the โvisual processingโ task in the brain. As the โOccipital Poleโis primarily responsible for visual processing [47], we use one of itsrepresenting nodes as the query node. Figure 7 reports the foundcommunities by FTCS and baselines. The identified communitiesare highlighted in red, and the query node is green. Results showthe effectiveness of FTCS as the community detected by our methodis very similar to the ground truth with F1-score of 0.75. RWM,which is a random walk-based community model, includes manyfalse-positive nodes that cause F1-score of 0.495. On the other hand,some nodes in the boundary region are missed by ML-LCD thatcaused low F1-score of 0.4. The result of the ML k-core is omittedas it does not terminate even before one week.
Case Studies: Classification on Brain Networks. Behavioraldisturbances in attention deficit hyperactivity disorder (ADHD)are considered to be caused by the dysfunction of spatially dis-tributed, interconnected neural systems [32]. In this section, weemploy our FTCS to detect common structures in the brain func-tional connectivity network of ADHD individuals and typicallydeveloped (TD) people. Our dataset is derived from the functionalmagnetic resonance imaging (fMRI) of 520 individuals with thesame methodology used in [52]. It contains 190 individuals in the
11
(a) Ground truth (b) FirmTruss (c) ML-LCD (d) RWM
Figure 7: Detected functional systems.
(a)๐ถADHD (b)๐ถTD
Figure 8: FirmTruss community in TD and ADHD groups.
condition group, labeled ADHD, and 330 individuals in the con-trol group, labeled TD. Here, each layer is the brain network of anindividual person, where nodes are brain regions, and each edgemeasures the statistical association between the functionality of itsendpoints. Since โTemporal Poleโ is known as the part of the brainthat plays an important role in ADHD symptoms [64, 69], we use asubset of its representing nodes as the query nodes.
Next, we randomly chose 230 individuals labeled TD and 90 indi-viduals labeled ADHD to construct two multilayer brain networksand then found the FirmTruss communities associated with โTem-poral Poleโ in each group separately, referred to as๐ถTD and๐ถADHDin Figure 8. In the second step, for each individual unseen brainnetwork, we find the associated communities to the query nodesusing the FTCS model, setting |๐ฟ | = 1. In order to classify an unseenbrain network, we calculate the similarity of its found communitieswith ๐ถTD and ๐ถADHD and then predict its label as the label of thecommunity with maximum similarity. Here, we use the overlapcoefficient [72] as the similarity measure between two communities.
To ensure that the result is statistically significant, we repeatthis process for 1000 trials and report the mean, and its relativestandard deviation of accuracy, precision, recall and F1-score inTable 4. Interestingly, not only does our FTCS outperform baselinecommunity search models, but it also achieves results comparablewith the state-of-the-art ADHD classification model [32], basedon SVM, which reports an accuracy of 76%. It is noteworthy thatthis comparable result is achieved by the FTCS method, which is awhite-box model, and can be explained.
Case Studies: DBLP.We conduct a case study on the DBLP datasetto judge the quality of the AFTCS model and to show the effective-ness of the homophily score in removing free riders. The multilayerDBLP dataset is a collaboration network derived following themethodology in [8]. In this dataset, each node is a researcher, anedge shows collaboration, and each layer is a topic of research. Foreach author, we consider the bag of words resulting from the titlesof all their papers and apply LDA topic modeling [7] to automati-cally identify 240 topics. The attribute of each author is the vector
Table 4: Results of the ADHD classification task.CS Model Accuracy Precision Recall F1-score
FTCS 76.56 ยฑ 0.72 75.73 ยฑ 1.00 83.77 ยฑ 1.21 77.54 ยฑ 0.66ML-LCD 55.70 ยฑ 1.25 55.43 ยฑ 1.15 78.91 ยฑ 1.64 64.13 ยฑ 1.06RWM 50.47 ยฑ 0.18 53.03 ยฑ 2.08 55.09 ยฑ 0.41 45.59 ยฑ 1.18
(a) AFTCS Community
Security
Systems
Graph AlgorithmDatabases
Network
ML
Data Analysis
Health Informatics Cognitive Science
NLP
Free-riders Community
(b) Average attribute
Figure 9: Case study of DBLP.
that describes the distribution of their papers in these 240 topics.We use "Brian D. Athey" as the query node. The maximal (8, 2)-FirmTruss, including the query node, has 44 nodes with a minimumhomophily score of 0.08, shown in Figure 9(a). While generally,researchers in this FirmTruss work on "Data," the aspect of their re-search is different. The community found by AFTCS (๐ = โโ) is an(8, 2)-FirmTruss with a minimum homophily score of 0.28, whichresulted by removing 28 nodes as free-riders. The found commu-nity is shown in larger circle, and smaller circle shows free riders.We compute the average attributes of community members andfree-riders and then cluster their non-zero elements into ten knownresearch topics. Results are shown in Figure 9(b). While researchersin the found community have focused more on "Health Informat-ics," removed researchers have focused more on โDatabases.โ Theconnection of these two communities, which results in their unionbeing an (8, 2)-FirmTruss, is the collaboration of โBrian D. Athey,โfrom the โHealth Informaticsโ community with some researchersin โDatabasesโ community. AFTCS divides the maximal FirmTrussinto two communities with more correlations in each of them.
10 CONCLUSIONSWe propose and study a novel extended notion of truss decompo-sition in ML networks, FirmTruss, and establish its nice proper-ties. We then study a new problem of FirmTruss-based communitysearch over ML graphs. We show that the problem is NP-hard. Totackle it efficiently, we propose two 2-approximation algorithmsand prove that our approximations are tight. To further improvetheir efficiency, we propose an index and develop fast index-basedvariants of our approximation algorithms.We extend the FirmTruss-based community model to attributed ML networks and proposea homophily-based model making use of generalized ๐-mean. Weprove that this problem is also NP-hard for finite value of ๐ and tosolve it efficiently, we develop a fast greedy algorithm which hasa quality guarantee for ๐ โฅ 1. Our extensive experimental resultson large real-world networks with ground-truth communities con-firm the effectiveness and efficiency of our proposed models andalgorithms, while our case studies on brain networks and DBLPillustrate their practical utility.
12
REFERENCES[1] Carlo Abrate and Francesco Bonchi. 2021. Counterfactual Graphs for Explainable
Classification of Brain Networks. In Proceedings of the 27th ACM SIGKDD Con-ference on Knowledge Discovery; Data Mining (Virtual Event, Singapore) (KDDโ21). Association for Computing Machinery, New York, NY, USA, 2495โ2504.https://doi.org/10.1145/3447548.3467154
[2] Esra Akbas and Peixiang Zhao. 2017. Truss-Based Community Search: A Truss-Equivalence Based Indexing Approach. Proc. VLDB Endow. 10, 11 (aug 2017),1298โ1309. https://doi.org/10.14778/3137628.3137640
[3] Alberto Aleta and Yamir Moreno. 2019. Multilayer networks in a nutshell. AnnualReview of Condensed Matter Physics 10 (2019), 45โ62.
[4] M. Ashburner, C.A. Ball, and J.A. Blake et al. 2000. Gene Ontology: Tool for theUnification of Biology. Nature Genetics 25, 1 (2000), 25โ29.
[5] N. Azimi-Tafreshi, J. Gomez-Garde, and S. N. Dorogovtsev. 2014. k-corepercolation on multiplex networks. Physical Review E 90, 3 (Sep 2014).
[6] Bharat B Biswal, Maarten Mennes, Xi-Nian Zuo, Suril Gohel, Clare Kelly, Steve MSmith, Christian F Beckmann, Jonathan S Adelstein, Randy L Buckner, Stan Col-combe, et al. 2010. Toward discovery science of human brain function. Proceedingsof the National Academy of Sciences 107, 10 (2010), 4734โ4739.
[7] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent DirichletAllocation. J. Mach. Learn. Res. 3, null (mar 2003), 993โ1022.
[8] Francesco Bonchi, Aristides Gionis, Francesco Gullo, Charalampos E. Tsourakakis,and Antti Ukkonen. 2015. Chromatic Correlation Clustering. ACM Trans. Knowl.Discov. Data 9, 4, Article 34 (jun 2015), 24 pages. https://doi.org/10.1145/2728170
[9] Fabio Celli, F Marta L Di Lascio, Matteo Magnani, Barbara Pacelli, and Luca Rossi.2010. Social Network Data and Practices: the case of Friendfeed. In InternationalConference on Social Computing, Behavioral Modeling and Prediction (Lecture Notesin Computer Science). Springer Berlin Heidelberg, New York, NY, USA, โ.
[10] Tanmoy Chakraborty, Sikhar Patranabis, Pawan Goyal, and Animesh Mukherjee.2015. On the Formation of Circles in Co-Authorship Networks. In Proceedings ofthe 21th ACM SIGKDD International Conference on Knowledge Discovery and DataMining (Sydney, NSW, Australia) (KDD โ15). Association for Computing Machin-ery, New York, NY, USA, 109โ118. https://doi.org/10.1145/2783258.2783292
[11] Lu Chen, Chengfei Liu, Kewen Liao, Jianxin Li, and Rui Zhou. 2019. ContextualCommunity Search Over Large Social Networks. In 2019 IEEE 35th InternationalConference on Data Engineering (ICDE). IEEE, Macau SAR, China, 88โ99. https://doi.org/10.1109/ICDE.2019.00017
[12] Lu Chen, Chengfei Liu, Rui Zhou, Jianxin Li, Xiaochun Yang, and Bin Wang. 2018.Maximum Co-Located Community Search in Large Scale Social Networks. Proc.VLDB Endow. 11, 10 (2018), 1233โ1246. https://doi.org/10.14778/3231751.3231755
[13] Shuntong Chen, Huaien Qian, Yanping Wu, Chen Chen, and Xiaoyang Wang.2019. Efficient Adoption Maximization in Multi-layer Social Networks. In 2019International Conference on Data Mining Workshops (ICDMW). IEEE, Beijing,China, 56โ60. https://doi.org/10.1109/ICDMW.2019.00017
[14] Wanyun Cui, Yanghua Xiao, Haixun Wang, Yiqi Lu, and Wei Wang. 2013. Onlinesearch of overlapping communities. In Proceedings of the 2013 ACM SIGMODinternational conference on Management of data. Association for ComputingMachinery, New York, NY, USA, 277โ288.
[15] M. De Domenico, A. Lima, P. Mougel, and M. Musolesi. 2013. The Anatomy of aScientific Rumor. Scientific Reports 3, 1 (2013), โ.
[16] M. De Domenico, M. A. Porter, and A. Arenas. 2014. MuxViz: a tool for multilayeranalysis and visualization of networks. Journal of Complex Networks 3, 2 (Oct2014), 159โ176. https://doi.org/10.1093/comnet/cnu038
[17] M. De Domenico, V. Nicosia, A. Arenas, and V. Latora. 2015. Structural reducibilityof multilayer networks. Nature communications 6 (2015), 6864.
[18] Zheng Dong, Xin Huang, Guorui Yuan, Hengshu Zhu, and Hui Xiong. 2021.Butterfly-Core Community Search over Labeled Graphs. Proc. VLDB Endow. 14,11 (jul 2021), 2006โ2018. https://doi.org/10.14778/3476249.3476258
[19] Yixiang Fang, Reynold Cheng, Yankai Chen, Siqiang Luo, and Jiafeng Hu. 2017.Effective and efficient attributed community search. The VLDB Journal 26, 6(2017), 803โ828.
[20] Yixiang Fang, Reynold Cheng, Siqiang Luo, and Jiafeng Hu. 2016. EffectiveCommunity Search for Large Attributed Graphs. Proc. VLDB Endow. 9, 12 (aug2016), 1233โ1244. https://doi.org/10.14778/2994509.2994538
[21] Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie Zhang, Reynold Cheng,and Xuemin Lin. 2019. A Survey of Community Search Over Big Graphs. https://doi.org/10.48550/ARXIV.1904.12539
[22] Yixiang Fang, Zhongran Wang, Reynold Cheng, Hongzhi Wang, and Jiafeng Hu.2019. Effective and Efficient Community Search Over Large Directed Graphs.IEEE Transactions on Knowledge and Data Engineering 31, 11 (2019), 2093โ2107.
[23] Yixiang Fang, Zhongran Wang, Reynold Cheng, Hongzhi Wang, and Jiafeng Hu.2019. Effective and Efficient Community Search Over Large Directed Graphs (Ex-tended Abstract). In 2019 IEEE 35th International Conference on Data Engineering(ICDE). IEEE, Macau SAR, China, 2157โ2158. https://doi.org/10.1109/ICDE.2019.00273
[24] Yixiang Fang, Yixing Yang, Wenjie Zhang, Xuemin Lin, and Xin Cao. 2020.Effective and Efficient Community Search over Large Heterogeneous Infor-mation Networks. Proc. VLDB Endow. 13, 6 (Feb. 2020), 854โ867. https://doi.org/10.14778/3380750.3380756
[25] Santo Fortunato. 2010. Community detection in graphs. Physics Reports 486, 3(2010), 75โ174. https://doi.org/10.1016/j.physrep.2009.11.002
[26] Santo Fortunato and Marc Barthรฉlemy. 2007. Resolution limit incommunity detection. Proceedings of the National Academy of Sci-ences 104, 1 (2007), 36โ41. https://doi.org/10.1073/pnas.0605965104arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.0605965104
[27] Amita Gajewar and Atish Das Sarma. 2012. Multi-skill Collaborative Teams basedon Densest Subgraphs. In Proceedings of the 2012 SIAM International Conference onData Mining (SDM). Society for Industrial and Applied Mathematics, California,USA, 165โ176. https://doi.org/10.1137/1.9781611972825.15
[28] Edoardo Galimberti, Francesco Bonchi, and Francesco Gullo. 2017. Core Decom-position and Densest Subgraph in Multilayer Networks. In Proceedings of the2017 ACM on Conference on Information and Knowledge Management (Singapore,Singapore) (CIKM โ17). Association for Computing Machinery, New York, NY,USA, 1807โ1816. https://doi.org/10.1145/3132847.3132993
[29] Edoardo Galimberti, Francesco Bonchi, Francesco Gullo, and Tommaso Lanciano.2020. Core Decomposition in Multilayer Networks: Theory, Algorithms, andApplications. ACM Trans. Knowl. Discov. Data 14, 1, Article 11 (2020), 40 pages.https://doi.org/10.1145/3369872
[30] Jun Gao, Jiazun Chen, Zhao Li, and Ji Zhang. 2021. ICS-GNN: LightweightInteractive Community Search via Graph Neural Network. Proc. VLDB Endow.14, 6 (feb 2021), 1006โ1018. https://doi.org/10.14778/3447689.3447704
[31] Neil Zhenqiang Gong, Wenchang Xu, Ling Huang, Prateek Mittal, Emil Stefanov,Vyas Sekar, and Dawn Song. 2012. Evolution of Social-Attribute Networks:Measurements, Modeling, and Implications Using Google+. In Proceedings of the2012 Internet Measurement Conference (Boston, Massachusetts, USA) (IMC โ12).Association for Computing Machinery, New York, NY, USA, 131โ144.
[32] Kristi R. Griffiths, Taylor A. Braund, Michael R. Kohn, Simon Clarke, Leanne M.Williams, and Mayuresh S. Korgaonkar. 2021. Structural brain network topologyunderpinning ADHD and response to methylphenidate treatment. TranslationalPsychiatry 11, 1 (02 Mar 2021), 150. https://doi.org/10.1038/s41398-021-01278-x
[33] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning forNetworks. arXiv:1607.00653 [cs.SI]
[34] Fangda Guo, Ye Yuan, Guoren Wang, Xiangguo Zhao, and Hao Sun. 2021.Multi-attributed Community Search in Road-social Networks. In 2021 IEEE37th International Conference on Data Engineering (ICDE). 109โ120. https://doi.org/10.1109/ICDE51399.2021.00017
[35] Farnoosh Hashemi, Ali Behrouz, and Laks V.S. Lakshmanan. 2022. FirmCoreDecomposition of Multilayer Networks. In Proceedings of the ACM Web Confer-ence 2022 (Virtual Event, Lyon, France) (WWW โ22). Association for ComputingMachinery, New York, NY, USA, 1589โ1600. https://doi.org/10.1145/3485447.3512205
[36] Allen L. Hu and Keith C. C. Chan. 2013. Utilizing Both Topological and AttributeInformation for Protein Complex Identification in PPI Networks. IEEE/ACMTransactions on Computational Biology and Bioinformatics 10, 3 (2013), 780โ792.https://doi.org/10.1109/TCBB.2013.37
[37] Jiafeng Hu, Xiaowei Wu, Reynold Cheng, Siqiang Luo, and Yixiang Fang. 2016.Querying Minimal Steiner Maximum-Connected Subgraphs in Large Graphs.In Proceedings of the 25th ACM International on Conference on Information andKnowledge Management (Indianapolis, Indiana, USA) (CIKM โ16). Association forComputing Machinery, New York, NY, USA, 1241โ1250. https://doi.org/10.1145/2983323.2983748
[38] Hongxuan Huang, Qingyuan Linghu, Fan Zhang, Dian Ouyang, and Shiyu Yang.2021. Truss Decomposition on Multilayer Graphs. In 2021 IEEE InternationalConference on Big Data (Big Data). IEEE, virtual event, 5912โ5915. https://doi.org/10.1109/BigData52589.2021.9671831
[39] Ling Huang, Chang-Dong Wang, and Hong-Yang Chao. 2019. Higher-OrderMulti-Layer Community Detection. In Proceedings of the Thirty-Third AAAI Con-ference on Artificial Intelligence and Thirty-First Innovative Applications of ArtificialIntelligence Conference and Ninth AAAI Symposium on Educational Advances inArtificial Intelligence (Honolulu, Hawaii, USA) (AAAIโ19/IAAIโ19/EAAIโ19). AAAIPress, -, Article 1271, 2 pages. https://doi.org/10.1609/aaai.v33i01.33019945
[40] Xinyu Huang, Dongming Chen, Tao Ren, and Dongqi Wang. 2021. A surveyof community detection methods in multilayer networks. Data Mining andKnowledge Discovery 35, 1 (2021), 1โ45.
[41] Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. 2014. QueryingK-Truss Community in Large and Dynamic Graphs. In Proceedings of the 2014ACM SIGMOD International Conference on Management of Data (Snowbird, Utah,USA) (SIGMOD โ14). Association for Computing Machinery, New York, NY, USA,1311โ1322. https://doi.org/10.1145/2588555.2610495
[42] Xin Huang and Laks VS Lakshmanan. 2017. Attribute-driven community search.Proceedings of the VLDB Endowment 10, 9 (2017), 949โ960.
[43] Xin Huang, Laks V. S. Lakshmanan, Jeffrey Xu Yu, and Hong Cheng. 2015. Ap-proximate Closest Community Search in Networks. Proc. VLDB Endow. 9, 4 (dec
13
2015), 276โ287. https://doi.org/10.14778/2856318.2856323[44] Roberto Interdonato, Andrea Tagarelli, Dino Ienco, Arnaud Sallaberry, and Pascal
Poncelet. 2017. Local community detection in multilayer networks. Data Miningand Knowledge Discovery 31, 5 (2017), 1444โ1479.
[45] V. Jethava and N. Beerenwinkel. 2015. Finding Dense Subgraphs in RelationalGraphs. In Machine Learning and Knowledge Discovery in Databases, AnnalisaAppice, Pedro Pereira Rodrigues, Vรญtor Santos Costa, Joรฃo Gama, Alรญpio Jorge,and Carlos Soares (Eds.). Springer International Publishing, Cham, 641โ654.
[46] Yuli Jiang, Yu Rong, Hong Cheng, Xin Huang, Kangfei Zhao, and Junzhou Huang.2021. Query Driven-Graph Neural Networks for Community Search: From Non-Attributed, Attributed, to Interactive Attributed. https://doi.org/10.48550/ARXIV.2104.03583
[47] Eileanoir B. Johnson, Elin M. Rees, Izelle Labuschagne, Alexandra Durr, Blair R.Leavitt, Raymund A.C. Roos, Ralf Reilmann, Hans Johnson, Nicola Z. Hobbs,Douglas R. Langbehn, Julie C. Stout, Sarah J. Tabrizi, and Rachael I. Scahill. 2015.The impact of occipital lobe cortical thickness on cognitive task performance:An investigation in Huntingtonโs Disease. Neuropsychologia 79 (2015), 138โ146.https://doi.org/10.1016/j.neuropsychologia.2015.10.033
[48] Samir Khuller and Barna Saha. 2009. On Finding Dense Subgraphs. In Proceedingsof the 36th International Colloquium on Automata, Languages and Programming(ICALP โ09). Springer-Verlag, -, 597โ608.
[49] Jungeun Kim and Jae-Gil Lee. 2015. Community Detection in Multi-Layer Graphs:A Survey. SIGMOD Rec. 44, 3 (dec 2015), 37โ48. https://doi.org/10.1145/2854006.2854013
[50] Mikko Kivelรค, Alex Arenas, Marc Barthelemy, James P. Gleeson, Yamir Moreno,and Mason A. Porter. 2014. Multilayer networks. Journal of Complex Networks 2,3 (07 2014), 203โ271. https://doi.org/10.1093/comnet/cnu016
[51] Jรฉrรดme Kunegis. 2013. KONECT: The Koblenz Network Collection. In Proceedingsof the 22nd International Conference on World Wide Web (Rio de Janeiro, Brazil)(WWW โ13 Companion). Association for Computing Machinery, New York, NY,USA, 1343โ1350. https://doi.org/10.1145/2487788.2488173
[52] Tommaso Lanciano, Francesco Bonchi, and Aristides Gionis. 2020. ExplainableClassification of Brain Networks via Contrast Subgraphs. In Proceedings of the26th ACM SIGKDD International Conference on Knowledge Discovery; Data Mining(Virtual Event, CA, USA) (KDD โ20). Association for Computing Machinery, NewYork, NY, USA, 3308โ3318. https://doi.org/10.1145/3394486.3403383
[53] Jure Leskovec, Lada A. Adamic, and Bernardo A. Huberman. 2007. The Dynamicsof Viral Marketing. ACM Trans. Web 1, 1 (May 2007), 5โes.
[54] Rong-Hua Li, Lu Qin, Fanghua Ye, Jeffrey Xu Yu, Xiaokui Xiao, Nong Xiao, andZibin Zheng. 2018. Skyline Community Search in Multi-Valued Networks. InProceedings of the 2018 International Conference on Management of Data (Houston,TX, USA) (SIGMOD โ18). Association for Computing Machinery, New York, NY,USA, 457โ472. https://doi.org/10.1145/3183713.3183736
[55] Rong-Hua Li, Jiao Su, Lu Qin, Jeffrey Xu Yu, and Qiangqiang Dai. 2018. Persis-tent community search in temporal networks. In 2018 IEEE 34th InternationalConference on Data Engineering (ICDE). IEEE, Paris, France, 797โ808.
[56] Boge Liu, Fan Zhang, Chen Zhang, Wenjie Zhang, and Xuemin Lin. 2019. Core-Cube: Core Decomposition in Multilayer Graphs. In Web Information SystemsEngineering โ WISE 2019, Reynold Cheng, Nikos Mamoulis, Yizhou Sun, and XinHuang (Eds.). Springer International Publishing, Cham, 694โ710.
[57] Qing Liu, Yifan Zhu, Minjun Zhao, Xin Huang, Jianliang Xu, and Yunjun Gao.2020. VAC: Vertex-Centric Attributed Community Search. In 2020 IEEE 36thInternational Conference on Data Engineering (ICDE). IEEE, Dallas, Texas, USA,937โ948. https://doi.org/10.1109/ICDE48307.2020.00086
[58] Xueming Liu, Enrico Maiorino, Arda Halu, Kimberly Glass, Rashmi B. Prasad,and Joseph Loscalzo. 2020. Robustness and lethality in multilayer biologicalmolecular networks. Nature Communications 11 (2020), โ.
[59] Dongsheng Luo, Yuchen Bian, Yaowei Yan, Xiao Liu, Jun Huan, and Xiang Zhang.2020. Local community detection in multiple networks. In Proceedings of the 26thACM SIGKDD international conference on knowledge discovery & data mining.Association for Computing Machinery, New York, NY, USA, 266โ274.
[60] Elisa Omodei, Manlio De Domenico, and Alex Arenas. 2015. Characterizinginteractions in online social networks during exceptional events. Frontiers inPhysics 3 (2015), 59. https://doi.org/10.3389/fphy.2015.00059
[61] Kaiyan Peng, Zheng Lu, Vanessa Lin, Michael R. Lindstrom, Christian Parkinson,Chuntian Wang, Andrea L. Bertozzi, and Mason A. Porter. 2021. A MultilayerNetwork Model of the Coevolution of the Spread of a Disease and CompetingOpinions. arXiv:2107.01713 [cs.SI]
[62] Jonathan D. Power, Alexander L. Cohen, Steven M. Nelson, Gagan S. Wig,Kelly Anne Barnes, Jessica A. Church, Alecia C. Vogel, Timothy O. Laumann,Fran M. Miezin, Bradley L. Schlaggar, and Steven E. Petersen. 2011. Functionalnetwork organization of the human brain. Neuron 72, 4 (17 Nov 2011), 665โ678.https://doi.org/10.1016/j.neuron.2011.09.006 S0896-6273(11)00792-6[PII].
[63] N. Roberts and S. Everton. 2011. The Noordin Top Terrorist Network Data. (2011).http://www.thearda.com/Archive/Files/Descriptions/
[64] Jacqueline F. Saad, Kristi R. Griffiths, Michael R. Kohn, Simon Clarke, Leanne M.Williams, and Mayuresh S. Korgaonkar. 2017. Regional brain network or-ganization distinguishes the combined and inattentive subtypes of Attention
Deficit Hyperactivity Disorder. NeuroImage: Clinical 15 (2017), 383โ390. https://doi.org/10.1016/j.nicl.2017.05.016
[65] Pramod Shinde and Sarika Jalan. 2015. A multilayer protein-protein interac-tion network analysis of different life stages in Caenorhabditis elegans. EPL(Europhysics Letters) 112, 5 (dec 2015), 58001. https://doi.org/10.1209/0295-5075/112/58001
[66] Mauro Sozio and Aristides Gionis. 2010. The Community-Search Problem andHow to Plan a Successful Cocktail Party. In Proceedings of the 16th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining (Washington,DC, USA). Association for Computing Machinery, New York, NY, USA, 939โ948.
[67] Mauro Sozio and Aristides Gionis. 2010. The Community-Search Problem andHow to Plan a Successful Cocktail Party. In Proceedings of the 16th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining (Washington,DC, USA) (KDD โ10). Association for Computing Machinery, New York, NY, USA,939โ948. https://doi.org/10.1145/1835804.1835923
[68] Yizhou Sun and Jiawei Han. 2013. Mining Heterogeneous Information Networks:A Structural Analysis Approach. SIGKDD Explor. Newsl. 14, 2 (apr 2013), 20โ28.https://doi.org/10.1145/2481244.2481248
[69] Yunkai Sun, Lei Zhao, Zhihui Lan, Xi-Ze Jia, and Shao-Wei Xue. 2020. Dif-ferentiating Boys with ADHD from Those with Typical Development Basedon Whole-Brain Functional Connections Using a Machine Learning Approach.Neuropsychiatric disease and treatment 16 (10 Mar 2020), 691โ702. https://doi.org/10.2147/NDT.S239013 PMC7071874[pmcid].
[70] Andrea Tagarelli, Alessia Amelio, and Francesco Gullo. 2017. Ensemble-BasedCommunity Detection in Multilayer Networks. Data Min. Knowl. Discov. 31, 5(sep 2017), 1506โ1543. https://doi.org/10.1007/s10618-017-0528-8
[71] Johan Ugander, Lars Backstrom, Cameron Marlow, and Jon Kleinberg. 2012.Structural diversity in social contagion. Proceedings of the National Academyof Sciences 109, 16 (2012), 5962โ5966. https://doi.org/10.1073/pnas.1116502109arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.1116502109
[72] MK Vijaymeena and K Kavitha. 2016. A survey on similarity measures in textmining. Machine Learning and Applications: An International Journal 3, 2 (2016).
[73] Jia Wang and James Cheng. 2012. Truss Decomposition in Massive Networks.Proc. VLDB Endow. 5, 9 (2012), 812โ823. https://doi.org/10.14778/2311906.2311909
[74] Yue Wang, Xun Jian, Zhenhua Yang, and Jia Li. 2017. Query Optimal k-PlexBased Community in Graphs. (2017). https://doi.org/10.1007/s41019-017-0051-3
[75] Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of โsmall-worldโ networks. Nature 393, 6684 (01 Jun 1998), 440โ442. https://doi.org/10.1038/30918
[76] YubaoWu, Ruoming Jin, Jing Li, and Xiang Zhang. 2015. Robust Local CommunityDetection: On Free Rider Effect and Its Elimination. Proc. VLDB Endow. 8, 7 (feb2015), 798โ809. https://doi.org/10.14778/2752939.2752948
[77] Jaewon Yang and Jure Leskovec. 2012. Defining and Evaluating NetworkCommunities Based on Ground-Truth. In Proceedings of the ACM SIGKDDWorkshop on Mining Data Semantics (Beijing, China) (MDS โ12). Associationfor Computing Machinery, New York, NY, USA, Article 3, 8 pages. https://doi.org/10.1145/2350190.2350193
[78] Yixing Yang, Yixiang Fang, Xuemin Lin, Wenjie Zhang, and Yixiang Fang. 2020.Effective and Efficient Truss Computation over Large Heterogeneous InformationNetworks. In 2020 IEEE 36th International Conference on Data Engineering (ICDE).901โ912. https://doi.org/10.1109/ICDE48307.2020.00083
[79] Long Yuan, Lu Qin, Wenjie Zhang, Lijun Chang, and Jianye Yang. 2018. Index-Based Densest Clique Percolation Community Search in Networks. IEEE Trans-actions on Knowledge and Data Engineering 30, 5 (2018), 922โ935. https://doi.org/10.1109/TKDE.2017.2783933
[80] Xuemeng Zhai, Wanlei Zhou, Gaolei Fei, Weiyi Liu, Zhoujun Xu, Chengbo Jiao,Cai Lu, and Guangmin Hu. 2018. Null model and community structure in multi-plex networks. Scientific reports 8, 1 (2018), 1โ13.
[81] Zhiwei Zhang, Xin Huang, Jianliang Xu, Byron Choi, and Zechao Shang. 2019.Keyword-centric community search. In 2019 IEEE 35th International Conferenceon Data Engineering (ICDE). IEEE, 422โ433.
[82] Zibin Zheng, Fanghua Ye, Rong-Hua Li, Guohui Ling, and Tan Jin. 2017. Findingweighted k-truss communities in large networks. Information Sciences 417 (2017),344โ360. https://doi.org/10.1016/j.ins.2017.07.012
[83] Qijun Zhu, Haibo Hu, Cheng Xu, Jianliang Xu, and Wang-Chien Lee.2017. Geo-Social Group Queries with Minimum Acquaintance Constraint.arXiv:1406.7367 [cs.DB]
[84] Rong Zhu, Zhaonian Zou, and Jianzhong Li. 2018. Diversified Coherent CoreSearch on Multi-Layer Graphs. In 2018 IEEE 34th International Conference on DataEngineering (ICDE). IEEE, 701โ712. https://doi.org/10.1109/ICDE.2018.00069
[85] Yuanyuan Zhu, Qian Zhang, Lu Qin, Lijun Chang, and Jeffrey Xu Yu. 2018.Querying Cohesive Subgraphs by Keywords. In 2018 IEEE 34th InternationalConference on Data Engineering (ICDE). https://doi.org/10.1109/ICDE.2018.00141
14
A PROOFSA.1 Properties
Property 1 (Uniqueness).
Proof. Suppose ๐ฝ_๐and ๐ฝ
โฒ_๐
are two distinct (๐, _)-FirmTrusses of๐บ . By Definition 6, ๐ฝ_
๐is a maximal subgraph such that โ๐ โ E[๐ฝ_
๐]
there are โฅ _ layers where the supports of ๐ in each of theselayers is โฅ ๐โ2. Similarly, ๐ฝ
โฒ_๐
is a maximal subgraph with the sameproperty. Then the subgraph ๐ฝ_
๐โช๐ฝ โฒ_
๐trivially satisfies the FirmTruss
conditions, contradicting the maximality of ๐ฝ_๐and ๐ฝ
โฒ_๐. โก
Property 2 (Hierarchical Structure).
Proof. (a) The property follows from the definition of FirmTrussand this fact that in a subgraph, every edge schema ๐ โ E that hasat least ๐โ1 supports, also has at least ๐โ2 supports. (b) Similarly, ifedge schema ๐ in at least _+1 layers has no less than ๐โ2 supports,in at least _ layers it has at least ๐ โ 2 supports as well. โก
Property 3 (Minimum Degree).
Proof. Given an edge schema (๐ฃ,๐ข) โ E, where ๐ข, ๐ฃ โ ๐ , since(๐ฃ,๐ข) in at least _ layers has at least ๐ โ 2 supports, ๐ข and ๐ฃ has atleast ๐ โ 2 common neighbours in at least _ layers. Therefore, eachhas degree at least ๐ โ 1 in at least _ layers. โก
A.2 Theorems and Lemmas
Theorem 1 (Density Lower Bound).
Proof. First, we provide a lower bound for the mean of average
degree over all layers:โ|๐ฟ |
โ=1 |๐ธโ [๐ฝ _๐ ] ||๐ฟ | | ๐ฝ _
๐| . We can rewrite the numerator
as half of the sum of degrees of each node in each layer. In fact,
|๐ฟ |โ๏ธโ=1|๐ธโ [๐ฝ_๐ ] | =
12
ยฉยญยญยซ|๐ฟ |โ๏ธโ=1
โ๏ธ๐ขโ๐ฝ _
๐
deg๐ฝ _๐
โ(๐ข)
ยชยฎยฎยฌ . (1)
Since ๐ฝ_๐is a FirmTruss, each edge schema ๐ = (๐ฃ,๐ข) in at least _
layers has at least ๐ โ 2 supports. Accordingly, ๐ฃ and ๐ข each in atleast _ layers have at least ๐ โ 1 neighbours. Therefore,
|๐ฟ |โ๏ธโ=1
โ๏ธ๐ขโ๐ฝ _
๐
deg๐ฝ _๐
โ(๐ข) โฅ _(๐ โ 1) |๐ฝ_
๐|. (2)
Based on Equations (1) and (2), we can conclude that the meanof average degree over all layers is at least _ (๐โ1)
2 |๐ฟ | . Since it is theaverage over all layers, there exist a layer โ1 โ ๐ฟ such that:
|๐ธโ1 [๐ฝ_๐ ] |
|๐ฝ_๐|โฅ _(๐ โ 1)
2|๐ฟ | .
Now, ignoring this layer, and exploiting the definition of ๐ฝ_๐, each
edge schema ๐ = (๐ฃ,๐ข) in at least _ โ 1 layers has at least ๐ โ 2supports. Accordingly, ๐ฃ and ๐ข each in at least _ โ 1 layers haveat least ๐ โ 1 neighbours. Therefore, the mean of average degreeover remaining layers, ๐ฟ \ {โ1}, is at least (_โ1) (๐โ1)2 |๐ฟ | . Again, we
can conclude that there is a layer โ2 โ โ1 such that|๐ธโ2 [๐ฝ
_๐] |
| ๐ฝ _๐| โฅ
(_โ1) (๐โ1)2 |๐ฟ | . So there is a subset of ๐ฟ, ๏ฟฝฬ๏ฟฝ = {โ1, โ2}, with two layers
such that each layer has density at least (_โ1) (๐โ1)2 |๐ฟ | . By iteratingthis process, we can conclude that โ๏ฟฝฬ๏ฟฝ โ ๐ฟ, such that each layerโ โ ๏ฟฝฬ๏ฟฝ has density at least (_โ|๏ฟฝฬ๏ฟฝ |+1) (๐โ1)2 |๐ฟ | . Based on the definition ofthe density in ML graphs, we can conclude that:
๐๐ฝ (๐ฝ_๐ ) โฅ(_ โ |๏ฟฝฬ๏ฟฝ | + 1) (๐ โ 1)
2|๐ฟ | |๏ฟฝฬ๏ฟฝ |๐ฝ .
The right hand side of the above inequality is a function of |๏ฟฝฬ๏ฟฝ |,and since the above inequality is valid for each arbitrary integer1 โค |๏ฟฝฬ๏ฟฝ | โค _, ๐๐ฝ (๐ฝ_๐ ) is not less than the maximum of this functionexhibited by these values. โก
Theorem 2 (Diameter Upper Bound).
Proof. Let path
P : ๐ฃ1โ1 โ ๐ฃ2โ2 โ ยท ยท ยท โ ๐ฃ๐ผโ๐ผ,
be the diameter of๐บ [๐ฝ_๐], and for each edge schema ๐ โ E[๐ฝ_
๐], we
define ๐ฟ๐ = {โ โฒ1, . . . , โโฒ_โฒ} as the set of layers such that in each layer
โ โฒ๐, ๐โโฒ
๐has at least ๐ โ 2 supports. Note that _โฒ โฅ _. Let ๐ก โ N such
that ๐ก๐ก+1 |๐ฟ | > _ โฅ ๐กโ1
๐ก |๐ฟ |, then we show that there is a path
P โฒ : ๐ค1โโฒโฒ1โ ๐ค2
โโฒโฒ2โ ยท ยท ยท โ ๐ค๐ผโฒ
โโฒโฒ๐ผโฒ,
such that its path schema is the same as the path schema of P, andfor each 0 โค ๐ โค โ |P
โฒ |๐ก โ โ 1 all {๐ค
๐๐ก+1,๐ค๐๐ก+2, . . . ,๐ค๐๐ก+๐ก } are in thesame layer โโ๐ such that:
โโ๐ โ๐กโ1โ๐=1
๐ฟ(๐ฃ๐๐ก+๐ ,๐ฃ๐๐ก+๐+1) .
To show that, let๐ be the path schema of P, and ๐ข1, ๐ข2, . . . , ๐ข๐ก bethe ๐ก consecutive vertices in the path schema. For each edge schema๐๐ = (๐ข๐ , ๐ข๐+1), we know that there are at least _ layers โ โ ๐ฟ๐๐ inwhich ๐๐
โhas at least ๐ โ 2 support. Thus, since _ โฅ ๐กโ1
๐ก |๐ฟ |, basedon the pigeonhole principle there is a layer โโ such that all ๐๐ inโโ has at least ๐ โ 2 supports. Therefore, we have constructed pathP โฒ such that for each 0 โค ๐ โค โ |๐ |+1๐ก โ โ 1:
(1) vertices {๐ข๐๐ก+1, ๐ข๐๐ก+2, . . . , ๐ข๐๐ก+๐ก } of the path are in the samelayer โโ๐ ,
(2) we have an inter-layer edge in the path that ๐ข๐๐ก+๐กโโ๐โ ๐ข๐๐ก+๐ก
โโ๐+1
.
Due to the number of inter-layer edges, which is at most |๐ |๐ก ,the length of the P โฒ is at most (1 + 1/๐ก) |๐|.
Next we show that the length of ๐ is bounded by โ 2 | ๐ฝ_๐|โ2
๐โ.
Given the part of ๐, ๐๐ : ๐ข๐๐ก+1 โ ๐ข๐๐ก+2 โ ยท ยท ยท โ ๐ข๐๐ก+๐ก , that alledges are in layer โโ๐ , for any 0 โค ๐ โค โ |๐ |+1๐ก โ. We know that eachedge (๐ข๐๐ก+๐ , ๐ข๐๐ก+๐+1, โโ๐ ) has at least ๐ โ 2 supports. Thus, each pair(๐ข๐๐ก+๐ , ๐ข๐๐ก+๐+1) has at least ๐ โ 2 common neighbours in layer โโ๐ .We divide the set of common neighbours into two sets ๐ด๐
๐and ๐ต๐
๐:
each ๐ด๐๐holds those vertices not in ๐๐ that are adjacent to both
๐ข๐๐ก+๐ and ๐ข๐๐ก+๐+1, and to no other vertices in P โฒ๐ ; each ๐ต๐๐holds
those vertices that are adjacent to all three of ๐ข๐๐ก+๐โ1, ๐ข๐๐ก+๐ and15
๐ข๐๐ก+๐+1. Note that the ๐ต๐๐sets are disjoint, else we can find a path
with smallest length. Let
|๐โโ๐ | = |๐๐ | + 1 +|๐๐ |โ1โ๏ธ๐=0|๐ด๐
๐ | +|๐๐ |โ1โ๏ธ๐=1|๐ต๐๐ |.
For each ๐ , |๐ด๐๐| = ๐ โ 2 โ |๐ต๐
๐| and also |๐ต๐
๐| + |๐ต๐
๐+1 | โค ๐ โ 2. Thus:
|๐โโ๐ | โฅ 1 + |๐๐ | (๐ โ 1) โ โ|๐๐ |2โ (๐ โ 2)
โโ๏ธ๐
|๐โโ๐ | โฅ โ|๐| + 1
๐กโ + (๐ โ 1)
โ๏ธ๐
|๐๐ | โ (๐ โ 2)โ๏ธ๐
โ |๐๐ |2โ
โฅ โ |๐| + 1๐กโ + |๐| (๐ โ 1) โ โ |๐|
2โ (๐ โ 2)
Note that each node cannot be counted in two ๐โโ๐ , else we can finda path with smallest length. Therefore,
|๐ฝ_๐| โฅ
โ๏ธ๐
|๐โโ๐ | โฅ โ|๐| + 1
๐กโ + |๐| (๐ โ 1) โ โ |๐|
2โ (๐ โ 2)
โ |๐| โค โ2|๐ฝ_
๐| โ 2๐
โ
Overall, we have:
(1 + 1/๐ก) โ2|๐ฝ_
๐| โ 2๐
โ โฅ (1 + 1/๐ก) |๐| โฅ |P โฒ | โฅ |P|.
The last (right) inequality comes from the fact that the start andend of paths P and P โฒ are the same. Accordingly, if the length ofP โฒ is less than P, then P โฒ is the shortest path between the startand end nodes, which contradicts the assumption that P is thediameter. โก
Theorem 3 (Edge Connectivity).
Proof. Let ๐ถ be a minimal set of intra-layer edges whose re-moval will divide the ๐บ [๐ฝ_
๐] into separate components, and ๐ =
(๐ข, ๐ฃ, โ) โ ๐ถ . When all of the edges in ๐ถ are removed, the vertices๐ฃ and ๐ข will be in separate components, else the removal of ๐ wasunnecessary, violating the minimality of ๐ถ . Since (๐ฃ,๐ข) โ E(๐ฝ_
๐)
there are at least _ layers ๏ฟฝฬ๏ฟฝ = {โ1, . . . , โ_} in which ๐ฃ and ๐ข arejoined by at least ๐ โ 2 independent paths of length 2. Therefore, ineach layer โ๐ โ ๏ฟฝฬ๏ฟฝ, at least ๐ โ 1 edges must be removed to place ๐ฃand ๐ข in separate components. Therefore, |๐ถ | โฅ _(๐ โ 1). โก
Theorem 4 (๐-FTCS Hardness).
Proof. We reduce the well-known NP-hard problem of Maxi-mum Clique (decision version) to ๐-FTCS problem. Given a single-layer graph๐บ = (๐ , ๐ธ) and integer๐ , theMaximumClique Decisionproblem is to check whether๐บ contains a clique of size ๐ . From this,construct an instance ๐บ โฒ = (๐ โฒ, ๐ธ โฒ, ๐ฟ), where ๐ โฒ = ๐ , |๐ฟ | is an arbi-trary number โฅ _, ๐ธ โฒ
โ= ๐ธ for each โ โ ๏ฟฝฬ๏ฟฝ = {โ1, . . . , โ_} and ๐ธ โฒ
โ= โ
for โ โ ๐ฟ/๏ฟฝฬ๏ฟฝ, parameters ๐ and ๐ = 1, and the empty set of querynodes ๐ = โ . We show that the instance of the Maximum Clique(decision version) problem is a YES-instance iff the correspondinginstance of ๐-FTCS problem is a YES-instance.(โ) : A set of vertices ๐ป โ ๐ that in any layer โ โ ๏ฟฝฬ๏ฟฝ is
a clique with at least ๐ vertices is a connected (๐, _)-FirmTruss
with diameter 1 since each two node have at least ๐ โ 2 commonneighbours in _ layers.(โ) : Given a solution ๐ป for ๐-FTCS problem, so each edge
schema in at least _ layers has at least ๐ โ 2 supports, but since forโ โ ๐ฟ/๏ฟฝฬ๏ฟฝ the set of edges in layer โ is empty, all edge schema eachin layers โ โ ๏ฟฝฬ๏ฟฝ has supports of at least ๐ โ 2. Given a layer โ โ ๏ฟฝฬ๏ฟฝ,the distance of two arbitrary nodes ๐ฃ โ ๐ข โ ๐ป , in ๐บ โฒ
โ[๐ป ] is 1 since
if their shortest path contains an inter-layer edge, then its length isat least 2. Therefore, ๐บ โฒ
โ[๐ป ] contains a clique of size at least ๐ . โก
Theorem 5 (FTCS Non-Approximability).
Proof. We prove it by contradiction. Assume that there ex-ists a (2 โ ๐)-approximation algorithm for the FTCS problem inpolynomial-time complexity, no matter how small the ๐ โ (0, 1)is. Therefore, there is an algorithm A that provides a solution ๐ป
with an approximation factor (2โ๐) of the optimal solution๐ปโ. Letthe set of query nodes empty ๐ = โ , ๐บ = (๐ , ๐ธ) be an instance ofMaximum Clique Decision problem, and construct๐บ โฒ = (๐ โฒ, ๐ธ โฒ, ๐ฟ),where ๐ โฒ = ๐ , |๐ฟ | = _, and for each โ โ ๐ฟ : ๐ธ โฒ
โ= ๐ธ.
(โ) : Clearly, if A outputs a solution ๐บ โฒ[๐ป ] that is (๐, _)-FirmTruss with diameter 1, then ๐บโ [๐ป ] is a clique of size at least ๐for each โ โ ๐ฟ.(โ) : Suppose ๐๐๐๐(๐บ โฒ[๐ป ]) โฅ 2, so:
2๐๐๐๐(๐บ โฒ[๐ปโ]) > (2 โ ๐)๐๐๐๐(๐บ โฒ[๐ปโ]) โฅ ๐๐๐๐(๐บ โฒ[๐ป ]) โฅ 2.
Since diameter is an integer, we deduce that ๐๐๐๐(๐บ โฒ[๐ปโ]) โฅ 2.In this case, ๐บ โฒ
โ[๐ปโ] cannot possibly contain a clique of size ๐ , for
if it did, that clique would be the optimal solution to the FTCSproblem with diameter of 1. Therefore, since ๐บ โฒ
โ[๐ปโ] = ๐บ [๐ป ], we
can distinguish between the YES and NO instances of the MaximumCliqueDecision problem in polynomial-time, which is contradiction.
โก
Theorem 6 (FTCS Free-Rider Effect).
Proof. Let๐บ [๐ป ] be the maximal subgraph among all optimal so-lutions to the FTCS problem, and๐บ [๐ปโ] be any query-independentoptimal solution.We prove this theorem by contradiction. If๐บ [๐ปโ]\๐บ [๐ป ] โ โ , ๐บ [๐ป โช ๐ปโ] is a connected (๐, _)-FirmTruss containing ๐ , and๐๐๐๐(๐บ [๐ป โช ๐ปโ]) โค ๐๐๐๐(๐บ [๐ป ]), then ๐บ [๐ป โช ๐ปโ] is an optimalsolution to the FTCS problem while ๐บ [๐ป ] โ ๐บ [๐ป โช ๐ปโ], violatingthe maximality of ๐บ [๐ป ]. โก
Theorem 7 (FTCS-Global Quality Approximation).
Proof. Motivate by [43], we first prove:
๐๐๐ ๐ก๐บ [๐ป ] (๐ป,๐) โค ๐๐๐ ๐ก๐บ [๐ป โ ] (๐ปโ, ๐) .
Algorithm 1 outputs a sequence of intermediate graphs (in line12) ๐บ = {๐บ0,๐บ1, . . . ,๐บ๐โ1}, which are (๐, _)-FirmTruss containingquery vertices ๐ . Therefore, we have ๐บ๐โ1 โ ยท ยท ยท โ ๐บ1 โ ๐บ0 โ๐บ and also since the optimal solution is also a (๐, _)-FirmTruss,๐บ [๐ปโ] โ ๐บ0. We consider two cases:
(I) ๐บ [๐ปโ] โ ๐บ๐โ1. Suppose the first deleted vertex ๐ขโ โ ๐ปโ
happens in graph ๐บ๐ , this vertex must be deleted because of the16
distance constraint but not the FirmTruss maintenance. Accord-ingly, ๐๐๐ ๐ก๐บ๐
(๐บ๐ , ๐) = ๐๐๐ ๐ก๐บ๐ (๐ขโ, ๐) = ๐๐๐ ๐ก๐บ๐
(๐ปโ, ๐). Since ๐ป hasthe smallest query distance in ๐บ and ๐บ [๐ปโ] โ ๐บ๐ , we have:
๐๐๐ ๐ก๐บ [๐ป ] (๐ป,๐) โค ๐๐๐ ๐ก๐บ๐ (๐บ๐ , ๐) = ๐๐๐ ๐ก๐บ๐
(๐ปโ, ๐) โค ๐๐๐ ๐ก๐บ [๐ป โ ] (๐ปโ, ๐)(II)๐บ [๐ปโ] โ ๐บ๐โ1. We prove this case by contradiction. Assume
that ๐๐๐ ๐ก๐บ๐โ1 (๐บ๐โ1, ๐) > ๐๐๐ ๐ก๐บ๐โ1 (๐ปโ, ๐), then there exist a vertex๐ขโ โ ๐บ๐โ1\๐ปโ with the largest query distance ๐๐๐ ๐ก๐บ๐โ1 (๐ขโ, ๐) =๐๐๐ ๐ก๐บ๐โ1 (๐บ๐โ1, ๐) > ๐๐๐ ๐ก๐บ๐โ1 (๐ปโ, ๐). In the next iteration, Algo-rithm 1 will delete ๐ขโ from๐บ๐โ1, and maintain the FirmTruss struc-ture of๐บ๐โ1. Since ๐บ [๐ปโ] โ ๐บ๐โ1, the output of this iteration is afeasible FirmTruss. This contradicts that ๐บ๐โ1 is the last feasibleFirmTruss. Therefore, ๐๐๐ ๐ก๐บ๐โ1 (๐บ๐โ1, ๐) โค ๐๐๐ ๐ก๐บ๐โ1 (๐ปโ, ๐), so wehave:
๐๐๐ ๐ก๐บ [๐ป ] (๐ป,๐) โค ๐๐๐ ๐ก๐บ๐โ1 (๐บ๐โ1, ๐) โค ๐๐๐ ๐ก๐บ๐โ1 (๐ปโ, ๐)
โค ๐๐๐ ๐ก๐บ [๐ป โ ] (๐ปโ, ๐).
From abovewe have proved that๐๐๐ ๐ก๐บ [๐ป ] (๐ป,๐) โค ๐๐๐ ๐ก๐บ [๐ป โ ] (๐ปโ, ๐).Not that based on the triangle inequality
๐๐๐๐(๐บ [๐ป ]) โค 2๐๐๐ ๐ก๐บ [๐ป ] (๐ป,๐) .Thus, we can conclude that:
๐๐๐๐(๐บ [๐ป ]) โค 2๐๐๐ ๐ก๐บ [๐ป ] (๐ป,๐) โค 2๐๐๐ ๐ก๐บ [๐ป โ ] (๐ปโ, ๐)โค 2๐๐๐๐(๐บ [๐ปโ]).
โก
Lemma 1 (Time Complexity of Maximal (๐, _)-FirmTruss).
Proof. Using a heap, we can find the _-th largest element ofvector S in O(|๐ฟ | + _ log |๐ฟ |) time so lines 4-7 takeO(|๐ธ | ( |๐ฟ | + _ log |๐ฟ |)) time. We remove each edge schema exactlyone time from the buckets, taking time O(|๐ธ |). For โ โ ๐ฟ, let๐๐โโฅ (๐ข) = {๐ฃ | (๐ฃ,๐ข, โ) โ ๐ธ,๐๐๐โ (๐ฃ) โฅ ๐๐๐โ (๐ข)}, lines 10-16 takeO(โ(๐ฃ,๐ข) โE โ
โโ๐ฟ ๐๐๐โ (๐ข) |๐๐โโฅ (๐ข) |). For each layer โ , we know that|๐๐โโฅ (๐ข) | โค 2
โ๏ธ|๐ธโ | [73], so we can conclude that lines 10-16 take
O(โโโ๐ฟ |๐ธโ |1.5). Finally, to update the Top-_ support for a givenedge schema we need O(|๐ฟ |), so lines 17-20 takes O(|๐ธ | |๐ฟ |) time. Inthis algorithm, we only need to store and keep the support vectorof each edge schema so it takes O(|๐ธ | |๐ฟ |) space. โก
Theorem 8 (FTCS-Global Complexity).
Proof. First, Algorithm 1 finds the maximal (๐, _)-FirmTrussof ๐บ , which takes O(โโโ๐ฟ |๐ธโ |1.5 + |๐ธ | |๐ฟ | + |๐ธ |_ log |๐ฟ |) time asanalyzed in Lemma 1. Then, it iteratively deletes the vertex withthe largest query distance starting from ๐บ0 with |๐ธ0 | edges. Ineach iteration, the algorithm needs to compute the query distanceusing O(|๐ | |๐ธ0 |) time and to perform FirmTruss maintenance us-ing O(โโโ๐ฟ |๐ธโ |1.5). Therefore, for ๐ก iterations the time cost isO(๐ก ( |๐ | |๐ธ0 | +
โโโ๐ฟ |๐ธโ |1.5)). In total, FTCS-Global algorithm takes
O(๐ก ( |๐ | |๐ธ0 | +โโโ๐ฟ |๐ธโ |1.5) + |๐ธ | |๐ฟ | + |๐ธ |_ log |๐ฟ |) time. The space
complexity of FTCS-Global algorithm is dominated by finding ๐บ0,which takes O(|๐ธ | |๐ฟ |) space (Lemma 1). โก
Theorem 9 (FTCS-Local Quality Approximation).
Proof. By contradiction, we first show that ๐๐๐ ๐ก๐บ [๐ป ] (๐ป,๐) โค๐๐๐ ๐ก๐บ [๐ป โ ] (๐ปโ, ๐). Let๐๐๐ ๐ก๐บ [๐ป ] (๐ป,๐) > ๐๐๐ ๐ก๐บ [๐ป โ ] (๐ปโ, ๐), so thereexists another FirmTruss with smallest diameter, so when ๐ =
๐๐๐ ๐ก๐บ [๐ป โ ] (๐ปโ, ๐), the algorithm must find a FirmTruss with querydistance of ๐ , which is contradiction as ๐ = ๐๐๐ ๐ก๐บ [๐ป ] (๐ป,๐) is thesmallest value of ๐ that the algorithm finds a non-empty FirmTruss.Thus, ๐๐๐ ๐ก๐บ [๐ป ] (๐ป,๐) โค ๐๐๐ ๐ก๐บ [๐ป โ ] (๐ปโ, ๐). Based on triangle in-equality, ๐๐๐๐(๐บ [๐ป ]) โค 2๐๐๐ ๐ก๐บ [๐ป ] (๐ป,๐). Overall:
๐๐๐๐(๐บ [๐ป ]) โค 2๐๐๐ ๐ก๐บ [๐ป ] (๐ป,๐) โค 2๐๐๐ ๐ก๐บ [๐ป โ ] (๐ปโ, ๐)โค 2๐๐๐๐(๐บ [๐ปโ]).
โกTheorem 10 (FTCS-Local Complexity).
Proof. The proof is similar to analysis of time complexity ofFTCS-Global algorithm. โก
Theorem 11 (FirmTruss Decomposition Complexity).
Proof. By efficient computing of Topโ_ degree for all valuesof _, lines 4-6 take O(|๐ธ | |๐ฟ | log |๐ฟ |)) time. Given _, we removeeach edge schema exactly one time from the buckets, so lines 9-10take O(|๐ธ | |๐ฟ |). For โ โ ๐ฟ, let ๐๐โโฅ (๐ข) = {๐ฃ | (๐ฃ,๐ข, โ) โ ๐ธ,๐๐๐โ (๐ฃ) โฅ๐๐๐โ (๐ข)}. So for a given _, lines 11-17 takeO(โ(๐ฃ,๐ข) โE โ
โโ๐ฟ ๐๐๐โ (๐ข) |๐๐โโฅ (๐ข) |). For each layer โ , we know that|๐๐โโฅ (๐ข) | โค 2
โ๏ธ|๐ธโ | [73], so we can conclude that lines 11-17 take
O(โโโ๐ฟ |๐ธโ |1.5). Finally, to update the buckets for a given edgeschema we need O(|๐ฟ |), so line 20 takes O(|๐ธ | |๐ฟ |2) time. โก
Theorem 13 (AFTCS Hardness).
Proof. We reduce the problem of finding the densest subgraphwith at least ๐ vertices in single-layer graphs. Here, for simplicity,we assume that ๐ = 1, we can easily extend this proof to any finite๐ . Given a single-layer graph ๐บ = (๐ , ๐ธ), construct an instance๐บ โฒ = (๐ โฒ, ๐ธ โฒ, ๐ฟ,ฮจ), where ๐ โฒ = ๐ , and for each โ โ ๐ฟ, ๐บ โฒ
โis a
complete graph over ๐ โฒ. Assume that ๐ โฒ = {๐ข1, ๐ข2, . . . , ๐ข๐}, where๐ = |๐ โฒ |. For each vertex ๐ข๐ โ ๐ โฒ, we construct a vector of size๐2 + ๐ as their attribute vector as follows:
ฮจ(๐ข๐ ) ๐ =
1 if 1 โค ๐ โค ๐2 and (๐ข๐ , ๐ข ๐โ(๐โ1)ร๐) โ ๐ธor (๐ข ๐ , ๐ข๐โ( ๐โ1)ร๐) โ ๐ธโ๏ธ
2๐ โ 2Nb(๐ข๐ ) if ๐ = ๐2 + ๐
0 otherwise
,
where Nb(๐ข) is the number of ๐ขโs neighbours in๐บ . Now for eachtwo vertices ๐ข and ๐ฃ that (๐ข, ๐ฃ) โ ๐ธ : โ(๐ข, ๐ฃ) = 1
2๐ and if (๐ข, ๐ฃ) โ ๐ธ :โ(๐ข, ๐ฃ) = 0. Now, a solution to AFTCS on๐บ โฒ is the densest subgraphwith at least ๐ vertices in ๐บ . Also, each densest subgraph with atleast ๐ vertices is a (๐, _)-FirmTruss with the maximum homophilyscore. โก
Theorem 14 (AFTCS Free-Rider Effect).
Proof. Let๐บ [๐ป ] be the maximal subgraph among all optimal so-lutions to theAFTCS problem, and๐บ [๐ปโ] be any query-independentoptimal solution.We prove this theorem by contradiction. If๐บ [๐ปโ]\๐บ [๐ป ] โ โ , ๐บ [๐ป โช ๐ปโ] is a connected (๐, _)-FirmTruss containing ๐ , and
17
ฮ๐ (๐บ [๐ปโช๐ปโ]) โฅ ฮ๐ (๐บ [๐ป ]), then๐บ [๐ปโช๐ปโ] is an optimal solutionto the AFTCS problem while ๐บ [๐ป ] โ ๐บ [๐ป โช ๐ปโ], violating themaximality of ๐บ [๐ป ]. โก
Theorem 15 (AFTCS-Approx Complexity).
Proof. The time complexity of finding ๐บ0 is O(โโโ๐ฟ |๐ธโ |1.5 +
|๐ธ | |๐ฟ | + |๐ธ |_ log |๐ฟ |) (Lemma 1). Let ๐ก be the number of iterations,and๐0 and ๐ธ0 be the vertex set and edge set of๐บ0. The computationof homophily score for all vertices takes O(๐ |๐0 |2) time, where ๐is the dimension of the attribute vector. The time complexity ofmaintenance algorithm is O(๐ก |๐0 | + |๐ธ0 |) in ๐ก iterations, since eachedge in ๐บ0 is removed at most one time in all iteration and in eachiteration we need to check the Top-_ support of each edge schema.Since in each iteration we delete at least one node and its incidentedges in all layers, which is at least (๐ โ 1) ร _, the total number ofiterations ๐ก is O(min{|๐0 | โ ๐, |๐ธ0 |
_.(๐โ1) }). โก
Theorem16 (AFTCS-ApproxQualityApproximation).Wefirstmention two useful lemmas:
Lemma 2. Given ๐, ๐ โ R+ and ๐ โฅ ๐, and a real value ๐ โฅ 1:
๐๐ โ (๐ โ ๐)๐ โค ๐๐๐โ1๐.
Proof. Given ๐ โฅ 1 and ๐, ๐ โ R+, let ๐ (๐ฅ) = ๐ฅ๐ , โ๐ฅ โ R. Basedon the mean value theorem, there is a ๐ โ [๐ โ ๐, ๐] such that:
๐ โฒ(๐) = ๐ (๐) โ ๐ (๐ โ ๐)๐
โ ๐๐๐โ1 โฅ ๐๐๐โ1 = ๐ โฒ(๐) = ๐๐ โ (๐ โ ๐)๐๐
โก
Lemma 3. Given ๐ โฅ 1, function ฮ(๐) = |๐ |ฮ๐๐ (๐) =โ
๐ฃโ๐ โ๐ (๐ฃ)๐ issupermodular.
Proof. We know that function ๐ (๐ฅ) = ๐ฅ๐ for ๐ โฅ 1 and ๐ฅ โฅ 0is increasing convex function. Since function โ๐ (๐ฃ) is a modularfunction, its combination with ๐ is supermodular. Accordingly, ๐-thpower of โ๐ (๐ฃ) is a supermodular function. Finally, we know thatthe summation of supermodular functions is supermodular. โก
Based on lemmas 2 and 3, we can prove the theorem:
Proof. Let ๐ปโ be the optimal solution and ๐ป be the output ofAlgorithm 6. Since ๐ปโ is the optimal solution, removing a node๐ขโ will produce a subgraph with homophily score at most ฮ๐๐ (๐ปโ),therefore, we have:
1|๐ปโ | โ 1
(|๐ปโ |ฮ๐๐ (๐ปโ) โ ฮ๐ขโ (๐ปโ)
)โฅ ฮ
๐๐ (๐ปโ)
โ ฮ๐๐ (๐ปโ) โค ฮ๐ขโ (๐ปโ) .
Note that ๐ขโ is an arbitrary vertex in ๐ปโ and ๐ปโ\{๐ขโ} is not neces-sarily a (๐, _)-FirmTruss. Since |๐ |ฮ๐๐ (๐) is a supermodular functionfor ๐ โฅ 1, when๐ปโ โ ๐ ,ฮ๐ขโ (๐ปโ) โค ฮ๐ขโ (๐) (increasing differences).Let ๐ denote the subgraph maintained by the algorithm right beforethe first node ๐ขโ โ ๐ปโ is removed. This node cannot be removedby maintaining the subgraph (line 10) as ๐ขโ โ ๐ปโ and ๐ปโ โ ๐ is a(๐, _)-FirmTruss. Since ๐ขโ is the first node to be removed from ๐ปโ,๐ปโ โ ๐ and so ฮ๐ขโ (๐ปโ) โค ฮ๐ขโ (๐). Since ๐ โฅ 1, in each iteration weremove a node with minimum value of ฮ, ๐ขโ = argmin๐ขโ๐ ฮ๐ข (๐),
so ฮ๐ขโ (๐) is smaller than the average value of ฮ๐ข (๐) across ๐ข โ ๐ .Therefore, we have:
ฮ๐๐ (๐ปโ) โค ฮ๐ขโ (๐ปโ) โค ฮ๐ขโ (๐) โค
1|๐ |
โ๏ธ๐ขโ๐
ฮ๐ข (๐)
=1|๐ |
ยฉยญยซโ๏ธ๐ขโ๐
โ๐ (๐ข)๐ +โ๏ธ๐ขโ๐
โ๏ธ๐ฃโ๐/{๐ข }
โ๐ (๐ฃ)๐ โ [โ๐ (๐ฃ) โ โ(๐ฃ,๐ข)]๐ยชยฎยฌ
โค 1|๐ |
ยฉยญยซโ๏ธ๐ขโ๐
โ๐ (๐ข)๐ +โ๏ธ๐ขโ๐
โ๏ธ๐ฃโ๐/{๐ข }
๐โ๐ (๐ฃ)๐โ1โ(๐ฃ,๐ข)ยชยฎยฌ
=1|๐ |
(โ๏ธ๐ขโ๐
โ๐ (๐ข)๐ + ๐โ๏ธ๐ขโ๐
โ๐ (๐ฃ)๐)= (๐ + 1)ฮ๐๐ (๐)
The one before last step follows from the inequality in Lemma 2. Ac-cordingly, ฮ๐ (๐ปโ) โค (๐ + 1)1/๐ฮ๐ (๐). Since ๐ โ {๐บ0,๐บ1, . . . ,๐บ๐โ1},and ๐ป is the output of the algorithm, we have:
ฮ๐ (๐ปโ) โค (๐ + 1)1/๐ฮ๐ (๐) โค (๐ + 1)1/๐ฮ๐ (๐ป ) .
โก
Theorem 17 (Correctness of Exact-MaxMin AFTCS).
Proof. Let ๐ป be the found subgraph by Exact-MaxMin algo-rithm and ๐ปโ be the optimal solution, we prove this theorem bycontradiction. If๐ป โ ๐ปโ, let ๐ be the first time that a vertex๐ขโ โ ๐ปโis removed. Accordingly, ๐ปโ โ ๐บ๐ and since ๐ขโ is removed, it mustbe removed when we remove a vertex with a minimum value ofโ๐ but ๐ปโ is a subgraph with the maximum minimum homophilyscore. Thus, ฮโโ (๐ปโ) โค โ๐ป โ (๐ขโ) โค โ๐บ๐
(๐ฃ) for any ๐ฃ โ ๐ [๐บ๐ ].Therefore, ฮโโ (๐ปโ) โค ฮโโ (๐บ๐ ), which is contradiction. โก
B TIGHTNESS EXAMPLES
Density Lower Bound. For simplicity assume that ๐ฝ = 0 (for othervalues of ๐ฝ the example is similar), for a given ๐ and _, we constructa multilayer graph ๐บ = (๐ , ๐ธ, ๐ฟ), such that |๐ฟ | = _ and for eachโ โ ๐ฟ, ๐บโ is a ๐-clique. In this case, ๐๐ฝ (๐บ) = ๐โ1
2 and lower boundfor the density of ๐บ is ๐โ1
2_ ร _ = ๐โ12 . Which shows that the lower
bound is tight.
Diameter Upper Bound. First, the diameter of a connected ๐-truss with ๐ vertices in a single layer graph is no more than โ 2๐โ2
๐โ,
and this is a tight upper bound. Therefore, for a given ๐ , thereexists a single layer ๐-truss, ๐ป๐ , with diameter โ 2๐โ2
๐โ. Now, for
any arbitrary ๐ and _, we construct a multilayer graph๐บ = (๐ , ๐ธ, ๐ฟ),such that |๐ฟ | = _ and for each โ โ ๐ฟ, ๐บโ = ๐ป๐ . In this case ๐ = 1and diameter of G is โ 2 |๐บ |โ2
๐โ.
Edge Connectivity. For a given ๐, _, we construct a multilayergraph ๐บ = (๐ , ๐ธ, ๐ฟ), such that |๐ฟ | = _ and for each โ โ ๐ฟ, ๐บโ is a๐-clique. To make๐บ disconnected, we have to make๐บโ for all โ โ ๐ฟdisconnected. Since the edge connectivity of a ๐-clique is ๐ โ 1, wehave to remove at least _ ร (๐ โ 1) intra-layer edges to make ๐บdisconnected.
Approximation Factor of AFTCS-Approx. In the proof of The-orem 13, we see the following corollary:
18
Corollary 1. Given a set of tuples E โ ๐ ร ๐ , we can constructattribute vectors for each ๐ข โ ๐ such that โ(๐ข, ๐ฃ) โ E : โ(๐ข, ๐ฃ) = ๐ ,and โ(๐ข, ๐ฃ) โ E : โ(๐ข, ๐ฃ) = 0, where ๐ is a constant number.
Given a complete multilayer graph ๐บ = (๐ , ๐ธ, ๐ฟ), we want toconstruct the set of tuples E โ ๐ ร ๐ such that AFTCS-Approxreturns an approximation solution that asymptotically approaches(๐ + 1)1/๐ of the exact solution of AFTCS problem.
For a subgraph ๐ , we let ๐ฟ (๐) = min๐ขโ๐ ฮ๐ข (๐). In order to con-struct the tightness example, we assume that ๐ = ๐1 โช๐2, and soE = E1 โช E2 such that E1 โ ๐1 ร ๐1 and E2 โ ๐2 ร ๐2. Now weneed to construct each ๐1 and ๐2, and so E1 and E2 accordingly.Given an arbitrary number ๐ โ N, we let ๐1 be a set of nodes withsize ๐ + 1 and also E1 = ๐1 ร๐1. Accordingly, based on Corollary 1,we have:
๐ฟ (๐1) = min๐ขโ๐1
ฮ๐ข (๐1) = (๐๐)๐ + ๐๐๐ (๐๐ โ (๐ โ 1)๐ ), (3)
ฮ๐ (๐1) = ๐๐ (4)
Now let ๐ and ๐ก be two arbitrary integers such that ๐ก < ๐2 , we
consider a set of nodes ๐2 = {๐ข1, ๐ข2, . . . , ๐ข๐ } such that ๐1 โฉ๐2 = โ ,and |๐2 | = ๐ . In order to construct E2, for each node ๐ข๐ โ ๐2, weadd (๐ข๐ , ๐ข ๐ ) to E2 for each ๐ + 1 โค ๐ โค max{๐ + ๐ก, ๐}. Accordingly,we can see:
๐ฟ (๐2) = ฮ๐ข1 (๐2) = (๐๐ก)๐ + ๐๐(๐ก+1โ๏ธ๐=2(๐ก + ๐ โ 1)๐ โ (๐ก + ๐ โ 2)๐
)= (๐๐ก)๐ + ๐๐
((2๐ก)๐ โ ๐ก๐
)= ๐๐ (2๐ก)๐ , (5)
and
ฮ๐ (๐2) =๐๐โ๐
(๐กโ๏ธ๐=1(๐ก + ๐ โ 1)๐ +
๐โ๐กโ๏ธ๐=๐ก+1(2๐ก)๐ +
๐โ๏ธ๐=๐โ๐ก+1
(๐ก + ๐ โ 1)๐)1/๐(6)
Note that ๐ is an arbitrary number. So, we let ๐ = โ 2๐ก(๐+1)1/๐ โ + 1,
and accordingly, we have:
๐ฟ (๐1) = (๐๐)๐ + ๐๐๐ (๐๐ โ (๐ โ 1)๐ ) โฅ ๐๐ (2๐ก)๐ = ๐ฟ (๐2). (7)
Now consider a complete attributed multilayer graph ๐บ = (๐1 โช๐2, ๐ธ, ๐ฟ,ฮจ). Let E = E1โชE2, based on Corollary 1, we can constructฮจ such that โ(๐ข, ๐ฃ) โ E : โ(๐ข, ๐ฃ) = ๐ , and โ(๐ข, ๐ฃ) โ E : โ(๐ข, ๐ฃ) = 0,where ๐ is a constant number. Since ๐ฟ (๐1) โฅ ๐ฟ (๐2), AFTCS-Approxalgorithm starts from removing all nodes from ๐2, while based onthe value of ฮ๐ (๐1) and ฮ๐ (๐2), we know that the exact solutionis the induced subgraph by ๐2. Therefore, the found solution byAFTCS-Approx algorithm is the induced subgraph by๐1. In order tosee the approximation ratio for this example, we need to calculateฮ๐ (๐2)ฮ๐ (๐1) . To this end, we have:
ฮ๐ (๐2)ฮ๐ (๐1)
=
๐
โ๏ธ(โ๐ก๐=1 (๐ก + ๐ โ 1)๐ +
โ๐โ๐ก๐=๐ก+1 (2๐ก)๐ +
โ๐๐=๐โ๐ก+1 (๐ก + ๐ โ 1)
๐)
๐ ร ๐โ๐
โฅ๐
โ๏ธ(1 โ 2๐ก
๐) (2๐ก)
๐โฅ
(๐
โ๏ธ1 โ 2๐ก
๐
) ยฉยญยซ๐โ๐ + 1
1 +๐โ๐+1๐ก
ยชยฎยฌ = ๐๐ (๐, ๐ก)
Let ๐ก โ O(๐) and fixed ๐ โฅ 1, we have lim๐โโ๐๐ (๐, ๐ก) =๐โ๐ + 1, whichmeans the approximation factor converges to ๐
โ๐ + 1.
C PROPERTIES OF HOMOPHILY SCOREFUNCTION
Positive Influence of Similar Vertices. The first property of ho-mophily score is positive influence of similar nodes. Aswe discussed,ฮ๐ข (๐) is the value that droping ๐ข โ ๐ changes the numerator of thehomophily score:
ฮ๐๐ (๐/{๐ข}) =
1|๐ | โ 1
(|๐ |ฮ๐๐ (๐) โ ฮ๐ข (๐)
),
where
ฮ๐ข (๐) = โ๐ (๐ข)๐ +ยฉยญยซ
โ๏ธ๐ฃโ๐/{๐ข }
โ๐ (๐ฃ)๐ โ [โ๐ (๐ฃ) โ โ(๐ฃ,๐ข)]๐ยชยฎยฌ . (8)
Similarly, ฮ๐ข (๐ โช {๐ข}) is the value that adding ๐ข to ๐ channgesthe numerator of the homophily score. Next fact, illustrates whenadding a vertex can increase the homophily score.
Fact 2. Adding a vertex ๐ข to a subgraph ๐ increases the homophilyscore of subgraph ๐ iff ฮ๐ข (๐ โช {๐ข})1/๐ > ฮ๐ (๐).
Example 8. Let ๐ = 1, Fact 2 states that adding a node ๐ข with node-homophily score โ๐ (๐ข) > 1
2 |๐ |โ
๐ฃโ๐ โ๐ (๐ฃ) to subgraph ๐ , increasesthe homophily score of the subgraph.
Negative Influence of Dissimilar Vertices. Adding dissimilarvertices to the community decreases the homophily score.
Fact 3. Adding a vertex ๐ข to a subgraph ๐ decreases the homophilyscore of subgraph ๐ iff ฮ๐ข (๐ โช {๐ข})1/๐ < ฮ๐ (๐).
Example 9. Let ๐ = 1, Fact 3 states that adding a node ๐ข with node-homophily score โ๐ (๐ข) < 1
2 |๐ |โ
๐ฃโ๐ โ๐ (๐ฃ) to subgraph ๐ , decreasesthe homophily score of the subgraph.
Non-monotone Property. Non-monotonicity is a desirable prop-erty for the community model. Assume that if the attribute correla-tion measure for the community model is monotone and increasing,then always the maximal FirmTruss is the optimal solution. At thesame time, we know that maximal FirmTruss may include some freeriders. The introduced homophily score function is non-monotone:as we discussed in Facts 2 and 3, adding a vertex might increase ordecrease the homophily score.
Non-submodularity and Non-supermodularity. Optimizationproblems over submodular or supermodular functions lend them-selves to efficient approximation. We thus study whether our ho-mophily score function is submodular or supermodular with respectto the set of vertices. By a contradictory example, we show that ourhomophily score function is neither submodular nor supermodular.
Example 10. Consider a graph๐บ with๐ = {๐ข1, ๐ข2, ๐ฃ1, ๐ฃ2}. Let ๐ = 1.Assume that โ(๐ฃ1, ๐ข2) = โ(๐ฃ2, ๐ข1) = 0.1, โ(๐ฃ2, ๐ข2) = 0.5, โ(๐ฃ1, ๐ข1) =0.2, โ(๐ข1, ๐ข2) = 0.3 and โ(๐ฃ1, ๐ฃ2) = 0. Now consider the sets ๐ = {๐ข1}and ๐ = {๐ข1, ๐ข2}. Let us compare the marginal gains ฮ๐ (๐ โช {๐ฃโ}) โฮ๐ (๐) and ฮ๐ (๐ โช {๐ฃโ}) โ ฮ๐ (๐ ) , from adding the new vertex ๐ฃโ โ ๐to ๐ and ๐ . Suppose ๐ฃโ = ๐ฃ1, then we have ฮ๐ (๐ โช {๐ฃ1}) โ ฮ๐ (๐) =(2ร0.2)/2โ0 = 0.2 > ฮ๐ (๐โช{๐ฃ1})โฮ๐ (๐ ) = 2(0.1+0.2+0.3)/3โ(2ร0.3)/2 = 0.1, violating supermodularity. On the other hand, suppose
19
๐ฃโ = ๐ฃ2. Then we have ฮ๐ (๐ โช {๐ฃ2}) โ ฮ๐ (๐) = (2 ร 0.1)/2 โ 0 =
0.1 < ฮ๐ (๐ โช {๐ฃ2}) โ ฮ๐ (๐ ) = 2(0.1+ 0.5+ 0.3)/3โ (2ร 0.3)/2 = 0.3,which violates submodularity. Thus, this function is non-submodularand non-supermodular.
D REPRODUCIBILITYAll algorithm implementations and code for our experiments areavailable at Github.com/joint-em/FTCS.
E DATASETSWe perform extensive experiments on thirteen real networks in-cluding social, genetic, co-authorship, financial, and co-purchasingnetworks, whose main characteristics are summarized in Table 1.Using the raw Noordin terrorist network data [63], we constructedthe multilayer terrorist relationship network. Each node in thisnetwork represents a terrorist, and each layer represents a typeof relationship (e.g., friendships, organizations, educational insti-tutions, businesses, and religious institutions, etc.). RM [49] has10 networks, each with 91 nodes. Nodes represent phones andone edge exists if two phones detect each other under a mobilenetwork. Each network describes connections between phones ina month. Phones are divided into two communities according totheir ownersโ affiliations. Brain is derived from the functional mag-netic resonance imaging (fMRI) of 520 individuals with the samemethodology used in [52]. It contains 190 individuals in the con-dition group, labelled as ADHD and 330 individuals in the controlgroup, labelled as TD. Here, each layer is the brain network ofan individual, nodes are brain regions, and an edge measures thestatistical association between the functionality of nodes. Each com-munity is a functional system in the brain. YEAST is a biologicalnetwork, in which the layers correspond to interaction networks ofgenes in Saccharomyces cerevisiae (which was obtained via a syn-thetic genetic-array methodology) and correlation-based networksin which genes with similar interaction profiles are connected toeach other [16]. FAO represents different types of trade relation-ships among countries inwhich layers represent products, nodes arecountries and edges at each layer represent import/export relation-ships among countries [17]. Obama network represent re-tweeting,mentioning, and replying among Twitter users, focusing on BarackObamaโs visit in 2013 [60]. DBLP is a co-authorship network until2014 that is used in [28]. Amazon is a co-purchasing network, inwhich each layer corresponds to one of its four snapshots betweenMarch and June 2003 [53]. Higgs is a two-layer network wherethe first layer represents tracking the spread of news about thediscovery of the Higgs boson on Twitter, and the second layer is forthe following relation [15]. Friendfeed contains commenting, liking,and following interactions among users of Friendfeed [9]. Stack-Overflow represents user interactions from the StackExchange sitewhere each layer corresponds to interactions in one hour of theday [51]. Finally, Google+ is a billion-scale network consisting of 4Google+ snapshots between July and October 2011 [31].
F ADDITIONAL EXPERIMENTSF.1 Scalability of Other AlgorithmsWe test our algorithms using different versions of StackOverflowobtained by selecting a variable #layers from 1 to 24 and also with
|E|
10 102103
104105
106
|L|
14
710
1316
20
Tim
e (s
)
10 2
0.1110100
(a) Global
|E|
10 102103
104105
106
|L|
14
710
1316
20
Tim
e (s
)
10 3
10 2
0.1 1
(b) iLocal
Figure 10: Scalability of Global and iLocal algorithms withvarying the number of layers and the number of edges.
1 2 3 4 5 6 7 8 9 10 |Q|
0.4
0.6
0.8
1.0
F1-s
core
FTCS AFTCS
(a) Terrorist dataset
1 2 3 4 5 6 7 8 9 10 |Q|
0.4
0.6
0.8
1.0
F1-s
core
FTCS AFTCS
(b) RM dataset
Figure 11: Effect of the query set size, |Q|, on F1-score.
1 2 3 4 5 6 7 8 9 10 |Q|
10 2
10 1
Tim
e (s
)
Global iGlobal Local iLocal
(a) Terrorist dataset
1 2 3 4 5 6 7 8 9 10 |Q|
10 1
100
101
Tim
e (s
)
Global iGlobal Local iLocal
(b) DBLP dataset
Figure 12: Effect of the query set size, |Q|, on running time.
different subsets of edges. Figure 10 shows the results of the Global,and iLocal algorithms. The running time of these algorithms scaleslinearly in #layers. By varying #edges, all algorithms scale gracefully.As expected, the iLocal algorithm is less sensitive to varying #edgesthan #layers.
F.2 Effect of Query Set Size
Effectiveness.We evaluate the effect of the query set size on thequality of found communities by FTCS and AFTCS. Figure 11(a)and (b) show the F1-score as a function of query set size, |๐ |, onTerrorist and RM datasets. We observe that increasing the size ofthe query set first results in increasing F1-score, but after it achievesits maximum, the F1-score decreases.
Efficiency. We evaluate the effect of the query set size on theefficiency of proposed algorithms. Figure 12(a) and (b) show thequery time as a function of query set size, |๐ |, on Terrorist andDBLP datasets. We observe that increasing the size of the query setresults in increasing the running time.
20
Table 5: Case study of DBLP: Effect of ๐ and _ on Quality.
Query node ๐ = 2 ๐ = 3 ๐ = 4 ๐ = 5 ๐ = 6 ๐ = 7 ๐ = 8 ๐ = 9_ = 1 _ = 2 _ = 1 _ = 2 _ = 1 _ = 2 _ = 1 _ = 2 _ = 1 _ = 2 _ = 1 _ = 2 _ = 1 _ = 2 _ = 1 _ = 2
Jiawei Han๐บ0 Size 952305 952305 367471 308285 231953 104779 148625 29297 100171 18 70236 17 50111 - 37476 -Community Size 440 440 284 264 225 102 169 34 113 10 83 10 60 - 34 -#Free-riders 951865 951865 367187 308021 231728 104677 148456 29233 100058 8 70153 7 50051 - 37442 -
Samuel Madden๐บ0 Size 952305 952305 367471 308285 231953 104779 148625 29297 100171 5107 70236 17 50111 - 37476 -Community Size 210 210 173 152 144 99 122 59 113 17 100 17 83 - 67 -#Free-riders 952095 952095 367298 308133 231809 104680 148543 29238 100058 5090 70136 0 50028 - 37409 -
Yoshua Bengio๐บ0 Size 952305 952305 367471 308285 231953 104779 148625 29297 100171 17 70236 17 50111 17 37476 17Community Size 116 116 109 69 81 34 58 23 36 17 23 17 17 17 17 17#Free-riders 952189 952189 367362 308216 231872 104745 148567 29274 100135 0 70213 0 50094 0 37459 0
2 3 4 5 6 7 8 9 10 k
10 20.1
110
102103104
Tim
e (s
)
GlobaliGlobal
LocaliLocal
(a) Effect of ๐
1 2 3 4 5 6 7 8 9 10 10 2
0.11
10102103104
Tim
e (s
)
GlobaliGlobal
LocaliLocal
(b) Effect of _
Figure 13: Effect of ๐ and _ on running time (Brain).
F.3 Effect of ๐ and _ on EfficiencyFigure 13 shows the result of the effect of varying ๐ and _ onrunning time. As we discussed earlier, the larger ๐ and _ for Globalalgorithms results in less running time since algorithms generatea smaller ๐บ0. However, the larger ๐ and _ increases the runningtime of Local algorithms since it needs to count more nodes in theneighbourhood of query nodes to find a (๐, _)-FirmTruss.
F.4 Effect of ๐ and _ on Found CommunitiesSince ๐ and _ are given by the user, one might ask what happensif inappropriate values of ๐ and _ are given? We conduct a casestudy on the DBLP dataset to evaluate the effect of ๐ and _, and toshow the effectiveness of the FTCS in removing free riders. Table 5shows the ๐บ0 size, the size of found community by FTCS, andthe # free-riders for different values of ๐ and _, and for differentquery nodes, โJiawei Hanโ, โSamuel Maddenโ, โYoshua Bengioโ.These results show that giving inappropriate values of ๐ and _
results in increasing the size of๐บ0. However, since FTCS minimizesthe diameter, it is able to remove far nodes from the query nodeand always finds a community with appropriate size and quality.Accordingly, FTCS is robust to inappropriate values of ๐ and _.
We know that given _ โ N+ and ๐ โฅ 0, the (๐ + 1, _)-FT and(๐, _ + 1)-FT of a multilayer graph are subgraphs of its (๐, _)-FT.Another property of FTCS that is shown in Table 5 is its hierarchicalstructure. This is an important property for a community model asby varying the value of ๐ and _ it can find communities at differentlevels of granularity.
F.5 Efficiency of FirmTruss DecompositionWe compare the efficiency of FirmTruss decomposition algorithmwith the state-of-the-art dense strcture mining algorithms, Firm-Core [35], ML k-core [28], and TrussCube [38], in multilayer graphs.Figure 14 shows the running time of these algorithms on different
Terrorist RM FAO Brain DBLP ObamaYouTube
AmazonYEAST Higgs
FriendFeedStackOverflow
10 210 1100101102103104105
Tim
e (s
)
FirmTruss FirmCore ML k-core TrussCube
Figure 14: Efficiency Evaluation of FirmTruss.
datasets. FirmTruss outperforms all other algorithms, except Firm-Core, and can be scaled to graphs with hundred thousands of edges.As we previously discussed, although FirmCore is more efficientthat FirmTruss, FirmTrusses are more denser than FirmCores. It isa trade-off between high density and efficiency.
21