A Machine Learning Framework for Sleeping Cell Detection in ...

14
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. Digital Object Identier 10.1109/ACCESS.2019.DOI A Machine Learning Framework for Sleeping Cell Detection in a Smart-city IoT Telecommunications Infrastructure ORESTES MANZANILLA-SALAZAR 1 , (Member, IEEE), FILIPPO MALANDRA 2 , (Member, IEEE), HAKIM MELLAH 3 , CONSTANT WETT 4 , BRUNILDE SANS 5 , (Senior Member, IEEE) 1 Polytechnique Montreal, Montreal, QC H3T 1J4 Canada (e-mail: [email protected]) 2 Department of Electrical Engineering, University at Buffalo, Buffalo, NY 14260 - 1920 USA (e-mail: [email protected]) 3 Polytechnique Montreal, Montreal, QC H3T 1J4 Canada (e-mail: [email protected]) 4 Ericsson, Stockholm, 164 40 Sweden (e-mail: [email protected]) 5 Polytechnique Montreal, Montreal, QC H3T 1J4 Canada (e-mail: [email protected]@polymtl.ca) Corresponding author: Orestes Manzanilla-Salazar (e-mail: [email protected]). ABSTRACT The smooth operation of largely deployed Internet of Things (IoT) applications will depend on, among other things, effective infrastructure failure detection. Access failures in wireless network Base Stations (BSs) produce a phenomenon called sleeping cells, which can render a cell catatonic without triggering any alarms or provoking immediate effects on cell performance, making them difcult to discover. To detect this kind of failure, we propose a Machine Learning (ML) framework based on the use of Key Performance Indicators (KPIs) statistics from the BS under study, as well as those of the neighboring BSs with propensity to have their performance affected by the failure. A simple way to dene neighbors is to use adjacency in Voronoi diagrams. In this paper, we propose a much more realistic approach based on the nature of radio-propagation and the way devices choose the BS to which they send access requests. We gather data from large-scale simulators that use real location data for BSs and IoT devices and pose the detection problem as a supervised binary classication problem. We measure the effects on the detection performance by the size of time aggregations of the data, the level of trafc and the parameters of the neighborhood denition. The Extra Trees and Naive Bayes classiers achieve Receiver Operating Characteristic (ROC) Area Under the Curve (AUC) scores of 0:996 and 0:993, respectively, with False Positive Rates (FPRs) under 5%. The proposed framework holds potential for other pattern recognition tasks in smart-city wireless infrastructures, that would enable the monitoring, prediction and improvement of the Quality of Service (QoS) experienced by IoT applications. INDEX TERMS failure detection, IoT, M2M communications, machine learning, sleeping cells, smart cities, wireless networks. I. INTRODUCTION T He deployment of the Internet of Things (IoT) in urban areas is enabling the creation of so-called smart cities where city life will be improved by using large amounts of information coming from hundreds of thousands of geographically distributed communicating devices. This information will lead to the automation of some systems and the creation of new applications that will enhance city living. Smart parking, smart pedestrian crossings, intelligent transportation systems, and intelligent power distribution are just a few of the new types of innovations that can be put in place with the effective exchange of information between city IoT devices. IoT-enabled data and services in smart cities rely on either (a) users interacting with smart devices connected to the Internet or (b) users using network services that depend on IoT devices serving as sensors or actuators [1]. In both cases, communications are essential for the IoT applications to work. Even though several telecommunication technologies have been proposed for the deployment of different IoT applications in cities [2], [3], the ubiquity of cellular com- munications is making operators and standardization entities such as the3 rd Generation Partnership Project (3GPP) push for a common cellular infrastructure for smart cities based VOLUME 4, 2016 1 arXiv:1910.01092v2 [eess.SP] 4 Feb 2020

Transcript of A Machine Learning Framework for Sleeping Cell Detection in ...

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.

Digital Object Identifier 10.1109/ACCESS.2019.DOI

A Machine Learning Framework forSleeping Cell Detection in a Smart-cityIoT Telecommunications InfrastructureORESTES MANZANILLA-SALAZAR1, (Member, IEEE), FILIPPO MALANDRA2, (Member,IEEE), HAKIM MELLAH3, CONSTANT WETTÉ4, BRUNILDE SANSÒ5, (Senior Member, IEEE)1Polytechnique Montreal, Montreal, QC H3T 1J4 Canada (e-mail: [email protected])2Department of Electrical Engineering, University at Buffalo, Buffalo, NY 14260 - 1920 USA (e-mail: [email protected])3Polytechnique Montreal, Montreal, QC H3T 1J4 Canada (e-mail: [email protected])4Ericsson, Stockholm, 164 40 Sweden (e-mail: [email protected])5Polytechnique Montreal, Montreal, QC H3T 1J4 Canada (e-mail: [email protected]@polymtl.ca)

Corresponding author: Orestes Manzanilla-Salazar (e-mail: [email protected]).

ABSTRACT The smooth operation of largely deployed Internet of Things (IoT) applications will dependon, among other things, effective infrastructure failure detection. Access failures in wireless networkBase Stations (BSs) produce a phenomenon called “sleeping cells”, which can render a cell catatonicwithout triggering any alarms or provoking immediate effects on cell performance, making them difficultto discover. To detect this kind of failure, we propose a Machine Learning (ML) framework based onthe use of Key Performance Indicators (KPIs) statistics from the BS under study, as well as those ofthe neighboring BSs with propensity to have their performance affected by the failure. A simple way todefine neighbors is to use adjacency in Voronoi diagrams. In this paper, we propose a much more realisticapproach based on the nature of radio-propagation and the way devices choose the BS to which they sendaccess requests. We gather data from large-scale simulators that use real location data for BSs and IoTdevices and pose the detection problem as a supervised binary classification problem. We measure theeffects on the detection performance by the size of time aggregations of the data, the level of traffic and theparameters of the neighborhood definition. The Extra Trees and Naive Bayes classifiers achieve ReceiverOperating Characteristic (ROC) Area Under the Curve (AUC) scores of 0.996 and 0.993, respectively,with False Positive Rates (FPRs) under 5%. The proposed framework holds potential for other patternrecognition tasks in smart-city wireless infrastructures, that would enable the monitoring, prediction andimprovement of the Quality of Service (QoS) experienced by IoT applications.

INDEX TERMS failure detection, IoT, M2M communications, machine learning, sleeping cells, smartcities, wireless networks.

I. INTRODUCTION

THe deployment of the Internet of Things (IoT) inurban areas is enabling the creation of so-called “smart

cities” where city life will be improved by using largeamounts of information coming from hundreds of thousandsof geographically distributed communicating devices. Thisinformation will lead to the automation of some systemsand the creation of new applications that will enhance cityliving. Smart parking, smart pedestrian crossings, intelligenttransportation systems, and intelligent power distribution arejust a few of the new types of innovations that can be putin place with the effective exchange of information between

city IoT devices. IoT-enabled data and services in smartcities rely on either (a) users interacting with smart devicesconnected to the Internet or (b) users using network servicesthat depend on IoT devices serving as sensors or actuators[1]. In both cases, communications are essential for the IoTapplications to work.

Even though several telecommunication technologieshave been proposed for the deployment of different IoTapplications in cities [2], [3], the ubiquity of cellular com-munications is making operators and standardization entitiessuch as the 3rd Generation Partnership Project (3GPP) pushfor a common cellular infrastructure for smart cities based

VOLUME 4, 2016 1

arX

iv:1

910.

0109

2v2

[ee

ss.S

P] 4

Feb

202

0

Manzanilla-Salazar et al.: A Machine Learning Framework for Sleeping Cell Detection in a Smart-city IoT Telecommunications Infrastructure

on 4th Generation of broadband cellular network technology(4G) enhancements and 5th Generation (5G).

Even with the use of a common communication infras-tructure, there are several drawbacks of smart-city large-scale implementation. First, it heavily depends on reliabletelecommunications, as even banal failures may lead to themassive malfunctioning of key automated systems. Second,the type of telecommunication traffic produced in smartcities will mostly be produced by IoT machines insidethose automated systems. The problem is that the statisticalbehaviour of this traffic is quite different from that producedby humans [4], and the lack of direct human interactionwill make it even more difficult to detect telecommunicationfailures. Finally, the distributed nature of the applicationsand the large number of devices and connections will alsohinder failure detection.

One of the most difficult types of failure to detect incellular networks is the so-called “sleeping cell” failure. Itconsists of failures that will not set-off alarms even if thecell is malfunctioning. In human cellular communications,a sleeping cell will cause users to react to the lack ofservice, change location and eventually notify the operator.This failure can be prolonged, in some cases days, beforebeing detected by the operator, and corrective measuresare taken [5]. The influence of sleeping cell failures isgreatly amplified in smart cities, where many automatedsystems may depend on the normal function of a particularcell. Thus, the city does not have the luxury of waitingseveral days for the malfunctioning cell to be detected.The delay constraints of essential smart-city applicationsmight be difficult to satisfy even with fully-operational BaseStations (BSs) due to the massive number of devices thatare expected to request access [6].

The objective of this paper is to present a MachineLearning (ML) framework to detect sleeping cells in asmart-city IoT context. The framework is based on thefollowing:• the introduction of a novel concept of neighborhood

between BSs, and• the use of aggregated Key Performance Indicators

(KPIs) over time intervals for different types of IoTapplications.

The data used to feed our framework were extractedfrom a large-scale IoT infrastructure simulator that takesas its input a real city database of geographical locations ofpotential IoT devices and the current locations and featuresof the BSs of several service providers.

In the remainder of this paper, we present the state of theart in Section II. In Section III, we present the modelingof the system, emphasizing the relationship between the in-frastructure technology and the locations of the IoT devices.The ML framework is detailed in Section IV, starting withnovel definitions of the cell neighborhood and proximitythat are at the core of the framework, followed by thesimulation and ML methodologies, and ending with someremarks on the implementation. Numerical results are shown

and commented on Section V, and conclusions are presentedin Section VI.

II. STATE OF THE ARTFailure detection of network elements is one of the mainconcerns of mobile network operators. Several papers in theliterature address this problem using real network operatordata at the BS level [7]–[11]. This approach produces veryaccurate results for the specific networks, but the solutionsare not easily generalized due to the difficulty in retrievingreal cellular network data. As a consequence, most authorsuse simulated data, such as in [7], [10], [12]–[19], thoughemulations based on real data can also be found [20]. Inthis work, we employ simulated network data generatedwith a large-scale network simulator [21] (an extension of[4]), which employs real data on the positions of networkelements and the parameters of the communicating nodes.

There has been much work dedicated to solving problemsin wireless networks using ML; for an extensive survey, see[22]. A common approach in failure detection is to learnstandard traffic patterns and quickly detect deviations fromnormal behavior (i.e., unsupervised learning). In particular,existing research focuses on i) anomaly detection [7], [23],[24], ii) KPIs [25], iii) clustering [11], [14], [16], andiv) dimensionality-reduction techniques [14], [26]. Otherauthors exploit known properties of cellular networks toperform supervised learning to detect faulty elements in anetwork [11], [13], [18], [19], [24], [27].

A complementary approach to ML proposed by [28] isto acquire data from troubleshooting (human) experts inmobile networks and to use their experience and knowledgeto improve fault detection. In addition to the proposedtechniques, fuzzy models can be used for failure detection,as presented in [17], [20].

Finally, some authors propose detecting failures in a net-work element by looking at anomalies in the traffic and KPIsfrom neighboring cells [10], [13], [19]. This is particularlypowerful when the traffic generated in a defected cell doesnot present remarkable anomalies in its KPIs, such as inthe case of Random-Access Channel (RACH)-sleeping cells,where new users cannot connect but existing users in the cellcan continue to transmit regularly during a failure.

In this paper, we propose using well-known supervisedlearning techniques for BS failure detection in a smart-citycellular infrastructure. In particular, for each cell, KPIs fromneighboring cells are analyzed to highlight anomalies anddetect defective BSs. Different from the reviewed literature,i) we consider advanced propagation models based notonly on distance but also on other parameters, such asReceived Signal Strength (RSS), the bandwidth, frequency,and antenna orientation, and ii) we define different neighborcategories to improve failure detection.

III. SYSTEM MODELLINGLet us first mention that we provide in Table 1 a summaryof the mathematical notation used in our modelling of

2 VOLUME 4, 2016

Manzanilla-Salazar et al.: A Machine Learning Framework for Sleeping Cell Detection in a Smart-city IoT Telecommunications Infrastructure

TABLE 1: Summary of mathematical notation.

Symbol DescriptionB Set of M BSs {b1, ..., bM}.G Set of potential geographical locations for an IoT device.

rg(i) RSS from a bi ∈ B as measured from location g ∈ G

sg

Ordered list of size ξ of the bg,i ∈ B withhighest rg(g, i) values in decreasing order{bg,1, ..., bg,ξ}. Subscript i indicates priority.

f(i) Failure indicator for bi ∈ B.Pr(X = x) Probability of event x.

CniSet of bj ∈ B belonging to theNeighborhood Category n of bi ∈ B.

Iu−vOrdered list of consecutive positions{u, u+ 1, ..., v}, such that ξ ≥ v ≥ u ≥ 1.

cni,jCategory n neighborhood indicatorfor bi, bj ∈ B.

Pu−v(bi, bj)Indicator of u− v Proximity for a pair ofantennas bi, bj ∈ B.

xi,t Feature vector for bi ∈ B in interval t.o(xi,t) Target value used for training of ML models.N Backbone of the cellular network.

the system as well as in the description of the proposedframework.

A. COMMUNICATION INFRASTRUCTUREThe cellular network model is composed of a set B of basestations enumerated as {b1, . . . , bM}, a set of IoT devices ingeographical locations {g1, . . . , gL} ⊂ G , a backbone N ,and a Data Management Center (DMC). Only the uplinkperformance is considered; the core and metropolitan partof the network are modeled as a black box.

We assume a limited number of wireless channels (i.e.,the Resource Blocks (RBs) in Long-Term Evolution (LTE))can be used to transmit data between users and BSs. This isdone through dedicated control channels allocated through arandom access procedure (RACH), based on preamble trans-missions. The available preambles are limited and mightcollide, triggering retransmission and introducing additionaldelay in the communications between devices and BSs.

The key parameters in this study are the collision proba-bility and the access delay, i.e., the time required for a userpacket to be received by the associated BS. In particular,high-order statistics on those two parameters are used todetect sleeping cells. Further details on the methodologyare provided in Section IV.

B. TOPOLOGY DEFINITIONThe framework was built with real telecommunications andurban data from the city of Montreal (see details in [4]). InFig. 1, a toy example of a smart-city cellular system is dis-played: network users are represented by IoT devices, suchas cars, buses, traffic lights, and security cameras. Details onthe types of IoT device considered and their characteristicscan be found in a previous work [29], where six differentIoT applications are presented. The rectangle represents thegeographical boundaries of a smart city, in which three BSsbi, bj , and bk are installed and provide network access to theIoT devices. The geographical position and other features

BS i

BS j

BS k

FIGURE 1: A sample scheme of the proposed architecturewith three BSs and a large number of IoT devices.

of the BSs, such as the bandwidth, transmitted power, andorientation, were retrieved from [30].

To characterize the links between users and BSs we i)define a threshold on the received power, ii) compute thepower received from each of the BSs by each IoT device,and iii) determine the list of BSs that cover each IoT device.A threshold of −100 dBm is considered in this study. Thereceived power is computed according to the Cost-Hata(Cost-231 defined in [31]) propagation model, which allowscomputing the path loss based on key parameters, suchas the frequency, distance, and height. This propagationmodel is also combined with the corresponding radiationpatterns for each BS. This leads to computing the EquivalentIsotropic Radiated Power (EIRP), based on the elevation,gain and inclination of the antennas, which are also availableat [30]. The list of BSs covering a certain IoT devicecan be very large, especially in a densely populated urbanscenario like Montreal, and this can lead to computationalinefficiencies and large execution times. As a consequence,this list is limited to the ξ BSs with the highest receivedpower. The list is used, as described in Section IV-A, tocombine the analysis of one cell’s KPIs with those ofneighbor cells, and ultimately to detect the sleeping cellswith high accuracy.

C. THE SLEEPING CELL PROBLEMA sleeping cell is usually defined as a cell that is notentirely operational and whose malfunctioning is not easilydetectable by the network operator, as highlighted in [25].This term is generally used to describe a wide variety ofhardware and software failures, which degrade the Qualityof Service (QoS) and Quality of Experience (QoE) andcan remain hidden to the network operator for a long time(days or even weeks) [32]. In this study, we address aparticular type of sleeping cells that affects the RACH inLTE networks [25]. On the one hand, this type of problemaffects new users who are not able to complete the accessprocedure and consequently cannot access the network.On the other hand, existing users, which were alreadyconnected to the BS when the problem manifested, continueto transmit. As a consequence, standard methods based on

VOLUME 4, 2016 3

Manzanilla-Salazar et al.: A Machine Learning Framework for Sleeping Cell Detection in a Smart-city IoT Telecommunications Infrastructure

Base

Station

A

Device

Locations

A

B

A

B

C

B

A

C

B

C

A

C

B

A

C

BPriority

Lists

Base

Station

B

Base

Station

C

FIGURE 2: Priority list example.

traffic monitoring fail to detect the problem, because thenetwork operator continues to monitor updated statisticscoming from the RACH-sleeping BS. Progressively, all theongoing connections end, and the cell ceases all activity.

IV. A FRAMEWORK FOR SLEEPING CELL DETECTIONA. NEIGHBORHOOD/CLOSENESS DEFINITIONWhen trying to detect if a particular BS has failed, our keyidea is to include data from its “neighborhood”. Howeverhow does one determine which BSs can be considered as“neighbors”? Though the BS distance can be used, as donein [19], we now propose a richer definition: a neighboringBS is actually one whose performance KPIs are likely tobe affected by the access failure in the BS under study.Accordingly, we base our definition on the following:

1. Antenna priority and RSS: when sending access requests,an IoT device at location g ∈ G will choose the BS bm ∈ Bsuch that:

m = argmax1≥j≥M

{rg(j)} (1)

where rg(j) is the signal strength of a BS bj measuredat location g. The set of all the BSs considered as optionsfor a device located at g ∈ G is depicted as a priority listsg of size ξ, defined as the following sorted list of BSs:

sg = [bg,1, bg,2, ..., bg,ξ] (2)

where the RSS values of the BSs b1 = bg,1, b2 =bg,2, ..., bξ = bg,ξ, hold the following relationship:

rg(1) ≥ rg(2) ≥ ... ≥ rg(ξ) (3)

A device at location g will send its access request to BSbg,1 first. When the BS fails, the next BS on the list (bg,2)is considered, and so on. Fig. 2 shows examples of BSs andtheir positions in priority lists of size ξ = 3 for a series oflocations. Note that in our implementation the 12 antennaswith the highest RSS are considered (ξ = 12). In somelocations, the priority list can be shorter, as a consequenceof the threshold mentioned in Section III-B. The readershould be aware that even if RSS decay is presented in Fig.

2 as decreasing linearly with distance, this is done only tosimplify the explanation.

2. Directional antennas: antenna tilt and orientation areimportant factors in the computation of the RSS in thepropagation model used in our simulator. Therefore, thestrongest received signal might not come from the closestBS. In Fig. 3, where a set of BSs numbered from 1 to 17 areshown. We observe that even though BS 6 is closer than BS15 to the BS under study, it might not be the second BS inthe priority list for any location, while BS 15 effectively is.In our modeling of the system, this might happen because nolocation receives the signals of BS 6 and the one under studyas the two strongest ones, due to the antenna directionalityand power values.

3. KPIs availability: it is feasible to obtain aggregations ofKPIs for all packets processed by a BS during any periodof time.

The type of failure we analyze in this paper is that ofthe sleeping cells, which are BSs whose access functionbecomes inactive, affecting the performance of “close” BSs.This process can be described as follows:

• Step 1: The observed BS’s access function fails. On-going transmissions continue to be served by the BS.

• Step 2: Idle devices that usually would request accessto the failed BS, choose as an alternative the one withthe second-highest RSS.

• Step 3: The additional traffic, produced by the “new”devices requesting access, induces a performancedegradation in the chosen BS.

Because of its degradation, it is of interest to includethe BS chosen in step 2 as a neighbor in the patternanalysis process. This is why we base our definition ofneighborhood on the notion of the probability of experien-cing a performance degradation. Note also that, even thoughsimultaneous failures of nearby BSs may not be frequent,they cannot be ruled out. Therefore devices around the failedlocation may go down their priority list until they find anoperational BS. The farther down a BS is in the priority list,the less likely it is to receive the extra traffic, as it wouldrequire the simultaneous failure of all the BSs positioned“above” it in the priority list.

Using probabilities to define neighborhood categoriesWe define now a novel idea to determine whether two BSsare “neighbors” or not, based on a threshold on the proba-bility of each of the BSs affecting the other’s performancein case of a failure. The following example with ξ = 4illustrates the intuition behind this approach. Let p be theprobability of access failures of any BS during any giventime interval of duration T . Let us also assume that failuresin different BSs and time intervals are independent. Giventhe device location g ∈ G, let sg = [bg,1, bg,2, bg,3, bg,4]be the priority list containing the 4 BSs with the highest

4 VOLUME 4, 2016

Manzanilla-Salazar et al.: A Machine Learning Framework for Sleeping Cell Detection in a Smart-city IoT Telecommunications Infrastructure

received power at g. When bg,1 fails, one of the followingoccurs:• bg,2 is operational with probability (1− p) and it will

receive all the traffic from devices at g with probability1(1− p).

• bg,2 is asleep, which will happen with probability p,bg,3 is operational with probability (1 − p), and therequests for access of the device at g will be handledby bg,3. This will occur with probability p(1− p).

• both bg,2 and bg,3 are asleep with probability p2, bg,4is operational with probability (1−p), and bg,4 will bethe alternative BS receiving the access requests. Thiswill happen with probability p2(1− p).

This example shows the intuition behind our definition ofneighborhood of category n of BS bk. A BS is a categoryn neighbor of BS bk when it is the n-th option to requestaccess if BS bk fails.

To formalize the relationship between the probability ofreceiving traffic normally served by bg,1 ∈ sg , and thecondition of being a neighbor of category n, let us firstdefine the failing state indicator of the RACH function ofBS bi ∈ B as:

f(bi) =

{1 if access function of bi is inactive0 otherwise

(4)

Given an ongoing access failure in bg,1 ∈ sg , the probabili-ties of failure for the BSs in sg are:

Pr(f(bg,1) = 1) = 1 (5)Pr(f(bg,n = 1) = p;∀n = 2, ..., ξ (6)

When the access function of BS bi = bg,1 fails, we candefine Q(i, j) as the probability of BS bj = bg,n ∈ sgreceiving traffic normally served by BS bi. This probabilitycan be modeled as:

Q(i, j) = Pr(f(bg,1 = 1) ∩ ...... ∩ Pr(f(bg,n−1 = 1) ∩ Pr(f(bg,n = 0) (7)

= pn−1(1− p)We now define Cn(bi), the neighborhood category n as:

Cn(bi) = {bj 6=i|Q(i, j) ≥ pn−1(1− p)} (8)

As a consequence of Equation (8), each neighborhoodis nested inside those of higher categories, following thestructure of a set of Russian dolls (a.k.a. Matryoshkas orBabushkas):

C1(bi) ⊂ ... ⊂ Cξ−1(bi) (9)

Applying the definitions to the example, assuming thatthere is only one device location in the system g ∈ G, weobtain the following possible neighborhood category sets forbi = bg,1:

BS

under

study

BS

under

study

Neighborhood

Category 1

Neighborhood

Category 212

34

5

6

78

9

10

11 12

13

14

15 16

17

FIGURE 3: Example of neighborhood categories 1 and 2.

C1(bg,1) = {bg,2}C2(bg,1) = {bg,2, bg,3} (10)

C3(bg,1) = {bg,2, bg,3, bg,4}where bg,1, bg,2, bg,3, bg,4 ∈ sg ⊂ B.Note that in Fig. 3, BS 7 is the target of this analysis,

and the BSs in neighborhood category 1 (dark gray) are partof the set of category 2 (light gray), as a consequence ofEquation (9). To highlight the differences between the pro-posed neighboring structure and the classical geographicalone, in this example, an immediate neighbor, such as BS6, is excluded from the neighborhood of category 1, andthe more distant BS 4 instead belongs to it. This choice isto emphasize the fact that distance is not the criterion todetermine the membership of a neighborhood set, which isactually determined by the position in priority lists.

The u-v proximityTo find the set of neighbors for each target BS, we needto first compute the u− v proximity. Two BSs have u− v(u < v) proximity if there exists at least one device locationsuch that in its priority list, the positions occupied by thetwo BSs are between the uth and vth positions (includingthe extremes).

Based on this definition, multiple u − v proximities canbe defined for a single pair of BS if there is more thanone location whose list contains antennas from both BS.This occurs because a pair of BSs can occupy very diversepositions in the priority lists in different device locations.At a location well positioned to receive signals from bothBSs, both might occupy the first two positions of the list.At a location far from both BSs, they might occupy the twolast positions of the list.

VOLUME 4, 2016 5

Manzanilla-Salazar et al.: A Machine Learning Framework for Sleeping Cell Detection in a Smart-city IoT Telecommunications Infrastructure

rangeu-v

Decreasing received power

... ... ... ... ...

Location

g

Base

Station

A

Base

Station

B

priority

1

priority

u

priority

i

priority

j

priority

v

priority�

I

FIGURE 4: Range u-v of proximity between BSs (i,j).

The existence of multiple u− v proximities for the samepair of BSs is not necessarily a problem. Their usefulnessbecomes evident when we consider that the signals froma single pair of BSs might not be received with enoughstrength in any device to have 1−2 proximity, for example,but are received strongly enough in at least one device tohave 2− 3 proximity. This allows us to say that these BSsdo not belong to each other’s neighborhood category 1 butthat they belong to each other’s neighborhood category 2.No device normally connected to one of them will have as afirst choice the other BS in case of an access failure, unlessthere are two simultaneous failures.

We can formally define the u − v range of priorities asthe following ordered set:

Iu−v = {u, u+ 1, ..., v} (11)

where ξ ≥ v > u ≥ 1.Let the u−v proximity indicator of a pair of BSs bi, bj ∈

B be defined as follows:

Pu−v(bi, bj) =

{1 if ∃g ∈ G : bi, bj ∈ {bg,u, ..., bg,v}0 otherwise

(12)where

i 6= j;u, u+ 1, ..., v ∈ Iu−v;bi, bj ,∈ B; andbg,u, ..., bg,v ∈ sg ⊂ B.

In Fig. 4, to visually show the u−v proximity concept, thefollowing are displayed: location g ∈ G; the set of machinesinstalled in g; and some of the ξ BSs in the priority list sg .Note that, in this example, u, i, j, and v all belong to therange Iu−v . Therefore, applying the definition in (12), wehave that Pu−v(A,B) = 1, for BSs A = bg,i and B = bg,j .

Note that the u− v proximity is a symmetrical property,under the general assumption of the existence of at least oneother location where the positions that both BSs occupy ina priority list are inverted. Therefore, for any pair of BSsbi, bj ∈ B, we have that:

TABLE 2: Relation between priority, probability, categoryand proximity

Alternative Position Pr Neighborhood u− vBSs in sg Categories intervalbg,2 2 1(1− p) C1 1− 2bg,3 3 p(1− p) C2 1− 3bg,4 4 p2(1− p) C3 1− 4

Pu−v(bi, bj) = Pu−v(bj , bi),∀i 6= j (13)

The u − v proximity is a property that can be usedto determine whether any pair of BSs has high, mediumor low propensity of affecting each other’s performanceby considering a u − v range that covers positions at thebeginning, middle or last part of the priority list. In ourspecific implementation, we are interested in identifyingthose BSs that belong to a specific neighborhood category.We can see in Equation (8) that the BSs of interest for aparticular Cn are receiving traffic from the target BS witha probability of pn−1(1− p) or higher. This means that theu − v range for this application starts in the first position(u = 1).

Table 2 illustrates where the u−v range ends, in relation-ship to a particular neighborhood category. We can observethe relationship between:

1 The position of the BSs in a priority list,2 The probability they have of receiving access requests

typically served by the BS in the first position if itfails,

3 The neighborhood of lowest category that includeseach BS, and

4 the u− v interval associated to each case.Table 2 was built following the toy example presented insection IV-A to illustrate the intuition behind our approach,which was also used in Equation (10). Under the assumptionthat bg,1 fails, each of the three BSs has decreasing proba-bilities of receiving access requests originally intended forthe BS in first position. We can appreciate as a general rulethat for a neighborhood category n:• BSs in positions 2, ..., n+ 1 are included in it.• The lower bound of the probability these BSs have of

receiving access requests from bg,1 is pn−1(1− p).• The associated proximity range ends in n+ 1.

We conclude that the u− v range associated to a neighbor-hood category n is 1− (n+ 1).

Formally:

bj 6=i ∈ Cn(bi)⇔ P 1−(n+1)(bi, bj) = 1 (14)

for at least one location g ∈ G.We can illustrate this relationship with the following

example: if BS bj belongs to the neighborhood category3 of bi (bj ∈ C3(bi)), it means that bi and bj have 1 − 4proximity (P 1−4(bi, bj) = 1).

6 VOLUME 4, 2016

Manzanilla-Salazar et al.: A Machine Learning Framework for Sleeping Cell Detection in a Smart-city IoT Telecommunications Infrastructure

Because of the association defined between the notions ofproximity and neighborhood, the symmetry defined in (13)also implies a symmetry in neighborhood relationships suchthat:

bj ∈ Cn(bi)⇔ bi ∈ Cn(bj),∀i 6= j (15)

Neighborhood matricesIn the proposed framework, we aggregate KPIs of the“neighbors” of a BS whose failing state we wish to study. Toidentify the neighbor BSs, we use a neighborhood indicatorfor each pair of BSs. We arrange these indicators in anM ×M neighborhood matrix, where M is the number ofBSs in the system.

For a neighborhood category n, each element of the M×M neighborhood matrix Cn is defined as:

cni,j =

{1 if bj ∈ Cn(bi), j 6= i

0 otherwise(16)

Note that cni,j = 0 when i = j and that cni,j = cnj,i becauseof (15).

The neighborhood matrices C1 and C5 computed in ourimplementation are partially shown in Figs. 5a and 5b. Wecan observe that the BS identified with a 0 is a neighbor ofBSs 2 and 4 when considering a neighborhood of category1 (Fig. 5a). If we consider a neighborhood of category5, BS 0 is also a neighbor of BS 5. We can observe asimilar situation for BSs 3 and 5, which have one moreneighbor when the category is augmented to 5. For thecellular network simulated, the matrices Cn are of size479× 479, as there are 479 BSs in the city.

The neighborhood matrices are computed once per cellu-lar infrastructure and recomputed only if there is a change inthe RSS values or if antennas or BSs are removed or added.The matrices are evaluated when aggregating the neighbor-hood KPIs for each of the BSs as part of the constructionof the “feature vectors” that allow the use of ML for failuredetection. As mentioned, a neighborhood is associated withthe probability of experiencing a performance degradationas a consequence of a failure, whose pattern we intend todetect. The specifics on how KPIs are aggregated can befound in Section IV-C.

B. NETWORK SIMULATIONWe use an LTE simulator1similar to the one described in[4]. The way IoT devices gain access to the BSs is basedon the computation of priority lists (ξ = 12) for each deviceserved by the mobile infrastructure. The simulator allowsfor several propagation models to compute the RSS valuesinvolved in the construction of the priority lists. Becausethis model encompasses the uplink RACH procedure andthe transmission until reception at the Evolved Packet Core(EPC), its output is composed of counters and statisticsrelated to both phases:• Number of packets created.

TABLE 3: Simulation scenarios

Type of devicesmart meters, surveillance cameras, bus stops,

traffic lights, parking lots, microPMUs*

Levels of traffic High / lowSimulation duration (h) 12Total BSs 479Failing BSs 50

* Micro-phasor measurement units are power monitoring devices.

• Number of packets transmitted.• Number of RACH collisions.• Number of RACH attempts.• Minima, maxima and average of the RACH delays.• Minima, maxima and average of the transmission times

(from RACH completion to reception at the EPC).

The priority lists are computed considering as options theantennas, instead of the BSs. In the preprocessing to com-pute the neighborhood matrices, the antennas identificationcodes in these lists is replaced by the identification of theBS where the antennas are installed.

The RACH failures are modeled as affecting simulta-neously all the antennas of a particular BS. Because BSsgenerally do not have the same number of antennas, mostof the time, the probabilities for the positions in the prioritylists may have values higher than the theoretical minimumdescribed in Section IV-A for a particular neighborhoodcategory. We purposely choose to omit the implementationof countermeasures in the preprocessing to address the“noise” introduced by it, as the effect on the results doesnot hinder the methodology.

1) Simulated scenariosIn Table 3, we show the devices types, levels of traffic,duration of the simulation, total number of BSs and numberof failing BSs.

We considered two scenarios in our simulations: hightraffic and low traffic. In the high-traffic scenario, smartmeters, parking slots, bus stops, surveillance cameras, andtraffic lights generate packets at double rate with respectto the low-traffic scenario. Fire alarms and Micro-PhasorMeasuring Units (microPMUs) generated packets at thesame rate for both scenarios.

2) RACH failure generationIn a random sample of 50 BSs, total RACH failures wereparameterized to initiate at the beginning of each 1-hoursimulation, with a duration of 30 minutes. This processwas repeated 12 times (with different random seeds). Inevery simulation, the first 10 minutes are used to initializethe network and their results are omitted from the KPIsaggregation.

C. AN ML FRAMEWORK

1Smart cities M2M: https://www.trafficm2modelling.com/

VOLUME 4, 2016 7

Manzanilla-Salazar et al.: A Machine Learning Framework for Sleeping Cell Detection in a Smart-city IoT Telecommunications Infrastructure

(a) Neighborhood category 1. (b) Neighborhood category 5.

FIGURE 5: Example of neighborhood matrices.

1) PreprocessingThe output of the simulator was aggregated in three ways:

i Across non-overlapping time-bins or intervals (ac-cording to the size of the time aggregations),

ii across the antennas of each BS, andiii across the IoT devices.

As a result, the data set contained the statistics at theBS level and considered generic traffic (without distinc-tion among the traffic generated by the different de-vices/applications). The results were preprocessed aggre-gating the data at the BS level in time intervals of 5, 10,15 and 30 minutes. This allowed us to study the effect ofaggregation size on detection performance.

A fundamental part of the preprocessing was the compu-tation of the neighborhood matrices for each of the neighbor-hood categories. This process involves the following steps:

• Analyzing the priority lists for each location in thenetwork.

• Processing the priority lists to obtain the u − v prox-imities between each pair of BSs.

• Using the u−v proximities to obtain the neighborhoodmatrices.

Our aggregating procedure consisted of computing thefollowing statistics: average, variance, skewness, kurtosis,percentile (5, 25, 50, 75, 95), minimum, maximum andrange.

After the data were aggregated for each interval/BS, anormalization step is used to force the values to lie withinthe range (0, 1).

The feature vectors xi,t to be used in the ML algorithmsare built concatenating, for each aggregation interval t and“target” BS bi two vectors:

• Aggregation of the statistics of BS bi during interval t.• Aggregation of the statistics of all the neighboring BSs

in Cn(bi) during interval t.

To perform supervised classification, the data set is com-pleted by associating each feature vector xi,t to a categorylabel, a class indicator or a “target value” o(xi,t), definedas:

o(xi,t) =

{1 if f(bi) = 1 at interval t0 otherwise

(17)

This procedure is repeated using the data generated by thesimulator for each time aggregation size and neighborhoodcategory. The whole process, for a particular interval of time,is described in the diagram in Fig. 6.

2) Models and training strategyWe report our results for the following binary classifiers:

i Naive Bayesii Logistic Regression

iii Linear, Quadratic, Cubic and Radial Basis Function(RBF) Support Vector Machines

iv Decision Treesv Extra Trees

vi Bagged Decision Treesvii Random Forest

viii Shallow (Single Hidden Layer) Neural NetworksFor each of the simulation scenarios and preprocessing

strategies, the data set is randomly split into training (70%)and testing (30%) sets. Parameter tuning for each of the clas-sification models is performed via 10-fold cross-validation(within the data from the training set).

The detection (classification) performance was mainlyevaluated via the Receiver Operating Characteristic (ROC)Area Under the Curve (AUC) score. However, failure in-vestigation activities associated with a false alarm representa considerable operational cost for telco providers. Conse-quently, we also computed the False Positive Rates (FPRs)as a performance index.

In Fig. 6, we explain the ML framework for failuredetection. Notice that gray boxes represent processes oractions and that white boxes represent their products, in-termediate products or inputs. The process starts with theRSS measurements, by probing in a deployed network orby simulation. In our implementation, RSS values werecomputed for each antenna in each location g ∈ G andused to construct the priority lists sg . After the lists werecreated, each pair of BSs was considered one at a time,and the list of each location was checked to see if itcontained the pair of BSs and in which positions of priority.By observing these positions, a practitioner could computeeach of the u − v proximities for the pair of BSs. Oncethe practitioner decided which neighborhood category toconsider, u − v proximities of each pair of BSs can beused to determine whether or not they are neighbors. When

8 VOLUME 4, 2016

Manzanilla-Salazar et al.: A Machine Learning Framework for Sleeping Cell Detection in a Smart-city IoT Telecommunications Infrastructure

Received powercomputation

Antenna, altitude,location, frequency,

power, and orientation

Devicelocationg ∈ G

Device’spriority

list sg ⊂ B

Evaluation of u − vproximity for eachpair of BS bi ∈ B

Neighborhoodmatrix Cn

Aggregation ofneighbor BSs KPIs

for a target BS

Feature vectorxi,t

Aggregation oftarget BSs KPIs

BinaryClassification

Model

Prediction(Failure/No-Failure)

Desiredneighbor category

n

Errorminimization

training

Trained classifier(Failure detector)

Class indicator(Failure/No-Failure)

o(xi,t)

FIGURE 6: Graphical description of the proposed framework for failure detection.

the feature vector was built for a specific target BS, theKPIs vectors of its neighbors were concatenated to the KPIsvector of the target BS. The process was repeated for all theBSs, for all the time intervals under study, to build the dataset. Then, the training process took place by minimizingsome norm of the difference between the prediction and thereal target value. The target value was obtained from thesimulation parameters. If there was an ongoing failure inthe BS in a specific interval, the target value was defined as1. Otherwise, the target value was 0.

D. IMPLEMENTATION AND SCALABILITY DETAILSThe proposed framework is based on four main processes(see Fig. 7):

i Retrieval of RSS values at each potential devicelocation and creation of priority lists.

ii Computation of neighborhood sets based on the pri-ority lists.

iii Data aggregation (both at the BS level and of neigh-borhoods based on the neighborhood matrices).

iv Training and evaluation of ML models using theaggregated data.

The real-world application of the framework requiressome efforts from an operator:

i RSS retrieval is a process that can be done via probinga deployed network at each of the potential IoTdevice locations or via simulation of the BSs’ signalpropagation.

ii Neighborhood computation requires finding the u −v proximities for each pair of BSs and the furthercreation of neighborhood sets for each neighborhoodcategory.

iii Data aggregation.iv All the MLs models to be trained and evaluated are

well-known models, whose complexities and difficul-ties are well known, and there exists a plethora ofpipe-line design strategies that can be used.

In what follows, the first two operator’s challenges areanalyzed.

To obtain the RSS measurements, the telco operatorwould first need to identify the potential IoT device locationsand undertake a project to probe, at each location, the

signal strength of the BSs in the range of each location.An alternative approach would be to use simulations, as inthis work. Instead of considering only the potential locationsfor devices, we divided the space of the city into a gridof squares. The positions of the IoT devices inside a gridsquare are approximated with the center of the grid square.The RSS of every antenna of every BS in the simulatednetwork was computed taking into account the antenna tilt,orientation and power, as well as distances and frequencies.

The outcome of this simulation is a data structure oflength ξG (worst-case scenario), where G is the totalnumber of squares in the grid and ξ is the size of the prioritylists of antennas at each square. Each item in the structureis a vector containing information regarding the location ofthe square, the identification of both the BS and the antennaand the RSS value.

Another important implementation task is the computa-tion of the neighborhood matrices. To study the scalabilityof this process so that an operator can estimate its feasibility,we show the computational complexity associated with theprocess. Assuming that M is the total number of BSs in thecity, the computational complexity of the procedure is givenby the following polynomial expression:

O(M2 +G) (18)

A distance-based approach, in contrast, involves the com-putation of the Voronoi regions around each of the BSs, aprocess the computational complexity of which is, in general[33]:

O(M logM) (19)

Compared to Equation (19), the complexity of the neigh-borhood matrices can be considered more computationallyexpensive. However, being polynomial, the method can beconsidered scalable. If the number M of BSs of the networkis considered constant, Equation (18) becomes O(G), whichis linear with respect to the number G of potential IoTdevice locations.

V. NUMERICAL RESULTSWe now discuss the results obtained after using four classi-fiers in the following task: to determine if a specific vector

VOLUME 4, 2016 9

Manzanilla-Salazar et al.: A Machine Learning Framework for Sleeping Cell Detection in a Smart-city IoT Telecommunications Infrastructure

ProbingObtain

RSS values

Simulationresults

Prioritylists

sg∀g ∈ GComputation ofneighborhood sets

NeighborhoodmatricesCn(bi)∀bi ∈ B

NeighborhoodKPI aggregation

KPIs aggregatedat BS level

NeighborhoodKPIs

Feature vectorxi,t

Training ofML models

FIGURE 7: Implementation process.

TABLE 4: Minimum and average ROC AUC for eachclassifier.

ROC AUCClassification model Average MinimumExtra Trees 0.997 0.971Random Forests 0.985 0.921Decision Trees 0.984 0.897Naive Bayes 0.983 0.938Bagged Decision Trees 0.981 0.879Linear Support Vector Machine (SVM) 0.959 0.912Shallow Neural Network 0.955 0.898Quadratic SVM 0.954 0.891Logistic Regression 0.954 0.877RBF SVM 0.953 0.902Cubic SVM 0.952 0.875

of aggregated KPIs taken during a time interval at a specificBS was produced or not during an ongoing failure. This wasachieved without knowledge of the past behavior of the BS.

According to their classification performance both fromthe point of view of the ROC AUC and FPR, in all of ourexperiments, we could identify two groups of supervisedclassifiers:

• Group 1: consisting of all the SVM classifiers, alongwith Logistic Regressions and the Shallow NeuralNetworks, which achieve on average an AUC scorebelow 0.981.

• Group 2: consisting of the ensemble learners (BaggedDecision Trees, Random Forests and Extra Trees),Decision Trees and Naive Bayes, the average AUC ofwhich is higher than 0.98. The Extra Trees classifiers inparticular, in the worst performance, achieved an AUChigher than 0.97, and the average score was above 0.99.We indicate the classifiers of this group with italics inTable 4.

The behavior of the performance of these groups is shownin Figs. 8, 10, and 9. It can be argued that the pattern iseasily separable, as a low-complexity classifier like NaiveBayes performs very well. While Extra Trees is not asimple classifier, its building strategy is less prone to over-

11109

0

1 2 3 4

Neighborhood Category

5 6 7 8

1

4

6

8

10

12

14

Fals

e P

ositiv

e R

ate

(%

)

FIGURE 8: Effect of neighborhood category on falsepositive rate per classifier.

fitting than traditional one-hidden-layer Neural Networksand kernel-based methods (SVMs), which might explain itsdominance over those models.

It is important to keep in mind that these AUC and FPRvalues are obtained without any BS-related information (BSID, time, coordinates, number of devices producing traffic,application types, etc).

Among all the experiments, the average effect of increas-ing the traffic intensity is mild (never higher than 1%),though in most classifiers, the effect is slightly negative.Extra Trees is an exception, showing a slightly positiveaverage reaction.

A. EFFECT OF NEIGHBORHOOD CATEGORYIn Fig. 8 we show the FPR observed when applying theclassification models on the 11 data sets generated by aggre-gating the simulated data considering the 11 neighborhoodmatrices (for categories 1, ..., 11). The reader should notethat the neighborhood definition affects which (and how

10 VOLUME 4, 2016

Manzanilla-Salazar et al.: A Machine Learning Framework for Sleeping Cell Detection in a Smart-city IoT Telecommunications Infrastructure

many) BSs’ KPIs are included to detect the failure. The FPRcomputed for each classifier in this graph is the average ofthe FPRs obtained under the two traffic levels for each ofthe time aggregation sizes, to observe only the effect of thechange in the neighborhood definition.

It is notable that, in general, increasing the category ofthe neighborhood definition has a detrimental effect on theFPR values for classifiers of group 1. The FPR values forclassifiers of group 2, on the other hand, exhibit neither aclear improvement nor a clear deterioration when increasingthe size of the neighborhoods considered. The results sug-gest that there is no justification for using neighborhoods ofcategories higher than 3 when considering FPRs values.

In Fig. 9, we observe the response of the ROC AUCvalues with respect to changes in the neighborhood def-inition. A clear distinction in the behavior of classifiersof Groups 1 and 2 can also be observed in terms of theAUC, and group 2 does not appear to benefit from having aneighborhood category higher than 3. The performance ofclassification models in group 1 also deteriorates in termsof the AUC, progressively losing the separation ability asthe neighborhood size increases.

B. EFFECT OF AGGREGATION SIZEIn Fig. 10, we show how the average ROC AUC valuesfor each group of classifiers responds to increasing sizes ofthe aggregation time bins. To build this figure, we averagedthe AUC values obtained by each classifier over the datagenerated under the two levels of traffic and preprocessedunder the 11 neighborhood categories to observe only theeffect produced by the size of the aggregation bins.

We find that increasing aggregation size from 5 to 10minutes has effects that range from mild (for classifiersof Group 2, which already have AUC values higher than0.97) to clearly positive (for classifiers of Group 1) (seeFig. 10). With the exception of Bagged Trees, all modelsin Group 2 clearly benefit from the increase in aggregationfrom 10 to 15 minutes. Increasing the aggregation to 30minutes, however, appears to blur the patterns and provokea deterioration of their performance. Models of Group 1had no performance improvement when augmenting theaggregation size to 15 minutes and had no uniform responseto the increase in aggregation size to 30 minutes.

When averaging to observe the ROC AUC and FPRresponses to all the neighborhood categories and all theaggregations sizes, Extra Trees consistently showed the bestperformance. Naive Bayes, being a less complex classifier,had similar results on average, and might be a sound enoughchoice for an operator in the scenarios at hand.

C. INTERACTION BETWEEN AGGREGATION SIZE ANDNEIGHBORHOOD CATEGORYFigs. 11 and 12 show the average AUC scores for the twobest models: Extra Trees and Naive Bayes, respectively. Inthese figures, to analyze the joint effect of time aggregationsize and neighborhood category, we created a heatmap,

Extra Trees

0.940

1

Neighborhood Category

10

0.960

0.970

0.980

RO

C A

UC

0.950

0.990

Naive Bayes

Group 2

0.930

2 3 4 5 6 7 8 9 11

FIGURE 9: Effect of neighborhood category on AUC foreach classifier.

Extra Trees

30

0.940

5

Aggregation size (min)

10 15

0.945

0.955

0.960

0.970

0.980

0.985

RO

C A

UC

0.950

0.965

0.975

0.990

FIGURE 10: Effect of aggregation on AUC for eachclassifier.

in which lighter colors represent higher AUC scores andconsequently better detection performance.

To compute these values, the AUC values were averagedamong only the two traffic levels to observe the interactionbetween the neighborhood definition and time aggregation.

In the aforementioned figures, it can be noted that bothmethods produced AUC scores close to 1, making evidentthat the models have a high separation capacity for thesedata sets. However, neighborhood categories 2, 3 and 4obtained the best scores, especially when the size of thetime aggregations was 10, 15 and 30.

In particular, the best results were achieved with 15 min-utes of aggregation and neighborhood category 2, allowingthe Extra Trees classifiers to achieve an AUC score of 0.996,and the Naive Bayes classifier a score of 0.993.

VOLUME 4, 2016 11

Manzanilla-Salazar et al.: A Machine Learning Framework for Sleeping Cell Detection in a Smart-city IoT Telecommunications Infrastructure

FIGURE 11: Joint effect of proximity and aggregationlevels on Extra Trees AUC score.

FIGURE 12: Joint effect of proximity and aggregationlevels on Naive Bayes AUC score.

VI. CONCLUSIONS AND FUTURE WORKIn this paper, we proposed a supervised learning frameworkto detect RACH-related sleeping cells in a smart-city cellularinfrastructure. We used well-known binary classificationtechniques to detect network elements at fault based on theanalysis of aggregated KPIs such as the RACH collisionprobability and the delay.

RACH-related sleeping cells are difficult to detect dueto the lack of evidence in the KPIs from a faulty cell. Toovercome this problem, we have proposed to jointly considerthe KPIs of one cell with those from the neighboring cells.We have also proposed a novel definition for neighbors of acell, not choosing the nodes geographically closer to a cellbut rather those that would be more likely impacted by itsfailure.

We used data obtained from a large-scale IoT networksimulator that employs real data on the telecommunicationinfrastructure and on the position of IoT nodes in a smart-city environment. Although LTE was chosen to obtainnumerical results, the proposed framework can easily be

adapted to other cellular technologies, such as 5G.Different time aggregation interval sizes were tested for

the KPIs: 15 minutes resulted in the aggregation interval thatpermitted achieving the highest AUC. This aggregation levelpermits heavily reducing the amount of data to be analyzedby a network operator to detect faulty elements, resulting inlarge potential savings. The numerical results also provedExtra Trees and Naive Bayes to be the most effective binaryclassification techniques among the ones considered in thiswork. Broadly speaking, the results suggest that simple andensemble models, known for being less prone to overfitting,are superior to kernel models and neural networks.

The preprocessing approach based on aggregations andthe inclusion of information regarding the “neighborhood”of a BS has shown its value in the classification task. Weare currently working on how to adapt this strategy forforecasting and anomaly detection.

REFERENCES[1] J. Jin, J. Gubbi, S. Marusic, and M. Palaniswami, “An information frame-

work for creating a smart city through internet of things,” IEEE Internet ofThings journal, vol. 1, no. 2, pp. 112–121, 2014.

[2] A. Zanella, N. Bui, A. Castellani, L. Vangelista, and M. Zorzi, “Internet ofthings for smart cities,” IEEE Internet of Things journal, vol. 1, no. 1, pp.22–32, 2014.

[3] J. Lin, W. Yu, N. Zhang, X. Yang, H. Zhang, and W. Zhao, “A surveyon internet of things: Architecture, enabling technologies, security andprivacy, and applications,” IEEE Internet of Things Journal, vol. 4, no. 5,pp. 1125–1142, 2017.

[4] F. Malandra, L. Chiquette, L.-P. Lafontaine-Bédard, and B. Sansò, “Trafficcharacterization and LTE performance analysis for M2M communicationsin smart cities,” Pervasive and Mobile Computing, no. 48, pp. 59–68, 2018.

[5] R. Barco, P. Lazaro, and P. Munoz, “A unified framework for self-healingin wireless networks,” IEEE Communications Magazine, vol. 50, no. 12,2012.

[6] M. Polese, M. Centenaro, A. Zanella, and M. Zorzi, “M2M massive accessin lte: Rach performance evaluation in a smart city scenario,” in 2016 IEEEInternational Conference on Communications (ICC). IEEE, 2016, pp. 1–6.

[7] A. Coluccia, A. D’Alconzo, and F. Ricciato, “Distribution-based anomalydetection via generalized likelihood ratio test: A general maximum entropyapproach,” Computer Networks, vol. 57, no. 17, pp. 3446–3462, 2013.

[8] M. Z. Shafiq, L. Ji, A. X. Liu, J. Pang, and J. Wang, “A first look at cellularmachine-to-machine traffic: large scale measurement and characteriza-tion,” in ACM SIGMETRICS Performance Evaluation Review, vol. 40.ACM, 2012, Conference Proceedings, pp. 65–76.

[9] ——, “Large-scale measurement and characterization of cellular machine-to-machine traffic,” Networking, IEEE/ACM Transactions on, vol. 21,no. 6, pp. 1960–1973, 2013.

[10] I. de-la Bandera, R. Barco, P. Munoz, and I. Serrano, “Cell outagedetection based on handover statistics,” IEEE Communications Letters,vol. 19, no. 7, pp. 1189–1192, 2015.

[11] S. Rezaei, H. Radmanesh, P. Alavizadeh, H. Nikoofar, and F. Lahouti,“Automatic fault detection and diagnosis in cellular networks using ope-rations support systems data,” in Network Operations and ManagementSymposium (NOMS), 2016 IEEE/IFIP. IEEE, 2016, pp. 468–473.

[12] R. M. Khanafer, B. Solana, J. Triola, R. Barco, L. Moltsen, Z. Altman,and P. Lazaro, “Automated diagnosis for UMTS networks using Bayesiannetwork approach,” IEEE Transactions on vehicular technology, vol. 57,no. 4, pp. 2451–2461, 2008.

[13] C. M. Mueller, M. Kaschub, C. Blankenhorn, and S. Wanke, “A cell out-age detection algorithm using neighbor cell list reports,” in InternationalWorkshop on Self-Organizing Systems. Springer, 2008, pp. 218–229.

[14] F. Chernogorov, J. Turkka, T. Ristaniemi, and A. Averbuch, “Detectionof sleeping cells in lte networks using diffusion maps,” in VehicularTechnology Conference (VTC Spring), 2011 IEEE 73rd. IEEE, 2011,pp. 1–5.

12 VOLUME 4, 2016

Manzanilla-Salazar et al.: A Machine Learning Framework for Sleeping Cell Detection in a Smart-city IoT Telecommunications Infrastructure

[15] M. Panda and P. M. Khilar, “Distributed soft fault detection algorithmin wireless sensor networks using statistical test,” in 2012 2nd IEEEInternational Conference on Parallel, Distributed and Grid Computing.IEEE, 2012, pp. 195–198.

[16] Y. Ma, M. Peng, W. Xue, and X. Ji, “A dynamic affinity propagationclustering algorithm for cell outage detection in self-healing networks,”in Wireless Communications and Networking Conference (WCNC), 2013IEEE. IEEE, 2013, pp. 2266–2270.

[17] A. Gómez-Andrades, P. Muñoz, E. J. Khatib, I. de-la Bandera, I. Serrano,and R. Barco, “Methodology for the design and evaluation of self-healingLTE networks,” IEEE Transactions on Vehicular Technology, vol. 65,no. 8, pp. 6468–6486, 2015.

[18] M. Sun, H. Qian, K. Zhu, D. Guan, and R. Wang, “Ensemble learning andsmote based fault diagnosis system in self-organizing cellular networks,”in GLOBECOM 2017-2017 IEEE Global Communications Conference.IEEE, 2017, pp. 1–6.

[19] O. Manzanilla-Salazar, F. Malandra, and B. Sansò, “eNodeB failure detec-tion from aggregated performance KPIs in smart-city LTE infrastructures,”in 15th International Conference on the Design of Reliable Communica-tion Networks, DRCN 2019, Coimbra, Portugal, March 19-21, 2019, 2019,pp. 51–58.

[20] E. J. Khatib, R. Barco, A. Gómez-Andrades, P. Muñoz, and I. Serrano,“Data mining for fuzzy diagnosis systems in LTE networks,” ExpertSystems with Applications, vol. 42, no. 21, pp. 7549–7559, 2015.

[21] “Smart cities M2M traffic characterization and performance analysis,”https://www.trafficm2modelling.com/home, accessed: 2019-02-01.

[22] H. Zhang, Y. Ren, K.-C. Chen, L. Hanzo et al., “Thirty years of machinelearning: The road to pareto-optimal next-generation wireless networks,”arXiv preprint arXiv:1902.01946, 2019.

[23] B. Cheung, S. Fishkin, G. Kumar, and S. Rao, “Method of monitoringwireless network performance,” Mar. 23 2006, uS Patent App. 10/946,255.

[24] Q. Liao, M. Wiczanowski, and S. Stanczak, “Toward cell outage detectionwith composite hypothesis testing,” in Communications (ICC), 2012 IEEEInternational Conference on. IEEE, 2012, pp. 4883–4887.

[25] F. Chernogorov, S. Chernov, K. Brigatti, and T. Ristaniemi, “Sequence-based detection of sleeping cell failures in mobile networks,” WirelessNetworks, vol. 22, no. 6, pp. 2029–2048, 2016.

[26] A. Gómez-Andrades, P. Muñoz, I. Serrano, and R. Barco, “Automatic rootcause analysis for LTE networks based on unsupervised techniques,” IEEETransactions on Vehicular Technology, vol. 65, no. 4, pp. 2369–2386,2016.

[27] X. Liu, G. Chuai, W. Gao, and K. Zhang, “GA-AdaBoostSVM classifierempowered wireless network diagnosis,” EURASIP Journal on WirelessCommunications and Networking, vol. 2018, no. 1, p. 77, 2018.

[28] E. J. Khatib, R. Barco, P. Muñoz, and I. Serrano, “Knowledge acquisitionfor fault management in LTE networks,” Wireless Personal Communica-tions, vol. 95, no. 3, pp. 2895–2914, 2017.

[29] F. Malandra, P. Potvin, S. Rochefort, and B. Sansò, “A case study forM2M traffic characterization in a smart city environment,” in InternationalConference on Internet of Things and Machine Learning (IML 2017),Liverpool, UK, Oct. 2017, pp. 1–9.

[30] “Spectrum management system data,” https://sms-sgs.ic.gc.ca/eic/site/sms-sgs-prod.nsf/eng/h_00010.html, accessed: 2019-09-09.

[31] R. V. Akhpashev and A. V. Andreev, “Cost 231 hata adaptation model forurban conditions in lte networks,” in 2016 17th International Conferenceof Young Specialists on Micro/Nanotechnologies and Electron Devices(EDM), June 2016, pp. 64–66.

[32] S. Hämäläinen, H. Sanneck, and C. Sartori, LTE self-organising networks(SON): network management automation for operational efficiency. JohnWiley & Sons, 2012.

[33] F. Aurenhammer, “Voronoi diagrams—a survey of a fundamental geo-metric data structure,” ACM Computing Surveys (CSUR), vol. 23, no. 3,pp. 345–405, 1991.

ORESTES G. MANZANILLA-SALAZAR(M’19) received his B.S. (2002) in productionengineering and his M.S. (2005) in systems engi-neering from Simón Bolívar University (USB) inVenezuela. He is currently working on his Ph.Din electrical engineering at Ecole Polytechniquede Montreal, Montreal, QC, Canada.

From 2008 to 2015, he was an AssistantLecturer at USB. He taught courses in the areasof mathematical programming, decision science

and operations management and focused his research on linear and integerprogramming models to create supervised classifiers. He worked 6 years inseveral positions for private firms in the areas of engineering, technologyand finance. His present research focus is on machine learning and analyticstools for telco networks management and on network-enabled systems.

FILIPPO MALANDRA (M’17) received B.Eng.and M.Eng. degrees in telecommunications engi-neering from the Politecnico di Milano, Milan, in2008 and 2011, respectively, and a Ph.D. degreein electrical engineering from Ecole Polytech-nique de Montreal, Montreal, QC, Canada, in2016.

He is currently an Assistant Professor of Re-search with the Department of Electrical Engi-neering at the University at Buffalo. His research

interests include the performance analysis and simulation of mobile net-works (4G/5G) for smart grids, smart cities and IoT applications.

HAKIM MELLAH received his B.S (1996) andM.Sc.(1999) in electrical engineering and hisPh.D. in electrical and computer engineering fromConcordia University, Montreal, QC, Canada, in2018.

He is currently a postdoc at Ecole Polytech-nique de Montreal. His research interests focusmainly on machine quality of experience for IoTapplications in a smart-city environment.

CONSTANT WETTE holds an M.Sc. in com-puter engineering from Ecole Polytechnique deMontreal, Montreal, QC, Canada (2000), anM.Eng. in electrical engineering from EcolePolytechnique de Yaoundé, Cameroon (1996),an MBA from HEC Montreal, Montreal, QC,Canada (2010) and a Graduate Certificate in datascience from Harvard University (2016).

He is a Senior System Developer at BusinessArea Digital Services, Ericsson. He joined Erics-

son in 2000 and has led research projects, product development, innovationand new business development in different technology domains includingdata analytics, IoT/M2M, IMS and telecommunication cloud architectures.

VOLUME 4, 2016 13

Manzanilla-Salazar et al.: A Machine Learning Framework for Sleeping Cell Detection in a Smart-city IoT Telecommunications Infrastructure

BRUNILDE SANSÒ (M’92–SM’17) is a FullProfessor of telecommunication networks withthe Department of Electrical Engineering, EcolePolytechnique de Montreal, and the Director ofthe LORLAB, a research group dedicated to de-veloping methods for the design and performanceof wireless and wireline telecommunication net-works.

Over her career of more than 25 years, she hasreceived several awards and honors, has published

extensively in the telecommunication networks and operations researchliterature, and has been a consultant for telecommunication operators,equipment and software manufacturers, and the mainstream media.

14 VOLUME 4, 2016