Ville Makkonen NEURAL NETWORKS IN DIGITAL ... - Aaltodoc

56
HELSINKI UNIVERSITY OF TECHNOLOGY Faculty of electrical engineering Ville Makkonen NEURAL NETWORKS IN DIGITAL EXCHANGE ALARM CLASSIFICATION Thesis submitted for the degree of Master of Science in Engineering in Espoo 12th September 1997 Supervisor professor Olli Simula Instructor B.Sc. Väinö Reinikainen Nokia Telecommunications

Transcript of Ville Makkonen NEURAL NETWORKS IN DIGITAL ... - Aaltodoc

HELSINKI UNIVERSITY OF TECHNOLOGY Faculty of electrical engineering

Ville Makkonen

NEURAL NETWORKS IN DIGITAL EXCHANGE ALARM CLASSIFICATION

Thesis submitted for the degree of Master of Science in Engineering in Espoo 12th September 1997

Supervisor professor Olli Simula

Instructor B.Sc. Väinö ReinikainenNokia Telecommunications

HELSINKI UNIVERSITY OF TECHNOLOGY Abstract of the Master’s Thesis

Author: Ville Makkonen

Name of the Thesis: Neural networks in digital exchange alarm classification

Date: 12th September 1997 Number of pages: 49

Faculty: Faculty of computer and information science

Professorship: Information technology Code: Tik-61

Supervisor: professor Olli Simula

Instructor: B.Sc. Väinö Reinikainen, Nokia Telecommunications

DX 200 telephone exchange is such a large and complex system, that real reasons behind error situations may be difficult to discern. The amount of alarm messages delivered to the operator needs to be decreased, so that the operating state of the exchange can be easily seen.

The Self-Organizing Map (SOM) is a popular neural network model. The ability of SOM to find patterns in high-dimensional data is interesting in the case of reducing alarm flow by clustering the alarms. The excellent visualization capabilities of SOM allow for easy manual examination of large data sets.

This work evaluates several possible uses of neural network algorithms to the alarm reduction problem, concentrating on the SOM. The use of SOM in data mining is discussed in more depth: alarm data can be analyzed to find correlations that in turn can be used in forming filtering rules for the DX 200 built-in alarm filtering system.

Furthermore, an alarm data mining tool is designed and a prototype version implemented as a part of the work. The tool combines a view of the data as it is and a SOM-created representation of it. The user can jump between the corresponding points in these two representations. The tool was tested on sample alarm histories, and potential alarm groups for filtering could easily be located.

Keywords: neural network, self-organizing map, exchange, alarm,data mining

и

TEKNILLINEN KORKEAKOULU Diplomityön tiivistelmä

Tekijä: Ville Makkonen

Työn nimi: Neuroverkkojen käyttö digitaalisen keskuksen hälytysten luokittelussa

Päivämäärä: 12. syyskuuta 1997 Sivumäärä: 49

Osasto: Tietotekniikan osasto

Professuuri: Informaatiotekniikka Koodi: Tik-61

Työn valvoja: professori Olli Simula

Työn ohjaaja: Ins. Väinö Reinikainen, Nokia Telecommunications

DX 200 -puhelinkeskus on varsin laaja ja monimutkainen järjestelmä, joten vikatilanteiden todellisia syitä voi olla vaikea selvittää. Käyttöhenkilökunnalle tulevien hälytysten määrää on rajoitettava, jotta keskuksen toimintatila olisi helposti todettavissa.

Itseorganisoituva kartta eli SOM on suosittu neuro verkkomalli. Sen kyky löytää hahmoja korkeaulotteisista datajoukoista on kiinnostava hälytysten ryhmittelyn kannalta. Itseorganisoituvan kartan hyvät visualisointiominaisuudet mahdol­listavat suurten tietomäärien helpon tarkastelun.

Työssä arvioidaan useita mahdollisuuksia käyttää neuroverkkoja hälytysten määrän rajoittamiseen, keskittyen erityisesti itseorganisoituvaan karttaan. Kartan käyttö tiedon louhinnassa on käsitelty tarkemmin: hälytysdataa voidaan analysoida korrelaatioiden löytämiseksi ja suodatussääntöjen luomiseksi DX 200 -keskuksen sisäänrakennettuun hälytysten suodatusjärjestelmään.

Osana työtä on lisäksi hälytystiedon tutkimistyökalun suunnittelu ja prototyypin toteutus. Työkalu näyttää hälytyshistorian sekä alkuperäisessä muodossaan että itseorganisoituvana karttana. Käyttäjä voi siirtyä näiden esitysten vastaavien kohtien välillä. Työkalua kokeiltiin muutamilla hälytyshistorioilla ja suodatukseen sopivia hälytysryhmiä löydettiin helposti.

neuroverkko, itseorganisoituva kartta, puhelinkeskus, hälytys, tiedon louhinta

iii

Avainsanat:

I

ACKNOWLEDGEMENTS

This master’s thesis has been done while working at Nokia Telecommunications Research and Development unit, System Software department. I would like to thank my employer for financing my work and for the opportunity to focus on my studies.

Special thanks to my former section manager Matti Oosi for coming up with an idea for the subject of this thesis, and to my instructor Väinö Reinikainen for providing support and expertise. I would also like to thank my supervisor professor Olli Simula for his interest and encouraging attitude.

Additional thanks to the alarm system engineers Arto Tikkanen and Eero Kauria for an introduction to the field, and to Kimmo Hätönen in Nokia Research Center for valuable comments and ideas.

Helsinki,12th September 1997

Ville Makkonen

IV

TABLE OF CONTENTS

ABSTRACT................................................................................................................ ii

TIIVISTELMÄ - ABSTRACT IN FINNISH.......................................................... iii

ACKNOWLEDGEMENTS...................................................................................... iv

TABLE OF CONTENTS........................................................................................... v

TERMS AND ABBREVIATIONS.......................................................................... vii

1. INTRODUCTION...................................................................................................1

2. DX 200 DIGITAL EXCHANGE........................................................................... 22.1 Architecture.............................................................................................................2

2.1.1 Hardware....................................................................................................22.1.2 Software......................................................................................................4

2.2 System maintenance............................................................................................... 52.2.1 Supervision system......................................................................................62.2.2 Alarm system...............................................................................................62.2.3 Recovery system..........................................................................................62.2.4 Fault locating system - diagnostics............................................................ 6

2.3 Alarm system........................................................................................................... 72.3.1 Alarm filtering...........................................................................................10

3. NEURAL NETWORKS........................................................................................113.1 General.....................................................................................................................11

3.1.1 Introduction to neural networks................................................................113.1.2 ANN design...............................................................................................123.1.3 Categories of ANNs...................................................................................133.1.4 Historical perspective...............................................................................14

3.2 Competitive-learning networks.......................................................................153.2.1 Self-organizing map (SOM)......................................................................153.2.2 Learning vector quantization (LVQ).........................................................18

4. INSPECTION OF ALTERNATIVES.................................................................194.1 Background............................................................................................................. 19

4.1.1 Available data...........................................................................................194.1.2 Alarm System Requirements......................................................................204.1.3 Neural networks in telecommunications fault management.................... 214.1.4 SOM and LVQ algorithms........................................................................22

4.2 Alarm grouping..................................................................................................... 234.3 Alarm filtering...................................................................................................... 24

4.3.1 Experiment with LVQ................................................................................244.4 State monitoring...................................................................................................27

4.4.1 Overview...................................................................................................27

4.4.2 Application example..................................................................................274.4.3 Possibilities...............................................................................................28

4.5 Data mining..............................................................................................................294.5.1 Overview...................................................................................................294.5.2 Application examples................................................................................304.5.3 Possibilities...............................................................................................31

4.6 Choices for further study................................................................................. 32

5. DATA MINING TOOL REQUIREMENTS..................................................... 335.1 Introduction.......................................................................................................... 335.2 Description of features.......................................................................................33

5.2.1 Data gathering and display......................................................................345.2.2 Filtering....................................................................................................345.2.3 Feature selection and scaling.................................................................. 345.2.4 SOM initialization and training................................................................345.2.5 SOM visualization.....................................................................................35

5.3 User interface........................................................................................................ 355.3.1 Data file view............................................................................................355.3.2 Map construction view..............................................................................365.3.3 Map view...................................................................................................36

5.4 Functional specification.....................................................................................365.5 Further development........................................................................................... 36

6. ACCORD PROTOTYPE VERSION................................................................. 376.1 Introduction...........................................................................................................376.2 Implementation of features............................................................................... 37

6.2.1 Data gathering and display......................................................................376.2.2 Feature selection and scaling...................................................................386.2.3 SOM initialization and training................................................................396.2.4 SOM visualization.....................................................................................39

6.3 User interface........................................................................................................ 406.3.1 Data file view............................................................................................406.3.2 Map construction view..............................................................................406.3.3 Map view...................................................................................................41

6.4 Further development........................................................................................... 426.5 Alarm classification results............................................................................43

7. CONCLUSIONS....................................................................................................46

REFERENCES..........................................................................................................48

VI

TERMS AND ABBREVIATIONS

AI Artificial intelligence: the ability of an artificial system to perform "intelligent" tasks

ANN Artificial neural networkBCD Binary coded decimal - a coding method in which a decimal

number is binary coded one digit at a time

BMU Best matching unit

DMC Direct Memory Communications, an internal processor bus of one computer unit in the DX 200 system.

DX 200 Telephone exchange product family of Nokia Telecommunications

cluster A set of data points close together in the data space.GA Genetic algorithm: a learning method utilizing artificial evolution

where generations of solutions compete against each other.

GUI Graphical user interfaceKDD Knowledge discovery in databases: a field of research, the purpose

of which is to locate new information within large data spacesLVQ Learning vector quantization - a supervised-learning ANN

algorithmMLP Multi-layer perceptron - a feedforward type ANNNN Neural networkOMU Operation and maintenance unitSOM Self-organizing map - an unsupervised clustering ANN algorithm

trunk circuit A circuit connecting two exchanges.u-matrix Unified distance matrix. A method of displaying data clusters with

SOM.

Vil

1. INTRODUCTION

The DX 200 digital exchange, which may consist of hundreds of computer units, is such a large system that the real reasons behind error situations are not always clear. The maintenance personnel is faced with a constant stream of alarm messages, many of which are not at all relevant to any real problem, and many different messages may be symptoms of the same fault. There is a need to decrease the volume of the alarms to enable the operators to see the state of the exchange more easily.

Neural networks are parallel networks of simple processing units called neurons. The network acquires experiential knowledge through a learning process and builds an implicit model of the available data. Neural networks have been successfully used in problems relating to process monitoring /Kohonen 95/ and fault management /Lau/, /Sone/. The ability to find patterns in high-dimensional data is interesting in the case of reducing alarm flow either by filtering out unnecessary alarms or clustering the alarms into broader groups.

The goal of this work is to find and evaluate neural network methods that could be used in alarm classification. The neural network algorithms considered are Self- Organizing Map (SOM) and Learning Vector Quantization (LVQ), developed by professor Kohonen. The alarm classification should result in the reduction of the number of alarm messages, or make more descriptive information available on exchange failures.

• In chapter 2 the DX 200 telephone exchange is presented, with special emphasis on fault management and the alarm system.

• Chapter 3 introduces neural networks in general, describes their use and relation to other information processing methods, and gives detailed information on the SOM and LVQ algorithms.

• In chapter 4 possible alternatives for using neural networks in the alarm system are discussed, evaluating their usefulness. Relevant applications of neural nets, the requirements set by the designers and users of the alarm system, and the nature of the data itself are presented as background.

• Chapter 5 specifies the requirements for a data mining tool, used to locate correlations in alarm history data, based on which filtering rules can be created.

• In chapter 6 a prototype tool is described, and properties to be added to the final version of the tool are discussed. Results achieved with the tool are demonstrated as well.

l

2. DX 200 DIGITAL EXCHANGE

The DX 200 telephone switch is presented with special emphasis on maintenance services. The alarm system is discussed in more depth, as its properties are crucial in developing and training the neural network.

2.1 Architecture

The overall structure of the DX 200 system is briefly described to create a necessary background of terms and concepts.

2.1.1 Hardware

The DX 200 is basically a multiprocessor computer system with many similar computer units dedicated to different tasks that communicate by way of a message bus. The units are based on Intel x86 microprocessors and may have varying amounts of memory or different peripherals depending on their task. Each functional unit (including the computer unit, power supply, peripherals and the DMC local bus) is assembled in a cartridge frame. The different cartridges are then put in cartridge racks, which contain the cabling for buses and telephone connections.

Figure 2.1 : DX 200 Hardware a) plug-in unit b) cartridge c) rack

DX 220 is the high-capacity switch in the DX 200 product family. It contains all the usual unit types, some of which are multiplied in order to achieve higher reliability and processing ability. The units and their connections are illustrated in figure 2.2. Other DX 200 systems have similar, but less extensive, structures.

2

MFSUCCSU

OMUсеми

Figure 2.2 : DX 220 High capacity exchange block diagram

Table 2.1 : abbreviations used in figure 2.2CCMU Common Channel Signalling Management UnitCCSU Common Channel Signalling Unit (provides signalling to lines)CHU Charging Unit (collects call charging information)CLO Clock SystemCM Central MemoryDCU Data Communication UnitET External Terminal (connects lines with digital signals)GSW Group Switch (connects incoming and outgoing lines)I/O Input and Output Interfaces (terminals, printers, etc.)LSU Line Signalling UnitM Marker (controls GSW)MFSU Multi-Frequency Service UnitOMU Operation and Maintenance UnitPAU Primary Rate Access Unit (ISDN signalling)RSS Remote Subscriber Stage (collects local telephone lines together) RSU Remote Switching Unit (performs some local switching)SSU Subscriber Signalling UnitSTU Statistical Unit (collects traffic statistics)SUB Subscriber Unit (connects lines with analog signals)

3

2.1.2 Software

As with hardware, also the software in DX 200 is highly modular. The software is organized in hierarchical elements with well defined interfaces, called platforms. The platforms allow building customized systems with less effort, because the same software basis can be used for many different configurations.

Figure 2.3 : Platforms, and some products based on them

Table 2.2 : abbreviations used in figure 2.3DX 220 large exchangeDX 210 small/medium exchangeRSU Remote Switching UnitRSS Remote Subscriber StageMSC GSM Mobile Services Switching CentreBSC GSM Base Station ControllerDCS Digital Cross-connect systemSXC Service Cross-connect systemOMC Operation and Maintenance CentreHLR GSM Home Location Register

The platforms are further divided into system blocks, each of which contains a collection of service blocks. The service blocks define groups of services offered by the platform. The service blocks in turn consist of program blocks, which offer services or parts of services.

4

DX 200 Computing PlatformSystem Maintenance

Alarm System SystemSupervision

Diagnostics ConfigurationManagement

Recovery

Basic Computing Services

OperatingSystem

I/O Services

File Services Protocols

DatabaseManagement

Figure 2.4 : System blocks and service blocks in the computing platform

2.2 System maintenance

The major function of the DX 200 maintenance system is to ensure that the call switching ability of the exchange is not reduced in fault situations. The design adheres to ССГГТ usability standards, according to which the exchange is permitted an average of 40 minutes of non-operation in a 20 year period.

In a fault situation the faulty unit is isolated from the rest of the system and replaced by a standby unit, if one exists. The user then receives information concerning the faulty unit and the location of the fault, with precision ranging from the exact plug-in unit to a few plug-in units. The maintenance system is divided into fields as shown in figure 2.5. /NTC 93/

RECO­VERY

SYSTEM

ALARMSYSTEM

SUPER­VISION

SYSTEM

FAULTLOCA­TING

SYSTEM

Report to the user

Figure 2.5 : Fields of maintenance

5

2.2.1 Supervision system

The supervision system performs continuous testing to detect irregularities. It sends supervision messages to all processes and expects a reply: if the process does not reply in time, it is restarted. Monitoring is performed by test programs, error rate counters, and some hardware failure signals (such as power failure signals). Supervision produces disturbance and fault observations and notifies the alarm system about them.

2.2.2 Alarm system

The alarm system attempts to identify the faulty functional unit on the basis of the information received from the supervision system. Information on the detected unit is forwarded to the recovery system. Alarm system is discussed in depth in chapter 2.3.

2.2.3 Recovery system

The task of the recovery system is to eliminate the effects of faults. The faulty unit is separated and a redundant spare unit takes its place. This operation, called switchover, is quick because the spare unit is synchronized with the active unit.

The units may be in five basic working states: WO-EX is the state of the active unit, SP-EX the spare unit, TE-EX a unit under testing, SE-OU a faulty unit, and SE-NH a created but yet non-existent unit. The states and transitions are shown in figure 2.6.

working - executing

WO-EX

spare - executing

SP-EX

separated - out of

use

SE-OU

test - executing

TE-EX

separated - no

hardware

SE-NH

Figure 2.6 : Unit state transitions

2.2.4 Fault locating system - diagnostics

After recovering the failure, the faulty unit is set to test state TE-EX for more precise fault location. Diagnostics then attempts to pinpoint the exact plug-in unit that needs to be replaced. (Alarm system only singles out a functional unit, which usually consists of multiple plug-in units.)

6

Fault location utilizes test programs that expect certain results from each unit: if the result is different, the unit in question is faulty. The system tests each plug-in unit, for most evident faults first, until the first fault is located. The system does not consider multiple faults.

2.3 Alarm system

/Karling/, /NTC 93/

The DX 200 alarm system is divided into distributed part (DPA) and centralized part (CPA). Every computer unit has its own DPA, provided by the ASYLffi library and DP ALAR program block. The single CPA, provided by ALARMP program block, resides in the Operation and Maintenance Unit (OMU) and Central Memory Unit (CM).

UNITS

4 = information to the user (printer)

Figure 2.7 : Alarm system principle

DPA receives disturbance and fault observations from applications running in the same computer unit. According to these, the system creates, supports, prevents, and destroys fault hypotheses. Each setting of an observation supports one or more fault hypotheses by a predefined amount. Resetting of the observation weakens the same hypotheses an equal amount. DPA forms observation hypotheses directly from observations without defined support values. DPA notifies the CPA and initiates local recovery for I/O devices.

CPA receives all fault hypotheses from DPA’s and forms its own hypotheses on faults over the entire exchange. CPA notifies the recovery system and the user via alarm printer. The structure of the printout is described in figure 2.8.

7

i 2 3 4 5 6

E 8 9 10 11 12 13 14

15 16 17

18

19

20

20

Figure 2.8 : Structure of the alarm printout /NTC 96/

1.2.3.4.5.

6.7.

8.

9.10.

11.

12.13.14.15.16.17.18.19.20.

Type of alarm printout - normal (empty) or update (<UPDT>)Name of the exchangeName of the remote object (if object unit of alarm is located in one) Computer sending the alarm Alarm equipment type

SWITCH switchingO&M operation and maintenanceTRANSM transmissionPOWER powerEXTERN external

Date and time Urgency level

*** immediate action* * action during normal working hours* no action required

Printout typeALARM fault situationCANCEL fault terminatedDISTUR disturbanceNOTICE notice

Alarm object: the unit which is the object of the alarm Position coordinates of the alarm object (row/rack/vert/horiz) Program block issuing the alarm Alarm originated from trial configuration (TRIAL) Automatic recovery initiated (*RECOV*)Alarm set before distributed alarm system started (LIB)Consecutive number of failure printoutAlarm numberText description of the alarmSupplementary information fieldsSupplementary text (in some alarms)Operating instructions (if the user has defined any)

8

Example of an alarm printout:

DX220-LAB OMU SWITCH 1995-06-27 14:30:36.73** ALARM CLAB-1 1A153-62 H1RECE *RECOV*

(0015) 2761 CLAB FAILURE07 00

ALARM? program block sends the alarm information to the printer control process and stores it into an alarm history file. The data is stored in the list_alarm_t data structure, a simplified version of which is presented in table 2.3.

Table 2.3 : listalarmt data structure

alarm_state0 - setting of alarm1 - cancelling of alarm

printing info0 - normal case1 - recovery actions initiated

alarmjnumberBCD number 0001-6999

objecttypetype of the object unit (word)

object_indexindex of the object unit (word)

used_info_countused length of extra_info field

extra_info16 byte array for alarm-specific information

setting familyprocess family id of setting program block (word)

setting timestruct of BCD numbers: year, month, day, hours, minutes, seconds, 1/100 seconds

sending_unit_typetype of the unit where the observation was set (word)

sending_unit_indexindex of the unit where the observation was set (word)

objunitcoordinatesposition coordinate array (6 bytes)

alarm_class1 - notice2 - disturbance 4 - * alarm8 - ** alarm 16 - *** alarm

consecutive_nrsequential number (0001-9999) assigned to all * / ** / *** alarms

9

2.3.1 Alarm filtering

The alarm system includes a way to filter the alarm flow by use of filtering rules. The four different rule types are described below:

• Prevention rule prevents the display of selected alarms of minor significance if a certain critical alarm is set.

Figure 2.9 : Use of prevention rule

• Support rule causes a new, more severe alarm when the combined supporting rates of certain less severe alarms exceed a threshold value.

Figure 2.10 : Use of support rule

• Informing delay postpones the setting of alarms. It can be used to filter out short failures.

Figure 2.11 : Use of informing delay

• Cancelling delay postpones the cancelling of alarms. It combines several repetitions of an alarm into a single occurrence.

Figure 2.12 : Use of cancelling delay

to

3. NEURAL NETWORKS

3.1 General

3.1.1 Introduction to neural networks

/Науkin/, /Oja/

When talking about neural networks, the more precise term artificial neural network (ANN) should be used to distinguish from biological neural networks, such as the brain. In practice, however, it is customary to drop "artificial".

An artificial neural network is a massively parallel network of interconnected, simple, adaptive processing units - neurons - that has a propensity for storing experiential knowledge and making it available for use. It resembles an animal brain in two respects:

1. Knowledge is acquired through a learning process.2. Knowledge is stored in the form of connection strengths between the

neurons.

The term knowledge refers here to stored information or models the ANN uses to understand and interact with its environment. The task of the ANN is thus to create a model of its environment and to maintain that model sufficiently consistent to provide useful data to the application of interest. Knowledge of the environment consists of two kinds of information:

1. The known state of the environment (prior information).2. Observations (measurements) of the environment. These provide the pool

of examples used to train the ANN.

In supervised learning each example consists of an input-output pair: an input signal and the corresponding desired output. A subset of examples is used to train the ANN by means of a suitable algorithm. The performance of the trained network is evaluated with previously unseen data: an input signal is presented to the network, but it is not told the desired output for that signal. The result reported by the ANN is then compared with the actual result. The input-output pairs can be used to solve a variety of problems:

• Classification: inputs are patterns, outputs are classes• Prediction: inputs are present and past values, outputs are future values.• Control: inputs are measurements, outputs are control signals.• Diagnosis: inputs are symptoms, outputs are decisions

it

In unsupervised learning, on the other hand, there are no known outputs. The neural network is used to arrange the information into meaningful patterns on the basis of similarity or dissimilarity of measurements (inputs) only. The result is a clustering of the environment. The clusters can later be given appropriate labels, and be used for classification, for example. Unsupervised learning is useful when knowledge of the environment is incomplete.

The differences between neural networks and "classical" artificial intelligence (rule- based expert systems) is summarized in table 3.1.

Table 3.1 : Artificial intelligence vs. neural networksNNAI• Patterns, nonsymbolic data processing• Signal transformations• Massive parallelism• Learning, self-organization• Approximations also tolerated

• Logic, symbolic data processing• Search in large data spaces• Serial inference• Programming• Exact solutions required

3.1.2 ANN design

/Haykin/

The design of a neural network is based directly on real data, with the data set itself being used to build an implicit model of the environment. Thus an ANN "builds its own model" when attempting to recognize the desired properties of the environment. However, some rules should be observed by the designer regarding the data representation in the ANN:

1. Similar inputs should produce similar results and should thus have similar representations inside the network.

2. Items to be classified to separate categories should have widely different representations in the network.

3. If a particular feature is important, there should be a large number of neurons representing that feature in the network.

4. Prior information and invariances should be incorporated in the network design, so that the network does not have to learn them.

The designer has to worry about the first two rules only in the case of supervised learning. In unsupervised learning the similarity of inputs is the main criterion for clustering and similar items automatically end up close together on the network.

The last rule is important for all neural nets, because the more specialized the ANN is, the better results it yields. A specialized network is smaller, faster, more accurate, and can be trained more easily. Building prior information and invariances into a

12

neural network is accomplished through feature extraction. The raw data is pre- processed in such a way to produce few transformation-invariant features, which represent the important features of the environment (for example rotation- and translation-invariant image data, or pitch- and volume-invariant speech data).

It is also important to know when not to use a neural network. ANNs are not suitable for every type of problem - part of the designer’s expertise must lie in the ability to choose the best method for each problem. Figure 3.1 lists some suitable information processing systems for tasks of different kinds.

1[ Physical models

much

Rule-based expert systemslittle

Amount ofFuzzy logic

Amount ofknowledge empirical data

& expertiseNeural networks

little Statistical methods \much

Figure 3.1 : Comparison of suitable systems for different AI tasks /Oja/

3.1.3 Categories of ANNs

Signal-transfer networks. The output signal values depend uniquely on input signals: a typical example would be a layered feed-forward network, such as the multilayer perceptron (MLP) /Haykin/. In MLP, teaching takes place with the backpropagation algorithm: The desired output value is propagated backwards through the network and the neuron weights adjusted.

State-transfer networks. Feedback is very strong. The input value is an initial state and the network quickly converges to another stable state, which is then the result value. Examples of this category include the Hopfield network /Haykin/ and the Bolzmann machine /Haykin/.

Competitive learning networks. The cells of the network receive identical information and then compete in their activities. Each cell acts as a decoder for a domain. This category includes the Self-organizing map (SOM) /Kohonen 95/ and the Learning vector quantization (LVQ) /Kohonen 95/.

13

3.1.4 Historical perspective

/Науkin, Oja/

The first pioneering work on neural networks was done by McCulloch and Pitts in 1943. They described a logical calculus for ANNs and invented the concept of neurons. The McCulloch-Pitts formal neuron is described in figure 3.2.

c(t+1 ) = 1 ifbi(t) = b2(t) = ... bn(t) = 0 and ai(t) + a2(t) + ... an(t) > 0

c(t+1) = 0 otherwise

c

Figure 3.2 : McCulloch-Pitts formal neuron

The next step in ANN development was Hebb’s book on learning in 1949. He presented the idea that learning means adding new connections between the synapses of the brain. Another pioneer, Rosenblatt, proposed an improved design for the formal neuron for pattern recognition purposes, called the perceptron in 1958. He also devised one of the first learning algorithms.

The 60’s and 70’s were a transitional period for neural network research. The next major advancements were seen in 1981, when Kohonen published his first paper on self-organizing maps. In the following year Hopfield introduced the idea of using energy functions from statistical physics as an analogy for neural network learning process.

One of the most popular ANN designs, the Multi-layer perceptron (MLP) and its backpropagation learning algorithm was introduced by Rumelhart, Hinton, and Williams in 1986. Since then, there have been few really new network designs: the research has focused more on the implementation of ANNs into different problems.

14

3.2 Competitive-learning networks

3.2.1 Self-organizing map (SOM)

/Kohonen 95/

The basic idea of SOM algorithm is to fit a number of ordered discrete reference vectors into a distribution of vectorial input samples. The reference vectors - neurons in the network - interact locally and thus a topologically ordered map of the input signal space is formed. The neurons develop into specific detectors of their respective signal domains.

In practice, the topology-preserving map means that "similar" input vectors will be mapped into neighboring units in the map lattice. Similarity is assumed to be spatial: two vectors are similar if their distance from each other is small. The learning process according to SOM algorithm is as follows:

1. An input vector x is compared with all reference vectors m„ and the location of the best match c is defined by some metric (e.g. Euclidean distance):

||x — mc I = mm||x-m¿||/

2. All nodes that are topologically close to the best-matching unit (BMU) will also be activated. These nodes form the neighborhood Nc(t) of the BMU and are defined by some neighborhood function hci(t). The size of the neighborhood is usually reduced during the learning process.

Nc(ti)

Nc(t2)

Nc(t3)

Figure 3.3 : Example of a topological neigborhood

Every reference vector in the neigborhood of the BMU is moved closer to the input vector (figure 3.4). The distance moved is a fraction of the total, defined by the learning rate parameter aft).

mj(t + l) = mJ(t) + a(t)[x(t)-mJ(t)]j e Nc

mJ(t + l) = mJ(t),j «êNc

15

Figure 3.4 : Updating BMU and its neighborhood towards input sample 'x'

Training can be divided into an ordering and fine tuning phase: In ordering the neighborhood consists of a significant portion of the entire map, and the learning rate is large. This is done in order to assure the ordering becomes global - similar units, even if initially widely scattered, can be moved closer to each other. Fine tuning, on the other hand, concentrates on small neighborhoods and the learning rate is small. Thus the already achieved ordering is not disrupted, only the accuracy is improved.

Because the neurons are placed on a (usually two-dimensional) map grid, SOM is well suited for visualization purposes. One of the interesting properties of the data visualized is clustering. Clusters are "areas" in the data where the representative feature vectors are similar to each other, while dissimilar to all other vectors. SOM translates the similarity into distance between input vectors. The most often used method of displaying clusters is the unified distance matrix, or и-matrix /Ultsch/, in which the distances between neighboring unit weight vectors is calculated.

Figure 3.5 shows a u-matrix representation in which the values are shown as gray shades, with black corresponding to highest distance and white to smallest distance. Thus, light areas depict clusters and dark areas are the borders between clusters. The shade of the neuron depicts its "data density", and the shade of the bar between neurons shows how far the neurons are from each other. In the figure two distinct clusters can be seen: one in the lower left and another in the upper right comer of the map. The identities of the clusters can be found by labeling corresponding input vectors and plotting them on the map.

Figure 3.5 : U-matrix representation of a 5x4 SOM network

A somewhat simpler visualization method relies of the component planes of the reference vectors /Iivarinen/. The value of a single component is taken from each reference vector and depicted on the map as a shade. A component plane can be thought as a horizontal slice of the entire map, at a predetermined depth. This representation shows the distribution of component values and makes it possible to see correlations between components as overlapping component planes. A component plane representation is depicted in figure 3.6, showing high component values as lighter shades.

Figure 3.6 : A component plane view

In time-dependent applications, the goal of visualization is often to determine the direction the current input vector is taking in relation to past values. Trajectories - the paths subsequent BMU’s form on the map - are used for this purpose. A sample trajectory is shown in figure 3.7.

Figure 3.7 : A sample trajectory

17

3.2.2 Learning vector quantization (LVQ)

/Kohonen 95/

Vector Quantization (VQ) is a classical signal-approximation method, in which a finite number of codebook vectors m¡ are placed in the signal space. A given vector x is then approximated by finding the codebook vector mc closest to x. The signal space is partitioned into regions, each defined by one reference vector which is the "nearest neighbor" to any vector within the region. This kind of partitioning is also called Voronoi tessellation.

Figure 3.8 .-Voronoi tessellation of a two-dimensional space

LVQ is a supervised learning method of assigning optimal positions to the reference vectors in terms of classification error. Its purpose is to define class regions in the input data space for statistical classification. The codebook vectors are assigned class labels, and the vectors classified according to the class of the nearest neighbor.

The LVQ algorithm is as follows:

1. Compare the input vector x with all reference vectors m„ and locate the best matching unit c. This step is identical to the SOM algorithm.

2. Update the BMU by moving it closer to the input if both represent the same class, or moving it away if the classes are not the same.

mc(t +1) = mft) + a(/)[x(/) - mc(t)], if x and mc represent same class. mc(t + 1) = mc(t)-a(t)[x(t) -mc(t)], if x and mc represent different classes.

18

4. INSPECTION OF ALTERNATIVES

4.1 Background

4.1.1 Available data

The DX 200 alarm system uses the list_alarm_t data structure (table 2.3) for internal representation of alarms. The structure contains all information available on the alarm in question. Alarm data is stored into a history file, as described in section 2.3, and can be fetched from the exchange. The different fields of the list_alarm_t structure are discussed below from the viewpoint of classification.

Alarm number is the identifying code for each alarm. The first character of the code identifies a broad group of the alarm, but otherwise there is no structure in the numbering. For practical purposes, often only alarm numbers below 3000 need to be considered. Alarm number is the primary information used to classify alarms.

• 0001-0999 notice• 1000-1999 disturbance• 2000-2999 "normal" alarm• 3000-3999 not used• 4000-5999 external (operator defined) alarm• 6000-6999 base station alarm (only used in some exchange types)

Setting time. The absolute date and time themselves are not very interesting (unless one wishes to see if certain faults often occur at the same time of day). However, time information is necessary to place the alarms in proper order and to select alarms belonging to a certain time interval.

Object type and index identify the computer unit (or other device, such as a magnetic tape unit or printer driver) to which the alarm applies. Usually, but not always, the fault causing the alarm is also located in the indicated unit. These fields, as pointers to the location of the fault, are important in classification.

Setting / cancelling of alarm and consecutive number. For alarms with an urgency class there are always two entries; first when the alarm is set and again when the alarm is cancelled (notices and disturbances are not separately cancelled). The set and cancel messages are linked by a common consecutive number. This information is needed to determine the time a particular alarm has been "on". If one is interested only in the number of alarms and not their lengths, the cancelling messages must be disregarded.

19

Object unit coordinates tell the physical location of the target functional unit within the exchange. The positioning of the units is largely irrelevant when considering faults: the only interesting cases would be power supply failures. If an entire cartridge rack loses power, the affected units could be identified by rack coordinates.

Alarm urgency class should give an indication the severity of the fault. The field actually informs how critical operator reaction is for the fault in question. Thus, faults requiring immediate operator attention might not affect the operation of the exchange very much, while less urgent alarms can have a much worse effect.

Extra info field carries additional information about the cause of the alarm. However, as the contents of the field are defined separately for each alarm, the information is not useful for comparing different alarms. The only case where the extra info needs to be considered is when determining if two occurrences of the same alarm are completely identical.

Recovery initiated informs of a case where automatic recovery action has started. There is little in common among recoverable faults, so this field is not very useful.

Sending unit, index, and family. These fields identify the functional unit and process family that has noticed a fault and sets the alarm. The information is not very useful for classification, except in a case where a certain unit is producing large numbers of alarms.

The information contained in a single alarm is quite often insufficient to make any meaningful classification. The real interest lies in alarm sequences: groups of alarms that occur sequentially. What makes alarm sequence identification problematic is that alarms are filed in the order they are set. Thus alarms from completely unrelated sources occur consecutively, making it difficult to point out which alarms in a chosen group are symptoms of the same fault.

4.1.2 Alarm System Requirements

An important goal for DX 200 maintenance is to enable the operator to easily determine the state of the exchange so that any faults which could degrade the performance of the telecommunications network can be quickly dealt with. For this reason all messages received from the exchange, including alarms, should be as informative as possible. Unfortunately, the volume of alarms is usually very high, making it hard to notice the truly important alarms among the flow. Even when blocking all notice and disturbance messages, there still is an inordinate amount of alarms to keep track of.

There should be a radical reduction of the number of alarms displayed to the maintenance personnel. DX 200 contains a rule-based alarm filtering system as described in section 2.3.1, but creating filtering rules requires quite a lot of expertise and effort. The operators do not always have enough knowledge to form their own rules, and alarm system designers have little time to spare, nor relevant alarm data to

20

start forming more filtering rules. Each operator may also want to customize the filtering to suit their own needs, even to the point of having different rules in different exchanges. The different configuration schemes in different exchanges also pose problems in creating universal filtering rules.

Another way to give important alarms more visibility is to categorize the alarms according to their impact on the exchange operation. The alarm urgency class can be used for this purpose. It is not always a reliable indicator, however, as it is intended for indicating only the need of operator action.

The goals for alarm classification can be summarized as follows:

1. Reduce the amount of alarms• help in forming filtering rules• perform the filtering itself

2. Make alarms more informative• categorize alarms by severity• identify actual faults from alarm sequences

4.1.3 Neural networks in telecommunications fault management

British Telecom has investigated the use of a neural net in telecommunication access network fault diagnostics. The ANN is used both alone and as a pre-processor for an expert system. By itself the neural net performance was about the same as with an expert system. When the ANN output value was used as an input rule for the expert system, the performance of the overall system was somewhat increased. /Chattel!/

Ericsson has integrated an ANN central processor hardware fault diagnosis system into the AXE 10 switch. The neural net classifies input bits from error registers into 64 classes. The system is operating in addition to standard fault detection, and results from both systems are shown to the operator. /Johansson/

Lau et.al at Hong Kong University have built a hybrid intelligent classifier for telephone network hardware error message classification. The system consists of a neural net and a rule-based expert system. The expert system rules are improved by the use of genetic algorithms. The ANN gives the system flexibility and error tolerance, and the expert system outputs classification rules. /Lau/

Cabletron Systems Inc. has researched the use of a neural net in obtaining index rules for a Case-based reasoning (CBR) communications network fault resolution system. CBR operates by maintaining a case library of previous problems and adapts the old cases for use in new problems. Developing rules for fetching the relevant cases is an arduous process, and a neural network could be used to learn the rules from raw data. However, the performance of the neural net was superior to a rule-based learning scheme only with continuous inputs and noisy data. /Lewis/

21

NTT has developed a neural network approach for switching system fault identification. The system employs a distributed ANN architecture of several micro­viewing networks and one macro-viewing network, plus an expert system for integrating the ANN outputs. A dummy fault generator is used to generate large amounts of high-quality training data. The neural net yielded a significantly better fault recognition rate than that achieved with traditional statistical methods. /Sone/

All the above applications are based on the traditional Multi-Layer Perceptron (MLP) network, trained with the backpropagation learning algorithm. MLP is used to classify input data into certain number of pre-determined classes, by using training samples classified by hand. The performance of such networks is highly dependent on the amount and quality of training data. Many applications couple neural nets with expert systems, especially as pre-processing elements. The noise insensitivity of ANN’s seems to improve the performance of rule-based expert systems, which themselves are very sensitive to even slight variations of the inputs.

4.1.4 SOM and LVQ algorithms

/Kohonen 95/

SOM was developed in the first place for the visualization of non-linear relations of multidimensional data. It has turned out that even very abstract relations may become clearly visible with the SOM. The LVQ methods, on the other hand, were developed for statistical pattern recognition, especially for dealing with very noisy high­dimensional data. The main benefit is a radical reduction of computing operations when compared to traditional statistical methods.

LVQ is used in fairly straightforward supervised learning applications (pattem recognition), and its performance is similar to feedforward networks such as MLP. LVQ and SOM may be combined: a SOM is first formed for optimal neuron allocation. The map is then used as starting point for teaching the LVQ. The benefit of such combination is that less labeled teaching data is needed for a validated classification.

The uses for SOM are more diverse. There have been applications in such fields as process state monitoring, speech recognition, texture analysis, text semantics, robot arm control, and so on. SOM is used for unsupervised learning, mostly clustering unlabelled data. In pattem recognition the clusters are later labelled by using a small set of sample data. The final result is usually intended for human visual processing.

The following sections outline some possible uses of SOM and LVQ algorithms for alarm classification.

22

4.2 Alarm grouping

The most obvious use of neural nets would be to perform basic grouping. There have been attempts to this end elsewhere, as described in section 4.1.3. Supervised learning - meaning hand-labeled training samples - is used to train the network to classify one alarm or a series of alarms into certain groups.

An intriguing possibility would be to create a separate, parallel alarm display that describes the state of the exchange in general terms. Alarms could be grouped to informative categories representing the fault in question rather than just a single symptom of it. Some categories could be:

• power failures• broken hardware• software faults• configuration conflicts• network connection faults• overload

A separate neural net could be trained to recognize alarms of each category. The outputs of these nets are then compared to decide the final grouping of the alarm in question. The comparator could be another neural net, or a rule-based system. When using separate nets for each category, the features extracted from the alarm data can be optimized to best suit the category in question, and the structure of the neural net itself can be tailored to achieve optimal performance. It is also easy to add new fault categories later, as no changes are needed in the existing neural nets (except the comparator, if it is implemented with a neural net).

As with any supervised learning system, a large amount of high-quality training data would be required. Faults of desired categories must be created on the exchange and the relevant alarms labeled to the alarm history file. Unfortunately, it is not possible to easily create all types of faults, such as hardware breakdowns, so such situations must be simulated, either by hand or by creating a program for that purpose (such as a "dummy fault generator" /Sone/.) In any case, gathering and processing the training data is a long and arduous process, requiring much expertise to achieve correct labeling.

The actual usefulness of such an alarm classifier to the operator must be carefully weighted against the amount of work in designing the system to be able to determine its feasibility. Demand for the system should be quite high to allow for the large amount of work that would go into building it.

23

4.3 Alarm filtering

A simplified form of alarm clustering is filtering. There are only two clusters: "filtered" and "not filtered". Alarms from the exchange are fed into the trained network in real time, and filtered out according to the output obtained from the net.

What makes neural nets flexible in filtering applications is the lack of filtering rules: only the desired results, not the rules resulting to them, must be specified. The lack of rules is also a disadvantage because one can never be completely sure that every occurrence of a certain case will be filtered. Hundred per cent accuracy is almost impossible to achieve with neural nets.

4.3.1 Experiment with LVQ

The LVQ algorithm was tested in a simple filtering task: if the same alarm was set again within one minute of cancelling, the second (and further) settings were to be filtered out. The functionality was identical to cancelling delay, explained in section 2.3. The data was classified by hand, marking the alarms to be filtered. Features used are listed in table 4.1.

Table 4.1 : Features used in LVQ alarm filteringn previous alarms: (n = history size)

• time difference from current alarm, 1 hour normalized to -1, 1• difference in alarm numbers (-1 smaller, 0 equal, 1 larger)• urgency class• difference in target units (-1 smaller, 0 equal, 1 larger)• difference in consecutive numbers (-1 smaller, 0 equal, 1 larger)• difference in extra info field (-1 smaller, 0 equal, 1 larger)• difference in physical coordinates (-1 smaller, 0 equal, 1 larger)

current alarm:• urgency class• target unit type

The features were selected by experimentation. While keeping the history size fixed, different fields were removed from the feature vector one at a time and the impact to the classification accuracy was checked. The same method was used to determine the best normalization scheme for the selected fields, and whether to use pure field values or differences from the latest alarm.

The data was classified using history sizes from 1 to 5, and two different codebook sizes. The 400-vector codebook was trained for 10000 epochs with the optimized LVQ algorithm. The training length for the 600-vector codebook was 20000 epochs. The LVQ was initialized proportional to the a priori probabilities of the classes (propinit -program) /Kohonen 96а/. Results are shown in table 4.2 and figure 4.1.

24

Table 4.2 : Recognition accuracy of simple filtering task

test data, codebook size 400 test data, codebook size 600

historysize

featurevector

dimension

accuracy of not filtered

alarms

accuracy of filtered

alarmstotal

accuracy

accuracy of not filtered

alarms

accuracy of filtered

alarmstotal

accuracy1 9 99,44 % 36,75 % 94,68 % 99,47 % 37,18% 94,74 %2 16 98,42 % 71,37 % 96,37 % 98,24 % 71,37% 96,20 %3 23 98,88 % 37,61 % 94,23 % 97,02 % 53,42 % 93,71 %4 30 98,60 % 5,98 % 91,57% 98,31 % 5,13% 91,24 %5 37 98,07 % 6,84 % 91,14% 97,75 % 7,26 % 90,88 %

historysize

featurevector

dimension

training data, codebook size 400 accuracy of accuracy not filtered of filtered total

alarms alarms accuracy

training data, codebook size 600 accuracy of accuracy not filtered of filtered total

alarms alarms accuracy1 9 98,91 % 49,36 % 95,16% 98,88 % 49,79 % 95,17%2 16 98,74 % 78,11 % 97,18% 98,74 % 78,54 % 97,21 %3 23 99,16% 56,22 % 95,91 % 98,38 % 68,24 % 96,10%4 30 99,54 % 36,91 % 94,80 % 99,65 % 39,48 % 95,10%5 37 99,61 % 36,48 % 94,84 % 99,58 % 39,48 % 95,03 %

Recognition accuracy of simple filtering task80,00 %

70.00 %

60.00 %

50.00 %

40.00 %

30.00 %

20.00 %

10,00%

0,00 %1 2 3 4 5

History size

train data 600 vectors

train data 400 vectors

test data 600 vectors

test data 400 vectors

Figure 4.1 : Recognition accuracy with varying history sizes

The performance of the system was also evaluated with SOM used to initialize codebooks of different sizes. History length of 2 was used in this case, and the results were compared to those obtained with propinit. The SOM was trained in 2 phases: first with neighborhood size equal to the highest dimension of the map, and next a

25

longer period with neighborhood size of 3. The map dimensions are larger than final codebook sizes because non-labelled units were removed from the SOM before starting LVQ training.

Table 4.3 : Recognition accuracy with SOM initialization

codebooksize

(approx.)

phase 1 training length

phase 2 training length map size

filteredaccuracy

(SOM)

filteredaccuracy(propinit)

100 4000 20000 15x8 8,97 % 38,89 %200 8000 40000 18x 13 10,68% 72,22 %300 12000 60000 22 x 17 72,22 % 72,22 %400 16000 80000 28x22 73,08 % 71,37 %500 20000 100000 32x26 67,52 % 71,37 %

Initializing codebook with SOM

80,00

70.00

60.00

50.00

40.00

30.00

20.00

10,00

0,00100 200 300 400 500

Codebook size

Figure 4.2 : Recognition accuracy with different initialization schemes

The results show that straightforward filtering with LVQ is unusable for this application, because the classification error is too large, even for a simple task. In practical use, the classification scheme would have to be much more complex. Also, the possibility of filtering out a critical alarm (one that must always be visible) because of a classification error is unacceptable.

It is obvious that one should not use a neural net if a simple rule based approach is available. If more complex filtering is needed, or exact rules cannot be determined, neural classification might be useful. However, the bad performance of LVQ in this case suggests that filtering of any complexity would yield poor results. The temporal correlation inherent to the application is the probable cause for the lack of success.

26

4.4 State monitoring

4.4.1 Overview

Consider a physical system, such as a machine or process, whose state can be defined by taking measurements (e.g. temperatures, pressures, or electrical variables at different points). These measurements are then normalized to a common range of values and a feature vector is formed. Data collected from a variety of working states (or simulated) can then be used to train a SOM. The map will, in essence, become a two-dimensional image of the high-dimensional state space of the system. The variables from the system can then be plotted onto the trained map in real time, forming a trajectory of states (figure 3.7 in section 3.2.1). /Kohonen 95/

The advantage of such a map is that the operator will receive a view to the system as a whole, not only independent state variables. The map makes it easy to intuitively understand complex system states, and even dynamic behavior such as start-up or shutdown with help of the trajectories. Dangerous or undesirable states can also be labelled on the map, giving the operator a way to indicate if the system is not functioning normally.

4.4.2 Application example

ABB has incorporated a SOM-based application as a supplementary tool into power plant monitoring system. /Otte/ It uses an 11-dimensional measurement vector, reduced from the total of 40 process variables by correlation studies. The visualization of the finished map is not based on the u-matrix, because it has shown to be insufficient to clearly show the desired process states. Instead, a winner-rate map is used: the more often a neuron becomes В MU of a training vector, the higher its winner rate is. The resulting display has light areas corresponding to frequently occurring real states (of which training data has been available), and a dark background where the process state should not normally enter.

Trajectories are plotted on the map in real time, enabling the monitoring personnel both to follow steady-state plant operation, and to notice deviations in transient operations by comparing the actual trajectory to a reference operation superimposed on the same map. The light, high winner-rate areas of the map function as "safe channels" for the trajectories, further aiding fault identification.

Similar applications for monitoring and modelling industrial processes are developed in the MENES (Methods for Demanding Neurocomputing Applications) research project funded by TEKES Adaptive and Intelligent Systems Applications program. /TEKES/

27

Figure 4.3 : Example of "safe channel" in process map /Otte/

4.4.3 Possibilities

In the framework of alarm classification, the state monitoring idea could be used in a similar way as above. If the alarms are thought of as separate measurements of the state of the entire exchange, then a map could be formed to give an indication of a general working state. By observing the state trajectory, the operator could discern at a glance if the exchange is operating within normal parameters.

The problem is that alarm messages are much more complex than simple measurements, so that a suitable feature vector may be hard to compose. Also, the "normal operation" for an exchange is harder to define, and cannot necessarily be derived form alarms alone. Therefore, the alarm data should be augmented as much as possible by other operating information about the exchange, such as traffic measurements.

28

4.5 Data mining

4.5.1 Overview

Data mining is a part of the process of finding useful knowledge from large amounts of data, called knowledge discovery in databases (KDD) and shown in figure 4.3. The goal of data mining is to extract patterns from raw data, but without proper pre- and postprocessing steps the patterns could be false, or useless. KDD as well as data mining are interactive, and usually iterative processes. /Fayyad/

selection

interpretation

TargetData

PatternsTransformedData

KNOWLEDGE

DATA

preprocessing

processed

transformation data mining

Figure 4.4 : Steps of the KDD process /Fayyad/

Human interaction in the KDD process is needed in two forms. There must be expertise on the data to be explored, to be able to apply as much prior knowledge as possible. Informative features of the data must be chosen and processed in an intelligent manner. The data mining step, to be performed on the processed data, requires an expert in pattern extraction. The algorithm and parameters used must be optimized for best results.

KDD and data mining tools can also be grouped in two. Tools intended for the data experts have many data filtering and transforming features, but a relatively fixed way to run the pattern extraction. Tools intended for KDD experts, on the other hand, assume that the data used is already processed and the user can experiment with a multitude of algorithms and parameters to find patterns. /Vesanto/

Data mining can be performed with many algorithms, one of which is the SOM. The visualization gained with SOM can be used especially when the analyzing of patterns is done manually, instead of finding properties automatically.

29

4.5.2 Application examples

Entire. A general-purpose SOM data-mining tool prototype has been developed for Jaakko Pöyry Consulting in Helsinki University of Technology /Vesanto/. The tool, called Entire, uses the SOM_PAK software package /Kohonen 96b/ as SOM platform, with an X-window GUI. Entire offers a range of input vector processing capabilities, such as scaling, labelling and grouping vector components, and filtering out unwanted vectors by specifying acceptable component value ranges.

The most important features are the map visualization capabilities. Entire can display the SOM in either u-matrix view or by vector component plane values. Several component planes can be shown at the same time. The number of input vectors mapped to each node can be displayed, as well as BMU trajectories. The nodes can be labelled by a variety of means.

Entire was used to analyze pulp and paper industry data from around the world. The feature vectors were constructed by hand and the clusters formed on the finished map were inspected visually. The project is part of the TEKES Adaptive and Intelligent Systems Applications program. /TEKES/

CorDex, developed at Motorola, is another general-purpose knowledge discovery tool based on SOM /Leivian/. It can use any spreadsheet or database information (even Boolean or text category values), with fields normalized to the length of one byte for fast processing (the program does not have to use floating point operations). Other concessions for training speed are also made: the SOM uses rectangular topology and Manhattan distance instead of Euclidean.

CorDex displays the data either by placing item names or icons on their respective BMU's on the map, or by color-coding the map nodes according to a selected component value. There are features to help in finding data patterns: the user can define and name "regions" by using conventional Boolean operations on the components. The regions can be outlined and overlaid over different component plane views.

The tool was used on some very high-dimensional data sets, (a semiconductor production yield data of 132 dimensions and over 17000 records, for example) with very good results. Visual attributes of promising cluster structures are described in the report by /Leivian/.

NDA, Neural Data Analysis environment, is a SOM-based data analysis tool developed in the University of Jyväskylä /Häkkinen/. It uses a variant of the basic SOM: a tree-shaped SOM or TS-SOM, which consists of several SOM's of different resolutions in a hierarchical structure. TS-SOM has advantages in data analysis, such as high computational effectiveness and no need to adjust teaching parameters for optimized result. Thus, the user does not need deep expertise in SOM structure. Also other, non-neural methods of data reduction are supported.

30

NDA gives the user a multitude of visualization schemes, with possibilities to mix different views (such as bar graphs overlaid on u-matrix, and so on). The interface is implemented as a kernel performing transformations on selected data according to a command language. The environment does not include a display program or GUI. These are created separately for each application. The applications can use NDA as directly linked code, a library, or over a network from a separate server.

NDA is the supporting software platform of the project Stella - Computationally intelligent analysis of large data sets. Stella is a part of the MENES project. /TEKES/

4.5.3 Possibilities

Data mining could be applied to the alarm history data in order to find clusters, which in turn correspond to alarm sequences. So, with a suitable choice of feature vector elements one could locate often occurring alarm sequences, and use them to form alarm filtering rules manually. Former applications suggest that even complex sequences can be found by visualizing the data with SOM.

There has already been some research in using data mining on alarms. There is an alarm rule-finding application, TASA, that is based on traditional non-neural methods /Hätönen/. TASA is targeted on the whole telecommunications network level, and the topology of the network plays an important role in decision making.

The ability to gain a visual representation of the data as a whole is an invaluable asset in exploration, and a tool to visualize alarm history would undoubtedly be useful to the persons designing alarm filtering rules. The tool could be used in conjunction with others, such as TASA, to gain a more complete understanding of the data. Designing a visualization tool does not require much expertise on the alarms, as opposed to using the tool (i.e. gaining useful results from the visual representation) and would not constitute an overly large effort, especially if "off-the-shelf components could be directly used for routine functions.

31

4.6 Choices for further study

It is quite clear that applications using supervised learning require quite a lot of expert alarm knowledge in their design, if any concrete benefits are to be obtained. People with such expertise, even if available, can rarely be separated from their other work for the necessary length of time. It can also be argued, that the benefits gained from using neural nets in filtering instead of the already well established and functioning methods would be marginal.

The branch to probe further would then be unsupervised learning, where only raw data is initially required. Applications using SOM for clustering data and forming a visual representation can be considered, namely monitoring and data mining. These fields are somewhat overlapping, as visualization of the data is formed in both, but used for different purposes.

A feasible way to continue is to create a tool for visualizing alarm history data. The image can then be used to analyze the data itself (data mining) or the placement of subsequent alarms on it (monitoring). The main focus of the tool would be feature selection and processing of the results - the target user would be an alarm expert. SOM training and visualization features have already been implemented in other software, such as the SOM_PAK /Kohonen 96b/.

32

5. DATA MINING TOOL REQUIREMENTS

5.1 Introduction

When determining alarm filtering rules, the designer faces a considerable effort: the alarm sequences where filtration would be useful are often hard to locate. To make this task easier, a data mining tool could be used. The tool is used to assist the designer analyze alarm history data and locate alarms or alarm groups suitable for rule creation.

Alarm history is analyzed by specifying desired features and creating a self­organizing map from the data. The alarm feature map is displayed in u-matrix form, enabling the user to visually locate clusters of frequently occurring alarm sequences. By inspecting these sequences, the designer is then able to form rules pertaining to the situations in question.

The tool is a Windows NT application, with a development name Accord (alarm correlation discovery tool).

5.2 Description of features

Accord covers the knowledge discovery process, which is explained in section 4.5.1. A block diagram of the process, where the KDD terminology is replaced with application specific terms, is shown in figure 5.1.

DATAGATHERING

RULE-BASED

FILTERING

FEATURESELECTION

VISUALIZING WITH SOM

map image, f

CREATINGNEW RULESMANUALLY

Figure 5.1 : Process for finding new alarm filtering rules

33

5.2.1 Data gathering and display

Alarm history data consists of the information contained in the list_alarm_t structure (table 2.3). It is acquired from the DX 200 exchange in the form of one or more files, which the user must copy to the PC workstation. Accord reads the file(s) and converts the data into a single file of its own format for storage and further internal processing. The ASCII format must contain all fields of list_alarm_t and be compact, but user-readable and possible to be created by hand.

The user can view the alarm history data file directly in "raw" form, or formatted to a display similar to the alarm printout on the DX 200. Searching for specified values of chosen data fields is also possible.

This feature corresponds to the data selection step in the KDD process.

5.2.2 Filtering

To reduce the amount of alarm data, the designer may wish to perform filtering. Accord supports the use of all alarm system filtering rules (section 2.3.1). Filtering out alarms with specified parameter value ranges is also possible. Thus, only alarms that the current rules do not filter can be considered when trying to find candidates for new rules. The filtering function creates a new file for the filtered output.

This feature covers both data selection and preprocessing steps of KDD.

5.2.3 Feature selection and scaling

The alarm information supplied to the SOM, as well as its form, differs according to what the desired output will be. Accord offers a choice in what alarm parameters to use, and how to scale their values. The chosen fields are labelled to make it easier to interpret feature values. The scaling must also operate in reverse, displaying the original alarm information when viewing a feature vector.

In addition, Accord supports calculating histograms of chosen alarm information fields within a specified time window. To reduce the dimensionality of the resulting feature vector, histogram entries below a selected threshold value can be omitted.

This feature corresponds to KDD data transformation.

5.2.4 SOM initialization and training

All parameters pertaining to the structure of the SOM and the manner of its training are modifiable. These include:

• Map dimensions

34

• Lattice type (default value hexagonal)• Neigborhood function (default value step function)• Random number seed for initialization• Lengths, learning rates, and neighborhood radii for two training phases• Feature vector dimension and data used for training

Accord calculates the quantization error of the completed map, with relation to the training data. The value can be used to assess the "goodness" of the mapping. The map, along with its parameter information, is saved to a disk file for later use.

This feature is part of the KDD data mining step.

5.2.5 SOM visualization

Accord forms the u-matrix or component plane representation of the SOM, which is displayed to the user. Chosen feature vectors can be labelled and these labels plotted on the map to give names to clusters. A list of feature vectors mapped to each map node is also maintained. The designer is able to jump back and forth between corresponding parts of data and map image. The neurons can be labelled either by hand or automatically.

This feature is part of the KDD data mining step, and a tool to assist in the pattern interpretation step.

5.3 User interface

Accord is intended for the alarm system designer: the user does not have to be an expert on SOM or neural nets, but must have wide knowledge on the alarm data. Although it is possible to affect most parameters and scale the features in a variety of ways, very little knowledge of the workings of SOM is needed. For the most part, the user can rely on templates or default values.

The user interface of Accord can be divided into three parts, each covering a distinct part of the rule finding process. The parts are centered around a view of the essential information in that stage of the process.

5.3.1 Data file view

This part of the user interface covers the beginning of the KDD process. The alarm history file contents are displayed on screen, with controls to select the desired data file and perform searches on the data. Filtering controls are also located in the data file view. If a SOM is trained based on the data, the user can jump to the BMU of the selected alarm on the map view.

35

5.3.2 Map construction view

The map construction part displays the selection, scaling and labelling of features, and the chosen parameters of the SOM to be trained. Training is started from this view, with controls to select the data file to use and the file in which to save the finished map. Pre-defined parameter templates can be selected to make the map construction simpler.

5.3.3 Map view

The u-matrix view of a selected SOM is displayed in this part of the user interface. The user can inspect the values of the reference vectors (back-scaled to their original contents) and jump back to data file view to check which alarms are mapped into the selected neuron. A component plane display can also be selected instead of the u- matrix view.

5.4 Functional specification

Accord is implemented on the Microsoft Windows NT platform. It consists of a collection of individual command-line programs, each devoted to a simple task. The parts are then gathered together under a single graphical user interface (GUI). This makes it easy to include software from many sources and customize the program for different uses. The programs, excluding the GUI, are implemented in ANSI C, so porting to other platforms is possible with minimal changes.

The implementation of the prototype version of Accord is described in chapter 6.

5.5 Further development

Some potentially useful features are intentionally left out of the initial version of Accord. The goal is to achieve a basic functionality, that can be enhanced later. Among the features to be added later are the following:

• Trajectories• Winner-rate display• Integration with other KDD tools (common file formats, etc.)

Other desirable features may be discovered when actually using the tool, and these should of course also be included in later releases.

36

6. ACCORD PROTOTYPE VERSION

6.1 Introduction

A prototype version of the ACCORD alarm data mining tool was built according to the requirements presented in the previous chapter. Not all features were included, however, as the objective of the prototype is to only show the usefulness of such a tool and serve as a basis for development of the full version. The feasibility of some planned features became questionable during the design process. Conversely, new useful features became evident and were included into the program.

The prototype was used to classify some alarm history sequences with the goal of identifying possible candidates for filtering rules. The potential users expressed the usefulness of Accord in giving a single-glance interface to the alarm history.

6.2 Implementation of features

6.2.1 Data gathering and display

For alarm history data, three files are required from the DX 200 exchange. ALHISTGX.ALF contains the actual alarm history, but some fields are incomplete. SCDFLEGX. IMG is needed to map message bus addresses to computer unit codes and indexes. An equipment list file, in Comma Separated Values (CSV) format, must be separately created on the exchange with the appropriate command /NTC 95/ to find the location coordinates for functional units. A fourth file, unfilegx.IMG, is required to convert functional unit codes to the names of the units. The contents of this file are the same for all exchanges, however, so it is included in the ACCORD package.

The files are transferred from the exchange to a PC. A conversion program (bin2ahi.exe) is then used to read them and convert the information into an intermediate format (table 6.1) for further processing. It is possible that the equipment list file is not available for conversion. If the file is missing, the location coordinate info is simply left out.

Accord displays the intermediate format files in a separate "Alarm history" window (figure 6.1). As the alarm number specific information to decode the extra info field cannot be acquired from DX 200 at this stage, only fields 1-11 are shown. Alarm system designers preferred this display scheme to the alarm printout format when viewing large amounts of alarm data.

37

Table 6.1 : Alarm data intermediate format (*.AHI)1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.X 9999 XXXX-999 X XXX 99999 9X999-999 XXXX-999 XXXX 9999-99-99 99:99:99:99 ...

12. 13.14.15.16.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX! XX XX 9 9

X - alphanumeric 9 - numericFirst row contains the total number of data rows.Rows starting with ’#’ are comments.

1. Alarm label2. Alarm number3. Object unit type & index4. Recovery (R)5. Urgency class (NOT / DIS / ALI / AL2 / AL3 / CAN)6. Consecutive number7. Unit location coordinates (?????-?? if not known)8. Sender unit type & index9. Setting family10. Date11. Time12. Extra info (max. 32 characters)

Some alphanumeric fields are stored numerically for easy decoding13. Object unit type (= field 3 in hexadecimal)14. Sender unit type (= field 8 in hexadecimal)15. Printing info (= field 4 in hexadecimal)16. Alarm class (= field 5 in hexadecimal)

6.2.2 Feature selection and scaling

Feature selection is performed by a conversion program (ah12DAT. EXE), which reads the chosen feature extraction type from a parameter file. Accord compiles the parameters and executes the conversion using user-supplied values from an input dialog box (figure 6.2).

The features are picked by calculating various histograms over a selected time period. The maximum feature vector dimension is selected to keep the map training time within reasonable bounds. The items placed into the vector are obtained by sorting all items according to descending frequency over entire data, and choosing the selected number of most frequent items. The feature vector lengths are normalized to unity.

The alarm data fields on which the histogram is calculated are selected from a list of available alternatives (table 6.2). These alternatives are based on suggestions from alarm system designers. More histogram types can easily be included later according to demand.

38

Table 6.2 : Available histogram types

Histogram type Histogram entry fieldsAlarm number & object unit Alarm number Object unit type Consecutive alarms

Set-cancel pairs

alarm number, object unit type and index alarm number object unit typealarm numbers of two consecutive alarms within the same time window (2-dimensional co-occurrence matrix) alarm number, object unit type & index, only if alarm is both set and cancelled within the same time window

Descriptive labels for the feature vector components are picked by the extraction program and saved into the data file header. These labels are used when displaying neuron contents. Information on the chosen parameters is also stored into the data file, as well as the original alarm history file name. The data file is intended for use with the SOM_PAK /Kohonen 96b/.

The option of compiling the feature vector from scaled data fields of a single alarm instead of calculating a histogram was decided to be unnecessary. The information conveyed from the relations inside a single alarm seems to serve no purpose in filtering rule creation.

6.2.3 SOM initialization and training

The SOM training parameters are supplied by the user through the training dialog box (figure 6.2). Once the training parameters are chosen, Accord executes both feature extraction and map training. The SOM is trained by calling the appropriate programs from the SOM_PAK package /Kohonen 96b/. After the training is finished, a link file containing the BMU’s of each feature vector is created.

Accord can suggest feasible training parameters, so that the user does not have to be familiar with SOM, and can concentrate in selecting the best feature extraction for the task at hand.

6.2.4 SOM visualization

The trained SOM is displayed in u-matrix form, along with the original alarm history (figure 6.3). The component values of user-selected reference vectors are displayed as percentages. Percentage display fits well in this context as the feature vectors all have length of unity. Alarms for which the selected neuron is the В MU (if any exist) are highlighted on the alarm history view. The user can also determine the BMU of a selected alarm from the alarm history.

39

The u-matrix can be switched into a component plane display along a selected component. Higher component values are shown as lighter shades. The vector value display and alarm highlight continue to operate identically.

The neurons can be labelled automatically. Accord determines the label by examining the component values. Neurons are named according to the label of the component with value above a selected threshold (default 80%). If two or three components share a high value of same magnitude (defaults 40% and 20%, respectively), the neuron is given multiple labels.

6.3 User interface

6.3.1 Data file view

Alarm history is shown in a separate child window. The user can scroll both the display and selected alarm (darker highlight in figure 6.1) separately with the mouse and keyboard. The BMU of the selected alarm is shown on the map display, and other alarms sharing the same BMU are shown on the alarm history window (lighter highlight in figure 6.1).

Figure 6.1 : Alarm history window

When a neuron is selected on the map window, the alarm history display jumps to the first occurrence of an alarm with that neuron as BMU.

6.3.2 Map construction view

The parameters for both feature extraction and SOM training are given in a dialog box. Default values for map dimensions are given, and the user can obtain suitable parameters for training lengths, radii, and learning rates with the "Suggest" button.

40

Figure 6.2 : Feature extraction and SOM training dialog box

The default training parameters are chosen for a fast training process and ease of use, but if an interesting feature configuration is found, a user familiar with SOM may experiment by varying training parameters.

6.3.3 Map view

2123 EH#m eh# 2172 EM#2122 EH#

2123 EH#

H2USIM

1121 li 1)1mw\2122 BMCmtt-mtoliStM

2ШЁШ:твт

ЖЕГ»mmmжег»20EÎ-â3mm- i:2123 В-284

2122 EMтвт

2122 ET-4D2123 ЕТ^О

2122 ET-4D 2122 БГ-4402123 ЕМ 2123 ET440

U Matrix

indow)Accord (all .map 10*6 neurons, 15 fields, 5

Я|Н#||И*да|ЙЙ|

Alarm history (324/ 6243)2K091-37 LSU-17 77 7 7-7 7 OMU-O77777-77 OMU-O2КО 91 — 37 LSU-12КО 91 — 37 LSU-12КО 91 — 37 LSU-1

2122 ET-4402123 ET-440 0690 VPP-1 1121 LSU-0 1121 LSU-1 0113 LSU-0 0690 MTU-0 0113 LSU-1 2123 ET-204 0860 WDU-02122 ET-93 : 29X 2122 ET-94 :5X 2122 ET-92 :17X 0690 LPT-1 2122 ET-95 :45X

lOAFF

lOA lOA 10 A

. 2 122

. 2122

. 2122

. 2122

. 2122

. 2122

. 212 2

. 2902

. 2902

ET-92 ET-95 ET — 98 ET-93 ET-95 ET-92 ET— 98 ET —2 8 4 ET —2 8 4

LSU-1LSU-OLSU-1LSU-OLSU-OLSU-1LSU-1LSU-1LSU-1

7 4 74 74 74 74 74 74 lOA lOA

2 902069006902 9022 9022 902

: 40 : 98

: 07 : 33

1997-1997-1997-1997-1997-1997-1997-1997-1997-

030303030303

0303

12:57:16:96 12:57:16:98 12:57:16:96 12:57:17:49 12 : 57 : 17 :12 : 57 : 17 :

. 12:57:17:12:57:36:13

. 12:57:43:78

: 49 : 48 : 48

1 2123ET-28M 2122EM

L2123ET-2M 2123 ET-284

ETVPPET-ETET-

C AN NOT NOT ALI CAN ALI

199719971997199719971997

-Ol-Ol-Ol-Ol-Ol-Ol

121212121212

. AL 2

. AL2

. AL2

. CAN

. CAN

. CAN

. CAN

. CAN

. AL 1

07 957079580795907956 0795807957 07 95 9 07 95 5 07 960

2K125 2K125 2 KO 91 2K125 2K125 2K125 2K091 2K091 2 KO 9 1

-37-37-OI-37-37-37-OI-37-37

07951

Figure 6.3 : Accord main window with и-matrix display

41

The map is shown in u-matrix form in a separate window. The user can select a neuron by clicking it with the mouse. The selected neuron is highlighted with a sphere, or by changing the color of its label. The coordinates and component values of the neuron are shown in the adjacent window. The component labels are generated by the feature extraction program.

By clicking a component label, the display can be changed to show the component plane view of the selection. The u-matrix view can be restored by clicking on the window background. Alternatively, the display options can be set with a dialog box.

2122 EM2122 EM« 2122 ET «0

2122 EM

2122 EM2123 tM

21220« 2123 EM

Accord (all .map 10*6 neurons. 15 fields. 5 sec window)Map yptions Help

col: 4 row: 2 \ »I *IñriilirnsZ1Z3 ET-440 9690 VPP-1 1121 LSU-0 1121 LSU-1 0113 LSU-0 0690 MTU-0 0113 LSU-1 2123 ET-284 0860 WDU-0 2122ET-93 2122 ET-94 2122 ET-92 0690 LPT-1 2122 ET-95 14%

ШЖ 21220-4« мвт mw

Figure 6.4 : Component plane display

Initially, the map is not labelled. The user can use an autolabelling function from the Map menu. The neurons are labelled according to the labels of components with high values.

6.4 Further development

The SOM training and visualization features are quite thoroughly implemented in the prototype version of Accord. Development of further features takes place mainly in the data viewing and preprocessing steps.

The most important missing feature is filtering capability. The user should be able to generate standard DX 200 filtering rules and execute them on the alarm history. This enables an iterative approach in defining new rules: the designer creates a map on only those alarms that are not yet filtered out.

There should be more possibilities to navigate the alarm history window. Functions such as jumping to the next highlighted alarm, and performing searches on chosen alarm fields make finding desired locations in the file much easier. The user might

42

also want to label some alarms in the history file before training to see where they end up in the SOM.

Some features desirable in the map visualization window could be the manual labelling and/or highlighting selected neurons to be able to see overlapping areas between different component plane views. There could also be more autolabelling options, such as changing the threshold values for labels.

Finally, an extensive help file should be created, preferably including an introduction to Self-organizing maps. Explanation on the effect of training parameters and different feature extraction schemes to the resulting map should also be available.

6.5 Alarm classification results

Figure 6.5 : SOM on alarm number <6 target unit

Figure 6.5 shows a sample SOM based on an alarm history file from a DX-220 exchange Pelangi, Malesia. Three distinct clusters can be located (1-3 in the figure), clearly the largest being cluster 1. The cluster shows that the exchange in question has had problems on a certain trunk circuit connected to the unit ET-440. Alarm numbers 2122 and 2123 describe faults on the local and remote end of the trunk circuit, respectively. These alarms seem to occur together, within a single five-second window, as can be seen from the following extract from the alarm history.

2123 ET-4402122 ET-4402122 ET-4402123 ET-440

AL2 07835 AL2 07836

. CAN 07836 CAN 07835

2K029-37 LSU-0 2K029-37 LSU-0 2K029-37 LSU-0 2K029-37 LSU-0

74 1997-03-01 74 1997-03-01 74 1997-03-01 74 1997-03-01

10:51:05:5310:51:05:5410:51:06:0510:51:06:05

43

Clusters 2 and 3 are not so interesting from the viewpoint of alarm correlation, as they show occurrences of single alarms. However, cluster 3 can point out a possible fault on the virtual printer VPP-1, which is changing operating state repeatedly (notice number 690), as shown below.

0690 VPP-1 . NOT ............ ?????-?? OMU-O F 1997-03-01 10:32:23:240690 VPP-1 . NOT ............ ?????-?? OMU-O F 1997-03-01 10:32:23:43

The node marked "4" in the figure is interesting as it suggests a connection between three different exchange terminals. These units seem to experience trunk circuit failures very close together. In general, the user should concentrate on the map nodes with multiple labels which point out alarms occurring in groups: exactly the information needed in designing prevention and support rules. An alarm history extract of such a case is shown below.

2122 ET-95 ,. AL 2 07947 2K125-37 LSU-0 74 1997-03-01 12:33:11:822122 ET-98 .. AL 2 07948 2K091-01 LSU-1 74 1997-03-01 12:33:11:802122 ET-92 .. AL 2 07949 2K125-37 LSU-1 74 1997-03-01 12:33:11:802122 ET-9 8 . CAN 07948 2K091-01 LSU-1 74 1997-03-01 12:33:12:312122 ET-95 . CAN 07947 2K125-37 LSU-0 74 1997-03-01 12:33:12:332122 ET-9 2 . CAN 07949 2K125-37 LSU-1 74 1997-03-01 12:33:12:32

The clusters shown in figure 6.5 could be highlighted by choosing a component plane display over the components of interest. Different feature extraction schemes can be used to search for other filtering rule possibilities. The set-cancel pair feature is especially useful in finding alarms that are cancelled very quickly after setting - prime candidates for applying an informing delay. Several clusters of such alarms can be seen in figure 6.6, and the alarm history extract above corresponds to the cluster labelled "2122 ET-95".

Figure 6.6 : SOM on set-cancel pairs

44

When looking at these results one should remember, that they cannot necessarily be generalized to other exchanges. If generally applicable filtering rules are required, the designer must analyze alarm histories from many sources and pick only the alarm groups inherent in each source. However, exchange-specific filtering rules can be defined to cope with cases that appear only in the exchange in question.

45

7. CONCLUSIONS

The volume of the alarms in the DX 200 exchange needs to be decreased to make it easier for the operators to determine the operating state. The ability of neural networks, especially the Self-organizing map, to find patterns in high-dimensional data is interesting in the case of reducing alarm flow. The principal possibilities for neural network use in alarm classification could be categorized and evaluated as follows:

• Alarm grouping (section 4.2). Alarms could be grouped in informative categories representing the "real" fault rather than just a symptom. A separate neural net could be trained by use of labelled inputs to recognize alarms of each category. The outputs of these nets are then compared to decide the final grouping of the alarm. The demand for the system must be quite high to justify the large amount of work that would go into building the necessary training samples.

• Alarm filtering (section 4.3). Alarms can be fed into a trained network in real time, and filtered out according to the output. What makes neural nets more flexible in filtering is that only the desired results must be specified. The LVQ algorithm was tested in a simple filtering task similar to cancelling delay. The results show that classification error is too large. Neural nets cannot obviously replace rule-based alarm filtering methods in cases where there are simple rules. If more complex filtering is needed neural classification might be useful, but labelling the training samples would be an arduous job.

• State monitoring (section 4.4). Data collected from a variety of working states of a system can be used to train a SOM. The variables from the system can then be plotted onto the map in real time, forming a trajectory of states. ABB has incorporated such an application as a supplementary tool into power plant monitoring system. If alarms are thought as measurements of the state of the exchange, a map could be formed, enabling the operator to discern if the exchange is operating within normal parameters. The problem is that the state for an exchange cannot necessarily be derived from alarms alone, and should be augmented by other information such as traffic measurements.

• Data mining (section 4.5). The goal of data mining is to interactively extract patterns from raw data. There are several applications using SOM for data mining, such as Entire by Jaakko Pöyry Consulting and Helsinki University of Technology, CorDex by Motorola, or NDA by the University of Jyväskylä. Data mining could be applied to alarm history to locate often occurring alarm sequences and use them to form alarm filtering rules. The above applications have shown quite promising results in similar problems.

46

The data mining application was the ideal choice for further investigation, as it is based on unsupervised learning, has no real-time requirements, can be performed in a workstation, and does not need huge amounts of alarm history data. The results from a data mining tool can easily be incorporated into the DX 200 system in form of filtering rules. No new software for the exchange is needed.

ACCORD, an alarm history analyzing tool was the result of the investigation into data mining. While other "generic" SOM-based data mining applications presented in the work focus on the visualization aspects of SOM, Accord has extensive feature extraction and preprocessing functions. It is intended solely for alarm history inspection. Results obtained by the prototype version of Accord were very promising, even taking in account the relatively few cases of alarm history it was tested on.

However, even more preprocessing features need to be added to future versions of Accord, the foremost being the ability to form filtering rules directly on the alarm history. Only then can the tool be fully used as an alarm filtering rule designer’s help utility. Other, smaller additions should focus on making Accord easier to use.

As in many other data mining applications, the SOM can be said to have proved its worth in the field of telephone exchange alarm inspection.

47

REFERENCES

/Chatteil/ Chattel, A. D. and Brook, J. B., 1993. A Neural Network Pre­processor for a Fault Diagnosis Expert System. In Proceedings of the International Workshop on Applications of Neural Networks to Telecommunications. Hillsdale (NJ), Lawrence Erlbaum.

/Fayyad/ Fayyad, U.M., et. al., editors. 1996. Advances in Knowledge Discovery and Data Mining. Menlo Park (CA), AAAI Press / The MIT Press.

/Haykin/ Haykin, S., 1994. Neural Networks - A ComprehensiveFoundation. Eaglewood Cliffs (NJ), Macmillan.

/Häkkinen/ Häkkinen, E., and Koikkalainen, P. 1997. The Neural DataAnalysis environment. In Proceedings of the Workshop on Self- Organizing Maps (WSOM'97). Espoo, Helsinki University of Technology.

/Hätönen/ Hätönen, K. et al. 1996. Rule Discovery in Alarm Databases. Report C-1996-7, University of Helsinki.

/Iivarinen/ Iivarinen, J., Kohonen, T., Kangas, J., and Kaski, S., 1994. Visualizing the clusters on the self-organizing map. In Proceedings of the Conference on Artificial Intelligence Research in Finland. Helsinki, Finnish Artificial Intelligence Society.

/Johansson/ Johansson, J., 1995. Fault diagnosis with an artificial neural network in AXE 10. In Proceedings of the International Conference on Artificial Neural Networks. Paris, European Neural Network Society.

/Karling/ Karling, T., 1990. Digitaalisen puhelinkeskuksen hälytys­järjestelmä. Master's thesis, Helsinki University of Technology.

/Kohonen 95/ Kohonen, T. 1995. Self-Organizing Maps. Berlin, Springer.

/Kohonen 96a/ Kohonen, T. et.al. 1996. LVQ PAK: The Learning Vector Quantization Program Package. Technical Report A30, Helsinki University of Technology.

/Kohonen 96b/ Kohonen, T. et.al. 1996. SOM PAK: The Self-Organizing Map Program Package. Technical Report A31, Helsinki University of Technology.

48

/Lau / Lau, H. C. et.al., 1995. A Hybrid Expert System for Error Message Classification. In Proceedings of the International Workshop on Applications of Neural Networks to Telecommunications. Hillsdale (NJ), Lawrence Erlbaum.

/Leivian/ Leivian, R., Peterson, W. and Gardner, M., 1997. CorDex: a Knowledge Discovery Tool. In Proceedings of the Workshop on Self-Organizing Maps (WSOM'97). Espoo, Helsinki University of Technology.

/Lewis/ Lewis, L. and Sycamore, S., 1993. Learning Index Rules and Adaptation Functions for a Communications Network Fault Resolution System. In Proceedings of the International Workshop on Applications of Neural Networks to Telecommunications. Hillsdale (NJ), Lawrence Erlbaum.

/NTC 93/ Anonymous, 1993. DX200 Maintenance. Training document, Nokia Telecommunications.

/NTC 95/ Anonymous, 1995. Equipment Management MML program (EQADMI). Product documentation, Nokia Telecommunications.

/NTC 96/ Tikkanen, A., 1996. Alarm Print-out Manual. General document CAG 20037 en, Nokia Telecommunications.

/Oja/ Oja, E., 1996. Tik-61.161: Introduction to neural computing. Course material, Helsinki University of Technology.

/Otte/ Otte, R. and Goser, K, 1997. New Approaches of Process Visualization and Analysis in Power Plants. In Proceedings of the Workshop on Self-Organizing Maps (WSOM'97). Espoo, Helsinki University of Technology.

/Sone/ Sone, T., 1993. Using Distributed Neural Networks to Identify Faults in Switching Systems. In Proceedings of the International Workshop on Applications of Neural Networks to Telecommunications. Hillsdale (NJ), Lawrence Erlbaum.

/TEKES/ Adaptive and Intelligent Systems Applications 1994-1999. Technology program, Finnish Technology Development Centre (TEKES). URL: http://www.kareltek.fi/opp

/Ultsch/ Ultsch, A. and Siemon, H.P., 1990. Kohonen’s self organizing feature maps for exploratory data analysis. In Proceedings of the International Neural Network Conference. Dodrecht, Kluwer.

/Vesanto/ Vesanto, J., 1997. Data Mining Techniques Based on the Self- Organizing Map. Master’s thesis, Helsinki University of Technology.

49