Intelligent management of computer networks: the GIR proposal

8

Transcript of Intelligent management of computer networks: the GIR proposal

Intelligent Management of Computer Networks: The GIR Proposal�

Alceu Britto Jr., Emerson Paraiso, Edson Scalabrin, Celso Kaestner, Br�aulio �Avila

LASINy| PPGIAz

PUC PR | Ponti�cal Catholic University of Paran�a

R. Imaculada Concei�c~ao, 1155

80215-901 Curitiba - PR, Brazil

falceu, paraiso, scalabrin, kaestner, [email protected]

Abstract

This paper presents a system proposed for the intelli-gent management of computer networks. The system isbased on the use of Arti�cial Intelligence techniques |data mining, expert systems and multi-agent systems.The work is based on the following lines of actions: (a)the use of distributed agents for the intelligent searchof information in the network, supplying it in a moreabstract way, adapted to the decision-making task; (b)the use of machine learning and data-mining techniquesthat, starting from the log �les which register previousproblems and their solution, allow the use of experiencethus obtained in the solution new problems; and (c) theuse of heuristics and conduction rules supplied by ex-perts, through a decision support system, as an advisorelement to network operators. This paper presents adiscussion about the techniques employed along thesethree research lines and the results already obtained.

1. Introduction

The migration from centralized to distributed sys-tems is a reality. Indications of this new reality are thepopularization of network technology, the increased re-liability of servers and the enhanced processing capac-ity of workstations. In large companies this behav-ior may be easily evidenced by the acceleration of thedownsizing process and the massive use of distributedprocessing systems.

�This system integrates the \GIR | Gerencia Inteligente deRedes" (Intelligent Management of Computer Nets) Project, car-ried out under angreement between PUC PR and Siemens Tele-comunications.

yIntelligent Systems LaboratoryzGraduate Program in Applied Informatics

If on the one hand these technologies bring unques-tionable bene�ts, the management of large computernetworks is not a trivial task. Some problems maybe readily identi�ed, particularly those concerning thedynamic access to the resources of a system, such asservers, routers, etc., which are distributed in the newmodel. Manager or operator support systems typicallyo�er limited help at the operational level, most of themconsisting in event monitoring, communication and log-ging.

The development of e�cient and e�cacious tools iscalled for to help in the management of the resourcesavailable in the network. A few quite complex toolswere developed for this purpose, such as OpenViewTM

[12] and OptivityTM [15], for HPTM , SUNTM andother platforms. The main problem inherent to suchtools is the overload they impose on the operator, bothin operational terms, when trying to manage resourceson-line, and in terms of knowledge handling - cognitiveoverload, as a consequence of the quantity of informa-tion handled at each decision-making [7], [11].

The GIR (Gerencia Inteligente de Redes de Com-putadores | Intelligent Management of ComputerNetworks) system is proposed in this paper to mon-itor and manage computer networks, relying on theuse of Arti�cial Intelligence Techniques. The maintechniques employed by GIR involve the applicationof Multi-Agent Systems [29] [28] [10] [16] [26] [4] [1]Machine Learning and Data-Mining [21] [22] [8] [2] [9][3] and Expert Systems [24] [19] [27].

The paper is structured as follows: the next sectionpresents an overview of the proposed system and itscomponents; section 3 presents the subsystem in chargeof information collection relying on multi-agent con-cepts; section 4 is dedicated to the presentation of themachine learning and data-mining systems employedand the description of the decision-making support sys-

tem that operates as an integrating mechanism andprovides an intelligent interface for the network man-ager; �nally, section 5 presents the conclusions and per-spectives of this work.

2. Proposed Solution

The problem of network monitoring and manage-ment may be described as follows: a network operatoror manager must be able to monitor and manage, froma given workstation, a myriad of resources and equip-ment that may be seen as belonging to an outside en-vironment.

The objective of this e�ort is to ensure the perfor-mance of several users' tasks in a safe and e�cient way.The environment is subject to failures and unforeseenevents and su�ers constant change, since the networkcon�guration is seldommaintained for a long time. Theoperation must be continuous, and the e�ects of a com-plete halt or recon�guration always result in problems.Time constraints are present either to ensure assistanceto critical tasks or to maintain the service at an ade-quate level.

Figure 1 depicts the structure of a conventional net-work management system: the data concerning net-work devices and performance is captured by a moni-toring system that provides it to the network operator;the entire data interpretation and decision-making taskrests solely with the system operator.

Figure 1. Conventional Network ManagementSystem Structure.

All these characteristics suggest that the adequatemodel for the automation of the problem is a controlsystem in which the best control vector for the currentsituation, relying on data collected in the environment,is to be provided by the system. However, the diversityof elements involved and the complexity of the prob-lem prevent the construction of proper algorithms tocarry out that task. An alternative seems to be takinginto account the experience of network operators andmanagers in hands-on problem solving.

This is precisely the strategy employed by the GIRsystem, using the information available in an intelligent

manner, along three basic lines:

� distribution of agents that search the network forinformation in an intelligent manner, providingthem in a more abstract and adequate way fordecision-making;

� the use of machine learning and data-mining tech-niques that, starting from the log �les which reg-ister previous problems and their solution, allowthe use of experience obtained from previous situ-ations;

� the use of heuristics to the conduction rules sup-plied by experts, through a decision-support sys-tem.

Figure 2 outlines the structure proposed for an in-telligent system of network management.

The data on equipment and other devices is obtainedin a selective way and at the adequate abstraction level,by means of a set of agents operating in a distributedway. A set of data mining tools allows the analysis ofprevious problem situations and the steps taken to cor-rect them. A decision-making support system allowsthe operator to diagnose the situation on the basis ofknowledge provided by network management experts,or by the identi�cation of similar situations previouslysolved. The �nal decision concerning action still be-longs to the system operator, but it can be made nowwith the support and suggestion o�ered by the auto-mated system.

Figure 2. Intelligent Management NetworkSystem Structure

3. Information collection and handling

agents

The �rst task needed in a management process isinformation collection. This process may include not

Type Skill

InformationTo provide information on a given re-source (e.g. server, router).

Task

To �lter and synthesize informationprovided by the Information agents,according to the di�erent abstractionlevels

Interface

To interpret and submit to the man-ager the present network situation,on the basis of knowledge providedby the Task agents, and facilitatethe communication between man-ager and system.

Table 1. Agent types and skills.

only the pure and simple search for data, but also thetreatment suited to the intended use.

In the GIR system data is collected in the computernetwork by a group of agents. This is a natural alterna-tive, since the network environment is inherently dis-tributed and many interactions may be needed in theprocess. On the other hand, the utilization of agentsand their communication reduces the volume of use-less information in the network, thereby reducing theoverhead imposed by the management system.

The tasks are carried out by a set of agents organizedinto three di�erent categories: Information, Task andInterface, the respective scope of which is shown inTable 1.

Agents act according to the following stages: (a)direct acquisition of information on the resources man-aged by a network; and (b) information treatment, per-formed by a �ltration and synthesis process to obtaininformation with a higher level of abstraction to meetthe requirement.

In computing terms, agents are processes distributedin the network, and they exchange information by send-ing messages. It is important to note that there areseveral agents of the same type, allowing parallel infor-mation acquisition and handling.

Some Information agents are SNMP managers thatsearch information from SNMP agents (managed ob-jects) [23] present in the system. Others were built toprovide information from the workstations existing ineach network segment, that is, they work on the basisof stimuli and answers [5].

Interface and Task agents in turn operate in multi-thread, allowing them to interact with several Infor-mation agents in parallel. Those agents have objec-tives and a pro-active behavior | being in this senseintelligent [28] [29].

Other tasks performed by the association of agentsare: (1) the creation of logs containing relevant datasuch as route, response time, and time of log gener-ation; such data are later used in the knowledge ac-quisition process; (2) the collection of MIBs from man-ageable agents already existing in the network environ-ment, to provide them to the Decision Support service.

3.1. Example of operation

To illustrate the operation, a performance monitorprocess in a network segment is provided as an exam-ple (see Figure 3). There are several network segments,and each one of them has a Task agent. Initially, In-terface agents query Task agents about the situationin their segment. One of these agents is the Ping Taskagent. This agent builds an ICMP type message andsend it to each route leading to Information agents thatare, in general, servers scattered throughout the net-work. Thus, after the message returns, a description ofthe situation of each route between Task and Informa-tion agents is obtained, allowing the determination ofthe performance of the segment for each route. Inter-face agents are charged with providing such informa-tion to the operator.

When a too long response time is detected in a givensegment or route, the Task agents takes over the roleof SNMP managers, de�ned as the software elementsinstalled in network equipment able to provide informa-tion on their interaction with the network [12]. Thoseagents collect from the equipment located on the routeunder study information that must be taken into ac-count in problem solving. Such information is passedon to the Interface agents and later to the DecisionSupport module.

Figure 3. Positioning of Interface, Task andInformation agents in a Computer Network.

3.2. Implementation

The system is implemented in C++ on a WindowsNT platform, and employs RPC to carry messages be-tween di�erent agents [17]. To recover and check in-formation about routes and overloads on network seg-ments, ICMP packages (ping), socket raw and check-sum were employed [18]. Finally, graphic interfaceswere employed to present the manager with the net-work situation in the form of gauges, as shown in Fig-ure 4.

Figure 4. Representation of segment perfor-mance by means of gauges.

3.3. Agent Manager

A special type of Interface agent was built to allowagent management. It is the Manager, allowing graphicdisplay of agent location in the network and automaticchange of location of machines and segments in thenetwork. Therefore such system agents are consideredmobile so that they can act in di�erent sites throughouttheir existence. With the use of agents they may beenabled to monitor speci�c network points with greaterease.

Agents can also be con�gured remotely from theManager, by de�ning their scope according to a pre-established model. The interaction among agents isbeing performed by the information exchange languageKQML [13].

4. The Decision Support System

This module consists in a Knowledge-Based Systemto support the decision-making process. In the contextof computer network management, DSS tries to help

the manager or operator with the task, providing sug-gestions about what to do in the face of a given event inthe network, and suggesting action to enhance his/herperformance.

One of the main advantages of using DSS is the re-duction of the cognitive overload on the operator, sincein many situations the number of variables to be con-sidered in the search for a solution exceeds the memo-rization and handling capability of the human mind.

As seen in Figure 2, DSS integrates directly withthe other system components, resulting in an intelli-gent user interface. Its entries are events or problems inthe network originating from: (a) a conventional net-work monitoring system (HP OpenView e Optivity);(b) the \intelligent agents" scattered in the networkperforming some kind of monitoring; or (c) the net-work operator himself, in case he/she intends to carryout some sort of simulation in the environment. Theevents are then used to trigger the inference processthat aims to de�ne an action to be suggested to thenetwork operator.

4.1. The DSS Structure

In general lines this module follows the classic struc-ture of a rules-based production system [27], made upof a language to represent knowledge and one (or sev-eral) mechanisms to perform the inference. However, afew particular characteristics stand out: (1) the orga-nization of the knowledge about the problem domainin contexts | groups of rules applicable to a particu-lar situation, ful�lling the need to minimize the e�ortspent during a likely maintenance of the knowledge ba-sis; and (2) the acquisition of \raw" knowledge in theform of rules originating from data mining techniques.

Basically, it is possible to split the DSS structure inthree levels:

Data level

At the data level is found the momentary knowledgerepresenting the current status of the system duringthe resolution of a problem. This is constantly fed bythe agents charged with searching relevant informationin the network, as well as by context information (ormeta-information) that is of the utmost importance insolving a problem and suggesting an action.

Knowledge level

This level is represented by a knowledge base com-posed of production rules in the IF-THEN format.Each rule represents an atomic article of knowledge

and belongs to a certain context. Thus, each context iscomposed of a set of related rules concerning the scopeof a particular problem, the selection of which is drivenby events originating in the network or by the inferenceprocess itself.

This division facilitates the process of knowledge ac-quisition and the maintenance of the system's knowl-edge base, and contributes to enhance system perfor-mance, since only the knowledge parcel more adequateto the situation is employed.

The natural style of production rules facilitatesthe inclusion of explanations about the reasoning em-ployed, such as \how" and \why" questions about agiven decision.

Control level

The control level contains the control strategy tohandle knowledge and may be referred to in general asinference engine.

There are two possible approaches: (a) forward rea-soning, starting from the evidence available (currentstatus) in search of a solution; (b) backward reason-ing, using a set of pre-de�ned assumptions as startingpoint and trying to determine which one of those as-sumptions is supported by the evidence. Certainly thede�nition of the best strategy depends on the problemin question [27]. The GIR system implements bothabove mentioned approaches, and each situation con-text may use a di�erent strategy merely by indicatingthe desired type of engine (forward or backward).

4.2. Example of operation

Figure 5 shows an example of an information blocthat makes up a context concerning a communicationfailure problem. In this �gure the following is observedas to the particular context structure: (a) the de�ni-tion of the inference direction employed, in this caseforward (DIRECTION: FORWARD); (b) the presenceof context-speci�c variables; and (c) the set of rulesthat make up the context. As to the syntax of the de-�ned language, the presence of QUERY and CHANGEcommands stands out, enabling to request informationon the network environment from system agents andthe change of context.

A main context is used to start the inference pro-cess. Starting from the variables concerning the prob-lem (events and alarms), a set of rules de�nes the nextcontext to be used. This expert context is then trig-gered, and it is possible to move to another contextif so indicated by the inference process that continuesuntil a solution to the problem is found.

Figure 5. Context with Knowledge about com-munication fault.

The DSS also possesses a user-friendly graphic inter-face designed to facilitate its interaction with the net-work operator. At the same time, the system possessesa simple protocol allowing communication with net-work monitoring agents. During the interaction withthe agent, the network manager or operator may dis-card the action suggested and provide another con-sidered more adequate. However, the system alwaysrecords the problem and the action chosen in a properlog �le of the triple type (problem, suggested action,action taken). This is important to allow feedback inthe decision-making process, taking into account pre-vious cases as a guiding mechanism in new, similar sit-uations.

4.3. Building the knowledge-base

In order to ful�ll the DSS knowledge-base, the ac-quisition of knowledge follows two approaches: (1) clas-sical, by interviewing network experts, managers andoperators [27]; and (2) automatic knowledge acquisi-tion, based on data-mining techniques [8] and MachineLearning [14].

The classical approach emphasized the use ofinterview-based techniques, allowing the mastering ofthe terminology employed in computer network man-agement, and the de�nition of heuristics indicated bynetwork operators and managers on the basis of theirexperience.

However, the main factor distinguishing DSS is theuse of automatically acquired knowledge. In di�erentapplication areas, data mining techniques have beenemployed to extract knowledge from major informationrepositories, such as the examples reported by [2].

The use of data mining techniques is possible in thecontext of computer network management, since thelog �les generated by management assistance systemslike HP OpenView and Optivity make up a bulk of datathat may contain relevant information for managers inthe form of logged events. Most of those events concernproblems occurred in the network.

Thus, a knowledge acquisition system is applied tothe available log �les in DSS, in two main stages: in-formation pre-treatment and data-mining proper.

Information Pre-Treatment

The initial stage consists in selecting, cleaning, en-riching and formatting data, using as ancillary tool anappropriate interface to navigate the log �le. Figure 6depicts the main interface window.

As already mentioned, important information onnetwork events is recorded in the log �les. However,those events are typically not related to the steps takenby the expert to solve the problem at that time. Oneof the characteristics of that interface is precisely thatit adds ancillary information such as the problem classand the respective action on the part of the expert atthe time the problem arose/was solved.

Figure 7 (a) contains a schematic depiction of infor-mation pre-treatment, showing the log �le reading andthe interaction with the expert to indicate the actiontaken after the recorded events. Thus the association(event-class-action) is obtained in an explicit way. Animportant by-product of this stage is the creation of anontology of the area, needed to standardize the termsused in the statement of problem classes and actionsduring the pre-treatment process.

The �nal result of this stage is a training base ad-equately formatted and �ltered to employ automaticknowledge acquisition algorithms to be used in the nextstage of the process.

Data-mining

Like in a mining process [8] the training base ob-tained in the pre-treatment is searched | or mined |

Figure 6. Interface for navigating the log file.

for a precious thing: the knowledge about computernetwork management that will allow solving the prob-lem.

The literature presents several data-mining pro-cesses. Algorithms for automatic knowledge acquisi-tion such as CN2 [6], C45 and C50 [22] are especiallyemployed to obtain classi�ers (event-action) based onproduction rules and decision trees [20]. Such algo-rithms induce production rules from example �les [19][21].

As can be seen in Figure 7 (b) acquisition algorithmsgenerate a knowledge basis (set of rules) from the train-ing base generated during information pre-treatment.The �nal product of this module is pure knowledgeabout network management.

5. Conclusions and Perspectives

This paper describes a proposal for a system employ-ing modern Arti�cial Intelligence techniques to supportcomputer network management. Its main goal is tohelp the conduction (situation assessment) of the sys-tem operator as to the best alternative of action foreach problem detected. As in most systems of this type,the goal is to decrease the cognitive overload falling onthe operator, allowing an automatic analysis of the sev-eral alternatives presented. The distributed, selective,and intelligent information collection carried out by anassociation of expert agents may also be pointed out.

The knowledge employed by the system derives fromtwo main sources: (a) heuristic, obtained from ex-perts in network conduction that make up a Knowl-edge Based System; and (b) information obtained via

Figure 7. (a) Information pre-processing and(b) Data Mining.

automatic knowledge acquisition in a data-mining pro-cess carried out in �les containing system logs; the aimis to detect similar situations and identify the actionstaken by the operator in previous cases.

Systems with this focus take on an active role inmodern computer network management. The intelli-gent participation ensures better use of the informationo�ered by monitoring systems, as well as the standard-ization of actions among several managers, and facili-tates training new operators.

The GIR system is undergoing the �nal implemen-tation steps; most of the software modules describedabove have already been developed. The future stagesof the program are the following:

� increase in the number of parameters manageableby information agents, granting network operatorsmore exibility;

� interaction with SNMP agents, enriching manage-ment possibilities by means of utilization of theinformation available in this protocol widely usedin distributed environments;

� tests performed in the machine learning environ-ment checking the actions suggested in typical fail-ure situations;

� continuous acquisition of heuristic knowledge de-riving from experts in the area, aiming to create amajor Data Base on the domain; and

� performance of system operation tests in a realworking environment, represented by a large cor-porate computer network.

References

[1] Abe, J.M.; �Avila, B.C.; Prado, J.P.A., Multi-Agents

and Inconsistency, ICCIMA'98 International Confer-

ence on Computacional Intelligence and Multime-

dia Applications, Monash University, World Scien-

ti�c Publishing Co. Pte., Ltd., Australia, pp. 137-142,

February, 1998.

[2] Adriaans, P.; Zantinge, D. Data Mining, Addison-

Wesley, England, 1996.

[3] �Avila, B.C., Data Mining, Livro da VI Escola Re-

gional de Inform�atica da Regi~ao Sul, SBC | So-

ciedade Brasileira de Computa�c~ao, pp. 87-106, Brazil,

Maio, 1998. (in portuguese)

[4] Barth�es, J-P.A.; Scalabrin, E.E., Cognitive Agents

and Exchange Protocols, MASTA Workshop, Coim-

bra, Portugal, October, 1997

[5] Brooks, R.A., A Robust Layered Control System for a

Mobile Robot, IEEE Journal of Robotics and Automa-

tion, 2(1), pp. 14-33, March, 1986.

[6] Clark, P.; Niblett, T., The CN2 Induction Algorithm,

The Turing Institute, Glasgow, UK, 1988.

[7] D'Ambrosio, B.; Fehling, M.; Forest, S.; Raulefs P.;

Wilber M., Real-Time Process Management for Mate-

rials Composition in Chemical Manufacturing, IEEE

Expert, pp.80-93, Summer. 1987.

[8] Fayyad, U.M.; Shapiro, G.P.; Smyth, P.; Uthurusamy,

R.,Advances in Knowledge Discovery and Data Min-

ing, AAAI/MIT Press, 1995.

[9] F., Yongjian, Discovery of Multiple-Level Rules from

Large Databases, PhD Thesis, School of Computing

Science, Simon Fraser University, 1996.

[10] Genesereth, M.R.; Ketchpel, S.P., Software Agents,

Communications of the ACM, Vol. 37(7), pp. 48-53,

July, 1994.

[11] Jacob, F.; Suslenschi, P., Situation Assessment for

Process Control, IEEE Expert, pp. 49-59, April, 1990.

[12] OpenView, HP OpenView: Using Network Node Man-

ager, Hewlett Packard Part No. J1169-90002, October,

1995.

[13] McGuire, J.; Pelavin, R.; Shapiro, S.; Finin, T.; We-

ber, J.; Wiederhold, G.; Genesereth, M.; Fritizson, R.;

Mckay, D., Draft Speci�cation of the KQML Agent-

Communication Language, 1993.

[14] Michalski, R.S., A theory and methodology of indutive

learning, In Michalski et al. (editor) Machine Learn-

ing: An Arti�cial Intelligence Approach, Vol. 1, pages

83-134, Morgan Kaufmann, 1983.

[15] Optivity, Optivity LAN 7.0 for UNIX, Bay Networks,

Part no. 893-568-G, April, 1996.

[16] Paraiso, E.; Kaestner, C.; Ramos, M., MASC: A

Multi-Agent System for Control and Supervision of

Industrial Plants, Engineering of Intelligent Systems,

Tenerife, Spain, February, 1998.

[17] Paraiso, E.; Vigo, J.; Ditzel, C., Implementa�c~ao e teste

desempenho da linguagem KQML, Relat�orio T�ecnico

LASIN-002/97, Pontif�icia Universidade Cat�olica do

Paran�a, Mestrado em Inform�atica Aplicada, Curitiba,

Brazil, 1997. (in portuguese)

[18] Paraiso, E.; Vigo, J.; Camargo, G., PING: Um Moni-

tor de Tempo de Resposta, Relat�orio T�ecnico LASIN-

003/97, Pontif�icia Universidade Cat�olica do Paran�a,

Mestrado em Inform�atica Aplicada, Curitiba, Brazil,

1997. (in portuguese)

[19] Quinlan, J.R., Induction, Knowledge and Expert Sys-

tems, Proceedings of the Australian Joint Conference

on Arti�cial Intelligence, Sydney, Australia, 1987.

[20] Quinlan, J.R., Generating production rules from de-

cision trees, In J. McDermott (editor), IJCAI-87, pp.

304-307, 1987.

[21] Quinlan J.R.; Compton, P.J.; Horn, K.A.; Lazarus, L.,

Inductive Knowledge Acquisition: a case study, In Ap-

plications of Expert Systems, Addison-Wesley, Wok-

ingham, UK, pp. 157-173, 1987.

[22] Quinlan J.R., C4.5 Programs for Machine Learning,

Morgan Kaufmann Publishers, San Mateo, California,

1993.

[23] Rech Filho, A., Estudos para Implanta�c~ao de uma

Gerencia de Rede Corporativa Utilizando Arquitetura

de Protocolos Abertos, Tese de Mestrado, CEFET-PR,

Maio, 1996. (in portuguese)

[24] Russel, S.J.; Norvig, P., Arti�cial Intelligence: a mod-

ern approach, Prentice-Hall, New Jersey, 1995.

[25] Santos, A.; Carvilhe, C.; Capeline, K.; Britto Jr., A.,

Um estudo comparativo dos algoritmos CN2 e C4.5,

Relat�orio T�ecnico LASIN-001/97, Pontif�icia Universi-

dade Cat�olica do Paran�a, Mestrado em Inform�atica

Aplicada, Curitiba, Brazil, 1997. (in portuguese)

[26] Scalabrin, E.E., Conception et R�ealisation d'environ-

nement de d�eveloppement de syst�emes d'agents cog-

nitifs, Ph.D. Thesis, Universit�e de Technologie de

Compi�egne, France, 169 p., 1996. (in french)

[27] Ste�k, M., Introduction to Knowledge Systems, Mor-

gan Kaufmann, 1995.

[28] Sycara, K.; Decker, K.; Williamson, M.; Pannu, A.,

Distributed Intelligent Agents, IEEE Expert, July,

1996.

[29] Wooldridge, M.J.; Jennings, N.R., Agent Theories,

Architectures, and Languages: A Survey, Workshop

on Agent Theories, Architectures and Languages,

ECAI'94, Amsterdam, 1994.