Expert systems in telecommunications

ExpertS)stems With Applications, Vol. I, pp. 127-136, 1990 0957-4174/90 $3.00 + .00 Printed in the USA. © 1990 Pergamon Press plc

Expert Systems in Telecommunications

JON R. WRIGHT AND GREGG T. VESONDER

AT&T Bell Laboratories, Warren, NJ

Abstract--Expert systems have been successfully applied to many maintenance, provisioning, and administrative tasks in telecommunications networks. Given that they can be appropriately integrated with the existing base of software applications, expert systems will play an important role in the .future. We review nearly 40 current projects, which run the gamut from research prototype to ftnished product.

1. INTRODUCTION

ONE MEASURE OF the growth in the application of expert systems to telecommunications is the number of published reports on the subject, and there is clear evidence of growth over the past several years. In 1986, a survey of expert systems by Waterman (1986) listed only two projects in telecommunications. They were COMPASS (Prerau, Gunderson, Reinke, & Goyal, 1985) and ACE (Vesonder, Stolfo, Zielinski, Miller, & Copp, 1983). More recent sources such as Liebowitz (1988), industry newsletters (SFCG Highlights, 1988; Spang-Robinson Report, 1988), and specialized con- ferences on expert systems applications (Attard, 1989; Teitell, 1988) collectively contain descriptions of nearly 40 expert system projects.

Table l shows the number of telecommunications expert systems by the year in which reports were first published. Although Table 1 is based on publicly available reports and contains systems that are in many different stages of the project life cycle, ranging from research prototype to mature product, it reflects steady growth. There have been new projects reported every year since ACE was first described. We think the telecommunications domain continues to be fruitful ground for expert systems, and feel that the work here is forward looking in several ways. This is partly a con- sequence of our view that expert systems are a natural extension of the methods employed by software developers in the past, and that they can greatly extend the capabilities of existing computer applications.

No telecommunications network can work effi- ciently or cost effectively without extensive support in the form of specialized computer applications. Ten

Requests for reprints should be sent to Jon R. Wright, AT&T Bell Laboratories, 184 Liberty Comer Road, P.O. Box 4908, Warren, NJ 07060-0908.

years ago, there was already extensive automation in support of the public switched network in the United States (for an example, see the Summer, 1983 issue of the Bell System Technical Journal devoted to the Au- tomated Repair Service Bureau). Today, these computer systemsnoften called Operations Support Sys- tems or simply Operations Systems (OS)nform a strong base of core applications.

OS in the telecommunications domain are both specialized and complex. The forces driving OS development have been primarily, although not exclu- sively, economic in nature. Technological innovation has been important in the history of these systems, but real business problems lie behind every project. In many cases, the final solution to these problems required a rearrangement of work--a division of labor, so to speak~between computers and people. Although automation was carried as far as was technically and economically feasible, some work remained. Fre- quently, new jobs, sometimes of a fundamentally different nature, were created to handle the remaining work.

We make this point because it is our view that the technology underlying expert systems can be used to address a wider range of problems than is typical of most present day OS--as long as they can be made to work effectively within the framework of the existing OS environment. Expert systems, appropriately con- ceived and applied, can push automation further than previously has been feasible. This kind of clean-cut economic benefit encourages the development of new expert systems.

An example of the relationship between conventional OS and expert systems is the following. Tele- communications networks generate huge amounts of data because of the volume of traffic they handle. Tele- phone-switching machines, for example, print diagnostic messages whenever an attempted call cannot be completed. Sometimes these messages represent nor-

127

128 J. R. Wright and G. T. Vesonder

TABLE 1 Expert Systems in Telecommunications by Year of First Publication and Application Domain

Year of first publication

Application domain 1983 1984 1985 1986 1987 1988 1989

Maintenance 1 Provisioning Network Administration Total 1

1 2 5 3 9 4 - - ~ 1 1 3 1 - - 2 1 ~ 3 1 1 4 7 4 15 6

mal conditions in the network but other times they are caused by real problems that must be repaired.

It often takes an expert who understands the switch to know the difference between normal conditions and serious trouble. Further, messages frequently must be grouped together like pieces of a puzzle before their meaning becomes clear. There are both monitoring and diagnostic aspects to this problem and expert systems have been applied successfully to both (see the review in section 2).

However, such systems must have access to the raw messages. Typically, this is achieved by tying the expert system to an OS that has direct access to the network. The OS provides access to the message traffic, manages the communications, and may supply some record- keeping or report-generating capability. Tile expert system discards false alarms, groups related messages together in some meaningful way, and constructs an interpretation or diagnosis that helps technicians repair the trouble.

In some cases, the expert system might go so far as to issue commands to the network (via the OS) to correct the problem or provide a temporary solution until repair technicians can be dispatched. The Starkeeper ® Network Troubleshooter (Marques, 1988a; 1988b) is an example of an expert system that does exactly this by using the Starkeeper Network Management OS as a go-between.

A key point to notice in this scenario is the relationship between the OS and the expert system, a relationship that we think is typical of many successful expert system applications. Basically, the OS provides a core application and the expert system automates tasks surrounding the use of that application. Often the expert system will employ the user interface of an existing OS to gain access to the functionality it requires. Interestingly enough, it is more often the lack of access to a core OS application that holds back expert system deployment rather than the ability to build a working expert system itself.

This scenario is repeated so frequently that we think that the ability to integrate expert systems with standard computing environments is one of the most critical factors, if not the most critical factor, in a project's

success. Vesonder (1988) describes some of the underlying rationale for this position and discusses the integration of rule-based expert systems with the UNIX ® operating system.

2. EXPERT SYSTEMS IN SUPPORT OF NETWORK OPERATIONS

In telecommunications, expert systems have primarily been applied to what is best described as network operations support. Basically, network operations can be divided into three broad categories. First, maintenance functions keep the components of the network working properly. Testing, equipment repair, troubleshooting, trouble report processing, and preventive maintenance are included in this category. Next, provisioning is the process of forecasting demand for network services, planning and engineering changes to the network to satisfy that demand, and installing new equipment or rearranging the network accordingly. Finally, network administration encompasses a diverse group of functions that sustain network services once they have been provided. Among these are traffic management, routing, billing, facility assignment, and record keeping.

Maintenance experts were the first kind of expert system to appear and are still the most common kind of expert system in telecommunications. These systems are directed at the activities that lead up to the actual repair of equipment. These up-front tasks consist largely of monitoring, diagnosis, data interpretation, and fault localization problems, which are, for the most part, amenable to known, well-established techniques in the expert system community.

This contrasts sharply with provisioning applications, for example, where planning problems predom- inate and where expertise in routing algorithms or performance models is sometimes required. Planning problems are difficult to organize properly, and often involve searching large problem spaces. However, there are beginning to appear in the literature a few expert systems in the provisioning domain. These systems address those portions of the provisioning task that can be solved with known techniques.

® Starkeeper is a trademark of AT&T. ® UNIX is a trademark of AT&T.

Expert Systems in Telecommunications 129

In the network administration area, the most fre- quent kind of expert system application is traffic routing. Some of these expert systems have routing algorithms or performance models at their core, and use these models to make decisions about the network. Such systems are good examples of the eclectic approach often needed in successful expert system projects.

2.1. Maintenance Experts

There are essentially four aspects to any maintenance operation: (1) monitoring or trouble report processing, (2) trouble-shooting, (3) diagnosis, and (4) repair. In the telecommunications domain, expert systems have been applied successfully to the first three and, in some specialized cases, to the fourth.

Switch and Network Monitors. Telecommunications networks are rather data intensive creatures. Large switching machines, for example, process hundreds of thousands of calls daily. When a call fails, diagnostic messages are generated that provide information about the call failure. These messages could point to problems at any place in the network--interoffice trunks, outside plant, terminal equipment, or a component of the switch itself.

Switching machines have a great deal of redundancy built into their design. This redundancy permits the switches to continue processing calls even when key components are failing. Because of network redundancy, customers may either not be aware of troubles, or they may perceive the troubles as transient and so choose not to report them. As a result, switch maintenance is heavily dependent on the internal diagnostics produced as the switch processes calls.

The detective work needed to properly maintain a switching machine is complicated by the fact that some messages are false alarms, and, in fact, represent normal conditions in the network. For example, messages are generated whenever someone places a call to a destina- tion that has been disconnected. In other cases, the same messages might represent a discontinuity in the outside plant or perhaps in the switch itself(colloquially called an open)--a genuine trouble that must be repaired.

Usually, the underlying problems represented by these messages can only be uncovered by examining groups of related messages over time. An expert with intimate knowledge of the switching network can use these patterns to identify faulty equipment and sched- ule the appropriate repair work. On the other hand, locating meaningful patterns out of hundreds and often thousands of messages is tedious and difficult work even for the best experts.

Message monitoring has been a prime candidate for automation using expert systems. The pattern-matching techniques developed in the artificial intelligence

community have proved to be a natural way to express the necessary relationships among messages needed by monitoring systems. Production system languages such as OPS5 (Forgy, 1981) or OPS83 (Forgy, 1986) or one of the commercially available multiparadigm shells that allow knowledge to be expressed as production rules are commonly used.

Knowledge is encoded in production rules by describing patterns of messages on the left-hand side of production rules. When the left-hand side of a rule is matched and the rule fires, actions on the rule's right- hand side alert users to the trouble conditions. Each production rule represents a solution in a sometimes large problem space. However, the number of solutions is typically not large (perhaps several hundred), and is collected through a combination of experience and knowledge of the network technology. This problem- solving technique, known as match, does not involve search and is very efficient.

There are at least 13 expert system projects that monitor diagnostic messages for various kinds of networks. The kinds of networks to which these expert systems are applied vary considerably--from public telephone networks to packet switch networks to the audio networks used in broadcast studios. Frequently, the monitoring expert also provides assistance with other kinds of maintenance functions such as troubleshooting or diagnosis.

Some of these projects have had a fairly significant impact on their respective maintenance operations. ACE (Wright & Siegfried, 1985; Wright, Zielinski, & Horton, 1988), for example, monitors and diagnoses trouble in the local loop using a database of maintenance records. There are more than 100 ACE systems working in the Regional Bell Operating Companies in the United States.

ACE is not the only successful application. NET/ ADVISOR (Mantleman, 1986), which monitors local area networks, has been a commercial product since 1986, and CENTAURE (Benicourt, Arnaud, & Vin- cent Crommelynck, 1989), which monitors the telecommunications network of SCNF, the French Na- tional Railway, reached full deployment in mid-1989. TERESA (Corn, Dube, McMichael, & Tsay, 1988) (previously called TOPAS-ES) has had successful product trials, and is working in multiple sites. The same is true of GTE's COMPASS (Prerau et al., 1985), NEMESYS, (Macleish, Theidke, & Vennergrund, 1986) and PROPHET (Prerau, Gunderson, & Levine, 1988), which all monitor switch messages of various kinds. In addition, BELLCORE has several switch- monitoring projects (Loberg, 1988; Slawsky & Sassa, 1988; Sutter, 1986).

Packet-switching networks have been the target of expert system projects on several occasions. IAS (Fer- rata, Giovannini, & Paschetta, 1989), which monitors the Italian ITELPAC network, and the DPN Monitor (Baird & White, 1989), which monitors Bell Canada's


packet switching network, have both had successful product trials and are moving into a deployment stage in 1990. Another pocket switch application is DAD (Rabie, Rau-Chapin, & Shibahara, 1988).

Monitoring experts are used in a wide variety of networks. British Telecom's AMF (Thandasseri, 1986) monitors TXE4 exchanges. TXE4 switches are used by broadcast studios for routing audio signals. REACT (Fox, 1988) and RTS (Mantleman, 1986) monitor diagnostic messages generated by the No. lESS switch in the United States. The FIESTA prototype (Miksell et al., 1988) monitors satellite communications for the U.S. federal government.

Troubleshooting. Troubleshooting is distinct from both monitoring and diagnosis. The goal of a troubleshooter is to identify an unknown faulty component or sub- system. Seldom in a troubleshooting task is all the rel- evant data immediately available. The troubleshooter actively seeks out new data by interacting with the system that is in trouble. Usually it is possible to collect far more data than is really needed or feasible, so that careful choices are necessary. There is clearly a dynamic aspect to troubleshooting--the path selected is not predictable from its initial inputs; rather, it depends on what is discovered as the troubleshooting process unfolds.

MAD (Peacock, 1988, Built, Peacocke, Rabie, & Starr, 1987) is a good example of a troubleshooting system that uses a human operator as a go-between. It is an advisor for the DMS family of switching machines developed by Northern Telecom. Diagnostic messages from the DMS switches are received by a technician over a printed maintenance channel. The technician reads the messages and provides input to MAD. MAD asks the technician for additional information when needed, and the technician obtains this information by interacting with the switch through the maintenance channel. Basically, MAD is an advice giver that depends on a human to intercede between it and the system that is in trouble.

Some expert systems have direct interfaces to the systems they troubleshoot, frequently using the same command interface that human troubleshooters use, and sometimes even resolving or temporarily relieving trouble on their own initiative. The Starkeeper Network Troubleshooter (Marques, 1988a; 1988b), an expert system developed by AT&T Bell Laboratories for the Datakit ® network, is a good example of this kind of system.

The Starkeeper Network Troubleshooter has an interesting approach to troubleshooting. It keeps a historical database of component failures along with the evidence that was available during each troubleshooting

® Datakit is a trademark of AT&T.

session. When new troubles arise, Troubleshooter obtains a description of the pathway associated with the network failure. Because any link or node along the pathway could be faulty, Troubleshooter then produces a plan or agenda describing what components should be tested and in what order. The plan is generated by combining historical frequency of failure records with current symptoms using Bayes' Rule.

Finally, Troubleshooter invokes specialist modules that do qualitative analyses on specific components. When the cause of the reported trouble is uncovered, Troubleshooter updates its historical records with new frequencies. This gives the Starkeeper Troubleshooter the ability to adapt to local conditions, something that is unusual for most working expert systems.

Some troubles reported to the Troubleshooter are administrative in nature and may not require physical repair. By using commands available through Datakit's Starkeeper Network Management System, the Trou- bleshooter is sometimes able to restore service to users on its own.

Troubleshooter's specialist modules provide an example of how troubleshooting knowledge can be cap- tured as a set of production rules. Each situation in which it is appropriate to run a particular test is cap- tured on the left-hand side of a production rule, and the right-hand side executes that test. The left-hand side might describe node or link configurations and specific evidence, symptoms, or data produced from previous tests. The approach works because the situations represented by the production rules are generic enough to handle most reported Datakit troubles. Rel- atively speaking, the Troubleshooter is able to capture knowledge about these situations in a manageable number of rules (about 1400).

Troubleshooting experts have been developed for a fairly wide variety of networks. ARTEX (Fleischanderl, Friedrich, & Retti, 1989), written in PROLOG, shoots troubles on the same kinds of audio-routing systems monitored by AMF. COMNET (Reddy & Uppuluri, 1986) is a troubleshooter for digital and analog data circuits. ExT (Yudkin, 1987) shoots troubles on on digital special services circuits, and the Network Trou- ble Shooting Consultant (Hannan, 1987) is a troubleshooter for DECnet and Ethernet LANs. Other expert systems with troubleshooting features are FIESTA (Miksell et al., 1988), a troubleshooter for satellite networks, TERESA (Callahan, 1988; Corn, Dube, McMichael, & Tsay, 1988), an expert system that both monitors and shoots trouble on interoffice trunks, AUTOTEST-2 (Ackroff, Surko, & Wright, 1988; Ack- roff, Surko, Vesonder, & Wright, 1990), and IRA (Horton, Hsiao, & Zielinski, 1988).

Diagnosis. If one thinks of troubleshooting as the process of uncovering evidence on a network trouble, diagnosis is the process of placing an interpretation on


that evidence. Most maintenance expert systems provide diagnoses or data interpretation along with the other functions they perform. AUTOTEST-2 (Ackroff et al., 1988), which is basically a troubleshooter for special services circuits, gives us a good example of what trouble diagnosis is all about.

AUTOTEST-2 has access to test equipment installed on special services circuits (Foreign Exchanges, WATS lines, conditioned data lines, etc.) through an interface to the SARTS (Switched Access Remote Test System) OS. Basically, AUTOTEST-2 tests circuits using strat- egies and methods that are very similar to human test- ers. Circuit records stored in SARTS are used to de- termine the kind of circuit being tested and the location and type of test equipment installed on the circuit. AUTOTEST-2 does both sectionalization and diagnosis of trouble, and its recommendations are good enough to dispatch repair technicians directly from its output. Like the Datakit Troubleshooter, AUTOTEST- 2 uses the same commands available to users through a standard interface.

Diagnosis is a critical part of AUTOTEST-2 for several reasons. First, misleading test measurements are frequently generated during the course of troubleshooting, and AUTOTESTo2 depends on its ability to diagnose to prevent these measurements from generating false alarms on equipment that is working properly. Second, the diagnoses themselves imply who should be dispatched to repair the trouble, and they are used to route troubles to the appropriate work group. These routing decisions form the heart of Au- totest-2's economic and service benefits.

Diagnostic knowledge is encoded as individual production rules. Each diagnostic rule describes a pattern of measurements that is known, either through experience or technical understanding of the domain, to represent meaningful troubles. The use of production rules is appropriate because the individual diagnoses are relatively few in number (a hundred or less). To give an example, the simplest diagnosis occurs when metallic faults in the network prevent terminal equipment from responding properly. There are two distinct sets of measurements, one set indicates a metallic fault, and another set indicates faulty terminal equipment. In this case, AUTOTEST-2 recommends that technicians repair the metallic fault but is able to disregard the measurements that suggest the terminal equipment should be replaced.

2.2. Provisioning Experts

The term provisioning is usually thought to apply only to large-scale public switched networks. However, provisioning is clearly part of the management of all kinds of networks, if sometimes on a smaller scale. Anyone responsible for managing a local area data network,

for example, will recognize similarities in what we are about to describe as network provisioning.

Provisioning encompasses a broad, somewhat open- ended, set of tasks. There are four principal activities: planning, design, configuration, and implementation. Provisioning usually takes place within the context of an overall master plan that describes how a network will evolve over a period of years, and there are often short-term plans that describe how networks are changing to meet new levels of customer demand. Changes to the network must be designed, taking into account currently available, and sometimes future, technology. Adding nodes and links to a network are examples of design level changes.

Once a design is completed, the individual nodes and links must be configured. The configuration stage takes node and link specifications generated in the design stage and produces equipment lists and instruc- tions for connecting them together properly. Finally, the right equipment must be acquired and installed on site.

No project, to our knowledge, has taken on provisioning in its full complexity. In particular, network planning is an important area that no working expert system has addressed successfully. There are, however, working expert systems that successfully address limited portions of the design and configuration subtasks. These systems typically play the role of assistants, au- tomating tedious or time-consuming aspects of an expert's job.

DesigNer (Bernstein, 1987; Mantleman, 1986), developed by BBN Laboratories, is perhaps the most advanced graphical design system for networks. Operators using DesigNer work with a graphical display to con- struct a tentative network design. Certain low level de- tails of this task, such as the construction of a minimum spanning tree in the beginning design stages, are han- dled automatically by DesigNer.

Once an initial design is completed, DesigNet uses a mathematical simulation to provide feedback to the user. DesigNet provides users with a variety of important metrics such as minimum delays, loading on individual network elements, and cost. Users make changes to the network design with the use of a mouse and observe effects through DesigNet's simulation. This capability makes it easy for users to perform tedious but important tasks, such as the identification and elimination of expensive links that do not contribute their share to network performance. BBN Laboratories has announced their intentions of developing a configurator that would configure node sites based on output from DesigNet (Mantleman, 1986).

XTEL (Feinstein et al., 1988) is a network designer for military applications. It has simulation models that provide feedback to the human designer, but does not have the sophisticated graphical interface of DesigNet. XTEL uses a knowledge base of production rules to

132 J. R. Wright and G. T. I/esonder

evaluate the network design. The evaluation rules are based on military criteria such as the survivability of individual nodes. XTEL also actively makes recommendations to the user, such as adding nodes or changing interconnections that improve on key met- tics, and can provide explanations as to why such changes are desirable.

Two other graphically oriented systems in the provisioning domain are KAT (Clark, 1987) and the Sys- tem Configurator (Lutticke, 1989). KAT is described as a knowledge-based information display tool. KAT manages an object-oriented description of an network design for a human designer. The System Configurator, also based on object-oriented methods, has both design and configuration features. It allows users to develop designs for computer networks interactively through a graphical interface, but generates a detailed equipment list that we would normally associate with a node or link configurator. Neither KAT nor System Configu- rator go as far as DesigNet or XTEL in helping users evaluate their designs.

LEIS (Salasoo, 1988) and SLEEK (Spang-Robinson, 1988) address a small, but nevertheless important, part of provisioning. The two systems, the first developed by BELLCORE and the second by AMERITECH, both assist designers in selecting the appropriate digital technology for loop subscriber applications.

2.3. Network Administration Experts

Network administration encompasses a broad group of diversified functions that are difficult to characterize as a whole. Network administration can be thought of as a kind of grab bag, where applications that are clearly not provisioning or maintenance can be placed.

The most common kind of expert system application in this area is traffic routing or trqffw management. Demand for network services fluctuates over a wide range. In general, however, it is too expensive to design for the highest levels of demand, especially since those levels are often confined to local areas and occur in- frequently. The most cost-effective way of dealing with demand peaks is to recognize local congestion in the network when it occurs and reroute traffic along paths that have lower demand. Significant benefits can be gained through the ability to properly route traffic through the available resources.

In the long-distance public switched network, experts called traffic managers monitor the network and reroute traffic when the occasion demands. These ex- pelts have available large amounts of data from switching machines, signal transfer points, and other network entities. When congestion is detected by monitoring these data, traffic managers issue commands to the network switches called switching controls that alter call processing and therefore temporarily relieve network congestion.

The NEMESYS expert system project (Guattery & Villareal, 1985) is an attempt to automate some of the decision-making abilities of the traffic manager. The routing decisions NEMESYS makes are rather complex, depending in part on the local topology of the network and the probable cause of the congestion.

Other network administration experts that handle traffic management are XTRAL (Chang & Gross, 1985), NetManager (Cross & Dillon, 1989), ATN (Spang-Robinson, 1988), and NCAI (Benson, 1986). NCAI, in particular, has an interesting approach to traffic management. It is a military application that manages packet-switched radio networks whose to- pography is subject to constant change. This might happen when military commanders are attempting to send and receive messages from field units. Normally, the approach is to have relay stations that serve specific geographical areas. By allowing all field units to be mo- bile, the NCAI approach decreases the risk of detection.

In NCAI, each network node has its own model of the network implemented as OPS5 working memory. The nodes derive connectivity information from message headers, enabling each node to maintain databases on complete paths and known links in the network. NCAI has conventions for allowing individual nodes to request paths from their neighbors when their own databases fail to contain specific path information, for updating node databases with new information, and for informing originating nodes about network connectivity outside their local area.

Two other expert systems in the network administration area are MES (Rosales & Mehrotra, 1988) and ASSIGN (Farenci et al., 1989). Both are concerned with record keeping in the public switched network. When customers request new services, they are assigned equipment that provides them with that service. Tele- phone companies keep extensive records on who is assigned what equipment so that they know what to repair ifa problem occurs, and when equipment is free for reassignment. In some cases, customers may request services for which equipment is not available, thus some action on the part of the phone company is required to provide that service, for example, recovering or repairing existing equipment that does not work properly, or installing new equipment. ASSIGN helps this process by giving advice on the appropriate action.

Similarly, telephone companies keep detailed database records on most circuits--descriptions of end points, terminal equipment, test devices, and so on. For special services and interoffice trunks, expected measurement values are also recorded as database entries. These database entries can be complex and difficult to enter accurately. MES assists users in deter- mining the correct database entries for new circuits.

2.4. Some New Applications

Two new systems have appeared recently that do not fit easily into the operations support framework dis-


cussed so far. Both systems could be described as sales assistants and they represent a new class of application that could become commonplace.

The Service Definition Expert System (Mehrotra, Erfani, Lee, & Sachar, 1988)is a production rule-based expert that supports sales proposals by matching the needs of customers to available telecommunications services, features, and options. Customers provide input via a structured dialogue and SDES translates that information into a technical description of services and equipment. It is currently in prototype form.

ENS (Ferguson, Rabie, Kennedy, & Peacocke, 1987) is another production rule prototype developed jointly by BNR and Bell Canada. ENS is intended to be used by Bell Canada's sales representatives. It produces a network configuration and pricing from a high level description of customer requirements for data communications.

Both systems represent a response to the growing complexity of telecommunications services. Sales representatives often have responsibility for broad product lines and it is difficult for them to maintain the technical depth and understanding needed to make effective sales proposals. Expert systems encoded with technical knowledge about product configurations and pricing could reduce the work load of sales representatives and allow them to spend more time focusing on their customers. We think that systems like SDES and ENS will be seen more frequently in the future.

2.5. Specialized Inference Engines

Most of the expert systems discussed so far use either commercially available development shells or well-established AI languages (LISP, OPSS, OPS83). These shells and languages support general expert system or artificial intelligence programming methods and are by no means limited to the telecommunications domain. However, one of the interesting trends that has emerged over the past several years is the appearance of special purpose problem-solving or inference frameworks that are tailored to particular classes of problems. The range of problems to which these tools may be applied is narrow, but their focused approach permits the use of specialized representations and methods of inference that provide the applications developer with considerable leverage. Several of these special purpose frameworks, although still in an experimental stage, may have applicability to telecommunications in the future.

LES (Laffey, Perkins, & Nguyen, 1986) is a troubleshooting framework for telecommunications networks that is based on production rules and an hy- pothesis-driven or backward-chaining control strategy. The pro~luction rules operate on a database that contains both topological information and a description of the expected behavior of the components of the net-

work. Production rules use the network description to generate a sequence of tests that locate the trouble. The advantage LES offers is the specialized database it has for describing network structure. An expert system developer still must write production rules about how to gather evidence from the network components and how to infer failures from the evidence. In LES, such rules are specific to each network.

A somewhat different approach is taken by FIS (Pipitone, 1984; Pipitone, DeJong, Spears, & Marrone, 1988). FIS is the culmination of several years of research directed at developing an inference engine for troubleshooting electronics equipment. Currently, it has been applied as a driver of the automated test equipment built into U.S. Navy sonar and radar equipment. Its approach is similar to that taken by the Starkeeper Network Troubleshooter and one that could be important for telecommunications networks.

Simply stated, FIS provides a control structure for driving a sequence of tests based on a description of the topology of the equipment, the fault probabilities of each module, and a description of the expected input/output behavior of each module, called causal rules in the FIS terminology. Bayes' Rule is used to accu- mulate evidence on the relative likelihood of failure for individual components following each test. Among other things, FIS can generate a decision tree for con- trolling the sequence of tests performed by automated test equipment. Typically, programmers would con- struct such a decision tree by hand using a procedural language.

Finally, DANTES (Van Cotthem, Mathonet, & Vanryckeghem, 1987) is an expert system framework that is intended for use in network-monitoring applications. The problem one faces in network monitoring is that the time-critical nature of the application often prevents the best known knowledge representation methods from being used. DANTES is an attempt to develop a generic rule-based architecture for network monitors that work in real time. Basically, it achieves speed by taking advantage of small improvements de- rived from its limited application domain. For example, it does some of its own physical memory management. In addition, it limits the data considered by the rule system with the use of.filters and has its own special purpose conflict resolution algorithm.

We think that specialized inference engines will prove to be an important development. There are several issues here. The first is whether or not there is enough common structure in the problems addressed by expert systems in telecommunications to support special tool development. In our judgment, the answer is a qualified yes. At the highest level, problems in the telecommunications domain fall neatly into separate buckets--monitoring, troubleshooting, diagnosis, configuration, and so on. However, it is not clear how similar the applications are beneath that top level. For example, the problem space for troubleshooting


switches may or may not be similar enough to that for troubleshooting transmission links to support a common troubleshooting tool.

Second, the common structure in these problems must be communicated properly to those who can do something about it. While there are people who are capable of building specialized tools, there are not many that are also willing to dig out the necessary detailed knowledge in a complex applications domain. Success in this area requires delicate interplay between application and research. Concise statements of key problems in the domain would be a good beginning.

3. DISCUSSION

Computer applications are indispensable to the proper operation of all kinds of communications networks, whether public, private, voice, or data. Expert systems as yet play a small role in the overall picture. Neverthe- less, that role is growing and will become more important over time. We have described our view of network operations and how a sample of the existing expert system projects fit within that framework. This gives us a good opportunity to try to understand what these expert systems are and how they relate to the existing body of computer applications.

In some circles, expert systems are thought to represent a unique technological category, a discontinuity from the methods and techniques of the past. Perhaps because both authors have experience in so-called conventional projects, we have never been comfortable with this position, especially when colleagues challenge us to define the difference between the expert systems we develop and the conventional systems others work on. Basically, we have come to understand expert systems and conventional systems as different points on a set of continua. In fact, we see them as natural ex- tensions of the tasks, techniques, and tools that have been applied to the problems of the telecommunications industry in the recent past.

The continuity between expert systems and conventional systems is never clearer than when one is discussing working systems within the same application domain. For most of the expert systems discussed, for example, one can find a conventionally based, working system that does something similar. We refer to this as task continuity. While the feel and functionality of expert versus conventional systems is invariably different, successful projects addressing similar applications have been structured around both technologies.

To give just one example, monitoring and processing of diagnostic messages has been problematic in the telecommunications industry for many years. There are existing systems, developed using conventional techniques, that address the same kinds of problems as the monitoring experts described in section 2.1. One

of the authors worked on such a system for several years (Boggs & Wright, 1985). That system, called Pre- dictor, arrives at a solution that is different from the expert system projects discussed in this paper, but one that is effective nevertheless. Basically, Predictor works in cooperation with a human expert, called an analyzer, by structuring incoming message data in such a way that the human expert finds it easy to identify meaningful groups of messages.

The monitoring experts discussed in section 2.1 take steps in addition to those taken by Predictor. Messages are separated into meaningful groups, false alarms are discarded, and the underlying trouble is diagnosed. These activities are quite different than simply providing conveniently structured data to a human analyzer. By combining the monitoring expert with some routing capability, one could send diagnosed troubles directly to repair technicians with accompanying savings in cost and improvement in customer service. By and large, one could not achieve the same effect by simply routing reports consisting of raw data, no matter how well structured. In this sense, the monitoring experts represent an extension to the message-monitoring application addressed by Predictor and systems like Pre- dictor.

Next, we think there is also continuity in the techniques applied in the software development process. Take, for example, the method that is most frequently identified with expert system development, that of knowledge engineering. Many conventional projects, including several in which the authors have been involved, have recognized the importance of having a domain expert available as a consultant, someone who could advise and shape the growth of important system features. To provide a concrete example, the algorithms used by Predictor were developed by studying preexisting manual procedures used by human experts. A domain expert was taken on as part of the project team during Predictor's development and contributed sig- nificantly to its success.

The consultation of a domain expert does not, in and of itself, make a project an expert system. For that matter, neither does the fact that a project involved the automation of preexisting manual procedures. However, there is a point at which these automated procedures take on the character of something like expertise. We wish to avoid arguments over what is or is not an expert system. Our point is simply that there are touchstones between the techniques employed in expert system projects today and those employed by conventional development projects in the recent past.

Finally, we think there is a growing continuity in technology and tools. The clearest example of this is the current interest in object-oriented methods. Several of the expert systems reviewed use object-oriented programming extensively--it seems to be the natural way to encode network models, for example. At the same


time, conventional projects are also beginning to use object-oriented languages. Other tools, such as production system languages, are also finding their way into otherwise conventional projects.

All this suggests that the difference between expert systems and so-called conventional programs is be- coming obscured. It is inadequate to define expert systems in terms of the tools used to build them, and there are no generic criteria that can be applied to clarify exactly what is meant by expert performance. We are not at all disturbed by this trend-- in some sense it may be an indicator of success. Clearly, there is a growing class of important software systems that are narrow and highly customized. The first rule for success in developing such systems is know the application domain thoroughly, and it is advantageous to use high level tools that allow developers to concentrate on the application itself rather than on the underlying systems technology. Object-oriented languages, production system languages, and other tools pioneered in the expert system and AI communities provide techniques for writing programs close to the language and structure of these specialized application domains. For this rea- son, we think that in the future many non-expert system projects will find these tools worth using.

REFERENCES

Ackroff, J.M., Surko, P.T., Vesonder, G.T., & Wright, J.R. (1990). SARTS AutoTest-2. In M.A. Bramer (Ed.), Practical experience in building expert systems. New York: John Wiley & Sons.

Ackroff, J.M., Surko, P.T., & Wright, J.R. (1988). AutoTest-2: An Expert system for speoal services. In M. Teitell (Ed.), Proceedings of the Fourth Annual Artificial Intelligence and Advanced Com- puter Technology Conference (pp. 503-508).

Attard, R. (Technical Chairman). (1989). Proceedings of the Ninth International Workshop of Expert Systems and Their Applications. Nanterre, France: ECCAI.

Baird, C., & White, T. (1989). A real time network monitor. In R. Atlard (Ed.), Proceedings of Conference on Artificial Intelligence, Telecommunications, and Computer Systems (pp. 35-41). Nan- terre, France: ECCAI.

Benicoart, A., & Crommelynck, V. (1989). CENTAURE: Le systeme expert de surveillance du resau national de teleinformatique de la SNCF. R. Attard (Ed.), Proceedings of Conference on Artificial Intelligence, Telecommunications, and Computer Systems (pp. 17-34). Nanterre, France: ECCAI.

Benson, P. (1986). Artificial intelligence assisted packet radio connectivity. Electrical Communication, 60(2).

Bernstein, S. (1987). DesignNet: An intelligent system for network design and modelling. In D. J. Sassa (Ed.), International Com- munications Conference 1987, New York: IEEE. Seattle, WA.

Boggs, P.S., & Wright, J.R. (1985). Knocking potential problems for a loop. A T&T Bell Laboratories Record (January), 22-26.

Built, T., Peacocke, R., Rabie, S., & Snarr, V. (1987). An interactive expert system for switch maintenance. In E. J. Glennor (Ed.), International Switching Symposium. New York: IEEE.

Callahan, P.H. (1988). Expert systems for AT&T switched network maintenance. A T& T Technical Journal. 67(1 ), 93-103.

Chang, D., & Gross, S. (1985). Telecommunications resource allo- cation: A knowledge-based system. In K. N. Karna (Ed.), Expert systems in government (PP. 666-675). New York: IEEE.

Clark, C.E. (1987). A knowledge-based information display tool for network planning. D. J. Sassa (Ed.), International Communica- tions Conference 1987, New York: IEEE. Seattle, WA.

Corn, P.A., Dube, R., McMichael, A.F., & Tsay, J.L. (1988). An autonomous distributed expert system for switched network maintenance. In R. Blake (Ed.), Proceedings of the IEEE Global Telecommunications Conference, pp. 1530-1537. New York: IEEE.

Cross, UM., & Dillon, T.S. (1989). A knowledge-based approach to network tratfic management in a national telecommunications network. In R. Attard (Ed.), Proceedings of the Conference on Artificial Intelligence, Telecommunications, and Computer Sys- tems (pp. 45--62). Nanterre, France: ECCAI.

Farenci, R., Vorce, D., Hahn, E.A., Hogan, J., Daminski, J.S., & Lee, W. (1989). ASSIGN: A qualitative approach to outside plant design engineering. BELLCORE Technical Memorandum. Mor- ristown, N J: BELLCORE.

Feinstein, J.L., Siems, F., Popolizio, J., Bailey, D., & Wang, A. (1988). XTEL: An expert system for designing theaterwide telecommunications architectures. In J. Liebowitz (Ed.), Expert systems applications to telecommunications (pp. 161-190). New York: John Wiley & Sons.

Ferguson, I., Rabie, J., Kennedy, J., & Peacocke, R. (1987). A knowledge-based sales assistant for data communications networks. In D. J. Sassa (Ed.), International Communications Conference 1987, New York: IEEE.

Ferrara, F., Giovannini, F., & Paschetta, E. (1989). IAS: An Expert system for packet-switched network monitoring and repair assis- lance. R. Attard (Ed.), Proceedings of Conference on Artificial Intelligence. Telecommunications, and Computer Systems (pp. 185-197). Nanterre, France: ECCAI.

Fleischanderi, G., Friedrich, G., & Retti, J. (1989). Model-driven fault localization in audio routing systems. R. Attard (Ed.), Pro- ceedings of Conference on Artificial Intelligence, Telecommuni- cations, and Computer Systems (pp. 173-183). Nanterre, France: ECCAI.

Forgy, C.L. ( 1981). The OPS5 user's manual (Tech. Rep. CMU-CS- 81-135). Computer Science Department, Carnegie-Mellon Uni- versity, Pittsburgh, PA.

Forgy, C.L. (1986). The 0PS83 user's manual. Pittsburgh, PA: Pro- duction Systems Technologies.

Fox, J.R. (1988). Tackling a real-time monitoring problem. In BELLCORE Artificial Intelligence Symposium (June) (pp. 25- 30). Morristown, N J: BELLCORE.

Guattery, S., & Villareal, F. (1985). NEMESYS: An expert system for fighting congestion in the long distance network. In K. N. Karna (Ed.), IEEE Symposium on Expert Systems in Government (October). Washington, DC: IEEE.

Hannan, J. (1987). Network solutions employing expert systems. In D. Friesen & F. Colshani (Eds.), Phoenix Computers and Com- munications Conference (PCCC-87) (pp. 543-547). Washington, DC: IEEE.

Horton, E.M., Hsiao, J., & Zielinski, J.E. (1988). Interactive repair assistant: A knowledge-based system for providing advice to field technicians. IEEE Communications, 26(3), 21-24.

Laffey, T.J., Perkins, W.A., & Ngnyen, T.A. (1986). Reasoning about fault diagnosis with LES. IEEE Expert (Spring), 13-20.

Liebowitz, J. (Ed.). (1988). Expert systems applications to telecommunications. New York: John Wiley & Sons.

Loberg, G. (1988). SMART I1, principled design of knowledge-based systems. In BELLCORE Artificial Intelligence Symposium, (June) (pp. 7-12). Morristown, N J: BELLCORE.

Lutticke, B., McArthur, D., Neuhaus, A., Sachs, S., & Swanson, A. (1989). An interactive graphical configurator for networked systems. In R. Atlard (Ed.), Proceedings of Conference on Artificial Intelligence, Telecommunications, and Computer Systems (pp.. 119-129). Nanterre, France: ECCAI.

Macleish, K., Theidke, S., & Venneq~und, D. (1986). Expert systems


in central office maintenance. IEEE Communications Magazine, ~A(9).

Mantleman, L. (1986). AI carves inroads: Network design, testing, and management. Data Communications, (July), 106-123.

Marques, T.E. (1988a). A symptom-driven expert system for isolating and correcting network faults. IEEE Communications, 26(3), 6- 13.

Marques, T.E. (1988b). Starkeeper Network Troubleshooter: An expert system product. A T& T Technical Journal, 67(6), 137-154.

Mehrotra, P., Erfani, S., Lee, Y.P., & Sachar, H. (1988). Design of an on-line telecommunication service definition tool based on Expert System Technology. In R. V. Milekkileni (Ed.), Proceed- ings of the IEEE Network Operations and Management Sym- posium. New York: IEEE.

Miksell, S., Quillin, R., Wilkinson, W.M., Matteson, N., Smisko, M., Zakrzewski, E., & Lowe, D. (1988). Expert system fault isolation in a satellite communications network. In J. Liebowitz (Ed.), Ex- pert Systems Applications to Telecommunications. New York: John Wiley & Sons.

Nilson, M.E. (1989). Towards the knowledge-based creation of telecommunications services. BELLCORE Technical Memorandum. Morristown, N J: BELLCORE.

Peacock, D. (1988). On-line expertise for telecommunications. In J. Liebowitz (Ed.), Expert systems applications to telecommunications. New York: John Wiley & Sons.

Prerau, D.S., Gunderson, A.S., & Levine, S.P. (1988). The prophet expert system: Pro-active maintenance of telephone company outside plant. In M. Teitell (Ed.), Proceedings of the Fourth Annual Artificial Intelligence and Advanced Computer Conference (pp. 384-389). New York: IEEE.

Prerau, D., Gunderson, A.S., Reinke, R.E., & Goyal, S.K. (1985). The COMPASS expert system: Verification, technology transfer, and expansion. In J. K. Aggarwal (Ed.), Second Conference on Artificial Intelligence Applications (pp. 597-602). Washington, DC: IEEE.

Pipitone, F. (1984). An expert system for electronics troubleshooting based on function and connectivity. In R. M. Haralick (Ed.), Proceedings of the First International Conference on Artificial In- telligence Applications (pp. 133-138). Washington, DC: IEEE.

Pipitone, F., DeJong, K., Spears, W., & Marrone, M. (1988). The FIS electronics troubleshooting project. In J. Liebowitz (Ed.), Ex- pert Systems Applications to Telecommunications. New York: John Wiley & Sons.

Rabie, S., Rau-Chapin, A., & Shibahara, T. (1988). DAD: A real- time expert system for monitoring data packet networks. IEEE Networks Magazine (September).

Reddy, Y., & Uppuluri, S. (1986). Intelligent systems technology in network operations management. In D. J. Sassa (Ed.), Interna- tional Communications Conference 1986. New York: IEEE. (pp. 1220-1224).

Rosales, S., & Mehrotra, P.K. (1988). MES: An expert system for reusing models of transmission equipment. In R. M. Haralick

(Ed.), Proceedings of the Fourth IEEE Conference on Artificial Intelligence Applications. New York: IEEE.

Ruddock, D., & Gersho, M. MAVEN: A knowledge-based system for common language equipment code assignment. In BELL- CORE Artificial Intelligence Symposium (June) (pp. 37-42). Morristown, NJ: BELLCORE.

Salasoo, A. (1988). Expert system enhancements to loop planning tools: a prototype for digital technology choice. In BELLCORE Artificial Intelligence Symposium (June) (pp. 31-36). Morristown, NJ: BELLCORE.

SCFG Highlights. (1988). Expert systems: Making a place in the telecom lineup. SFCG Highlights, 4(5), 1-8.

Slawsky, G.M., & Sassa, D.J. (1988). Expert systems for network management and control in telecommunications at BELLCORE. In J. Liebowitz (Ed.), Expert systems applications to telecommunications (pp. 191-199). New York: John Wiley & Sons.

Spang-Robinson (1988). Telecommunications systems. The Spang Robinson Report on A1, 4(5), 2-5.

SuRer, M. (1986). The SMART project: An approach to expert system integration and evaluation in the BOCs. In D. J. Sassa (Ed.), International Communications Conference 1986. New York: IEEE. (pp. 1230-1232).

Teiteil, M. (Program Chair). (1988). Proceedings of The Fourth Annual Artificial Intelligence and Advanced Computer Technology Con- ference, IEEE, Long Beach, CA.

Thandasseri, M. (1986). Expert systems for TXE4A exchanges. Elec- trical Communications, 60(2).

Van Cotthem, H., Mathonet, R., & Vanryckeghem, L. (1987). DANTES: An expert system shell dedicated to real-time network troubleshooting. In D. J. Sassa (Ed.), International Communi- cations Conference 1987, New York: 1EEE. (June).

Vesonder, G. T. (1988). Rule based programming in the UNIX system. A T& T Technical Journal, 67( I ), 69-80.

Vesonder, G.T., Stolfo, S.J., Zielinski, J.E., Miller, F.D., & Copp, D.H. (1983). ACE: An expert system for telephone cable maintenance. IJCAL 8, 116-120.

Waterman, D. A. (1986). A guide to expert systems. Reading, MA: Addison-Wesley.

Wright, J.R., & Siegfried, E.M. (1985). ACE: Going from prototype to product. In T. Bernold (Ed.), Expert systems and knowledge engineering. Essential elements of advanced information technology (pp. 121-131). New York: North-Holland Press.

Wright, J.R., Zielinski, J.E., & Horton, E.M. (1988). Expert systems development: The ACE system. In J. Liebowitz (Ed.), Expert systems applications to telecommunications. New York: John Wiley & Sons.

Yudkin, R.O. (1987). ExT: An expert tester. In R. M. Haralick (Ed.), Proceedings of the Fourth Conference on Artificial Intelligence Applications (pp. 452-458). New York: IEEE.

Zeldin, P.E., Miller, F.D., Siegfried, E.M., & Wright, J.R. (1986). Knowledge-based loop maintenance: The ACE system. ICC'86 (pp. 1241-1243).

Expert systems in telecommunications

Documents

Transcript of Expert systems in telecommunications