Expert systems in telecommunications

10
ExpertS)stems With Applications, Vol. I, pp. 127-136, 1990 0957-4174/90 $3.00 + .00 Printed in the USA. © 1990 Pergamon Press plc Expert Systems in Telecommunications JON R. WRIGHT AND GREGG T. VESONDER AT&T BellLaboratories, Warren,NJ Abstract--Expert systems have been successfully applied to many maintenance, provisioning, and administrative tasks in telecommunications networks. Given that they can be appropriately integrated with the existing base of software applications, expert systems will play an important role in the .future. We review nearly 40 current projects, which run the gamut from research prototype to ftnished product. 1. INTRODUCTION ONE MEASURE OF the growth in the application of ex- pert systems to telecommunications is the number of published reports on the subject, and there is clear ev- idence of growth over the past several years. In 1986, a survey of expert systems by Waterman (1986) listed only two projects in telecommunications. They were COMPASS (Prerau, Gunderson, Reinke, & Goyal, 1985) and ACE (Vesonder, Stolfo, Zielinski, Miller, & Copp, 1983). More recent sources such as Liebowitz (1988), industry newsletters (SFCG Highlights, 1988; Spang-Robinson Report, 1988), and specialized con- ferences on expert systems applications (Attard, 1989; Teitell, 1988) collectively contain descriptions of nearly 40 expert system projects. Table l shows the number of telecommunications expert systems by the year in which reports were first published. Although Table 1 is based on publicly avail- able reports and contains systems that are in many different stages of the project life cycle, ranging from research prototype to mature product, it reflects steady growth. There have been new projects reported every year since ACE was first described. We think the tele- communications domain continues to be fruitful ground for expert systems, and feel that the work here is forward looking in several ways. This is partly a con- sequence of our view that expert systems are a natural extension of the methods employed by software de- velopers in the past, and that they can greatly extend the capabilities of existing computer applications. No telecommunications network can work effi- ciently or cost effectively without extensive support in the form of specialized computer applications. Ten Requests for reprints shouldbe sent to Jon R. Wright,AT&TBell Laboratories, 184 Liberty ComerRoad,P.O. Box4908, Warren,NJ 07060-0908. years ago, there was already extensive automation in support of the public switched network in the United States (for an example, see the Summer, 1983 issue of the Bell System Technical Journal devoted to the Au- tomated Repair Service Bureau). Today, these com- puter systemsnoften called Operations Support Sys- tems or simply Operations Systems (OS)nform a strong base of core applications. OS in the telecommunications domain are both specialized and complex. The forces driving OS de- velopment have been primarily, although not exclu- sively, economic in nature. Technological innovation has been important in the history of these systems, but real business problems lie behind every project. In many cases, the final solution to these problems re- quired a rearrangement of work--a division of labor, so to speak~between computers and people. Although automation was carried as far as was technically and economically feasible, some work remained. Fre- quently, new jobs, sometimes of a fundamentally dif- ferent nature, were created to handle the remaining work. We make this point because it is our view that the technology underlying expert systems can be used to address a wider range of problems than is typical of most present day OS--as long as they can be made to work effectively within the framework of the existing OS environment. Expert systems, appropriately con- ceived and applied, can push automation further than previously has been feasible. This kind of clean-cut economic benefit encourages the development of new expert systems. An example of the relationship between conven- tional OS and expert systems is the following. Tele- communications networks generate huge amounts of data because of the volume of traffic they handle. Tele- phone-switching machines, for example, print diag- nostic messages whenever an attempted call cannot be completed. Sometimes these messages represent nor- 127

Transcript of Expert systems in telecommunications

ExpertS)stems With Applications, Vol. I, pp. 127-136, 1990 0957-4174/90 $3.00 + .00 Printed in the USA. © 1990 Pergamon Press plc

Expert Systems in Telecommunications

JON R. WRIGHT AND GREGG T. VESONDER

AT&T Bell Laboratories, Warren, NJ

Abstract--Expert systems have been successfully applied to many maintenance, provisioning, and administrative tasks in telecommunications networks. Given that they can be appropriately integrated with the existing base of software applications, expert systems will play an important role in the .future. We review nearly 40 current projects, which run the gamut from research prototype to ftnished product.

1. INTRODUCTION

ONE MEASURE OF the growth in the application of ex- pert systems to telecommunications is the number of published reports on the subject, and there is clear ev- idence of growth over the past several years. In 1986, a survey of expert systems by Waterman (1986) listed only two projects in telecommunications. They were COMPASS (Prerau, Gunderson, Reinke, & Goyal, 1985) and ACE (Vesonder, Stolfo, Zielinski, Miller, & Copp, 1983). More recent sources such as Liebowitz (1988), industry newsletters (SFCG Highlights, 1988; Spang-Robinson Report, 1988), and specialized con- ferences on expert systems applications (Attard, 1989; Teitell, 1988) collectively contain descriptions of nearly 40 expert system projects.

Table l shows the number of telecommunications expert systems by the year in which reports were first published. Although Table 1 is based on publicly avail- able reports and contains systems that are in many different stages of the project life cycle, ranging from research prototype to mature product, it reflects steady growth. There have been new projects reported every year since ACE was first described. We think the tele- communications domain continues to be fruitful ground for expert systems, and feel that the work here is forward looking in several ways. This is partly a con- sequence of our view that expert systems are a natural extension of the methods employed by software de- velopers in the past, and that they can greatly extend the capabilities of existing computer applications.

No telecommunications network can work effi- ciently or cost effectively without extensive support in the form of specialized computer applications. Ten

Requests for reprints should be sent to Jon R. Wright, AT&T Bell Laboratories, 184 Liberty Comer Road, P.O. Box 4908, Warren, NJ 07060-0908.

years ago, there was already extensive automation in support of the public switched network in the United States (for an example, see the Summer, 1983 issue of the Bell System Technical Journal devoted to the Au- tomated Repair Service Bureau). Today, these com- puter systemsnoften called Operations Support Sys- tems or simply Operations Systems (OS)nform a strong base of core applications.

OS in the telecommunications domain are both specialized and complex. The forces driving OS de- velopment have been primarily, although not exclu- sively, economic in nature. Technological innovation has been important in the history of these systems, but real business problems lie behind every project. In many cases, the final solution to these problems re- quired a rearrangement of work--a division of labor, so to speak~between computers and people. Although automation was carried as far as was technically and economically feasible, some work remained. Fre- quently, new jobs, sometimes of a fundamentally dif- ferent nature, were created to handle the remaining work.

We make this point because it is our view that the technology underlying expert systems can be used to address a wider range of problems than is typical of most present day OS--as long as they can be made to work effectively within the framework of the existing OS environment. Expert systems, appropriately con- ceived and applied, can push automation further than previously has been feasible. This kind of clean-cut economic benefit encourages the development of new expert systems.

An example of the relationship between conven- tional OS and expert systems is the following. Tele- communications networks generate huge amounts of data because of the volume of traffic they handle. Tele- phone-switching machines, for example, print diag- nostic messages whenever an attempted call cannot be completed. Sometimes these messages represent nor-

127

128 J. R. Wright and G. T. Vesonder

TABLE 1 Expert Systems in Telecommunications by Year of First Publication and Application Domain

Year of first publication

Application domain 1983 1984 1985 1986 1987 1988 1989

Maintenance 1 Provisioning Network Administration Total 1

1 2 5 3 9 4 - - ~ 1 1 3 1 - - 2 1 ~ 3 1 1 4 7 4 15 6

mal conditions in the network but other times they are caused by real problems that must be repaired.

It often takes an expert who understands the switch to know the difference between normal conditions and serious trouble. Further, messages frequently must be grouped together like pieces of a puzzle before their meaning becomes clear. There are both monitoring and diagnostic aspects to this problem and expert sys- tems have been applied successfully to both (see the review in section 2).

However, such systems must have access to the raw messages. Typically, this is achieved by tying the expert system to an OS that has direct access to the network. The OS provides access to the message traffic, manages the communications, and may supply some record- keeping or report-generating capability. Tile expert system discards false alarms, groups related messages together in some meaningful way, and constructs an interpretation or diagnosis that helps technicians repair the trouble.

In some cases, the expert system might go so far as to issue commands to the network (via the OS) to cor- rect the problem or provide a temporary solution until repair technicians can be dispatched. The Starkeeper ® Network Troubleshooter (Marques, 1988a; 1988b) is an example of an expert system that does exactly this by using the Starkeeper Network Management OS as a go-between.

A key point to notice in this scenario is the rela- tionship between the OS and the expert system, a re- lationship that we think is typical of many successful expert system applications. Basically, the OS provides a core application and the expert system automates tasks surrounding the use of that application. Often the expert system will employ the user interface of an existing OS to gain access to the functionality it re- quires. Interestingly enough, it is more often the lack of access to a core OS application that holds back expert system deployment rather than the ability to build a working expert system itself.

This scenario is repeated so frequently that we think that the ability to integrate expert systems with standard computing environments is one of the most critical factors, if not the most critical factor, in a project's

success. Vesonder (1988) describes some of the under- lying rationale for this position and discusses the in- tegration of rule-based expert systems with the UNIX ® operating system.

2. EXPERT SYSTEMS IN SUPPORT OF NETWORK OPERATIONS

In telecommunications, expert systems have primarily been applied to what is best described as network op- erations support. Basically, network operations can be divided into three broad categories. First, maintenance functions keep the components of the network working properly. Testing, equipment repair, troubleshooting, trouble report processing, and preventive maintenance are included in this category. Next, provisioning is the process of forecasting demand for network services, planning and engineering changes to the network to satisfy that demand, and installing new equipment or rearranging the network accordingly. Finally, network administration encompasses a diverse group of func- tions that sustain network services once they have been provided. Among these are traffic management, rout- ing, billing, facility assignment, and record keeping.

Maintenance experts were the first kind of expert system to appear and are still the most common kind of expert system in telecommunications. These systems are directed at the activities that lead up to the actual repair of equipment. These up-front tasks consist largely of monitoring, diagnosis, data interpretation, and fault localization problems, which are, for the most part, amenable to known, well-established techniques in the expert system community.

This contrasts sharply with provisioning applica- tions, for example, where planning problems predom- inate and where expertise in routing algorithms or per- formance models is sometimes required. Planning problems are difficult to organize properly, and often involve searching large problem spaces. However, there are beginning to appear in the literature a few expert systems in the provisioning domain. These systems ad- dress those portions of the provisioning task that can be solved with known techniques.

® Starkeeper is a trademark of AT&T. ® UNIX is a trademark of AT&T.

Expert Systems in Telecommunications 129

In the network administration area, the most fre- quent kind of expert system application is traffic rout- ing. Some of these expert systems have routing algo- rithms or performance models at their core, and use these models to make decisions about the network. Such systems are good examples of the eclectic ap- proach often needed in successful expert system proj- ects.

2.1. Maintenance Experts

There are essentially four aspects to any maintenance operation: (1) monitoring or trouble report processing, (2) trouble-shooting, (3) diagnosis, and (4) repair. In the telecommunications domain, expert systems have been applied successfully to the first three and, in some specialized cases, to the fourth.

Switch and Network Monitors. Telecommunications networks are rather data intensive creatures. Large switching machines, for example, process hundreds of thousands of calls daily. When a call fails, diagnostic messages are generated that provide information about the call failure. These messages could point to problems at any place in the network--interoffice trunks, outside plant, terminal equipment, or a component of the switch itself.

Switching machines have a great deal of redundancy built into their design. This redundancy permits the switches to continue processing calls even when key components are failing. Because of network redun- dancy, customers may either not be aware of troubles, or they may perceive the troubles as transient and so choose not to report them. As a result, switch mainte- nance is heavily dependent on the internal diagnostics produced as the switch processes calls.

The detective work needed to properly maintain a switching machine is complicated by the fact that some messages are false alarms, and, in fact, represent normal conditions in the network. For example, messages are generated whenever someone places a call to a destina- tion that has been disconnected. In other cases, the same messages might represent a discontinuity in the outside plant or perhaps in the switch itself(colloquially called an open)--a genuine trouble that must be re- paired.

Usually, the underlying problems represented by these messages can only be uncovered by examining groups of related messages over time. An expert with intimate knowledge of the switching network can use these patterns to identify faulty equipment and sched- ule the appropriate repair work. On the other hand, locating meaningful patterns out of hundreds and often thousands of messages is tedious and difficult work even for the best experts.

Message monitoring has been a prime candidate for automation using expert systems. The pattern-match- ing techniques developed in the artificial intelligence

community have proved to be a natural way to express the necessary relationships among messages needed by monitoring systems. Production system languages such as OPS5 (Forgy, 1981) or OPS83 (Forgy, 1986) or one of the commercially available multiparadigm shells that allow knowledge to be expressed as production rules are commonly used.

Knowledge is encoded in production rules by de- scribing patterns of messages on the left-hand side of production rules. When the left-hand side of a rule is matched and the rule fires, actions on the rule's right- hand side alert users to the trouble conditions. Each production rule represents a solution in a sometimes large problem space. However, the number of solutions is typically not large (perhaps several hundred), and is collected through a combination of experience and knowledge of the network technology. This problem- solving technique, known as match, does not involve search and is very efficient.

There are at least 13 expert system projects that monitor diagnostic messages for various kinds of net- works. The kinds of networks to which these expert systems are applied vary considerably--from public telephone networks to packet switch networks to the audio networks used in broadcast studios. Frequently, the monitoring expert also provides assistance with other kinds of maintenance functions such as trouble- shooting or diagnosis.

Some of these projects have had a fairly significant impact on their respective maintenance operations. ACE (Wright & Siegfried, 1985; Wright, Zielinski, & Horton, 1988), for example, monitors and diagnoses trouble in the local loop using a database of mainte- nance records. There are more than 100 ACE systems working in the Regional Bell Operating Companies in the United States.

ACE is not the only successful application. NET/ ADVISOR (Mantleman, 1986), which monitors local area networks, has been a commercial product since 1986, and CENTAURE (Benicourt, Arnaud, & Vin- cent Crommelynck, 1989), which monitors the tele- communications network of SCNF, the French Na- tional Railway, reached full deployment in mid-1989. TERESA (Corn, Dube, McMichael, & Tsay, 1988) (previously called TOPAS-ES) has had successful product trials, and is working in multiple sites. The same is true of GTE's COMPASS (Prerau et al., 1985), NEMESYS, (Macleish, Theidke, & Vennergrund, 1986) and PROPHET (Prerau, Gunderson, & Levine, 1988), which all monitor switch messages of various kinds. In addition, BELLCORE has several switch- monitoring projects (Loberg, 1988; Slawsky & Sassa, 1988; Sutter, 1986).

Packet-switching networks have been the target of expert system projects on several occasions. IAS (Fer- rata, Giovannini, & Paschetta, 1989), which monitors the Italian ITELPAC network, and the DPN Monitor (Baird & White, 1989), which monitors Bell Canada's

130 J. R. Wright and G. T. Vesonder

packet switching network, have both had successful product trials and are moving into a deployment stage in 1990. Another pocket switch application is DAD (Rabie, Rau-Chapin, & Shibahara, 1988).

Monitoring experts are used in a wide variety of networks. British Telecom's AMF (Thandasseri, 1986) monitors TXE4 exchanges. TXE4 switches are used by broadcast studios for routing audio signals. REACT (Fox, 1988) and RTS (Mantleman, 1986) monitor di- agnostic messages generated by the No. lESS switch in the United States. The FIESTA prototype (Miksell et al., 1988) monitors satellite communications for the U.S. federal government.

Troubleshooting. Troubleshooting is distinct from both monitoring and diagnosis. The goal of a troubleshooter is to identify an unknown faulty component or sub- system. Seldom in a troubleshooting task is all the rel- evant data immediately available. The troubleshooter actively seeks out new data by interacting with the sys- tem that is in trouble. Usually it is possible to collect far more data than is really needed or feasible, so that careful choices are necessary. There is clearly a dynamic aspect to troubleshooting--the path selected is not predictable from its initial inputs; rather, it depends on what is discovered as the troubleshooting process unfolds.

MAD (Peacock, 1988, Built, Peacocke, Rabie, & Starr, 1987) is a good example of a troubleshooting system that uses a human operator as a go-between. It is an advisor for the DMS family of switching machines developed by Northern Telecom. Diagnostic messages from the DMS switches are received by a technician over a printed maintenance channel. The technician reads the messages and provides input to MAD. MAD asks the technician for additional information when needed, and the technician obtains this information by interacting with the switch through the maintenance channel. Basically, MAD is an advice giver that de- pends on a human to intercede between it and the system that is in trouble.

Some expert systems have direct interfaces to the systems they troubleshoot, frequently using the same command interface that human troubleshooters use, and sometimes even resolving or temporarily relieving trouble on their own initiative. The Starkeeper Network Troubleshooter (Marques, 1988a; 1988b), an expert system developed by AT&T Bell Laboratories for the Datakit ® network, is a good example of this kind of system.

The Starkeeper Network Troubleshooter has an in- teresting approach to troubleshooting. It keeps a his- torical database of component failures along with the evidence that was available during each troubleshooting

® Datakit is a trademark of AT&T.

session. When new troubles arise, Troubleshooter ob- tains a description of the pathway associated with the network failure. Because any link or node along the pathway could be faulty, Troubleshooter then produces a plan or agenda describing what components should be tested and in what order. The plan is generated by combining historical frequency of failure records with current symptoms using Bayes' Rule.

Finally, Troubleshooter invokes specialist modules that do qualitative analyses on specific components. When the cause of the reported trouble is uncovered, Troubleshooter updates its historical records with new frequencies. This gives the Starkeeper Troubleshooter the ability to adapt to local conditions, something that is unusual for most working expert systems.

Some troubles reported to the Troubleshooter are administrative in nature and may not require physical repair. By using commands available through Datakit's Starkeeper Network Management System, the Trou- bleshooter is sometimes able to restore service to users on its own.

Troubleshooter's specialist modules provide an ex- ample of how troubleshooting knowledge can be cap- tured as a set of production rules. Each situation in which it is appropriate to run a particular test is cap- tured on the left-hand side of a production rule, and the right-hand side executes that test. The left-hand side might describe node or link configurations and specific evidence, symptoms, or data produced from previous tests. The approach works because the situ- ations represented by the production rules are generic enough to handle most reported Datakit troubles. Rel- atively speaking, the Troubleshooter is able to capture knowledge about these situations in a manageable number of rules (about 1400).

Troubleshooting experts have been developed for a fairly wide variety of networks. ARTEX (Fleischanderl, Friedrich, & Retti, 1989), written in PROLOG, shoots troubles on the same kinds of audio-routing systems monitored by AMF. COMNET (Reddy & Uppuluri, 1986) is a troubleshooter for digital and analog data circuits. ExT (Yudkin, 1987) shoots troubles on on digital special services circuits, and the Network Trou- ble Shooting Consultant (Hannan, 1987) is a trouble- shooter for DECnet and Ethernet LANs. Other expert systems with troubleshooting features are FIESTA (Miksell et al., 1988), a troubleshooter for satellite networks, TERESA (Callahan, 1988; Corn, Dube, McMichael, & Tsay, 1988), an expert system that both monitors and shoots trouble on interoffice trunks, AUTOTEST-2 (Ackroff, Surko, & Wright, 1988; Ack- roff, Surko, Vesonder, & Wright, 1990), and IRA (Horton, Hsiao, & Zielinski, 1988).

Diagnosis. If one thinks of troubleshooting as the pro- cess of uncovering evidence on a network trouble, di- agnosis is the process of placing an interpretation on

Expert Systems in Telecommunications 131

that evidence. Most maintenance expert systems pro- vide diagnoses or data interpretation along with the other functions they perform. AUTOTEST-2 (Ackroff et al., 1988), which is basically a troubleshooter for special services circuits, gives us a good example of what trouble diagnosis is all about.

AUTOTEST-2 has access to test equipment installed on special services circuits (Foreign Exchanges, WATS lines, conditioned data lines, etc.) through an interface to the SARTS (Switched Access Remote Test System) OS. Basically, AUTOTEST-2 tests circuits using strat- egies and methods that are very similar to human test- ers. Circuit records stored in SARTS are used to de- termine the kind of circuit being tested and the location and type of test equipment installed on the circuit. AUTOTEST-2 does both sectionalization and diag- nosis of trouble, and its recommendations are good enough to dispatch repair technicians directly from its output. Like the Datakit Troubleshooter, AUTOTEST- 2 uses the same commands available to users through a standard interface.

Diagnosis is a critical part of AUTOTEST-2 for sev- eral reasons. First, misleading test measurements are frequently generated during the course of trouble- shooting, and AUTOTESTo2 depends on its ability to diagnose to prevent these measurements from gener- ating false alarms on equipment that is working prop- erly. Second, the diagnoses themselves imply who should be dispatched to repair the trouble, and they are used to route troubles to the appropriate work group. These routing decisions form the heart of Au- totest-2's economic and service benefits.

Diagnostic knowledge is encoded as individual pro- duction rules. Each diagnostic rule describes a pattern of measurements that is known, either through expe- rience or technical understanding of the domain, to represent meaningful troubles. The use of production rules is appropriate because the individual diagnoses are relatively few in number (a hundred or less). To give an example, the simplest diagnosis occurs when metallic faults in the network prevent terminal equip- ment from responding properly. There are two distinct sets of measurements, one set indicates a metallic fault, and another set indicates faulty terminal equipment. In this case, AUTOTEST-2 recommends that techni- cians repair the metallic fault but is able to disregard the measurements that suggest the terminal equipment should be replaced.

2.2. Provisioning Experts

The term provisioning is usually thought to apply only to large-scale public switched networks. However, pro- visioning is clearly part of the management of all kinds of networks, if sometimes on a smaller scale. Anyone responsible for managing a local area data network,

for example, will recognize similarities in what we are about to describe as network provisioning.

Provisioning encompasses a broad, somewhat open- ended, set of tasks. There are four principal activities: planning, design, configuration, and implementation. Provisioning usually takes place within the context of an overall master plan that describes how a network will evolve over a period of years, and there are often short-term plans that describe how networks are changing to meet new levels of customer demand. Changes to the network must be designed, taking into account currently available, and sometimes future, technology. Adding nodes and links to a network are examples of design level changes.

Once a design is completed, the individual nodes and links must be configured. The configuration stage takes node and link specifications generated in the de- sign stage and produces equipment lists and instruc- tions for connecting them together properly. Finally, the right equipment must be acquired and installed on site.

No project, to our knowledge, has taken on provi- sioning in its full complexity. In particular, network planning is an important area that no working expert system has addressed successfully. There are, however, working expert systems that successfully address limited portions of the design and configuration subtasks. These systems typically play the role of assistants, au- tomating tedious or time-consuming aspects of an ex- pert's job.

DesigNer (Bernstein, 1987; Mantleman, 1986), de- veloped by BBN Laboratories, is perhaps the most ad- vanced graphical design system for networks. Operators using DesigNer work with a graphical display to con- struct a tentative network design. Certain low level de- tails of this task, such as the construction of a minimum spanning tree in the beginning design stages, are han- dled automatically by DesigNer.

Once an initial design is completed, DesigNet uses a mathematical simulation to provide feedback to the user. DesigNet provides users with a variety of impor- tant metrics such as minimum delays, loading on in- dividual network elements, and cost. Users make changes to the network design with the use of a mouse and observe effects through DesigNet's simulation. This capability makes it easy for users to perform tedious but important tasks, such as the identification and elimination of expensive links that do not contribute their share to network performance. BBN Laboratories has announced their intentions of developing a con- figurator that would configure node sites based on out- put from DesigNet (Mantleman, 1986).

XTEL (Feinstein et al., 1988) is a network designer for military applications. It has simulation models that provide feedback to the human designer, but does not have the sophisticated graphical interface of DesigNet. XTEL uses a knowledge base of production rules to

132 J. R. Wright and G. T. I/esonder

evaluate the network design. The evaluation rules are based on military criteria such as the survivability of individual nodes. XTEL also actively makes recom- mendations to the user, such as adding nodes or changing interconnections that improve on key met- tics, and can provide explanations as to why such changes are desirable.

Two other graphically oriented systems in the pro- visioning domain are KAT (Clark, 1987) and the Sys- tem Configurator (Lutticke, 1989). KAT is described as a knowledge-based information display tool. KAT manages an object-oriented description of an network design for a human designer. The System Configurator, also based on object-oriented methods, has both design and configuration features. It allows users to develop designs for computer networks interactively through a graphical interface, but generates a detailed equipment list that we would normally associate with a node or link configurator. Neither KAT nor System Configu- rator go as far as DesigNet or XTEL in helping users evaluate their designs.

LEIS (Salasoo, 1988) and SLEEK (Spang-Robinson, 1988) address a small, but nevertheless important, part of provisioning. The two systems, the first developed by BELLCORE and the second by AMERITECH, both assist designers in selecting the appropriate digital technology for loop subscriber applications.

2.3. Network Administration Experts

Network administration encompasses a broad group of diversified functions that are difficult to characterize as a whole. Network administration can be thought of as a kind of grab bag, where applications that are clearly not provisioning or maintenance can be placed.

The most common kind of expert system applica- tion in this area is traffic routing or trqffw management. Demand for network services fluctuates over a wide range. In general, however, it is too expensive to design for the highest levels of demand, especially since those levels are often confined to local areas and occur in- frequently. The most cost-effective way of dealing with demand peaks is to recognize local congestion in the network when it occurs and reroute traffic along paths that have lower demand. Significant benefits can be gained through the ability to properly route traffic through the available resources.

In the long-distance public switched network, ex- perts called traffic managers monitor the network and reroute traffic when the occasion demands. These ex- pelts have available large amounts of data from switching machines, signal transfer points, and other network entities. When congestion is detected by mon- itoring these data, traffic managers issue commands to the network switches called switching controls that alter call processing and therefore temporarily relieve net- work congestion.

The NEMESYS expert system project (Guattery & Villareal, 1985) is an attempt to automate some of the decision-making abilities of the traffic manager. The routing decisions NEMESYS makes are rather com- plex, depending in part on the local topology of the network and the probable cause of the congestion.

Other network administration experts that handle traffic management are XTRAL (Chang & Gross, 1985), NetManager (Cross & Dillon, 1989), ATN (Spang-Robinson, 1988), and NCAI (Benson, 1986). NCAI, in particular, has an interesting approach to traffic management. It is a military application that manages packet-switched radio networks whose to- pography is subject to constant change. This might happen when military commanders are attempting to send and receive messages from field units. Normally, the approach is to have relay stations that serve specific geographical areas. By allowing all field units to be mo- bile, the NCAI approach decreases the risk of detection.

In NCAI, each network node has its own model of the network implemented as OPS5 working memory. The nodes derive connectivity information from mes- sage headers, enabling each node to maintain databases on complete paths and known links in the network. NCAI has conventions for allowing individual nodes to request paths from their neighbors when their own databases fail to contain specific path information, for updating node databases with new information, and for informing originating nodes about network con- nectivity outside their local area.

Two other expert systems in the network adminis- tration area are MES (Rosales & Mehrotra, 1988) and ASSIGN (Farenci et al., 1989). Both are concerned with record keeping in the public switched network. When customers request new services, they are assigned equipment that provides them with that service. Tele- phone companies keep extensive records on who is assigned what equipment so that they know what to repair ifa problem occurs, and when equipment is free for reassignment. In some cases, customers may request services for which equipment is not available, thus some action on the part of the phone company is re- quired to provide that service, for example, recovering or repairing existing equipment that does not work properly, or installing new equipment. ASSIGN helps this process by giving advice on the appropriate action.

Similarly, telephone companies keep detailed data- base records on most circuits--descriptions of end points, terminal equipment, test devices, and so on. For special services and interoffice trunks, expected measurement values are also recorded as database en- tries. These database entries can be complex and dif- ficult to enter accurately. MES assists users in deter- mining the correct database entries for new circuits.

2.4. Some New Applications

Two new systems have appeared recently that do not fit easily into the operations support framework dis-

Expert Systems in Telecommunications 133

cussed so far. Both systems could be described as sales assistants and they represent a new class of application that could become commonplace.

The Service Definition Expert System (Mehrotra, Erfani, Lee, & Sachar, 1988)is a production rule-based expert that supports sales proposals by matching the needs of customers to available telecommunications services, features, and options. Customers provide in- put via a structured dialogue and SDES translates that information into a technical description of services and equipment. It is currently in prototype form.

ENS (Ferguson, Rabie, Kennedy, & Peacocke, 1987) is another production rule prototype developed jointly by BNR and Bell Canada. ENS is intended to be used by Bell Canada's sales representatives. It produces a network configuration and pricing from a high level description of customer requirements for data com- munications.

Both systems represent a response to the growing complexity of telecommunications services. Sales rep- resentatives often have responsibility for broad product lines and it is difficult for them to maintain the tech- nical depth and understanding needed to make effective sales proposals. Expert systems encoded with technical knowledge about product configurations and pricing could reduce the work load of sales representatives and allow them to spend more time focusing on their cus- tomers. We think that systems like SDES and ENS will be seen more frequently in the future.

2.5. Specialized Inference Engines

Most of the expert systems discussed so far use either commercially available development shells or well-es- tablished AI languages (LISP, OPSS, OPS83). These shells and languages support general expert system or artificial intelligence programming methods and are by no means limited to the telecommunications do- main. However, one of the interesting trends that has emerged over the past several years is the appearance of special purpose problem-solving or inference frame- works that are tailored to particular classes of problems. The range of problems to which these tools may be applied is narrow, but their focused approach permits the use of specialized representations and methods of inference that provide the applications developer with considerable leverage. Several of these special purpose frameworks, although still in an experimental stage, may have applicability to telecommunications in the future.

LES (Laffey, Perkins, & Nguyen, 1986) is a trou- bleshooting framework for telecommunications net- works that is based on production rules and an hy- pothesis-driven or backward-chaining control strategy. The pro~luction rules operate on a database that con- tains both topological information and a description of the expected behavior of the components of the net-

work. Production rules use the network description to generate a sequence of tests that locate the trouble. The advantage LES offers is the specialized database it has for describing network structure. An expert system de- veloper still must write production rules about how to gather evidence from the network components and how to infer failures from the evidence. In LES, such rules are specific to each network.

A somewhat different approach is taken by FIS (Pipitone, 1984; Pipitone, DeJong, Spears, & Marrone, 1988). FIS is the culmination of several years of re- search directed at developing an inference engine for troubleshooting electronics equipment. Currently, it has been applied as a driver of the automated test equipment built into U.S. Navy sonar and radar equipment. Its approach is similar to that taken by the Starkeeper Network Troubleshooter and one that could be important for telecommunications networks.

Simply stated, FIS provides a control structure for driving a sequence of tests based on a description of the topology of the equipment, the fault probabilities of each module, and a description of the expected in- put/output behavior of each module, called causal rules in the FIS terminology. Bayes' Rule is used to accu- mulate evidence on the relative likelihood of failure for individual components following each test. Among other things, FIS can generate a decision tree for con- trolling the sequence of tests performed by automated test equipment. Typically, programmers would con- struct such a decision tree by hand using a procedural language.

Finally, DANTES (Van Cotthem, Mathonet, & Vanryckeghem, 1987) is an expert system framework that is intended for use in network-monitoring appli- cations. The problem one faces in network monitoring is that the time-critical nature of the application often prevents the best known knowledge representation methods from being used. DANTES is an attempt to develop a generic rule-based architecture for network monitors that work in real time. Basically, it achieves speed by taking advantage of small improvements de- rived from its limited application domain. For example, it does some of its own physical memory management. In addition, it limits the data considered by the rule system with the use of.filters and has its own special purpose conflict resolution algorithm.

We think that specialized inference engines will prove to be an important development. There are sev- eral issues here. The first is whether or not there is enough common structure in the problems addressed by expert systems in telecommunications to support special tool development. In our judgment, the answer is a qualified yes. At the highest level, problems in the telecommunications domain fall neatly into separate buckets--monitoring, troubleshooting, diagnosis, con- figuration, and so on. However, it is not clear how similar the applications are beneath that top level. For example, the problem space for troubleshooting

134 J. R. Wright and G. T. Vesonder

switches may or may not be similar enough to that for troubleshooting transmission links to support a com- mon troubleshooting tool.

Second, the common structure in these problems must be communicated properly to those who can do something about it. While there are people who are capable of building specialized tools, there are not many that are also willing to dig out the necessary detailed knowledge in a complex applications domain. Success in this area requires delicate interplay between appli- cation and research. Concise statements of key prob- lems in the domain would be a good beginning.

3. DISCUSSION

Computer applications are indispensable to the proper operation of all kinds of communications networks, whether public, private, voice, or data. Expert systems as yet play a small role in the overall picture. Neverthe- less, that role is growing and will become more im- portant over time. We have described our view of net- work operations and how a sample of the existing ex- pert system projects fit within that framework. This gives us a good opportunity to try to understand what these expert systems are and how they relate to the existing body of computer applications.

In some circles, expert systems are thought to rep- resent a unique technological category, a discontinuity from the methods and techniques of the past. Perhaps because both authors have experience in so-called con- ventional projects, we have never been comfortable with this position, especially when colleagues challenge us to define the difference between the expert systems we develop and the conventional systems others work on. Basically, we have come to understand expert sys- tems and conventional systems as different points on a set of continua. In fact, we see them as natural ex- tensions of the tasks, techniques, and tools that have been applied to the problems of the telecommunica- tions industry in the recent past.

The continuity between expert systems and con- ventional systems is never clearer than when one is discussing working systems within the same application domain. For most of the expert systems discussed, for example, one can find a conventionally based, working system that does something similar. We refer to this as task continuity. While the feel and functionality of expert versus conventional systems is invariably dif- ferent, successful projects addressing similar applica- tions have been structured around both technologies.

To give just one example, monitoring and processing of diagnostic messages has been problematic in the telecommunications industry for many years. There are existing systems, developed using conventional techniques, that address the same kinds of problems as the monitoring experts described in section 2.1. One

of the authors worked on such a system for several years (Boggs & Wright, 1985). That system, called Pre- dictor, arrives at a solution that is different from the expert system projects discussed in this paper, but one that is effective nevertheless. Basically, Predictor works in cooperation with a human expert, called an analyzer, by structuring incoming message data in such a way that the human expert finds it easy to identify mean- ingful groups of messages.

The monitoring experts discussed in section 2.1 take steps in addition to those taken by Predictor. Messages are separated into meaningful groups, false alarms are discarded, and the underlying trouble is diagnosed. These activities are quite different than simply provid- ing conveniently structured data to a human analyzer. By combining the monitoring expert with some routing capability, one could send diagnosed troubles directly to repair technicians with accompanying savings in cost and improvement in customer service. By and large, one could not achieve the same effect by simply routing reports consisting of raw data, no matter how well structured. In this sense, the monitoring experts rep- resent an extension to the message-monitoring appli- cation addressed by Predictor and systems like Pre- dictor.

Next, we think there is also continuity in the tech- niques applied in the software development process. Take, for example, the method that is most frequently identified with expert system development, that of knowledge engineering. Many conventional projects, including several in which the authors have been in- volved, have recognized the importance of having a domain expert available as a consultant, someone who could advise and shape the growth of important system features. To provide a concrete example, the algorithms used by Predictor were developed by studying preex- isting manual procedures used by human experts. A domain expert was taken on as part of the project team during Predictor's development and contributed sig- nificantly to its success.

The consultation of a domain expert does not, in and of itself, make a project an expert system. For that matter, neither does the fact that a project involved the automation of preexisting manual procedures. However, there is a point at which these automated procedures take on the character of something like ex- pertise. We wish to avoid arguments over what is or is not an expert system. Our point is simply that there are touchstones between the techniques employed in expert system projects today and those employed by conventional development projects in the recent past.

Finally, we think there is a growing continuity in technology and tools. The clearest example of this is the current interest in object-oriented methods. Several of the expert systems reviewed use object-oriented pro- gramming extensively--it seems to be the natural way to encode network models, for example. At the same

Expert Systems in Telecommunications 135

time, conventional projects are also beginning to use object-oriented languages. Other tools, such as pro- duction system languages, are also finding their way into otherwise conventional projects.

All this suggests that the difference between expert systems and so-called conventional programs is be- coming obscured. It is inadequate to define expert sys- tems in terms of the tools used to build them, and there are no generic criteria that can be applied to clarify exactly what is meant by expert performance. We are not at all disturbed by this trend-- in some sense it may be an indicator of success. Clearly, there is a growing class of important software systems that are narrow and highly customized. The first rule for success in developing such systems is know the application do- main thoroughly, and it is advantageous to use high level tools that allow developers to concentrate on the application itself rather than on the underlying systems technology. Object-oriented languages, production system languages, and other tools pioneered in the ex- pert system and AI communities provide techniques for writing programs close to the language and structure of these specialized application domains. For this rea- son, we think that in the future many non-expert sys- tem projects will find these tools worth using.

REFERENCES

Ackroff, J.M., Surko, P.T., Vesonder, G.T., & Wright, J.R. (1990). SARTS AutoTest-2. In M.A. Bramer (Ed.), Practical experience in building expert systems. New York: John Wiley & Sons.

Ackroff, J.M., Surko, P.T., & Wright, J.R. (1988). AutoTest-2: An Expert system for speoal services. In M. Teitell (Ed.), Proceedings of the Fourth Annual Artificial Intelligence and Advanced Com- puter Technology Conference (pp. 503-508).

Attard, R. (Technical Chairman). (1989). Proceedings of the Ninth International Workshop of Expert Systems and Their Applications. Nanterre, France: ECCAI.

Baird, C., & White, T. (1989). A real time network monitor. In R. Atlard (Ed.), Proceedings of Conference on Artificial Intelligence, Telecommunications, and Computer Systems (pp. 35-41). Nan- terre, France: ECCAI.

Benicoart, A., & Crommelynck, V. (1989). CENTAURE: Le systeme expert de surveillance du resau national de teleinformatique de la SNCF. R. Attard (Ed.), Proceedings of Conference on Artificial Intelligence, Telecommunications, and Computer Systems (pp. 17-34). Nanterre, France: ECCAI.

Benson, P. (1986). Artificial intelligence assisted packet radio con- nectivity. Electrical Communication, 60(2).

Bernstein, S. (1987). DesignNet: An intelligent system for network design and modelling. In D. J. Sassa (Ed.), International Com- munications Conference 1987, New York: IEEE. Seattle, WA.

Boggs, P.S., & Wright, J.R. (1985). Knocking potential problems for a loop. A T&T Bell Laboratories Record (January), 22-26.

Built, T., Peacocke, R., Rabie, S., & Snarr, V. (1987). An interactive expert system for switch maintenance. In E. J. Glennor (Ed.), International Switching Symposium. New York: IEEE.

Callahan, P.H. (1988). Expert systems for AT&T switched network maintenance. A T& T Technical Journal. 67(1 ), 93-103.

Chang, D., & Gross, S. (1985). Telecommunications resource allo- cation: A knowledge-based system. In K. N. Karna (Ed.), Expert systems in government (PP. 666-675). New York: IEEE.

Clark, C.E. (1987). A knowledge-based information display tool for network planning. D. J. Sassa (Ed.), International Communica- tions Conference 1987, New York: IEEE. Seattle, WA.

Corn, P.A., Dube, R., McMichael, A.F., & Tsay, J.L. (1988). An autonomous distributed expert system for switched network maintenance. In R. Blake (Ed.), Proceedings of the IEEE Global Telecommunications Conference, pp. 1530-1537. New York: IEEE.

Cross, UM., & Dillon, T.S. (1989). A knowledge-based approach to network tratfic management in a national telecommunications network. In R. Attard (Ed.), Proceedings of the Conference on Artificial Intelligence, Telecommunications, and Computer Sys- tems (pp. 45--62). Nanterre, France: ECCAI.

Farenci, R., Vorce, D., Hahn, E.A., Hogan, J., Daminski, J.S., & Lee, W. (1989). ASSIGN: A qualitative approach to outside plant design engineering. BELLCORE Technical Memorandum. Mor- ristown, N J: BELLCORE.

Feinstein, J.L., Siems, F., Popolizio, J., Bailey, D., & Wang, A. (1988). XTEL: An expert system for designing theaterwide telecommu- nications architectures. In J. Liebowitz (Ed.), Expert systems ap- plications to telecommunications (pp. 161-190). New York: John Wiley & Sons.

Ferguson, I., Rabie, J., Kennedy, J., & Peacocke, R. (1987). A knowl- edge-based sales assistant for data communications networks. In D. J. Sassa (Ed.), International Communications Conference 1987, New York: IEEE.

Ferrara, F., Giovannini, F., & Paschetta, E. (1989). IAS: An Expert system for packet-switched network monitoring and repair assis- lance. R. Attard (Ed.), Proceedings of Conference on Artificial Intelligence. Telecommunications, and Computer Systems (pp. 185-197). Nanterre, France: ECCAI.

Fleischanderi, G., Friedrich, G., & Retti, J. (1989). Model-driven fault localization in audio routing systems. R. Attard (Ed.), Pro- ceedings of Conference on Artificial Intelligence, Telecommuni- cations, and Computer Systems (pp. 173-183). Nanterre, France: ECCAI.

Forgy, C.L. ( 1981). The OPS5 user's manual (Tech. Rep. CMU-CS- 81-135). Computer Science Department, Carnegie-Mellon Uni- versity, Pittsburgh, PA.

Forgy, C.L. (1986). The 0PS83 user's manual. Pittsburgh, PA: Pro- duction Systems Technologies.

Fox, J.R. (1988). Tackling a real-time monitoring problem. In BELLCORE Artificial Intelligence Symposium (June) (pp. 25- 30). Morristown, N J: BELLCORE.

Guattery, S., & Villareal, F. (1985). NEMESYS: An expert system for fighting congestion in the long distance network. In K. N. Karna (Ed.), IEEE Symposium on Expert Systems in Government (October). Washington, DC: IEEE.

Hannan, J. (1987). Network solutions employing expert systems. In D. Friesen & F. Colshani (Eds.), Phoenix Computers and Com- munications Conference (PCCC-87) (pp. 543-547). Washington, DC: IEEE.

Horton, E.M., Hsiao, J., & Zielinski, J.E. (1988). Interactive repair assistant: A knowledge-based system for providing advice to field technicians. IEEE Communications, 26(3), 21-24.

Laffey, T.J., Perkins, W.A., & Ngnyen, T.A. (1986). Reasoning about fault diagnosis with LES. IEEE Expert (Spring), 13-20.

Liebowitz, J. (Ed.). (1988). Expert systems applications to telecom- munications. New York: John Wiley & Sons.

Loberg, G. (1988). SMART I1, principled design of knowledge-based systems. In BELLCORE Artificial Intelligence Symposium, (June) (pp. 7-12). Morristown, N J: BELLCORE.

Lutticke, B., McArthur, D., Neuhaus, A., Sachs, S., & Swanson, A. (1989). An interactive graphical configurator for networked sys- tems. In R. Atlard (Ed.), Proceedings of Conference on Artificial Intelligence, Telecommunications, and Computer Systems (pp.. 119-129). Nanterre, France: ECCAI.

Macleish, K., Theidke, S., & Venneq~und, D. (1986). Expert systems

136 J. R. Wright and G. T. Vesonder

in central office maintenance. IEEE Communications Magazine, ~A(9).

Mantleman, L. (1986). AI carves inroads: Network design, testing, and management. Data Communications, (July), 106-123.

Marques, T.E. (1988a). A symptom-driven expert system for isolating and correcting network faults. IEEE Communications, 26(3), 6- 13.

Marques, T.E. (1988b). Starkeeper Network Troubleshooter: An ex- pert system product. A T& T Technical Journal, 67(6), 137-154.

Mehrotra, P., Erfani, S., Lee, Y.P., & Sachar, H. (1988). Design of an on-line telecommunication service definition tool based on Expert System Technology. In R. V. Milekkileni (Ed.), Proceed- ings of the IEEE Network Operations and Management Sym- posium. New York: IEEE.

Miksell, S., Quillin, R., Wilkinson, W.M., Matteson, N., Smisko, M., Zakrzewski, E., & Lowe, D. (1988). Expert system fault isolation in a satellite communications network. In J. Liebowitz (Ed.), Ex- pert Systems Applications to Telecommunications. New York: John Wiley & Sons.

Nilson, M.E. (1989). Towards the knowledge-based creation of tele- communications services. BELLCORE Technical Memorandum. Morristown, N J: BELLCORE.

Peacock, D. (1988). On-line expertise for telecommunications. In J. Liebowitz (Ed.), Expert systems applications to telecommunica- tions. New York: John Wiley & Sons.

Prerau, D.S., Gunderson, A.S., & Levine, S.P. (1988). The prophet expert system: Pro-active maintenance of telephone company outside plant. In M. Teitell (Ed.), Proceedings of the Fourth Annual Artificial Intelligence and Advanced Computer Conference (pp. 384-389). New York: IEEE.

Prerau, D., Gunderson, A.S., Reinke, R.E., & Goyal, S.K. (1985). The COMPASS expert system: Verification, technology transfer, and expansion. In J. K. Aggarwal (Ed.), Second Conference on Artificial Intelligence Applications (pp. 597-602). Washington, DC: IEEE.

Pipitone, F. (1984). An expert system for electronics troubleshooting based on function and connectivity. In R. M. Haralick (Ed.), Proceedings of the First International Conference on Artificial In- telligence Applications (pp. 133-138). Washington, DC: IEEE.

Pipitone, F., DeJong, K., Spears, W., & Marrone, M. (1988). The FIS electronics troubleshooting project. In J. Liebowitz (Ed.), Ex- pert Systems Applications to Telecommunications. New York: John Wiley & Sons.

Rabie, S., Rau-Chapin, A., & Shibahara, T. (1988). DAD: A real- time expert system for monitoring data packet networks. IEEE Networks Magazine (September).

Reddy, Y., & Uppuluri, S. (1986). Intelligent systems technology in network operations management. In D. J. Sassa (Ed.), Interna- tional Communications Conference 1986. New York: IEEE. (pp. 1220-1224).

Rosales, S., & Mehrotra, P.K. (1988). MES: An expert system for reusing models of transmission equipment. In R. M. Haralick

(Ed.), Proceedings of the Fourth IEEE Conference on Artificial Intelligence Applications. New York: IEEE.

Ruddock, D., & Gersho, M. MAVEN: A knowledge-based system for common language equipment code assignment. In BELL- CORE Artificial Intelligence Symposium (June) (pp. 37-42). Morristown, NJ: BELLCORE.

Salasoo, A. (1988). Expert system enhancements to loop planning tools: a prototype for digital technology choice. In BELLCORE Artificial Intelligence Symposium (June) (pp. 31-36). Morristown, NJ: BELLCORE.

SCFG Highlights. (1988). Expert systems: Making a place in the tele- com lineup. SFCG Highlights, 4(5), 1-8.

Slawsky, G.M., & Sassa, D.J. (1988). Expert systems for network management and control in telecommunications at BELLCORE. In J. Liebowitz (Ed.), Expert systems applications to telecom- munications (pp. 191-199). New York: John Wiley & Sons.

Spang-Robinson (1988). Telecommunications systems. The Spang Robinson Report on A1, 4(5), 2-5.

SuRer, M. (1986). The SMART project: An approach to expert system integration and evaluation in the BOCs. In D. J. Sassa (Ed.), International Communications Conference 1986. New York: IEEE. (pp. 1230-1232).

Teiteil, M. (Program Chair). (1988). Proceedings of The Fourth Annual Artificial Intelligence and Advanced Computer Technology Con- ference, IEEE, Long Beach, CA.

Thandasseri, M. (1986). Expert systems for TXE4A exchanges. Elec- trical Communications, 60(2).

Van Cotthem, H., Mathonet, R., & Vanryckeghem, L. (1987). DANTES: An expert system shell dedicated to real-time network troubleshooting. In D. J. Sassa (Ed.), International Communi- cations Conference 1987, New York: 1EEE. (June).

Vesonder, G. T. (1988). Rule based programming in the UNIX sys- tem. A T& T Technical Journal, 67( I ), 69-80.

Vesonder, G.T., Stolfo, S.J., Zielinski, J.E., Miller, F.D., & Copp, D.H. (1983). ACE: An expert system for telephone cable mainte- nance. IJCAL 8, 116-120.

Waterman, D. A. (1986). A guide to expert systems. Reading, MA: Addison-Wesley.

Wright, J.R., & Siegfried, E.M. (1985). ACE: Going from prototype to product. In T. Bernold (Ed.), Expert systems and knowledge engineering. Essential elements of advanced information tech- nology (pp. 121-131). New York: North-Holland Press.

Wright, J.R., Zielinski, J.E., & Horton, E.M. (1988). Expert systems development: The ACE system. In J. Liebowitz (Ed.), Expert sys- tems applications to telecommunications. New York: John Wiley & Sons.

Yudkin, R.O. (1987). ExT: An expert tester. In R. M. Haralick (Ed.), Proceedings of the Fourth Conference on Artificial Intelligence Applications (pp. 452-458). New York: IEEE.

Zeldin, P.E., Miller, F.D., Siegfried, E.M., & Wright, J.R. (1986). Knowledge-based loop maintenance: The ACE system. ICC'86 (pp. 1241-1243).