An Integrated Instrumentation Environment for Multiprocessors

IEEE TRANSACTIONS ON COMPUTERS, VOL. c-32, NO. 1, JANUARY 1983

An Integrated Instrumentation Environment forMultiprocessors

ZARY SEGALL, AJAY SINGH, RICHARD T. SNODGRASS, MEMBER, IEEE, ANITA K. JONES,SENIOR MEMBER, IEEE, AND DANIEL P. SIEWIOREK, FELLOW, IEEE

Abstract-This paper introduces the concept of an integrated in-strumentation environment (IIE) for multiprocessors. The primaryobjective of such an environment is to assist the user in the process ofexperimentation. The emphasis in an IIE is on experiment management(including stimulus generation, monitoring, data collection and anal-ysis), rather than on techniques for program development as in con-ventional programming environments. We believe the functionality ofthe two environments should eventually be provided in one compre-hensive environment.An experiment schema is introduced as an appropriate structuring

concept for experiment management purposes. Schema instancescapture the results of an experiment for later analysis. An example isdeveloped in some detail to demonstrate the potential benefits of suchan approach. The three primary components of the IIE, namely, theschema manager, the stimulus generator, and the monitor, are brieflydescribed. A preliminary implementation of the design on the Cm*multiprocessor is briefly discussed.

Index Terms-Automated testing, experimentation, experimentmanagement, instrumentation, monitoring, multiprocessor perfor-mance evaluation, programming environment, stimulus generation,workload generation.

I. INTRODUCTION

M/[ ULTIPROCESSOR designs have long been proposedto meet the need for powerful, cost-effective com-

puters. Several multiprocessors have been built to study thevarious tradeoffs inherent in this approach [1]- [9]. An im-portant objective of experimentation in performance evaluationand reliability is to provide evidence to validate the designdecisions of these systems. Due to the increased number ofindependent components in multiprocessors, the space ofpossible experiments for such machines is orders of magnitudelarger than for conventional uniprocessors. There is, therefore,a need to approach the problem of experimentation on mul-tiprocessors in a structured manner.

Manuscript received February 2, 1982; revised July 22, 1982. This workwas supported by the Ballistic Missile Defense Advanced Technological Centerunder Contract DASG60-81-0077 and in part by a National Science Foun-dation Fellowship. The views and conclusions contained in this paper are thoseof the authors and should not be interpreted as representing the official policies,either expressed or implied, of the B.M.D.A.T.C. or the U.S. Government.

Z. Segall, A. K. Jones, and D. P. Siewiorek are with the Department ofComputer Science, Carnegie-Mellon University, Schenley Park, Pittsburgh,PA 15213.

R. T. Snodgrass was with the Department of Computer Science, Carne-gie-Mellon University, Schenley Park, Pittsburgh, PA 15213. He is now withthe Department of Computer Science, University of North Carolina, ChapelHill, NC 27514.

A. Singh was with the Department of Electrical Engineering, Carnegie-Mellon University, Schenley Park, Pittsburgh, PA 15213. He is now withTandem Computers, 19333 Vallco Parkway, Cupertino, CA 95014.

Instrumentation of the machine is the first important step.Typical instruments discussed in the literature include soft-ware, hardware, hybrid, and computer network monitors,natural and synthetic workload generators, data compactiontools, and data analysis packages [10]-[18]. Most systemspossess multiple instruments which have been built indepen-dently over a period of time with little effort toward integra-tion. This unstructured approach has several disadvantages.First, an experimenter has to communicate with each instru-ment through its unique user interface, requiring familiaritywith several sets of conflicting conventions in syntax and dataformats. Second, data from one tool have to be convertedmanually to the format requirements of any subsequent tools.Furthermore, to make correlations across experiments, theexperimenter has to manually keep track of experiment dates,input parameters, monitored results, system configuration, andso forth. Finally, getting the tools to interact during the courseof the experiment is usually impossible.Work in the direction of integrating instruments is found

primarily in the area of computer network monitors [16]. Nutt[1 9] observes that the techniques for gathering measurementdata have not been effectively used. Although the raw powerof existing tools is quite adequate, the use of these tools is oftenso complex that experiments cannot fully utilize their func-tionality.

This paper recognizes the need for better human-engineeredenvironments for experimentation with multiprocessors. Itintroduces the concept of an integrated instrumentation en-vironment (IIE) as a structured approach to facilitate theprocess of experimentation. The design presented emphasizesthe integration of several instrumentation tools, includingstimulus generation and monitoring, into a unified experimentmanagement environment. An experiment script (a schema)is introduced as an appropriate structuring concept for ex-periment-management purposes. Schema instances are in-troduced to capture the results of an experiment for lateranalysis. A preliminary implementation of the design on theCm* multiprocessor [1] under both the StarOS [20] andMedusa [21 ] operating systems is briefly discussed.

Although some program management concepts have beenborrowed from conventional programming environments(PE's) [22]-[24], the thrust in the IIE is substantially differentfrom what is typically discussed in conjunction with pro-gramming environments. The emphasis in an IIE is on ex-periment management, from stimulus generation to moni-toring data collection and analysis, rather than on techniques

0018-9340/83/0100-0004$01.00 C 1983 IEEE

4

SEGALL et al.: INTEGRATED INSTRUMENTATION ENVIRONMENT FOR MULTIPROCESSORS

for program development. We believe the functionality of thetwo should eventually be provided in one comprehensive en-vironment. The IIE draws on the functions provided by PE'ssuch as program specification and translation, version control,multiple programmer support, and module management. Thispaper assumes the existence of a PE and will therefore notdiscuss such functionality in the IIE.

Section I-A presents the functions to be performed by theI IE. The basic components of the design of the IIE are pre-sented in Section II. Stimulus specification and representationis discussed in Section II-A. Section 1I-B discusses the tech-niques used to collect and process monitoring information. Therun-time environment, presented in Sections 1I-C and II-D,is a system-specific component permitting remote monitoringand stimulus control. Section II-E discusses the schemamanager as the central control component supporting the ex-ecution of experiments. A preliminary implementation of theIIE on Cm* is discussed in Section III. Relevant portions ofan example are discussed throughout the paper to illustratesome of the concepts

A. Functionality of the IIEAn integrated instrumentation environment (IIE) consists

of a set of tools which cooperate closely and present the userwith a single uniform interface in order to assist and partiallyautomate the process of experimentation. The general objectiveof an experiment is to inquire about performance, reliability,or any of a number of interesting properties of a computation.In the context of a computer system an experiment is the ex-ecution of an instrumented program in a controlled environ-ment allowing measurement, collection, and analysis. An ex-periment may invol-ve multiple executions of the instrumentedprogram with different input parameters or within differentenvironments.The IIE supports the notion of an experiment schema as the

high level unit of experimentation management. Each schemaspecifies a related collection of runs, that is, executions of aninstrumented program. Intuitively, a schema can be seen asa parameterized experiment script, describing the experi-mentation process. A schema specifies the instrumented pro-gram, the monitoring directives, the specifications of therun-time environment, and the input parameters for eachrun.The result of an execution of a schema is captured in a

schema instance, containing measurements, values of schemaparameters and environmental information. This is a datastructure representing the unit of management for the exper-iment results. Schema instances are archived in a database forlater analysis.

By using the generic notions of schema and schema instancethe experimentation process can be expressed as in Fig. 1. Eachphase of the experimentation process will be discussed in detailin the following sections.An IIE requires software to support the several phases of

experimentation, includingtranslation of collections of user-defined modules and

predefined synthetic actions into instrumented parallel pro-grams;

Schema = DESIGN(Experiment)WHILE (Not End of Experiment) DOBEGIN

EXECUTE (Schema)CREATE(Schema Instances)

ENDANALYZE(Schema Instances)

Fig. 1. Experimentation process in the HIE.

* creation of the schema by merging the instrumentedparallel stimulus, the monitoring directives and the environ-ment information;

* schema interpretation and run-time control;* creation of schema instances; and* analysis of schema instances.In order to further illustrate the experimentation process

described above, we will follow an example through in somedetail. This example shows how the IIE, at each stage, interactswith the user, performs the required actions, and generates itsoutputs. The example stimulus, called, simply, "a multipro-cessor experiment" or MPX, involves a single initiator andmultiple servers communicating through a shared buffer ormailbox. The initiator repeatedly sends requests through thebuffer to one or more servers, which operate on those requestsconcurrently. When the buffer is empty, the servers wait forfurther requests; when the buffer is full, the initiator waits fora request to be removed by a server.

'The servers perform identical functions, so a request can besatisfied by any server. Additionally, the servers communicatewith each other via shared memory. The goals of the proposedexperiment are to investigate

* the interaction between the request rate (expressed as theaverage number of requests per unit time) and the number ofservers, and

* the effect of the request rate and the number of serverson the average buffer queue length, and the average waitingtime in the buffer.

There are two interesting steady-state behaviors that havedifferent average queue length and service rate. In the firstcase, the request rate exceeds the aggregate processing rateof the servers, and hence, the buffer will always be full. In theother case, the buffer will always contain at most one request..The aggregate service rate will be approximately constant, yetradically different, in both cases. This analysis assumes aconstant individual service rate by independent servers.However, in Cm*, accessing shared data perturbs the perfor-mance of both the servers and the buffer insert/remove oper-ations in nonobvious ways, greatly complicating analyticalmodeling at the queue length and waiting time. As was shownabove, the boundary between the two cases is quite distinct ifcontention is ignored. The experiment will investigate the.boundary in the presence of contention.

To summarize our approach, experiments are described asschemata, and the result of executing a schema is a schemainstance. The primary functions of the IIE are the creation ofschemata, schema management execution, and control ofschemata, along with the creation, management and analysisof schema instances. The next section presents the design ofan IIE supporting these functions.

5


\.

-.

Run-Time System

Fig. 2. 1lE components. The arcs indicate transfer of data or control.

II. DESIGN OF THE IIE

The IIE contains several components: a schema manager,

a run-time environment, an instrumented stimulus and oper-

ating system, a database, and a monitor (see Fig. 2). Themonitor consists of a resident monitor, which gathers the datafrom the system under test, and a relational monitor, whichaggregates and correlates the data into a high-level form. Theuser interacts directly with the schema manager, which com-

municates with the run-time environment and the monitor,which in turn interacts with the instrumented program (thestimulus) and the database. The IIE interacts with the PEthrough the database.The schema manager is responsible for supporting the

schema and schema-instance abstractions. The monitor ini-tializes the schema instance with information specifying thisenvironment, including details on the hardware configuration,the version of the operating system, support software andstimulus, and the values of the parameters to remain constantfor this execution of the schema. The schema manager thencycles through the runs as indicated in the schema, initializingparameters that vary on a per-run basis, starting the stimulus,and collecting the monitoring data. Finally, data concerningthe runs as a whole is collected or computed, and stored in theschema instance for later study. Note that not all the IIEcomponents should necessarily reside and execute in the samemachine. In fact the Cm* IIE implementation spans severalcomputer systems. The run-time system and the stimulus are

resident in Cm*, whereas the schema manager, the databaseand the relational monitor are remotely located in a VAXI 1/780. The two computer systems are connected by an Eth-ernet link.One motivation for partitioning the components of the IIE

into a run-time environment and a remote environment is thatonly the run-time environment is constrained to any particularhardware or software configuration. Care has been taken tomake the remote components as system independent as pos-

sible. Currently, two preliminary implementations exist forthe run-time environment for two different operating systems,while only one implementation of the remote components wasnecessary (see Section III).The stimulus controller component provides a well-defined

interface to the instrumented stimulus. The functions it sup-ports include modifying parameters within the stimulus bothbefore and during the run, generating initial control events forthe stimulus, reporting errors back to the schema manager, andcontrolling the clock. Similarly the resident monitor providesa uniform interface for the relational monitor. The residentmonitor is responsible for enabling and disabling sensors andfor sending the information back to the relational monitor ina format convenient for further processing. The sensors areembedded in the stimulus, in the stimulus controller, in theoperating system, and in the resident monitor itself. The re-lational monitor controls the resident monitor and computesderived information which is then stored in a schema instancein the database.The database serves an important role in the IIE because

the information contained in the database is the end result ofthe entire experimentation process. Additionally, the inter-action between the IIE and the PE occurs via the database byhaving one environment create objects in the database for theother- environment to use. For instance, schemata are initiallycreated in the PE, to be interpreted by the schema manager.Schema instances, created by the IIE, are managed using theversion-control facilities of the PE. By using a common data-base, it is possible to make use of the functionality providedby the PE. This approach allows the designers of an IIE toconcentrate on those operations unique to experiment man-agement.A. The Instrumented Stimulus: Representation andSpecificationThe stimulus is an arbitrary set of processes executing in

parallel. The stimulus itself may incorporate sensors; in ad-

6


dition, sensors reside in the operating system and in the residentmonitor. We have developed tools to aid in the rapid develop-ment of a stimulus. One of them is a workload generator. Auser specifies the behavior of his parallel program in a specialhigh-level behavior-description language, the B-language. Thisbehavior is specified as a directed data flow graph, similar toa complex bigraph [25], [26]. The nodes of the graph representsubtasks, or processes, that execute in parallel with othersubtasks. Each subtask is composed of actions, parametrizedprogram fragments that may be predefined or user-defined,repeated at certain rates. Associated with each arc is a bufferwhich may hold data variables or control tokens flowing fromone subtask to another. Each subtask has an associated controltuple (i, o) where i corresponds to the in-firing rule for thesubtask and o corresponds to the out-firing rule. This set offiring rules characterizes the precedence relationship betweenthe subtasks of the graph. A B-language program is compiledinto an executable version as illustrated in Fig. 3. This sectiongives a brief overview of the B-language; a more detailed dis-cussion can be found in [27].The B-language thus represents the interaction of parallel

processes via the graph model of computation. A typical ex-ample is shown in Fig. 4. Subtask A 1 is fired by the arrival ofa token in buffer B 1 which corresponds to the entry arc of thegraph. Upon completion, subtask A 1 fires either of subtasksA 2 or A 3 by placing control tokens in either buffers B2 or B3,respectively. There is a certain probability associated with theOR-output logic of subtask A 1 (designated by the "+"). Fi-nally, subtask A4 fires if it receives a token either from A2 orA3. Upon completion, it places a token in buffer B6, whichcorresponds to the exit arc of the graph and represents the endof a single execution of the parallel synthetic program. TheB-language subtask declarations for this example are as fol-lows.

SUBTASK Al JINLOGIC: Bi; OUTLOGIC: %40(B2)OR %60(B3)1

SUBTASK A2 fINLOGIC: B2; OUTLOGIC: B41

SUBTASK A3 fINLOGIC: B3; OUTLOGIC: B51

SUBTASK A4 {INLOGIC: B4 OR B5; OUTLOGIC: B6}

Notice that the buffers B1-B6 correspond to the arcs of theparallel synthetic program. The delimiter "%" is used to specifythe branching probabilities for the arcs of an OR output.The specification of parallel synthetic programs in the B-

language is based on the object model supported by both op-erating systems on Cm* [20], [21]. The objects representeddirectly in the B-language include the following.

* The taskforce object: The task force abstraction, a col-lection of processes that cooperate to achieve a single logicaltask, is represented by a set of subtasks.

* The subtask object: This is the sequential computationunit that cooperates with similar user-defined objects tocompute the overall stipulated multiprocess task.

. The buffer object: The buffer object is a conventionalqueue of messages and is used by the subtasks to communicatewith each other.

The semaphore object: Semaphores synchronize requestsfor shared resources.

* Thefile object: Files represent a sequence of bytes.* The shared data object: Variables specified in the shared

data object are globally shared by all the subtasks of the taskforce. This allows communication of data and control throughshared memory.

* The table object: Tables implement functions varyingwith time.

Within a subtask, the basic building block is an action. Tocapture the cyclic nature of synthetic workloads, an action ajitself is described by an action-repetition tuple (specified as(ai, ri)). This tuple specifies that the action as is repeatedsequentially ri times, constituting action aj. An action may bearbitrarily complex, and may be further composed of action-repetition tuples. Also, both the a and the r can be parame-trized. Other control constructs within a subtask includecomposition and conditional and probabilistic branching.The library of actions consists of a collection of predefined

and user-defined program fragments, programmed in thesystems programming language and stored as part of the sys-tem database. Examples of predefined actions include sendingor receiving messages via a buffer, inputting or outputting toa file, referencing local memory, blocking on a semaphore, andaccessing a shared resource. The user gains flexibility by beingable to include his own special program fragment among theactions in the library. An example of a user-programmed ac-tion is the code for a disk process in a database applicationrunning on a specific multiprocessor. Hence, the library ofactions is specific for a particular multiprocessor system. TheB-language should be viewed as a portable framework intowhich system specific actions are inserted from a library ofactions.

Special control constructs are included in the B-languageso that the schema manager may control the user's workloadat run-time as specified in the schema. The control commandsinitiated by the schema manager are executed by the stimuluscontroller component of the run-time system. The VARYconstruct in the language permits the stimulus controller tovary parameters on a per-run basis. The language also allowsone to specify that the parameters are to vary in real time. Thisis accomplished by binding a real-time function to a run-timevariable on a per-run basis. The real-time function is definedby a table object and an associated interval of time. Thestimulus controller forces the run-time variable to take onsuccessive values from the table during successive time inter-vals.

Using the MSGEVENT construct, the language permits thestimulus controller to initiate variable time-driven events inthe stimulus on a per-run basis. This construct requests thestimulus controller to deliver messages to a buffer with inter-message time periods as specified by successive entries of atable. The stimulus controller can associate a different tableobject, or a constant time-period, with the MSGEVENT variableon a per-run basis.

7

IEEE TRANSACTIONS ON COMPUTERS, VOL. C-32. NO. 1, JANUARY 1983

Instrumented

Fig. 3. Stimulus generation steps.

e1

60%

B2 B3

+

+

B5

B6

Fig. 4. A parallel synthetic program-graph representation.

To allow measurement of the generated workload, a specialSENSOR construct permits a user to embed sensors into hisprogram. Sensors allow specified information as well as a ti-mestamp to be sent to the monitor as event records. In additionto user-defined sensors, the B-language program has some

built-in sensors. For example, the start time and end time foreach execution of a subtask are automatically recorded in theevent record. Furthermore, instrumentation available in theoperating system and the IIE run-time system allows theschema manager to access information not explicitly specifiedin the B-language program. An example is information re-

garding the interaction of the stimulus and the operatingsystem.The B-language translator constructs special data structures

allowing the stimulus controller to exercise external controlover the experiment as specified in the B-language program.

TASKFORCE MPXperimentBUFFER

RequestBuffer{ SIZE: 512 }SEMAPHORE

GDSemaphore( INITIAL: 1 }SHARED

GlobalData[512;VARY

RequestPeriod;MSGEVENT

RequestService = RequestBuffer @ RequestPeriodSENSOR

StartGlobal Phase

SUBTASK Serve rs[l. .C]{ INLOGIC : RequestBuffer }

VARYSharedDataAccess

BEGIN<$DoLocalWork : 10>,StartGl obal Phase.($AccessSharedData(GDSemaphor.G1lobalData): SharedDataAccess>

END

Fig. 5. B-language program for the MPX.

The translator also generates sensor descriptions (see SectionII-D) for all programmed and predefined sensors in the B-language program. These descriptions are used by the rela-tional monitor to sort out event records flowing from the res-ident monitor.As an example, consider the B-language program (Fig. 5)

for the single-requester, multiple-server experiment discussedin Section II. The task force consists of an array of five iden-tical server subtasks that wait on the RequestBuffer for queuedservice requests. The RequestBuffer is associated with themessage-event generator via the MSGEVENT construct. Thisallows an experimenter to vary the request rate by changingthe time (RequestPeriod) between successive firing of serverson a per-run basis. The BEGIN and END constructs mark theservice loop of each subtask which is executed each time itsin-firing rule is satisfied. In this example, each server does tenunits worth of work local to its processor, and then does somevariable number of accesses to global data, which is arbitratedby a semaphore. A sensor, StartGlobalPhase, is embedded ineach subtask and sends an event record to demarcate thetransition from local work to global work. Built-in sensorsrecord the begin and end of each subtask and the firing of re-quest tokens by the message-event generator. The variableparameters of the experiment are the number of active servers,the request rate, and the amount of global work done by eachserver. This program will be specified in the schema as thestimulus for the MPX.

8


B. Relational Monitor

In the IIE, each time the experiment schema is interpreted,and the stimulus executed one or more times, various moni-toring information is collected and stored in the database ina schema instance.The model of the monitoring data adopted in the IIE is a

variant of the relational model used in conventional relationaldatabases [28]. Information is recorded as a collection oftwo-dimensional tables, called relations. Each row, called a

tuple, records a particular relationship between entities namedin the columns, called domains, of the tuple. For example, therelation Running (Process, Processor), with two domains, maycontain the tuple (MyProcess, ProcessorA), indicating thatthe process called MyProcess is running on the processor calledProcessorA. Relations used in monitoring are temporal in thateach tuple records relationships that are true at an instanceof time or over some interval of time. A relation involving in-stances of time is called an event relation; each tuple recordsthe occurrence of a particular event. A period relation, on theother hand, records a relationship that exists for an intervalof time. Periods are delimited by events; each tuple (period)in the Running relation is associated whith two other eventtuples, one in the Start relation and one in the Stop relation.Time is included in an implicit domain manipulated by themonitor.The Running relation is an example of a primitive relation

because the information contained in the relation is a directtranslation of a set of recorded events. Primitive relations maybe divided into three categories: operating system, stimuluscontrol, and user-defined. The first category is concerned withinformation involving the operations and data structuressupported by the operating system. The Running relation isin this category. The second category involves the actionsperformed by the run-time portion of the IIE. Examples ofevent relations from the MPX include

* RequestService(TokenID): the sending of a MsgEventtoken to RequestBuffer; the TokenlD identifies the token;

. ServersStart(Index, TokenlD): the in-firing of a sensor'ssubtask; the Index identifies the Server; the TokenID identifiesthe token causing the firing;

ServersEnd(Index, TokenID): the out-firing of a sensor'ssubtask.The one user-defined primitive relation specified in the

MPX, StartGlobalPhase, is also an event relation and containsonly the implicit time domain. This relation was declared as

a sensor in the B-language program for the MPX (see Fig. 5),and records the time at which the Server subtask finished itslocal work and started the shared data access.

Given a collection of primitive relations, new relations can

be defined as a result of operations performed on existingrelations. These derived relations are specified using a rela-tional query language. The query language used in the IIE isa version of Quel [29] augmented with additional temporalconstructs and is discussed elsewhere [30]. Fig. 6 illustratesthe definition of the derived relations AverageQLength andServiceRate used in the MPX. The former relation has one

domain, AvQL, with the tuples specifying this value for the

various time intervals. Similarly, the ServiceRate relation willhave one domain, SRate, containing values varying over time.These queries will be referred to by the schema for the MPX,and will specify both the primitive relations to be monitoredand the calculations to be performed on the data in the eventrecords.

C. Stimulus ControllerThe stimulus controller component of the run-time system

is a set of utilities that permit control of the stimulus as spec-ified in the schema. While the schema manager provides ex-periment management through the management of the schemaabstraction, the stimulus controller provides low-level exper-iment control through the management of a single run. Themotivation was to separate the low-level control functions fromthe experiment-management functions so that differentmanagement strategies could be carried out using commoncontrol primitives. The functions exported by the stimuluscontroller are therefore geared towards the initialization andexecution of a single run.One responsibility of stimulus controller is to ensure the

repeatable behavior of a run by eliminating side effects fromone run that might perturb the next run. An example of aside effect is the presence of tokens left over in the edges(buffers) as a result of the previous run. The stimulus controllerensures that all data structures are in a well defined state atthe beginning of a run. For example, buffers are emptied andall semaphores are initialized as specified in the B-languageprogram.The stimulus controller is also responsible for the variation

of parameters on a per-run basis and in real time during a run.The variation of parameters on a per-run basis involves theVARY parameters of the B-language program (see Section1I-A), and the variation of the graph-structure representationof the program. A typical modification of the graph structureinvolves changing the number of active subtasks for a partic-ular run. This is particularly useful in real-time experimen-tation, where one wants to determine the number of subtasksnecessary to meet real-time constraints. The variation of pa-rameters in real time during a run involves the variation of therun-time variables of the B-language program according tosome function of time expressed as a table object and an as-sociated interval of time.The stimulus controller must provide a well defined mech-

anism to start the run. In the graphical representation of theprogram this corresponds to firing the entry node, that is,placing a token on the entry arc of the graph. To start a run,the stimulus controller delivers a specified number of controltokens into a system-defined buffer, called the IgnitionBuffer,which corresponds to the entry arc of the data-flow graph. Theuser may now use this source of tokens to start any desiredsubtask, by specifying the IgnitionBuffer appropriately in thein-firing rule of that subtask. Similarly, to detect the end ofa run, the stimulus component watches a system-definedTerminationBuffer for a specified number of tokens.The stimulus controller has four major subcomponents. The

first subcomponent executes basic control functions, includinginitialize, to initialize the instrumented program before each

9

IEEE TRANSACTIONS ON COMPUTERS, VOL. C-32, NO. 1, JANUARY 1983

range of R is RequestService : references to R will indicate the; RequestService relation

range of S is StartServers-range of Sp is StopServersdefine WaitingInQueue (R.TokenID) one domain, the request's

TokenIDwhere R.TokenIO = S.TokenID the request is being serviced

by a serverstart R ; the waiting begins when the requeststop S : is made, and ends when the server

startsrange of W is WaitingInQueuedefine, QLength (L = Count(W)) ; count the number of outstanding

; requests in the bufferrange of Q is QLengthdefine AverageQLength (AvQL = Average(Q)) ; instantaneous average

define TotalWaiting (W.TokenID)where Sp.TokenID = W.TokenlDstart W total waiting time begins when thestop SP request was made, and ends when the

server stopsrange of TW is TotalWaitingdefine ServiceRate (SRate = 1 / Average(Duration(TW)))

Fig. 6. Queries for the MPX.

run; fire, to fire a specified number of tokens into the Igni-tionBuffer; vary, to permit the variation of VARY-parameterson a per-run basis; display, to display the value of a vary-parameter; enable/disable, to enable or disable subtasks ona per-run basis; and status, to return the status of the program.Observe that functions such as display and status are inter-active in nature and can be used during the interactive creationof a schema (see Section II-D).

With the second, a message-event generator delivers tokenmessages to prespecified buffers according to prespecifiedfunctions of time. Control functions performed by this sub-component include start generator, to start the message-eventgenerator for a particular run; stop generator; and set messageevent, to allow the association of either a table object or aconstant with a buffer.

With the third, a run-time variable driver ensures that allrun-time variables vary in real time as specified by its associ-ated table object and time interval. The main control functionof this module is to allow the association of different tableobjects and time intervals with a run-time variable on a per-runbasis.

Finally, with the fourth, a clock module permits access toa set of clocks distributed over the system. This module is usedby the message-event generator, the sensors, and the run-timevariable driver.

Additional functionality in the instrumented program maybe added by augmenting the stimulus controller. For example,a set of components used for experimentation related to reli-ability has been designed and partially implemented. Thisincludes software-implemented voters, and acceleratedfault-insertion and configuration-control modules.

D. The Resident Monitor

The monitoring information is collected as event records,generated by sensors in the instrumented stimulus, the run-time system, the operating system, or the hardware. Each eventrecord contains an indication of the operation being monitored,the name of the component performing the operation, and thename of the object the operation is being performed on. Theevent record may optionally contain a timestamp and otherinformation germane to the event. For instance, a sensor lo-cated in a file-system process might generate event records for

file reads. In this case, the event record would include the nameof this process, the name of the file being read, an indicationthat this is a file-read event, the timestamp, and perhaps theblock number being read.

Highly selective filtering of the event records is necessaryto constrain the flow of event records into the monitor. En-abling and filtering directives are encapsulated in data struc-tures called receptacles, associated with either active com-ponents, such as a file-system process, or passive objects, suchas a file. Receptacles contain event-enable switches as well asa buffer for temporarily storing event records. The residentmonitor (and thus, indirectly, the relational monitor) has theability to enable switches in each receptacle. The flexibilityin associating receptacles with either processes or objectsprovides a mechanism for filtering the event records. For ex-ample, if the receptacle was associated with the file, and thefile-read event was enabled, event records for all file readsperformed on the file would be written into the receptacle. Onthe other hand, if the -receptacle was associated with a file-system process, event records for all file reads performed bythe process on any file would be written into the receptacle.A task force is instrumented by specifying the sensors,

events, and object types in a file, called a sensor description.The operating system and stimulus controller, being task forcesthemselves, are also associated with sensor descriptions. Asensor description is generated automatically when a B-lan-guage program is processed. Users may also write their ownsensor descriptions if they so desire. Fig. 7 illustrates the sensordescription generated from the B-language program for theMPX given in Fig. 5. This description includes a sensor-processdefinition for each subtask and for the stimulus controller, andevents for the start and end of execution of each subtask andthe start of each run. Another program takes the sensor de-scription and produces optimized code for each software-implemented sensor, based on the specifications in the sensordescription. Sensor descriptions thus allow users to specify theirown sensors which will utilize the same mechanisms for eventrecord and generation as the sensors embedded in the run-timeand operating systems.

It is important to note that the user never needs to be con-cerned about receptacles or event records. Instead, the IIE(through the monitor component) presents to the user the view

Ilo


; Standard prelude

MsgEvents

; SubTasks

User-defined sensors

Fig. 7. Sensor description for the MPX.

of a database composed of temporal relations. New relationscan be derived using the query language (identified in SectionII-B). As a result of executing a query, the appropriate oper-ations (locating and enabling receptacles, processing eventrecords, and generating the schema instances) are performedautomatically.The use of receptacles and sensors may extend from sensors

implemented in hardware to sensors embedded in the operatingsystem to sensors placed in the user's program. It is the residentmonitor's responsibility to extract the event records from thereceptacle and send them to the relational monitor. By the timethe relational monitor receives the event records, they are inan identical format regardless of how they were generated.

E. Schema Management

The central management and control of the schema and theschema instances is performed by the schema manager.Functions of the schema manager fall into two broad catego-ries: the creation, manipulation, and execution of the schemaand the creation, archiving, and cross analysis of schema in-stances. The schema manager is organized in three mainfunctional parts, as follows.

1) A user interface provides a uniform view of the variouscomponents of the IIE. Schemata can be created using con-ventional text editors, or incrementally, by directing the IIEto perform a series of runs. In the latter case, the correspondingschema and schema instance are automatically generated andarchived. This incremental mode is particularly helpful in thetuning of experiments. The user interface also directly supportsmonitoring queries and database queries thereby allowing auser to manipulate and analyze schema instances.

2) A schema interpreter scans the schema and sends controldirectives to the run-time system, including global initializationcommands for the entire experiment along with commands toset up, start, and terminate each run.

3) A schema-instance generator interacts with the rela-tional monitor to ensure that an instance is created and placed

SCHEMA (<invocation parameters>)<system conf iguration><stimulus><monitoring directives><initial conditions><experiment directives>

END SCHEMA

Fig. 8. High level organization of a schema.

SCHEMA MPX (RequestPeriod, SDA)

SYSTEMCONFIGURATION <configuration data>;

TASKFORCE <8-language program>;

MONITORQUERIES <relational queries>;

RESULTRELATIONS AverageQLength, ServiceRate;

VARY SharedDataAccess[I] = SDA WHERE I FROM 1 TO 5;

VARY NoOfServers FROM 1 TO 5DO

BEGINEXPERIMENTENABLE Server[l] WHERE I FROM 1 TO NoOfServers;TERMINATE AFTER 30 secondsENDEXPERIMENT

ODENDSCHEMA

Fig. 9. The schema for the MPX.

in the database. Both predefined and user-defined relationsare created and stored in the schema instance as a result ofinterpreting the schema.The schema contains all the necessary information to per-

form a complete experiment. It consists of five major compo-nents: the system configuration, the stimulus, monitoring di-rectives, initial experiment conditions, and experiment direc-tives (see Fig. 8). The system configuration completely definesthe environment the experiment is to be performed in. Thestimulus is in the form of a translated B-language programcontaining controlling parameters and data-collection sensorsas described in Section II-A. The monitoring directives are inthe form of a collection of queries as described in Section II-B.The initial experiment conditions consist of a set of invocationparameters and the required resources (i.e., hardware andoperating system configuration and instrumentation, stimulusversion, etc.). Invocation parameters can be used to initializeparameter values for experiments and are typically specifiedat schema interpretation time. The experiment directives areinterpreted by the schema manager and specify how thestimulus should be executed. Specifications are provided forthe iteration of the stimulus over the experiment runs alongwith the variation of parameters for each run.

During schema execution, the relational monitor creates aschema instance to hold the results of the experiment. Themonitor collects all the resulting event records together withthe schema identification and environment information andcreates an object to be managed by the PE. By using standardrelational database queries, the user can then perform analysesacross schema instances. The data in the instance which arecollected automatically provide the user with enough infor-mation to replicate any particular execution of the schema toverify the results.

In order to illustrate the use of schema and schema instance,consider the sqhema describing the MPX, shown in Fig. 9. Theschema has two invocation parameters, RequestPeriod andSDA. The configuration data specify the resources requested

(Taskforce (name MlPExperiment)

(SensorProcess (NJame StimulusControl)

(Event (Name PerRun)-(Domains (Domain (Name RunNumber)

(Type Integer))(Domain (Name RequestPeriod)

(Type Integer))(Domain (Name ServerCount)

(Type Integer)))(Timestamp yes)

(Event (Name RequestService)(Location StimulusControl)(Domains (Domain (Name TokenID)


(SensorProcess (Name Servers)

(Event (Name ServersStart)(Location Servers)(Domains (Domain (Name Index)

(Type Integer))(Domain (Name TokenID)


(Event (Name ServersEnd)

(Event (Name StartGlobalPhase)

I1I


3* s ~~~~~~~~~~~~~~~~~~~~~~~~~~~-- - - - - - 4-- - - - -

' i--6- -4-6-a.-S- -_64 _ @ _ _ 4 _ 4 4a-4---j-4------&---6-4.,~~~~-~.

I

,'

II

' E

a at

II I

'\

tI

. x

1.

v '2~~~-

.~ ~~~~~~~~~~~~~~~~~~~~~~~~~~ 4--.4 - ---

5 10 15 20 25 30 35Time(Seconds)

1---1 1 Server2 ---2 2 Servers3---3 3Servers4- --4 4 Servers---6 5 Servers

Fig. 10. Service rate versus time for a variable number of servers.

by this experiment, including the versions of the operatingsystem and IIE components, the hardware components, datafiles to be read by the stimulus, and initial tests to be used laterto calibrate the results.The experiment directives are in the form of a loop which

generates the execution of five runs. Each run will have its ownvalue for the NoOfServers parameter. The execution of thisschema will terminate when 30 s have passed for each run.

During execution, the sensors implanted in the B-languageprogram will generate data which are collected according tothe monitor queries.

Each time this schema is interpreted, a schema instance willbe automatically created in the database by the IIE. Each in-stance will have the following components

* the date, time, and user identification;* the values of the invocation parameters;* exact version numbers of all software used in the experi-

ment;a detailed description of the hardware configuration;results of the initial tests as specified in the system con-

figuration; and* the system- and user-defined relations (in this case, the

PerRun, AverageQLength, and ServiceRate relations).Once the instances have been created, additional analysis

can be performed on the instances individually or as a group.Fig." 10 shows the relationship between average service rate andtime for a RequestPeriod of 200 ms and a value of Shared

DataAccess of 400 accesses per request. Initially the servicerate is high, since the buffer is empty. For five servers, thebuffer never contains many requests, so the average service rateremains high. However, for less than three servers, the bufferfills up quickly, causing the average service rate to plummet.The behavior with three or four servers is more involved, andfurther analysis is necessary using different values for the re-

quest period and the SDA.

III. IMPLEMENTATION

A. Background

Our research vehicle is the Cm* multiprocessor. Cm* is a50 processor multiprocessor developed and implemented atCarnegie-Mellon University. Two operating -systems, ME-DUSA and STAROS, have been developed for Cm*. In addi-tion, substantial utility software built for Cm* runs on othergeneral-purpose computers.

B. Status

An initial version of the IIE has been partially implementedfor Cm*. Two versions of the run-time system have been de-veloped, one for each operating system [30], [27]. A sub-stantial library of actions has accumulated for both operatingsystems, and work is proceeding on implementing the B-lan-guage translator. An initial version of the relational monitorhas been developed, including the sensor-description of the

S 2.00b.uh.

4)1J.80

CX0Q0a

4)

4 1.60IA.4)

z0'

U 1.20_

4)

1.0OC

.80[

.60[

.401

.20a

0

12

I

I

I

I


processor, although substantial effort is still needed beforegeneral queries and multiple schema-instance analysis can beexecuted [30]. The schema manager is in the final designstages. It is expected that a full implementation of the IIE willbe completed by the end of 1982.

IV. CONCLUSION

The IIE constitutes a systematic approach to the task ofexperimentation on multiprocessors. This approach empha-sizes the integration of the tools used for such experimentationand the development of techniques for experiment manage-ment. The tools incorporated into the initial design of the IIEhave been oriented primarily toward performance measure-ment. Work is proceeding in the area of reliability experi-mentation, specifically to enhance the monitor so that it canfunction across system failures and to implement fault insertioninto the stimulus in a controlled fashion. Another interestinguse of the IIE is in automated testing of revised modules in theframework of version control. Future research areas includethe integration of the IIE with a multiprocessor PE, the in-corporation of hardware monitors and other tools into the IIE,and the development of an IIE supporting experimentation ofreal-time systems.

ACKNOWLEDGMENT

The authors would like to acknowledge the contributionsof some of the concepts, and of the implementation by the othermembers of the Multiprocessor Performance EvaluationGroup: X. Castillo, R. Chansler, I. Durham, P. Highnam, E.Gehringer, and P. Sindhu.

REFERENCES

[I] A. K. Jones and E. F. Gehringer, Eds., "The Cm* multiprocessor project:A research review," Dep. Comput. Sci., Carnegie-Mellon Univ., Pitts-burgh, PA, Tech. Rep., July 1980.

[2] R. J. Swan, "The switching structure and addressing architecture ofan extensible multiprocessor, Cm*," Ph.D. dissertation, Carnegie-Mellon Univ., Pittsburgh, PA, Aug. 1978.

[3] S. H. Fuller and S. P. Harbison, "The C.mmp multiprocessor," Dep.Comp. Sci., Carnegie-Mellon Univ., Pittsburgh, PA, Tech. Rep.CMU-CS-78-148, 1978.

[4] M. Satyanarayanan, Multiprocessors: A Comparative Study. En-glewood Cliffs, NJ: Prentice-Hall, 1980.

[5] S. A. Ward, "The MuNet: A multiprocessor message-passing systemarchitecture," in Proc. 7th Texas Conf. Computing Systems, Nov. 1978,pp. 7-21.

[6] D. Siewiorek, M. Canepa, and S. Clark, "C.mmp: The architecture ofa fault-tolerant multiprocessor," in Proc. 7th Annu. Int. Conf. Fault-Tolerant Computing, June 1977, pp. 37-43.

[7] R. G. Arnold and E. W. Page, "A hierarchical restructurable multi-microprocessor architecture," in Proc. 3rd Annu. Symp. Comput. Arch.,Jan. 1976, pp. 44-45.

[8] T. B. Smith and A. L. Hopkins, "Architectural description of a fault-tolerant multiprocessor engineering prototype," in Dig. Papers, 8thAnnu. Conf. Fault-Tolerant Computing, June 1978.

[9] S. H. Fuller, J. K. Ousterhout, L. Raskin, P. Rubinfeld, P. S. Sindhu,and R. J. Swan, "Multi-microprocessors: An overview and workingexample," Proc. IEEE, vol. 66, Feb. 1978.

[10] 1. Gertner, "Performance evaluation of communicating processes," Ph.D.dissertation, Univ. Rochester, NY, 1980.

[11] H. C. Lucas, Jr., "Performance evaluation and monitoring," Comput.Surveys, vol. 3, pp. 79-91, Sept. 1971.

[12] W. Bucholz, "A synthetic job for measuring system performance," J.IBM Syst., pp. 309-318, 1969.

[13] P. F. McGehearty, "Performance evaluation of multiprocessors underinteractive workloads," Ph.D. dissertation, Carnegie-Mellon Univ.,Pittsburgh, PA, 1980.

[14] K. Sreenivasan and A. J. Kleinman, "On the construction of a repre-sentative synthetic workload," Commun. Ass. Comput. Mach., 1974.

[15] D. Ferrari, "Workload characterization and selection in computerperformance measurement," IEEE Computer, pp. 18-24, July-Aug.,1972.

[16] D. E. Morgan, W. Banks, D. P. Goodspeed, and R. Kolanko, "A com-puter network monitoring system," IEEE Trans. Software Eng., vol.SE- I, Sept. 1975.

[17] L. Raskin, "Performance evaluation of multiple processor systems,"Ph.D. dissertation, Carnegie-Mellon Univ., Pittsburgh, PA, Aug.1978.

[18] H. Kobayashi, Modelling and Analysis: An Introduction to SystemPerformance Evaluation Methodology. Reading, MA: Addison-Wesley, 1978.

[19] G. J. Nutt, "A survey of remote monitors," NBS Special PubI. 500-42,Jan. 1979.

[20] A. K. Jones, R. J. Chansler, Jr., 1. Durham, K. Schwans, and S. R.Vegdahl, "STAROS: A multiprocess operating system for the supportof task forces," in Proc. 7th Symp. O.S. Principles, Sept. 1979, pp.117-127.

[21] J. K. Ousterhout, D. A. Scelza, and P. S. Sindhu, "Medusa: An exper-iment in distributed operating system structure," Commun. Ass.Comput. Mach., vol. 23, pp. 92-105, Feb. 1980.

[22] A. N. Habermann, "An overview of the Gandalf project," Carnegie-Mellon Univ., Comput. Sci., Res. Rep. 1978-1979, 1980.

[23] T. Teitelbaum, "The Cornell program synthesizer: A syntax directedprogramming environment," Dep. Comput. Sci., Cornell Univ., Ithaca,NY, Tech. Rep. TR80-421, May 1980.

[24] R. Medina-Mora and P. H. Feiler, "An incremental programming en-vironment," IEEE Trans. Software Eng., Sept. 1981.

[25] V. Cerf, "Multiprocessors, semaphores and a graph model of compu-tation," Dep. Comput. Sci., Univ. California, Los Angeles, Tech. Rep.7223, Apr. 1972.

[26] K. P. Gostelow, "Flow of control, resource allocation, and the propertermination of programs," Ph.D. dissertation, Univ. California, LosAngeles, Dec. 1971.

[27] A. Singh, "Pegasus: A workload generator for multiprocessors," Master'sthesis, Dep. Elec. Eng., Carnegie-Mellon Univ., Pittsburgh, PA,1981.

[28] J. D. UlIman, Principles ofDatabase Systems. Potomac, MD: Com-puter Science Press, 1980.

[29] M. Stonebraker, E. Wong, P. Kreps, and G. Held, "The design andimplementation of INGRES," ACM TODS, vol. 1, pp. 189-222, Sept.1976.

[30] R. Snodgrass, "Monitoring distributed systems," Ph.D. dissertation,Carnegie-Mellon Univ., Pittsburgh, PA, to be published, 1982.

Zary Segall received the E.E. degree from Poli-technical Institute, Bucharest, Romania, and theM.Sc. and Ph.D. degrees in computer science fromTechnion, Haifa, Israel.

Presently, he is on the faculty of the Depart-ment of Computer Science, Carnegie-Mellon Uni-versity, Pittsburgh, PA, and is the head of theCm* project. His research interests are in theareas of parallel computation, fault tolerance, per-formance evaluation, and design automation.

Ajay Singh received the B. Tech. degree in electri-cal engineering from the Indian Institute of Tech-nology, Kankur, in 1980, and the M.S. degree incomputer engineering from Carnegie-Mellon Uni-versity, Pittsburgh, PA, in 1981.He is currently working with the Operating Sys-

tems Group at Tandem Computers Inc., Cuperti-no, CA. His professional interests include distrib-uted operating systems and performance evalua-tion.

1 3

IEEE TRANSACTIONS ON COMPUTERS, VOL. c-32. NO. 1, JANUARY 1983

Richard T. Snodgrass (S'79-M'81) received theB.A. degree from Carleton College, Northfield,MN, in physics, and the M.S. degree from Carne-gie-Mellon University, Pittsburgh, PA, in comput-er science. He is currently completing his studiesfor the Ph.D. degree at Carnegie-Mellon Universi-ty, in the area of monitoring distributed systems.

At present he is an Assistant professor at theUniversity of North Carolina, Chapel Hill. His in-terests include programming environments, data-bases, expert systems, and user interfaces.

02§, Anita K. Jones (SM'82) received the Ph.D. degreefrom Carnegie-Mellon University, Pittsburgh,PA, in 1973.She is presently an Associate Professor with the

Department of Computer Science, Carnegie-Mel-lon University, and a Senior Scientist with TartanLaboratories Inc., Pittsburgh, PA. She has been amember of program committees for several recentconferences, as well as serving as the OperatingSystems Editor for the Communications of theAssociation for Computing Machinery. She is

also the Editor-in-Chief of the ACM Transactions on Computer Systems,which will commence publication in 1983. Her professional interests includeoperating systems, distributed systems, protection, parallel algorithms, andthe manufacture of software.

Dr. Jones is a member of the Association for Computing Machinery, SigmaXi, and the IEEE Computer Society.

Daniel P. Siewiorek (S'67-M'72-SM'79-F'81)g was born in Cleveland, OH, on June 2, 1946. He

received the B.S. degree in electrical engineeringE (summa cum laude) from the University of Michi-

gan, Ann Arbor, in 1968, and the M.S. and Ph.D._ degrees in electrical engineering (minor in com-_ puter science) from Stanford University, Stan-

ford, CA, in 1969 and 1972, respectively.He joined the Departments of Computer

Science and Electrical Engineering, Carnegie-Mellon University in 1972, where he is presently

Professor of Computer Science and Electrical Engineering. While at Carne-gie-Mellon University, he helped to initiate and guide the Cm* project thatculminated in an operational 50 processor multi-microprocessor system. Healso participated in the Army/Navy Military Computer Family (MCF)project to select a standard instruction set. The ISP (Instruction Set Processor)hardware description language was extensively developed to support the MCFprogram. He also directed the construction of C.vmp, a triply redundant faulttolerant computer, and formulated a hardware design automation project.His current research interests include computer architecture, reliabilitymodeling, fault-tolerant computing, modular design, and design automation.He has served as Associate Editor of the Computer Systems Department ofthe Communications of the Association for Computing Machinery and iscurrently serving as Chairman of the IEEE Technical Committee on FaultTolerant Computing.

Dr. Siewiorek was recognized with an Honorable Mention Award asOutstanding Young Electrical Engineer for 1977, given by Eta Kappa Nu(National Electrical Engineering Honorary Society). He was elected a Fellowof the IEEE in 1981 for "contributions to the design of modular computingsystems." He is a member of the Association for Computing Machinery, TauBeta Pi, Eta Kappa Nu, and Sigma Xi.

An Integrated Instrumentation Environment for Multiprocessors

Documents

Transcript of An Integrated Instrumentation Environment for Multiprocessors