Independence and mappings in model-based decision support systems

18
Decision Support Systems 10 (1993) 341-358 341 North-Holland Independence and mappings in model-based decision support systems Richard G. Ramirez Iowa State University, Ames IA, USA Chee Ching and Robert D. St. Louis Arizona State University, Tempe AZ, USA Independence of applications from logical and physical data structures is one of the cornerstones of modern database systems. Similar concepts may be applied to model manage- ment in a decision support system (DSS) to facilitate model portability, sharing, and multi-purpose application. In this paper we define the concepts of model/data and model/ solver independence, present an extended architecture for DSS, and show its implementation. The architecture supports separate solver, model, and data bases and uses mappings to integrate them. Computationally-equivalent solvers support portability, while non-computationally-equivalent solvers al- low a model to be used without modification for different purposes (what if, goal seeking, optimization). The implemen- tation integrates an SQL database system with a mathematical modeling language. Keywords: Model management; Linear programming; Data independence; Structured modeling; Decision sup- port systems Chee Ching is an Assistant Professor of Decision & Information Systems in the College of Business at Arizona State University. She received her Ph.D. degree from Purdue University in 1988. Her research interests are organizational computing and model management systems and has pub- lished research on coordination and organizational learning, and dis- tributed decision making. Correspondence to: Dr. Richard G. Ramirez, College of Business, Iowa State University, Ames, IA 50011-2065, USA. I. Introduction A typical implementation of a decision support system (DSS) is a single package that integrates the functionality of the three subsystems in the traditional framework for DSS: Database, model base, and user interface or dialog. Popular sys- tems such as Lotus 1-2-3 and IFPS use this ap- proach. Users learn a single language and need not be aware of the DSS internals. This approach is very effective in many applications, particularly those developed from scratch or involving a single user. As decision problems increase in complexity, the single-package approach becomes less effec- tive and the DSS must support the data compo- Richard G. Ramirez is an Assistant Professor of Information Systems at Iowa State University. He received his Ph.D degree from Texas A&M in 1986. His current research interests are the integration of mathematical programming languages and data- bases for decision support systems. He has published research on rela- tional views and expert databases. He is a member of ACM, IEEE Com- puter Society and TIMS. Robert D. St. Louis is an Associate Professor of Decision & Information Systems in the College of Business at Arizona State University. He received his Ph.D degree from Purdue Univer- sity in 1912. Dr. St. Louis currently is conducting research in the areas of productivity measurement, MIS de- sign and DSS design. He has pub- lished articles in a variety of journals, including the Academy of Manage- ment Journal, Industrial and Labor Re- lations Review, and the Journal of Hu- man Resources. 016%9236/93/$06.00 © 1993 - Elsevier Science Publishers B.V. All rights reserved

Transcript of Independence and mappings in model-based decision support systems

Decision Support Systems 10 (1993) 341-358 341 North-Holland

Independence and mappings in model-based decision support systems

Richard G. Ramirez Iowa State University, Ames IA, USA

Chee Ching and Robert D. St. Louis Arizona State University, Tempe AZ, USA

Independence of applications from logical and physical data structures is one of the cornerstones of modern database systems. Similar concepts may be applied to model manage- ment in a decision support system (DSS) to facilitate model portability, sharing, and multi-purpose application. In this paper we define the concepts of model/data and model/ solver independence, present an extended architecture for DSS, and show its implementation. The architecture supports separate solver, model, and data bases and uses mappings to integrate them. Computationally-equivalent solvers support portability, while non-computationally-equivalent solvers al- low a model to be used without modification for different purposes (what if, goal seeking, optimization). The implemen- tation integrates an SQL database system with a mathematical modeling language.

Keywords: Model management; Linear programming; Data independence; Structured modeling; Decision sup- port systems

Chee Ching is an Assistant Professor of Decision & Information Systems in the College of Business at Arizona State University. She received her Ph.D. degree from Purdue University in 1988. Her research interests are organizational computing and model management systems and has pub- lished research on coordination and organizational learning, and dis- tributed decision making.

Correspondence to: Dr. Richard G. Ramirez, College of Business, Iowa State University, Ames, IA 50011-2065, USA.

I. Introduction

A typical i m p l e m e n t a t i o n o f a d e c i s i o n s u p p o r t

sys t em ( D S S ) is a s ing le p a c k a g e tha t i n t e g r a t e s

t he func t i ona l i t y o f t h e t h r e e subsys tems in t h e

t r a d i t i o n a l f r a m e w o r k for D S S : D a t a b a s e , m o d e l

base , and u s e r i n t e r f a c e o r d ia log . P o p u l a r sys-

t e m s such as L o t u s 1-2-3 and I F P S use this ap-

p roach . U s e r s l e a rn a s ingle l a n g u a g e and n e e d

no t be a w a r e o f t h e D S S in te rna l s . Th i s a p p r o a c h

is ve ry e f f ec t i ve in m a n y app l i ca t i ons , pa r t i cu l a r ly

t hose d e v e l o p e d f r o m sc ra t ch o r invo lv ing a s ingle

user .

A s dec i s ion p r o b l e m s i nc r ea se in complex i ty ,

t he s i n g l e - p a c k a g e a p p r o a c h b e c o m e s less e f fec -

t ive and t h e D S S m u s t s u p p o r t t he d a t a c o m p o -

Richard G. Ramirez is an Assistant Professor of Information Systems at Iowa State University. He received his Ph.D degree from Texas A&M in 1986. His current research interests are the integration of mathematical programming languages and data- bases for decision support systems. He has published research on rela- tional views and expert databases. He is a member of ACM, IEEE Com- puter Society and TIMS.

Robert D. St. Louis is an Associate Professor of Decision & Information Systems in the College of Business at Arizona State University. He received his Ph.D degree from Purdue Univer- sity in 1912. Dr. St. Louis currently is conducting research in the areas of productivity measurement, MIS de- sign and DSS design. He has pub- lished articles in a variety of journals, including the Academy of Manage- ment Journal, Industrial and Labor Re- lations Review, and the Journal of Hu- man Resources.

016%9236/93/$06.00 © 1993 - Elsevier Science Publishers B.V. All rights reserved

342 R.G. Ramirez et al. / Independence and mapping in model-based DSS

nent as a separate system, so that DSS applica- tions now run concurrently with 'pure ' database applications. Many databases contain a large number of data files, some of them very large, and users increasingly need to access corporate data and share results with other users and appli- cations. Commercial systems respond to this need by providing interfaces to the more popular DBMSs, such as 1-2-3 links to ORACLE, IN- FORMIX, and dBase IV, SAS interfaces to IMS and DB2, and IFPS interfaces to ORACLE. This approach has the advantage of bringing to the DSS all the capabilities of a generalized database management system. Note, however, that most of these interfaces do little other than upload and download data, and the DSS must still include the data management subsystem.

As important as shared access and full DBMS functionality are, the separation of the data com- ponent from a DSS has far more important con- sequences for DSS researchers and users. Since the DSS no longer 'owns' the database, interfaces (or mappings) must be designed to relate the dialog and model components to the database. Bonczek et al. [2] introduced three interfaces: The u se r /mode l interface, the mode l /da ta inter- face, and the use r /da ta interface. Sprague and Carlson [20] identified mode l /da ta and m o d e l / dialog interfaces. These interfaces were, however, proposed as part of a more general framework and did not provide enough guidelines for imple- mentation. Liang [14] proposed model /da ta , schema/model , and mode l / too l 'links' in more detail. He used an "external model schema" to define a model's inputs and outputs. Inputs are described using a relational table with two columns: Model name and " input" variable. A similar table defines outputs. By consistently us- ing the same names in models and database ta- bles, it is possible to map model variables to database tables and columns.

A distinction is made between a model (or model schema) and the data values that instanti- ate elements in the model such as coefficients, parameters, and variables. A model together with these data values is a model instance. The data values are part of a dataset stored as a collection of tables or files in a database. The separation of model and dataset specifications is referred to mode l /da ta independence [9,3,16]. We propose the notion of mappings to allow models to be

reused without change with multiple datasets, possibly having different file formats, and also allow a dataset to be used with multiple models and computer programs. Models can be more easily integrated because the outputs of one model can become the inputs of another without changing any model specifications. Different lev- els of mode l /da ta independence exist. We char- acterize them as value, dimension, and data structure independence in section 2.

We show how the concept of independence is carried over to solvers. A solver is a computa- tional procedure (e.g., a computer program) used to apply the model to some dataset. In systems with little or no independence such as spread- sheet software, the program becomes the model and the model becomes the program. Because programs must be updated continuously to reflect current assumptions and obtain new results, the program's code often replaces other formulations of the model as a communication tool. It is not uncommon in practice to find discussion about the model being made by looking at the program's listing, and not at an algebraic or graphical repre- sentation. This situation forces modelers and users to become 'programmers' and to depend on a particular software package. Such dependence limits model sharing to those users having access to the same package, and model integration to models written using the same package. A higher level of model / so lver independence allows a model to be used for different purposes without changing the model representation. The same model may be used in optimization mode to estimate the 'best ' production schedule for a fac- tory, in "what-if" mode to determine the effect of specific seheduling decisions, and in "goal-seek- ing" mode to determine the necessary actions to reach predetermined production goals. These and other modes may be necessary in a single session, as the decision maker tries to understand the situation in the real world and explores alterna- tives.

This paper presents the data and algebraic management system (DAMS) that implements mode l /da ta and model / so lver independence. Flexible mappings between solvers, models, and datasets are defined. DAMS is built on top of the INGRES relational database system and provides model and data management facilities for a DSS. DAMS supports the S M / D B modeling language

R.G. Ramirez et al. / Independence and mapping in model-based DSS 343

and a superset of SQL to provide model and data management capabilities.

The following section defines model/data and model/solver independence. The DAMS system is introduced in section 3, and the SM/DB lan- guage for modeling is described in section 4. In section 5 the differences between model defini- tions in SM/DB and SML [8,11] are briefly de- scribed. Finally, we conclude the paper and dis- cuss further research directions in section 6.

2. Model independence from data and solvers

A model representation is a computer-readable formalization of a user's problem [after 13]. Mod- els have different representations according to their use. Our use of the term corresponds to the modeler's form of Fourer [6]. This form is meant to be used by people and must be must be under- standable, concise, general, and symbolic [6]. A model representation includes entities such as variables, constants, parameters, and coefficients. The solver's form (or algorithm's form), on the other hand, is convenient (to the solver) rather than understandable, redundant rather than con- cise, specific rather than general, and numeric rather than symbolic. The solver's form should be obtainable by a translation process that instanti- ates entities in the modeler's form with values stored in a dataset.

Independence from actual data and solver sep- arates the modeler's form from the solver's form. In this section, we first discuss model/data inde- pendence giving the necessary conditions and il- lustrate using algebraic notation for a linear pro- gram. Second, we discuss model/solver indepen- dence, giving the necessary conditions and illus- trating them with examples from a few popular modeling languages.

2.1. Model/data independence

In its simplest form, model/data indepen- dence means that data can be stored in files external to the model. With the exception of spreadsheet software such as Lotus 1-2-3 and EXCEL, most modeling software provides this simple form of model/data independence, allow- ing the data to reside in either an internal file (managed by the modeling system), an external

file (managed by the operating system), or files in a database system such as DB2 or ORACLE. Unfortunately, most modeling software requires that the data files be designed specifically for the model being processed, and thereby impose con- straints on the format of the files. For example, a model for sales commissions may require a 12- month sales history to be stored as 12 values per record. Alternatives such as one record per month or one file per month are not allowed or require modifying the model representation. This is a very limited form of model data independence.

2.2. Sufficient conditions for model/data indepen- dence

We define model/data independence by pos- tulating value independence, dimension indepen- dence, and data structure independence as suffi- cient conditions. The first two are adapted from Geoffrion [9]. If these conditions are satisfied, then any dataset in a DSS may be modified or replaced without affecting the model representa- tion, and the model representation can be changed without affecting the dataset.

Value independence This condition states that the data values for a variable or a constant can be changed without affecting the model representa- tion. In a model to compute student grades, value independence implies that the names of the stu- dents, the assignment scores of the students, and the weighting of the students' assignment scores can be changed without affecting the model rep- resentation.

Dimension independence This condition states that the number of variables and the number of values a variable takes on can be changed without affecting the model representation. In the grade book example, dimension independence implies that the number of students and the number of assignments can be changed arbitrarily without having to change the model.

Data Structure Independence, the third condition, requires the following:

(1) Separate specification of names and types Variable names and data type (e.g., integer, deci- mal) of entities in the model can be different

344 R.G. Ramirez et al. / Independence and mapping in model-based DSS

from the ones in the dataset. A mixed integer programming model may specify some of its vari- ables as integer, but all variables may be stored as character strings. It is possible that the conver- sion between data types may cause some integrity problems when the model is solved. However, this condition stipulates that models should not dictate the storage form of data values for model /da ta independence to exist. (2) Separate specification of logical record and file structures The model representation imposes no constraints on the format of the files in the database. As a consequence, the record structure for an entity in the model can be very different from its counterpart in the dataset. A matrix in a model may be stored in the database using a single record for the entire matrix, a record for each row of the matrix, or a record for each element of the matrix. (3) Separate specification of base entities Entities in the model do not necessarily correspond to physically stored ('base') entities in the database. Values of entities in the model may come from 'virtual' records generated at run time. For exam- ple, entities in the model can be obtained by accumulating values from multiple records, such as total product sales from individual invoices.

2.3. An illustration of model/data independence

The following examples show algebraic repre- sentations of a model with varying degrees of model /da ta independence.

No model/data independence No separate dataset exists. The model representation com- bines the model and specific data values. The dataset cannot be modified without also changing the model representation.

max Z = 3 x + 5 y ,

s.t. 5x + 2y _< 12,

7x - 3y < 15,

x, y > 0 .

Value independence only The model and the dataset exist separately. The dataset can be modi- fied without affecting the model representation. However, the dimensions of the model are re- stricted to the limits shown in the model repre-

sentation, i.e., only two variables and two con- straints. max Z=ClXl q-C2X2, s . t a l l x l - t - a l 2 x 2 < b l ,

a21x 1 -{-- a 2 2 x 2 ~ b 2 ,

x1 , x 2 >~ 0 ,

C=(~) , a = ( ~ 2 3 ) , b = ( l 1 2 ), x = ( ~ ) .

Dimension and value independence The model representation exists separately from the dataset and allows an arbitrary number of variables and constraints. Although the dataset is similar to the previous case, additional variables and con- straints can be added (or dropped) without hav- ing to change the model representation. The rep- resentation in this example is fully-dimension in- dependent since there can be an arbitrary number of both variables and constraints.

n

max ~ CjXj, j=l

s.t. ~ asjxj <bi, i = l , . . . , m , j=l

X 1, X2, . . . ,Xn~-~O,

Data structure independence This condition of independence allows any file format for storing the dataset. The model representation and data in these examples are the same as for dimension and value independence, only the file formats used for the dataset vary. A model representation that explicitly or implicitly requires a fixed format fails to satisfy data structure independence. (1) Store each vector and matrix in a separate file.

c Column 1 Column 2 b

3 5 2 12 5 7 - 3 15

(2) Store all vectors and matrices in a single file.

Vector Column 1 Column 2

c 3 c 5 a 5 a 7 b 12 b 15

2 - 3

R.G. Ramirez et al. / Independence and mapping in model-based DSS 345

(3) Store vectors e and b together in one file. Store matrix a in a separate file, one element per record with explicit subscripting.

Vector Value

c 3 c 5 b 12 b 15

Matrix a

Row Column Value

1 1 5 1 2 2 2 1 7 2 2 - 3

Typical purposes are what-if (or "evaluation"), goal-seeking, simulation, optimization, and oth- ers. We postulate selection, purpose, and repre- sentation independence as sufficient conditions for mode l / so lver independence. If these condi- tions exist, then the solver may be modified or replaced without affecting the model, and the model may be fully utilized with any given set of solvers.

Representation independence This condition states that the model representation cannot be affected by changes in the solver's code. It re- quires the model representation to be separated from the solver, and makes it possible to modify the solver's code in any manner without affecting the model representation.

2.4. Model / solver independence

A solver is a computer program in a language suitable for direct execution, or execution after automatic compilation, interpretation, a n d / o r linking. Two solvers are computationally equiva- lent if they provide essentially the same results and allow one to substitute for the other. Solvers that are strongly computationally equivalent can be substituted for one another transparently to the user. For example, the new release of a solver may have a faster algorithm for matrix inversion but still provide identical numerical results. Or a solver may run on a Cray supercomputer using floating-point vector processing, while an other- wise identical version may run on an IBM PC using software emulation for floating point com- putations. Solvers that are strongly computation- ally equivalent may produce results thar differ in numeric precision. In practice, many solvers are weakly computationally equivalent, and obtain re- sults that are similar in purpose but not in form. For example, S A S / O R , LINDO, and MPSX all solve linear programming problems represented in the MPS format but do not provide outputs for exactly the same variables and their listings are very different in form.

Model / so lver independence allows a model instance to be processed without modification using solvers that are not necessarily computa- tionally equivalent. This facilitates model porta- bility, sharing and multi-purpose application.

Selection independence This condition states that the model representation imposes no constraints on the choice of the solver. This makes it possible to process the model using any one of a collection of solvers, choosing the more appropriate to the particular situation.

Purpose independence This condition states that the model representation imposes no constraints on the "direction of computation" by describing relationships among model elements rather than computational steps. This makes it possible to process the model in any feasible mode that is useful to the decision maker.

Representation independence requires solvers and models to be represented and manipulated as two different entities. Changes made to the solver should not necessarily require a change in the model.

Selection independence extends representa- tion independence by allowing multiple solvers for the same model. The model representation may not, implicitly or explicitly, require a particu- lar solver. Note that representation independence alone is not enough to allow a substitution of solvers that are not strongly-computationally equivalent, since a model may still require a specific solver to be used.

Purpose independence allows users to explore relationships among model elements in multiple modes such as "what- i f ' , "goal-seeking", or "op- timization" provided the requisite data and

346 R.G. Ramirez et al. / Independence and mapping in model-based DSS

solvers exist. Purpose independence relates to the concept that a model representation should not specify the time of instantiation of variables and parameters. That is, it should not classify its entities as being part of input or output subsets. An entity is in the input subset if it must be instantiated before the solver can be used to process a model instance. The output subset is formed by those entities that are instantiated as a result of processing the model instance with a solver.

2.5. An illustration o f mode l / so lver independence

ment D E N O M = (1 + I) • * N is changed to DE- N O M = (1 + I / 1 0 0 . 0 ) * * N (to allow percent- ages to be typed as integer), the model represen- tation has changed. Because there is no separate model representation, there also is no choice with respect to the solver, and even the uses of the model are limited. The solver can only compute the value of P given F, I and N. It is not possible to use this model representation to com- pute the value for I given values for P, F and N. This example shows no model / so lver indepen- dence: None of the necessary conditions being satisfied.

We first illustrate mode l / so lver independence with dedicated modeling systems written in For- tran and similar programming languages. These systems make no attempt to explicitly separate model and solver. We show how, statements in the solver for functions such as reading external data, printing reports, displaying results on the screen, or controlling the flow of execution, pre- vent the solver from adequately representing the model. We then discuss two levels of modeling systems that include packages such as SAS and SPSS and "executable modeling languages" such as GAMS and IFPS. We show that even in this cases the amount of information related to com- puter execution detracts considerably from a straightforward model specification.

Modeling with programming languages Conven- tional programming languages make no distinc- tion between the notions of model and solver. Consider a financial model to obtain the present value of a future payment, with the following algebraic formulation: P = F / ( 1 + i) n where P is the present value, F is the future payment in dollars, i is the annual interest rate (a percent- age), and n is the number of years. Consider now the following Fortran program to describe and solve the model.

DOUBLE PRECISION P, F, I, DENOM READ F, I, N DENOM = (1 + I)* * N P = F / D E N O M PRINT P END

In this example there is no separate representa- tion of the model being manipulated. If the state-

Modeling with subroutines Subprogram facilities may be used in conventional programming lan- guages to isolate the model specification from other statements and provide representation in- dependence. The following Fortran program uses a function subprogram to describe the present- value model while the main program deals with reading and printing data. The program also al- lows an arbitrary number of present-value prob- lems to be solved.

D O U BLE PRECISION P, F, I 100 READ (INPUT, *, END = 900) F, I, N

P --- PRESVAL (F, I, N) PRINT P GO TO 100

900 STOP END

D O U BLE PRECISION FUNCTION PRES- VAL (F, I, N) DOUBLE PRECISION F, I PRESVAL = F / (1 + I ) , • N R E T U R N END

Although the model specification appears in a separate subroutine, there is still no model repre- sentation apart from the code of the subroutine. For simple models such as this one, the subrou- tine code may serve as the model representation. For models of realistic complexity, the code be- comes so convoluted and difficult to follow that the model representation is lost. This is the case for subroutine libraries such as IMSL and IBM's OSL. Such libraries provide solvers for com- monly-used models, while leaving the coding of input-output routines and other environmental

R.G. Ramirez et al. / Independence and mapping in model-based DSS 347

procedures to the users. In practice, these li- braries cannot be used as model representations at all and their vendors do not offer them as such.

Using subprograms to represent models also fails to provide purpose independence since the "direction of computing" is predefined. Conse- quently, the solver still can only compute the value of P given F, I and N. It is not yet possible to use the model to compute the value for I given P, F and N.

High leL,el languages. High-level languages pro- vide a higher degree of model/solver indepen- dence than conventional programming languages. These languages attempt to provide a model specification as simple as possible, and also to require minimal programming knowledge on the part of the user. Consider for example the regres- sion model Y/=/30 +/31X l +/32X2 + ' ' ' +/34X4 + e i and the SAS statements that compute the/3 coefficients for the 'best' subset of the X's.

DATA; INPUT Y X1 X2 X3 X4; CARDS; 10 1 2 4 5 6 7 4 9 1 2 1 0 3 7 6 0 9 2

PROC STEPWISE; MODEL Y = X1 X2 X3 X4/forward;

The forward specification indicates the variable selection method that will be used to obtain the 'best' subset of X's. In addition to forward, SAS provides backward, stepwise, maxr, and minr. Each variable selection method is a distinct solver.

Some separation between models and solvers is achieved in high-level languages. Unfortu- nately, solver choices are limited to the subrou- tines available in the system, typically only one for each possible case. Moreover, although the model specification appears as a separate state- ment, there is no model representation apart from the language statements (the SAS code in the example) that very often include input and output options that are not part of the model.

Modeling languages Modeling languages, such as GAMS and IFPS, attempt to separate the model from the solver and thereby achieve some pur- pose independence, i.e., they do not predefine

the direction of computation. Modeling languages differ from "high-level languages" in that they attempt to provide a complete description of the model rather than just simplify the programming task. In GAMS the user defines a model as a collection of "equations", with each equation de- fined on some "variables"• To solve a model, the user explicitly states both the model and the "solution procedure" (i.e., the solver) to be used. For example, the following statements define a transportation model (variable and other defini- tions have been omitted for brevity)•

(variable, set, and parameter definitions)

EQUATIONS COST the objective function SUPPLY(I) supply from plant I DEMAND(J) demand at location J

COST..Z =E= SUM((I, J), C(l, J ) . X(I, J)); SUPPLY (I)..SUM(J, X(I, J)) =L= A(I); DEMAND (J)..SUM(I, X(I, J)) =G= B(J); MODEL TRANSPORT/COST, SUPPLY,

DEMAND/;

To solve this model, the user issues the statement

SOLVE transport USING p MINIMIZING z;

where lp is the name of a solution procedure (i.e., the solver). GAMS provides lp for linear programming, nlp for nonlinear, mip for mixed integer, and rmip for relaxed mixed integer pro- gramming.

IFPS explicitly allows the use of a model for different purposes. The SOLVE command evalu- ates all functions and formulas for a given set of inputs. The WHAT-IF option of SOLVE allows temporary modification of formulas and variables in the model. The GOAL-SEEKING option al- lows the user to specify values for the 'output' variables, IFPS then determines the values of the input variables necessary to obtain those outputs.

The separation of model and solvers in model- ing systems such as GAMS and IFPS still leaves the following problems unsolved:

(1) As a single modeling paradigm is assumed, extensions are often awkward. For example, GAMS assumes mathematical programming and IFPS assumes a spreadsheet format. The model representation uses a specialized language that, because of the single-modeling paradigm, often

348 R.G. Ramirez et al. / Independence and mapping in model-based DSS

includes assumptions, defaults, and terminology that may differ considerably from common usage in other paradigms. (2) Only those solvers provided in the system or especially-written are available. Thus, while the GAMS program may be augmented with user- written solvers (not a trivial task), it cannot easily utilize SAS or non-customized Fortran subrou- tines as additional solvers. (3) In the interest of increased modeling power, modeling languages allow the mixing of data an model manipulation statements together with model definition statements. A consequence is that a 'model' in these languages usually includes not only the model itself but also statements to read data from external files, solver the model, manipulate the results provided by the solver, and display the outputs. Models begin to resem- ble conventional programs.

For these reasons, unless very strict program- ming (modeling) discipline is enforced, model representations defined in languages such as GAMS and IFPS should be considered computer programs rather than model specifications. This is perhaps the reason that GAMS uses the term "program" to refer to a complete set of state- ments, which include one or more "model" state- ments [4].

3. The DAMS data and model management sys- tem

The Data and Algebraic Management System, DAMS, is an integrated data and model manage- ment system that supports model/data and model/solver independence [18]. In particular, DAMS provides:

(1) mappings to support model/data and model/solver independence;

(2) multi-purpose use of models through a separate solver base and a modeling language that is independent of solvers;

(3) a manipulation language that adds model- specific operations to a generalized DBMS;

(4) an implementation architecture that allows compatibility with commercial database and mod- eling software.

3.1. The DAMS languages

At the heart of DAMS are the sublanguages shown in fig. 1. All sublanguages share a common

~M MDL 1 Modal DeflnlUon |

Sublanguaga J DSL

Model/Data | Sublanauage J

MML 1 odel Manipulation I

$ublanguaga J

1

Fig. 1. The DAMS languages.

syntax and a common programming interface. They can be mixed arbitrarily in an interactive session or invoked from a program by making calls to C subroutines.

DAMS uses SQL and SQL/OBJ as its database languages. In DAMS, a database state- ment that is not a SQL/OBJ statement is, by default, a "host SQL" statement. Host SQL statements are not processed directly by DAMS but only passed on to the host DBMS. Outputs from the host are displayed by the DAMS inter- face. SQL/OBJ is a nested relation extension to SQL [15] that supports matrices, arrays, and nested relations. These extensions facilitate the manipulation of data such as the hierarchical dataset for steel mills in Fourer [7].

The modeling language, SM/DB (structured modeling/data base) [19], supports a modeling environment. Table 1 summarizes the SM/DB statements. The MDL sublanguage allows the definition of new models. DSL defines model instances and maps them to relational tables. DSL also facilitates the browsing and display of tables containing model data. MML provides commands to solve models. The DISPLAY state- ments are a subset of MML that provide access to model information without using SQL.

3.2. Synergy of integration

The integration of modeling and database lan- guages provides DAMS with considerable syn- ergy. One aspect of this synergy relates to two extreme approaches to modeling and application development, shown in table 2. The model-centric

R.G. Ramirez et al. / Independence and mapping in model-based DSS 349

Table 1 S M / D B sta tements

MDL sta tements

DSL s ta tements

MML sta tements

DISPLAY sta tements

C R E A T E / D R O P M O D E L A L T E R M O D E L C R E A T E SOLVER REP C R E A T E / D R O P E X E C U T A B L E

C R E A T E / D R O P INSTANCE A L T E R INSTANCE V A L I D A T E INSTANCE V A L I D A T E GENUS ST OR E GENUS

E V A L U A T E M O D E L E V A L U A T E INSTANCE E V A L U A T E GENUS SOLVE

DISPLAY GENUS DISPLAY M O D E L DISPLAY INSTANCE DISPLAY STATUS

approach assumes that the model is the object of interest and that data exists only as it pertains to the model; the modeling language provides state- ments to manipulate the data. Two models share data only by replicating it. The data-centric ap- proach takes the opposite view: Data is the object of interest and is shared by multiple models. No model 'owns' the data; models are treated as 'programs' that are external to the database. A data language exists rather than a modeling lan- guage.

DAMS supports the data- and model-centric approaches plus any mixture of both. DAMS makes no explicit difference between both ap- proaches, and allows them to be mixed at any time. A modeler interested in a single model will

follow the model-centric approach, without con- cern for the structure of the database or even for the existence of a database system. DAMS will automatically generate the table formats and translate references to model variables to database elements.

The data-centric approach is mostly beneficial to users of multiple models based on mostly the same data a n d / o r models with large datasets that require extensive manipulation and reporting. The data-centric approach is supported by allowing relational tables to be created and manipulated independently from any model. The user must specify the mappings between model elements and relational tables.

3.3. The implementation environment of DAMS

DAMS is implemented using a relational database system (DBMS). All database query lan- guages and interfaces (such as query by forms) are available to the user. The relational system is called the "host" system. DAMS is similar to GPLAN [1] in that both allow the host DBMS query language to be used to manipulate the data used in models. Fig. 2 shows the architecture of DAMS. In the current implementation the host is INGRES version 6.03 running on a VAX under VMS. DAMS is written in C and uses embedded SQL to access the host DBMS.

IDAMS, or interactive DAMS, is the compo- nent that deals with interactive users. Conceptu- ally, its functions are very simple. It reads S M / D B and SQL statements from the user and sends them to the language processor. It then displays whatever output is received from DAMS. IDAMS

Table 2 Model-centric and data-centric approaches to modeling

Approach Characteristics Manipulation

Model-centric A model is defined. Data for the model is collected and then stored. The model defines the format of the data.

A database already exists, possibly shared by multiple users, A portion of the database will be used in a model. The database cannot be changed to accomodate the model.

Data-centric

The database is accessed through the modeling language, or by importing and exporting it to an external database system.

The modeling language is treated as an external program, in the same way that a COBOL program. The database cannot be guaranteed to conform to model specifications, e.g. inconsistencies may arise.

350 R.G. Ramirez et al. / Independence and mapping in model-based DSS

IDAMS Interactive interface

DAMS spreadsheet

interface

I DAMS I language

processor I model~ E

DAMS i query procesaor

I HOST DBMS

Ingres, Oracle, | DEERDS, DB2 I

Fig. 2. The DAMS system.

DAMS application

program Interface

I

only checks that the initial keywords (i.e., the name of the command) are correct and that the command is properly finished (semicolon a n d / o r END clause). IDAMS maintains a buffer with the last command typed the user. The command may be edited, printed, saved to a file, resubmitted, or erased. IDAMS also supports " immediate" state- ments, for functions such as creating a log file and browsing the output from a command.

There are two other interfaces. The DAMS application program interface (API) is a collec- tion of subroutines written in C that can be called from any application program. The application program generates a command and sends it to the language processor. The syntax of commands is the same as for IDAMS but the user's program is responsible for all user interaction. The spread- sheet interface is in development and it allows users to issue S M / D B statements and view model data from a spreadsheet.

The DAMS language processor receives state- ments from any of the interfaces and determines their processing. Database statements are sent to the host DBMS without further processing. S M / D B statements are sent to the DAMS query processor. Output from the host DBMS or the DAMS query processor is returned to the calling interface.

Communication between user programs and DAMS (as well as between DAMS modules) uses the DAMS communication area (DAMSCA). DAMSCA is similar to the SQLCA communica- tion area, it stores execution codes and error messages. Currently, the SQLCA also stores the execution code for S M / D B commands but it was decided to have a separate area for portability.

4. Modeling with DAMS and S M / D B

DAMS supports a number of modeling ob- jects. An object in DAMS is a named entity with an existence of its own. Objects are created using a CREATE statement such as CREA TE TABLE or CREA TE MODEL. A DROP statement is used to delete objects, and an A LTER statement modifies object definitions. The supported ob- jects are:

SQL objects Modeling objects

Tables Views Other as supported

by the host system (procedures, constraints, etc.)

Models Model instances Solver representations Model / so lver mappings Executable models

4.1. Defining models in DAMS

Modeling definition in DAMS is based on structured modeling [10] and utilizes the same basic concepts. A model in DAMS corresponds to the notion of model schema in structured mod- eling. Models are defined using the CREATE M O D E L statement. An example using the classi- cal FEEDMIX model [8] is given in fig. 3. Models are defined by aggregating genera. There are five genus types: Primitive entities (PE), compound entities (CE), attributes (ATI" or VA), functions (FUNC), and tests (TEST). Modules are syntacti- cal groupings of genera and are optional. Genera may be grouped in modules in any desired fash- ion.

Fig. 3 illustrates the major points of model definition in S M / D B . While S M / D B is not case sensitive, uppercase has been used to denote S M / D B keywords and lowercase for user-de- fined names. The first genus, NUTR, is a primi-

R.G. Ramirez et al. / Independence and mapping in model-based DSS

CREATE MODEL feexlmix WITH CANONICAL INSTANCE BEGIN

MODULE nutrients BEGIN PE nutr CHAR(8); ATr min (nutr) REAL;

END MODULE nutrients;

MODULE materials BEGIN PE material CHAR(10); A'vr ucost (material) REAL; ATr analysis (nutr, material) REAL;

END MODULE materials;

MODULE formulas BEGIN VA q (material) REAL; FUNC nlevel (nutr; analysis, q) : = SUM(analysis * q); TEST mlevel (nutr; nlevel, rain) := nlevel >-- rain; FUNC totcost (; ucost, q) :-- SUM(ucost * q);

END MODULE formulas;

END MODEL feedmix;

Fig. 3. CREATE MODEL for the FEEDMIX model.

tive entity. The second genus, MIN, is an at- tribute of N U T R and is said to be indexed by NUTR. The indexing genus is shown in parenthe- ses. ANALYSIS is indexed by both the N U T R and M A T E R I A L primitive entities. An attribute, function, or test without an index specification may be instantiated by a single value.

Each function or test is defined with an alge- braic expression following the := symbol. Tests are Boolean functions that return T R U E or FALSE. Functions and tests require the specifi- cation of the parameters used in the algebraic expression. Parameters are indicated in the

351

parentheses after the indexing genera following a semicolon. The function N L E V E L requires the values of ANALYSIS and Q and is indexed by NUTR. T O T C O S T is an unindexed function with parameters UCOST and Q.

4.2. Model instances and datasets

Model instances can be created at the time a model is defined or at any time after. They are given a name and are objects on their own. A model instance in DAMS is formed by a pair (model, dataset) . A dataset is a table subset of a relational database (or nested relational). Tables may be virtual relations (views) or base tables. Multiple instances for the same model can be created. Datasets are not necessarily disjoint. A database will usually store datasets for multiple models. A dataset is not an object in DAMS and thus it cannot be created or dropped in the same sense as a model or an instance. It is possible, however, for DAMS to automatically create and drop the tables used in a given instance.

Note that the term "model instance" refers only to an S M / D B object that associates a model and a dataset. There is no requirement that data actually exists in the dataset or that these data satisfy any constraints implied in the model ( V A L I D A T E can be used to verify correctness of the instance). This usage of the term instance may seem strange. It is, however, consistent with

NUTmENTS

N U T R I E ~ [MINIMUM ~ E ~ L Pmtem 16 15.~ ~ o ~ 4 4 . ~

TNLEVEL FALSE TRUE

MATERIALS

MATERIAL j u c o s r 1.20 2.00 3.00 .50

standard additive

ANALYSIS

NUTRIENT MATERIAL Protein standard Protein additive Calcium standard Calcium additive

ANALYS~ 4.00 14.00 2.00 1.00

TOTCOST

J TOTCOST 3.90

Fig. 4. The sample dataset for the FEEDMIX model.

352 R.G. Ramirez et al. / Independence and mapping in model-based DSS

the SQL usage, where CREATE TABLE defines a new empty table.

The clause WITH CANONICAL INSTANCE in fig. 3 creates a canonical instance with the same name as the model. Canonical indicates that the dataset tables are created automatically. These tables are initially empty and the modeler must insert data using SQL a n d / o r S M / D B commands. Tables and columns for a canonical instance are created according to an algorithm that essentially groups genera with the same in- dexes in the same table and uses the genus names as column names. Functions and tests are as- signed to tables separate from attributes. Mode l /da ta mappings for a canonical instance are automatically generated. The canonical in- stance created in fig. 3 would have the following tables and columns (table names are actually prefixed to avoid duplication; the actual name for the table N U T R will be something like XX35NUTR):

Table Columns

NUTR MATERIAL ANALYSIS

Q

NLEVEL TOTCOST

NUTR, MIN MATERIAL, UCOST NUTR, MATERIAL,

ANALYSIS MATERIAL, Q NUTR, NLEVEL, TNLEVEL TOTCOST

An instance can also be defined using a dataset created independent ly of the model. The mode l /da ta mapping must be explicitly given. Consider fig. 4 (double lines separate key and non-key attributes). This dataset differs from the canonical instance in several aspects. The names of columns and tables do not coincide with the corresponding genera. The tables Q and NLEVEL are no longer used. The S M / D B state- ment that creates an instance and the mode l /da t a mapping for the FEEDMIX model:

CRE ATE INSTANCE sample_instance FOR M O D E L feedmix MAP nutr TO nutrients (nutrient), MAP min (nutr) TO nutrients (minimum, nutri- ent), MAP material TO materials (material), MAP ucost (material) TO materials (ucost, mate- rial), MAP analysis (nutr, material)

TO analysis (analysis, nutrient, material), MAP q (material) TO materials (qty, material), MAP nlevel (nutr) TO nutrients (nlevel, nutrient), MAP tnlevel (nutr) TO nutrients (tnlevel, nutri- ent), MAP totcost TO totcost (totcost);

4.3. Default instances and partial model~data mappings

One of the model instances is the default instance. This allows users to issue S M / D B com- mands without explicitly indicating the instance using statements such as SOLVE and DISPLAY (explained later). The canonical instance of fig. 3 becomes the default instance since it is the only instance defined for that model. Unless otherwise specified, the instance last created is the default instance.

Instances may be created in which the mode l /da ta mapping is only partially specified. For example, a dataset may be defined to store only the inputs to a linear programming solver without any table or column assigned to store solver outputs. This is a common situation when the solver's outputs are sent directly to a printer or to the screen but are not stored in the database. If some genera must be instantiated prior to their use with a specific solver, an error message will be issued when attempting to use this solver.

It is also possible to define a mode l /da ta mapping where some of the tables already exist and others must be automatically generated for this instance. Assume, for example, that there exists a dataset for NUTR, MIN, MATERIAL, UCOST and ANALYSIS. The following S M / D B command uses the REST AS CANONICAL clause to automatically create three tables: Q, NLEVEL, and TOTCOST to store the genera not listed in a MAP clause.

CREA TE INSTANCE mixed instance FOR MO D EL feedmix MAP nutr TO nutrients (nutrient), MAP min (nutr) TO nutrients (minimum, nutri- ent), MAP MAP rial), MAP

material TO materials (material), ucost (material) TO material (ucost, mate-

analysis (nutr, material) TO analysis (analysis, nutrient, material),

REST AS CANONICAL;

R.G. Ramirez et al. / Independence and mapping in model-based DSS 353

All instances are dropped when their base model is dropped. Tables for a canonical instance are dropped when the base model is dropped, as are tables created using the REST AS CANONI- CAL clause. Tables referenced in a MAP clause are not dropped automatically.

4.4. Supporting the model-centric approach

The model-centric approach is supported in DAMS by essentially 'hiding' the database ma- nipulation statements. In a pure model-centric approach, the modeler defines a model and cre- ates a single canonical instance using C R E A T E MODEL. Data are inserted and manipulated in the instance using S M / D B commands. No SQL commands are needed. Since a canonical instance has its dataset and m o d e l / d a t a mapping auto- matically created, there is no need for the mod- eler to explicit refer to any relational table. In addition, for each session, DAMS maintains a model as the 'current ' model and interprets un- qualified references to a genus as pertaining to the current model.

The DISPLAY GENUS statement is used to retrieve model data. It is similar to the SELECT statement in SQL. Genus names are used, in- stead of column and table names, to provide a model-centric view. Thus, to retrieve the names of all nutrients in the F E E D M I X model, the statement DISPLAY nutr is used. N U T R is as- sumed to be a genus of the current model and its default instance. S M / D B internally translates DISPLAY G E N U S statements to SELECT state- ments. The S T O R E statement plays a similar role for insertion of values for a genus. It is translated to a SQL I N S E R T statement.

Other forms of the DISPLAY statement exist. DISPLAY M O D E L displays general information about a model, including the names of its in- stances. DISPLAY D E S C R I P T I O N lists the for- mulation of the model. DISPLAY INSTANCE lists m o d e l / d a t a mappings.

4.5. Automatic generation of data for model in- stances

S M / D B provides the G E N E R A T E GENUS statement to automatically generate data taking advantage of information in the model definition. It is useful in simulations and to avoid the typing

of values that can be computed, as in the follow- ing examples:

(1) for a primitive entity YEARS to take the values 1983, 1984,... 1992, use: G E N E R A T E GENUS years USING F O R M U L A 1983 (= years )= 1992;

(2) to generate all possible values for a com- pound entity such as LINKS in a transportation model, use: G E N E R A T E GENUS links USING DISPLAY links F R O M plants, markets;

(3) to randomly generate values for the genus SALES using a normal distribution with mean 2500 and a variance of 400, use: G E N E R A T E GENUS sales USING F O R M U L A normal(2500, 400). As many values will be generated as there are values for the indexing genus.

4.6. Model solving and evaluation

DAMS provides two fundamental operations on models: Evaluation and solving. The EVALU- A T E commands provides a straight computation of functions and tests. The SOLVE statement invokes a 'solver' to manipulate a model instance. SOLVE is discussed in the next subsection.

The first form of E V A L U A T E deals with an entire model instance. When E V A L U A T E M O D E L model-name or E V A L U A T E IN- STANCE instance-name are used, all functions and tests in that instance are evaluated. The default instance is evaluated when E V A L U A T E M O D E L is used. The statement E V A L U A T E M O D E L feedmix would use the canonical in- stance defined in fig. 3 and compute values for NLEVEL, TNLEVEL, and TOTCOST. Note that Q must have been previously instantiated in or- der to compute NLEVEL.

The second form, E V A L U A T E GENUS, com- putes an individual function or test genus. It optionally computes values for all genera re- quired as parameters. E V A L U A T E returns an error if any required genus is incorrectly instanti- ated; no tables are updated. Such an error occurs when the number of values for an attribute is less than the number of values in its index set. The form of the statement is

E V A L U A T E GENUS genus-name [FOR M O D E L / I N S T A N C E name] [ INSTANTIATE P R E V I O U S I W I T H O U T IN- S T A N T I A T I N G PREVIOUS];

354 R.G. Ramirez et al. / Independence and mapping in model-based DSS

INSTANTIATE PREVIOUS, the default, has the effect of storing the values for all genera required to compute this genus. If EVALUATE GENUS tnlevel FOR feedmix is used, the values of TNLEVEL and NLEVEL are computed and stored in the database, replacing any existent values. If EVALUATE GENUS tnlevel FOR feedmix W I T H O U T INSTANTIATING PREVI- OUS is used, values for T N L E V E L are computed using whatever values already exist for NLEVEL.

4. 7. The SOLVE statement

SOLVE is the most powerful statement in processing a model. It invokes a program (the solver) to manipulate a model instance. The for- mat is the following. SOLVE model-spec [USING solver-spec]

[MINIMIZING [MAXIMIZING genus name] VARIABLES genus-name [,genus-name...] OMIT genus-name [,genus-name...] RESTRICT GENUS genus name TO INTE-

GER [BETWEEN min and max] I BINARY I POSITIVE [BETWEEN rain AND max I REAL

model-spec may be the name of a model instance, a model (the default instance will be solved), or an executable model (defined later). Most clauses are optional, such as M I N I M I Z E / M A X I M I Z E . The genus to be minimized or maximized must be a non-indexed function genus. OMIT indicates genera to be discarded. The RESTRICT clause provides additional information to the solver such as bounds for a variable. It may also relax a constraint, treating an attribute as real instead of integer. RESTRICT and OMIT do not have ef-

fect beyond the particular SOLVE execution. The following uses a linear programming solver on the FEEDMIX model.

SOLVE feedmix USING lpsolver MINIMIZING totcost VARIABLES q;

4.8. Solvers and executable models

DAMS considers two types of solvers: internal and external. An internal solver is a program written specifically to interface with DAMS. The mechanism is similar to GAMS. The above exam- ple uses "lpsolver" as an internal solver. A lim- ited number of internal solvers is provided for the more common cases, such as linear programming. An external solver is an arbitrary computer pro- gram that can be automatically run and inter- faced using files or the host database system. A solver representation defines the communication with the solver via its inputs and outputs. M o d e l / solver mappings relate a model to an external solver. This architecture allows DAMS to be ex- tended with arbitrary solvers without extensive recoding since only the mapping needs to be defined. It also allows the use of solvers that cannot be easily modified, either because of pro- prietary restrictions for commercial software, or because of technical reasons.

A realized model is formed by a model in- stance and a solver, i.e., it is the triple (model, dataset, solver). A realized model that is "ready to run" is called an executable model and includes mode l /da ta and mode l / so lve r mappings, as shown in fig. 5.

End users need not be aware of all these components. The model and the executable model

DA1 MODEL SOl

EXEEIYrABLE MODEL ]

Fig. 5. The compositions of an executable model.

VEil

R.G. Ramirez et al. / Independence and mapping in model-based DSS 355

can be prepared by an expert, while data prepa- ration, model execution, and analysis of the model can be carried out by the end user. The distinc- tion between executable models, internal and ex- ternal solvers is transparent to the user. The SOLVE statement allows the name of an exe- cutable model to be used in the model-spec clause. This DAMS structure is not unlike that of a Fortran program using embedded SQL to access a database, where the user does not need to be aware that the application program and the DBMS are separate entities.

Each solver is given a representation that de- fines the types of models that can be manipu- lated. The representation includes a world ciew (entities manipulated by the solver), pre- and post-conditions that are true before and after execution of the solver, and a file interface. A discussion of world view and pre- and post-condi- tions is given in Eck et al. [5].

The file interface defines the files through which the solver receives its inputs and displays its outputs. Most solvers use a file structure that is more complex than a relational database and cannot be described in terms of just tables and columns. In DAMS, a file interface is as follows. Inputs and outputs are defined as separate col- lections of files. Files are not allowed to be used for both input and output. Files are 'logical' and may be assigned to a physical device such as keyboard, screen display, or disk. For each file, a file descriptor describes parameters such as name, device, type (sequential, host database table, ISAM), and record length (fixed, variable, num- ber of bytes). The file header is an optional series of constant records that allows the inclusion of statements required by operating systems and solvers, such as the statement / / EXEC MPSX for the MVS operating system. For SAS, the file header could include the OPTIONS statement to define the width of printer reports. The file footer is similar to the file header; also a series of constant records. Between header and footer, there are one or more segments. Records in a segment all have the same general format (or "scheme" in the relational terminology). A seg- ment has descriptor, headers and footers simi- larly to the file. The segment descriptor defines the format of records in the segment. A record is composed of fields. All records in a segment have the same fields. Fields can be constants or vari-

ables. A constant field has a fixed value defined in the segment record descriptor. A variable field takes its value from an external file or is assigned a value by the solver.

The following example illustrates the use of external solvers and model / so lver mappings. Consider a Fortran program that minimizes a linear program with > constraints and produces only printed output. The program reads a file with array variables C (cost coefficients), B (available resources), and a matrix C (substitu- tion coefficients). All records have a fixed format of 80 characters. Records containing values for C and B are identified by the constants 'C' and 'RHS' respectively. The program is run on the MVS system. This is a somewhat simplified situa- tion but illustrates the major points of and limits the amount of code in the example. More com- plex situations such as equality and _< constraints can be handled though the mappings or addi- tional clauses. The (partial) solver representation is

CREATE SOLVER REP lpfortran SYSTEM MVS; BEGIN

WORLD VIEW BEGIN PE i; PE j; ATT c(j); ATT b(i); ATT a(i, j); ATT x(j); END WORLD VIEW;

FILE DESCRIPTOR onlyfile INPUT REC- ORD SIZE 80 BEGIN

FILE H E A D E R ' / / D A M S JOB'; FILE H E A D E R ' / / E X E C LPSOLVER' ; FILE H E A D E R ' / / S Y S I N DD* ' ; SEGMENT onlyfile BEGIN

RECORD costs BEGIN CONSTANT 'C'; ARRAY c; END RECO RD costs;

RECORD righthand BEGIN CONSTANT 'RHS'; ARRAY b; END RECO RD righthand;

RECORD coeffs BEGIN MATRIX a; END RECO RD coeffs;

END SEGMENT; FILE F O O T E R ' / / ' ; END FILE DESCRIPTOR;

END;

356 R.G. Ramirez et al. / Independence and mapping in model-based DSS

The world view defines the entities and at- tributes that the solver manipulates and that can be mapped to a model. The SYSTEM MVS clause specifies the computer on which the solver is run. Currently, DAMS allows solvers to reside on the VAX and on an IBM 3090 running MVS. VAX solvers are executed by 'escaping' to a shell, and MVS jobs are sent over a network to the IBM 3090. In the example, the file header contains the JCL statements to execute an MVS job. The filed interface has three segments, one for each record type.

File interfaces and solver representations can- not handle all possible cases. For example, it is not possible to specify the inputs for LINDO using the free-form algebraic style. It is possible, however, to specify a large subset of the MPSX format as a file interface. Solvers with multiple options are better (sometimes only) described through multiple solver representations, one for each related group of options.

To create a mode l / so lver mapping for the FEEDMIX model and the LPFORTRAN solver representation, the following code can be used. Note that the mapping of Q to X is not necessary since this particular solver produces only printed output and does not store any results in the database.

CRE ATE SOLVER MAPPING feedsolvermap FOR M O D E L feedmix AND SOLVER lpfortran MAP nutr TO j: MAP material TO i; MAP ucost TO c; MAP analysis (nutr, material) TO a (j, i); MAP q TO x;

To create an executable model that solves the SAMPLE_ INSTANCE of the FEEDMIX model using LPSOLVER:

CREATE EXEC UTABLE M O D E L feedexec USING INSTANCE sample_instance AND SOLVER MAPPING feedsolvermap;

The statement SOLVE feedexec is now all that is needed to solve the FEEDMIX model. In this particular case, DAMS will 'download' the dataset to a single VAX file with the format specified in the solver representation (lpfortran). The first records in the file would be the constants in the file headers and the last record would be as

specified in the file footer. This file will then be sent over the network for execution on the IBM system. The output will be returned as a text file to the user.

5. S M / D B and SML

S M / D B shares with SML [8,11] the modeling concepts of structured modeling, and in fact the initial development of DAMS used SML as the model definition language. This section briefly describes the differences between the two lan- guages.

First, SML is a model definition language. S M / D B adds facilities to manipulate models (SOLVE, EVALUATE, DISPLAY), datasets (DSL statements), model definitions (DROP and A LTER MODEL), and other modeling objects. SML allows implementors to define their own interfaces and commands. In F W / S M [12], the facilities of the Framework III system are used to provide user interface and database manipula- tion.

SML and S M / D B view models and databases in a different way. In SML, a model is composed of a model schema and a set of elemental detail tables (EDTs). A model schema corresponds to a 'model ' in S M / D B . The EDTs store the data that instantiates the model schema. To create two model instances for the same model, in SML one must (at least conceptually) create two model schemas. EDTs differ from S M / D B tables in that EDTs obey a strict set of rules defining their format. EDTs are normalized to Boyce-Codd normal form [17] and are named after the genera they instantiate. ED T data is ' typed' (integer, real, etc.) according to the model schema. More- over, data in an ED T is assumed to be in order. S M / D B tables are strictly relational (or nested relational); assume no order and impose no con- straint on naming. SML assumes that every genus has a corresponding table and column in an EDT while S M / D B allows genera that are not mapped to a relational table.

Model definition also differs in some aspects. S M / D B allows single-instance attributes to be defined without an index set. This is useful for constants such as rr, or a fixed "rate of return". For syntactical convenience, S M / D B allows ini- tial values to be given in the model definition,

R.G. Ramirez et al. / Independence and mapping in model-based DSS 357

although they are not considered part of the model definition.

Indexing and subscripting are treated differ- ently in S M / D B . In SML, it is possible to indi- cate that, for example, a compound entity in- cludes precisely the Cartesian product of two primitive entities. It is also possible to indicate the rules to form a subset of the Cartesian prod- uct. These two cases are treated identically in S M / D B and is left to the actual instance to define whether and which subset is used. In re- turn for this reduced expressive power, S M / D B has simpler statements. In S M / D B the formulas for functions and tests are defined without sub- scripting, using set-oriented statements similar in concept to relational algebra.

6. Summary and conclusions

We have presented an overview of the DAMS data and model management system. DAMS at- tempts to support complex decision making in an organizational environment by providing the abil- ity to share components among users, and to integrate simpler components to form larger, more complex systems. The framework for DAMS is based on mode l /da ta and model / so lver inde- pendence, and for this reason, we have defined the conditions for model representations that make possible the reuse and combination of mod- els, databases, and programs. These conditions were illustrated using algebraic notation and modeling languages. To support model indepen- dence in the DSS architecture, DAMS incorpo- rates mappings and solver bases. Solver bases contain general purpose programs rather than just the small collection of specialized solvers found in most DSSs. Mappings relate the now independent components of a DSS, allowing them to be combined and reused as required by the decision situation at hand.

A prototype version of DAMS is implemented. DAMS is a collection of C programs making SQL calls to the multi-user INGRES database system. Interactive and API interfaces are operational. Model evaluation is done using INGRES: Func- tion and test formulas are translated into SQL statements. An internal solver provides optimiza- tion for a restricted class of models by generating calls to SAS/OR. External solvers are also sup-

ported for a limited class of models. The current version is uneven on its support for features and user-friendliness and has the following restric- tions. A genus is restricted to a maximum of four indexing genera. Tests values are stored as 0 /1 values and formulas must be stated as arithmetic expressions instead of logical expressions. The processing of solver representations and m o d e l / solver mappings is not yet integrated into the S M / D B language processor; separate programs must be run or entered manually into the model and solver bases.

The development of DAMS has been a re- warding experience for us and our students and has provided us with multiple lessons. Using the host DBMS as both a data manager and an implementation vehicle has removed much of the drudgery, allowing us to test many ideas without writing code at all. For instance, we were able to test the mode l /da ta and model / so lver mappings using only the SQL language, before writing the generalized versions in C. The multi-user envi- ronment facilitated group development of soft- ware.

The synergy of integration of model and data management in DAMS goes beyond the concep- tual level. All the resources of the host database system are available and it is not necessary to switch from the DAMS to the host environment to access them. The host database system is a multi-user system widely used in industry on a range of computers. This allows DAMS to be used with existent databases that are simultane- ously being used for conventional applications and supports a data-centric approach to model- ing. It must be noted that DAMS is not depen- dent on any specific features of the current host. INGRES can be substituted by ORACLE, DB2, or any other DBMS that supports SQL without affecting the modeling language and functions.

An initial attempt was made to use SML as the modeling language. Our approach was the trans- lation of SML to an intermediate form that could be easily compiled and executed. Our limited experience in compiler writing and the compre- hensiveness of SML made the translator too diffi- cult. The current CREATE M O D E L statement is the result of our efforts to define a subset of SML that would be easier to implement and still retain most of the expressive power.

The concept of a workspace seems necessary

358 R.G. Ramirez et al. / Independence and mapping in model-based DSS

to facilitate dynamic modification of models and datasets. Consider, for example, the addition of a function to compute the sum of Q in the FEED- MIX model. Currently this will cause modifica- tion of all canonical instances definitions and mappings, including the creation of a new tables. These are expensive operations that may be un- necessary for a one time computation. It seems preferable to use an interpretive approach during a session and only commit to the databases the results at the end of the session.

A second version of DAMS is planned using Microsoft Windows and ORACLE Version 6. Our intent is to have a better development envi- ronment and to provide modelers with 'live' links between applications. Windows provides these fa- cilities and it is relatively easy to interface spread- sheet, database, and graphical packages. We de- cided against using a UNIX workstation environ- ment because of the high cost of software and the more complicated procedures to obtain and in- stall network versions of software.

References

[1] R.H. Bonczek, C.W. Holsapple and A.B. Whinston, Mathematical Programming Within the Context of a Generalized Data Base Management System, R.A.I.R.O. Recherche Operationnelle/Operations Research 12, No. 2 (July 1978).

[2] R.H. Bonczek, C.W. Holsapple and A.B. Whinston, The Evolving Roles of Models in Decision Support Systems, Decision Sciences 11 (1980) 337-356.

[3] G.H. Bradley and R.D. Clemence, A Type Calculus for Executable Modeling Languages, IMA Journal of Mathe- matics in Management 3 (1988).

[4] A. Brooke, D. Kendrick and A. Meeraus, GAMS: A user's Guide (The Scientific Press, 1988).

[5] R. Eck, A. Philippakis and R.G. Ramirez, Solver Repre- sentation using Structured Modeling, Proceedings of the IEEE International Hawaii Systems Conference (HICSS- 23) January 1990.

[6] R. Fourer, Modeling Languages Versus Matrix Genera- tors for Linear Programming, ACM Transactions on Mathematical Software 9, No. 2 (June 1983) 143-183.

[7] R. Fourer, Database Structures for a Class of Mathemat- ical Programming Models, Proceedings of the Twenty- Fourth Annual Hawaii International Conference on Sys- tem Sciences, Vol. III (January 1991) 306-316.

[8] A.M. Geoffrion, An Introduction to Structured Model- ing, Management Science 33, No. 5 (May 1987).

[9] A.M. Geoffrion, Indexing In Mathematical Programming Languages, Working Paper No. 371, Western Manage- ment Science Institute, UCLA, 1989.

[10] A.M. Geoffrion, The Formal Aspects of Structured Mod- eling, Operations Research 37, No. 1 (January-February 1989) 30-51.

[11] A.M. Geoffrion, SML: A Model Definition Language for Structure Modeling, Working Paper No. 360, Western Management Science Institute, UCLA, August 1990.

[12] A.M. Geoffrion, S. Maturana, L. Neustadter, Y. Tsai and F. Vicufia, User Documentation for FW/SM, John E. Anderson Graduate School of Management, UCLA, June 1990.

[13] C.W. Holsapple and A.B. Whinston, Model Management Issues and Directions, Paper No. 7, 1988, Kentucky Insti- tute for Knowledge Management.

[14] T.P. Liang, Integrating Model Management with Data Management in Decision Support Systems, Decision Sup- port Systems 1 (1985) 221-232.

[15] K.A. Moser, R.G. Ramirez and R.D. St. Louis, Complex Object Databases for Model Management Systems, Pro- ceedings of the 23rd International Hawaii Systems Sci- ence Conference (HICSS-23), Hawaii, January 3-7, 1990.

[16] W.A. Muhanna and R.A. Pick, Composite Models in SYMMS, Proceedings of the 21st Annual Hawaii Inter- national Conference on System Sciences (January 1988) 418-527.

[17] L. Neustadter, On The Structure of Data in SML Mod- els, Research Paper, John E. Anderson Graduate School of Management, UCLA, March 1990.

[18] R.G. Ramirez, Architecture and Implementation of the DAMS System, ISUMMS Project Technical Report #4, Iowa State University, August 1991.

[19] R.G. Ramirez, The SM/DB Language: Reference Man- ual, ISUMMS Project Technical Report #3, Iowa State University, September 1991.

[20] R.H. Sprague and E.D. Carlson, Building Effective Deci- sion Support Systems, (Prentice-Hall International, En- glewood Cliffs, NJ, 1982).