Data base directions - NIST Technical Series Publications

184
! A11103 OTOlbl COMPUTER SCIENCE & TECHNOLOGY DATA BASE DIRECTIONS— The Conversion Problem 10-54 NBS Special Publication 500-64 U.S. DEPARTMENT OF COMMERCE National Bureau of Standards

Transcript of Data base directions - NIST Technical Series Publications

!A11103 OTOlbl

COMPUTER SCIENCE & TECHNOLOGY

DATA BASE DIRECTIONS—The Conversion Problem

10-54

NBS Special Publication 500-64

U.S. DEPARTMENT OF COMMERCENational Bureau of Standards

NATIONAL BUREAU OF STANDARDS

The National Bureau of Standards' was established by an act of Congress on March 3, 1901.

The Bureau's overall goal is to strengthen and advance the Nation's science and technology

and facilitate their effective application for public benefit. To this end, the Bureau conducts

research and provides: (1) a basis for the Nation's physical measurement system, (2) scientific

and technological services for industry and government, (3) a technical basis for equity in

trade, and (4) technical services to promote public safety. The Bureau's technical work is per-

formed by the National Measurement Laboratory, the National Engineering Laboratory, and

the Institute for Computer Sciences and Technology.

THE NATIONAL MEASUREMENT LABORATORY provides the national system of

physical and chemical and materials measurement; coordinates the system with measurement

systems of other nations and furnishes essential services leading to accurate and uniform

physical and chemical measurement throughout the Nation's scientific community, industry,

and commerce; conducts materials research leading to improved methods of measurement,

standards, and data on the properties of materials needed by industry, commerce, educational

institutions, and Government; provides advisory and research services to other Government

agencies; develops, produces, and distributes Standard Reference Materials; and provides

calibration services. The Laboratory consists of the following centers:

Absolute Physical Quantities- — Radiation Research — Thermodynamics and

Molecular Science — Analytical Chemistry — Materials Science.

THE NATIONAL ENGINEERING LABORATORY provides technology and technical ser-

vices to the public and private sectors to address national needs and to solve national

problems; conducts research in engineering and applied science in support of these efforts;

builds and maintains competence in the necessary disciplines required to carry out this

research and technical service; develops engineering data and measurement capabilities;

provides engineering measurement traceability services; develops test methods and proposes

engineering standards and code changes; develops and proposes new engineering practices;

and develops and improves mechanisms to transfer results of its research to the ultimate user.

The Laboratory consists of the following centers:

Applied Mathematics — Electronics and Electrical Engineering^ — Mechanical

Engineering and Process Technology^ — Building Technology — Fire Research —Consumer Product Technology — Field Methods.

THE INSTITUTE FOR COMPUTER SCIENCES AND TECHNOLOGY conducts

research and provides scientific and technical services to aid Federal agencies in the selection,

acquisition, application, and use of computer technology to improve effectiveness and

economy in Government operations in accordance with Public Law 89-306 (40 U.S.C. 759),

relevant Executive Orders, and other directives; carries out this mission by managing the

Federal Information Processing Standards Program, developing Federal ADP standards

guidelines, and managing Federal participation in ADP voluntary standardization activities;

provides scientific and technological advisory services and assistance to Federal agencies; and

provides the technical foundation for computer-related policies of the Federal Government.

The Institute consists of the following centers:

Programming Science and Technology — Computer Systems Engineering.

'Headquarters and Laboratories at Gaithersburg, MD, unless otherwise noted;

mailing address Washington, DC 20234.

^Some divisions within the center are located at Boulder, CO 80303.

Htloiai Burtau of Stxatedi

COMPUTER SCIENCE & TECHNOLOGY:

DATA BASE DIRECTIONS—The Conversion Problem

Proceedings of the Workshop of the d. 5-

National Bureau of Standards and the

Association for Computing Machinery,

held at Fort Lauderdale, Florida,

November 1 - 3, 1977

OCT 2 9 1980

flM. CjCU. ' Ore-

Cicioo

John L. Berg, Editor:

Center for Programming Science and Technology

Institute for Computer Sciences and Technology

National Bureau of Standards

Washington, D.C. 20234

Daniel B. Magraw, General Chairperson

Working Panel Chairpersons:

Milt Bryce, James H. Burrows,

James P. Fry, Richard L. Nolan

Sponsored by:

s National Bureau of Standards

^^^^^^y^ Association for Computing Machinery

U.S. DEPARTMENT OF COMMERCE, Philip M. Klutznick, Secretary

Luther H. Hodges, Jr., Deputy Secretary

Jordan J. Baruch, Assistant Secretary for Productivity, Technology and Innovation

NATIONAL BUREAU OF STANDARDS, Ernest Ambler, Director

Issued September 1980

Reports on Computer Science and Technology

The National Bureau of Standards has a special responsibility within the Federal

Government for computer science and technology activities. The programs of the

NBS Institute for Computer Sciences and Technology are designed to provide ADPstandards, guidelines, and technical advisory services to improve the effectiveness of

computer utilization in the Federal sector, and to perform appropriate research and

development efforts as foundation for such activities and programs. This publication

series will report these NBS efforts to the Federal computer community as well as to

interested specialists in the academic and private sectors. Those wishing to receive

notices of publications in this series should complete and return the form at the end of

this publicaton.

National Bureau of Standards Special Publication 500-64

Nat. Bur. Stand. (U.S.), Spec. Publ. 500-64, 178 pages (Sept. 1980)

CODEN; XNBSAV

Library of Congress Catalog Card Number: 80-600129

U.S. GOVERNMENT PRINTING OFFICE

WASHINGTON: 1980

V,n- siilo by tli<> Superintemlciit of Documents, U.S. Governmpiit Printing Office

Washington, D.C. 20402 - I'rice $5.50

TABLE OF CONTENTS

Page

INTRODUCTION 3

1.1 THE FIRST DATA BASE DIRECTIONS WORKSHOP 3

1.2 PLANNING FOR SECOND CONFERENCE 4

1.3 DATA BASE DIRECTIONS II 4

1.4 CONCLUSION 5

EVOLUTION IN COMPUTER SYSTEMS 7

2.1 QUESTIONS 7

2.2 HARDWARE CHANGES , 9

2.3 SOFTWARE CHANGES 10

2.4 EVOLUTIONARY APPLICATION DEVELOPMENT 11

2.5 MIGRATION TO A NEW DBMS 13

ESTABLISHING MANAGEMENT OBJECTIVES 19

3.1 OVERVIEW 20

3.2 CONVERSION TO A DATA BASE ENVIRONMENT 24

3.2.1 Impact On the Application Portfolio 253.2.2 Impact On the EDP Organization. 293.2.3 Impact On Planning and Control Systems. ... 353.2.4 Impact of Conversion On User Awareness. ... 38

3.3 MINIMIZING THE IMPACT OF FUTURE CONVERSIONS .. 39

3.3.1 Institutionalization of the DBA Function. . 393.3.2 DBMS Independent Data Base Design 403.3.3 Insulate Programs From the DBMS 40

3.4 CONVERSION FROM ONE DBMS TO ANOTHER DBMS 41

3.4.1 Reasons for Conversion 423.4.2 Economic Considerations 433.4.3 Conversion Activities and Their Impact. ... 443.4.4 Developing a Conversion Strategy 48

3.5 SUMMARY 50

4. ACTUAL CONVERSION EXPERIENCES 53

4.1 INTRODUCTION 54

4.2 PERSPECTIVES 55

4.3 FINDINGS 57

4.3.1 I ndustri al /Governmental Practices 584.3.2 The First DBMS Installation 604.3.3 Desired Standards 614.3.4 Desired Technology 61

4.4 TOOLS TO AID IN THE CONVERSION PROCESS ... 62

4.4.1 Introduction 62

4.4.2 Changing From Non-DBMS To DBMS 624.4.3 Changing From One DBMS To Another 644.4.4 Changing Hardware Environment 664.4.5 Centralized Non-DBMS--di st r i buted DBMS. ... 664.4.6 Centralized DBMS--di str ibuted DBMS 67

4.5 GUIDELINES FOR YOUR FUTURE CONVERSIONS 67

4.5.1 General Guidelines 684.5.2 Important Considerations 684.5.3 Ti ght Control 694.5.4 Precise Planning/pre-planning 694.5.5 Important Actions 70

4.6 REPRISE 73

4.7 ANNEX: CONVERSION EXPERIENCES 73

4.7.1 Conversion: File To DBMS 734.7.2 Conversion: Manual Environment To DBMS. .. 794.7.3 Conversion: Batch File System To a DBMS. . 844.7.4 Conversion: DBMS-1 To DBMS-2 87

5. STANDARDS 93

5.1 INTRODUCTION 93

5.1.1 Objectives 935.1.2 What Is a Standard? 945.1.3 Background 94

• 5.2 POTENTIAL BENEFITS THROUGH STANDARDIZATION ... 97

5.3 SOFTWARE COMPONENTS IN CONVERSION PROCESS 98

- i V-

5.3.1 Scenario 1 985.3.2 Scenerio 2 995.3.3 Scenario 3 1005.3.4 Scenario 4 1005.3.5 Miscellaneous Standards Necessary 1005.3.6 Non-software Components Necessary 100

5.4 RECOMMENDATIONS 101

5.4.1 The Development of a Standard DBMS 1015.4.2 Generalized Di ct i onary/ di rectory System. . 101

5.5 CONCLUSION 103

5.6 REFERENCES 103

6. CONVERSION TECHNOLOGY--AN ASSESSMENT 105

6.1 INTRODUCTION 106

6.1.1 The Scope of the Conversion Problem 1066.1.2 Components of the Conversion Process 107

6.2 CONVERSION TECHNOLOGY 109

6.2.1 Data Conversion Technology 1106.2.2 Application Program Conversion 1206.2.3 Prototype Conversion Systems Analysis. ... 129

6.3 OTHER FACTORS AFFECTING CONVERSION 138

6.3.1 Lessening the Conversion Effort 1386.3.2 Future Technologies/Standards Impact 145

BIBLIOGRAPHY 150

7. PARTICIPANTS 161

- v-

PREFACE

In 1972 the National Bureau of Standards (NBS) and the Association

for Computing Machinery (ACM) initiated a series of workshops and

conferences which they jointly sponsored and which treated issues

such as computer security, privacy, and data base systems. The

three -day Workshop, DATA BASE D IRECTION$~-The ConversionProbl em , reported herein continues that series. This Workshop

was held in Fort Lauderdale, Florida, on November 1-3, 1977, and is

the second in the DATA BASE DIRECTIONS series. The first, DATA

BASE DIRECTIONS--The Next Steps , received wide circulation and,

in addition to publication by NBS, was published by ACM's Special

Interest Group on Management of Data and Special Interest Group on

Business Data Processing, the British Computer Society in Europe,

and excerpted by IEEE and Auerbach.

The purpose of the latest Workshop was to bring together leading

users, managers, designers, impl ementor s , and researchers in

database systems and conversion technology in order to provide

useful information for managers on the possible assistancedatabase management systems may give during a conversionresulting from an externally imposed system change.

We gratefully acknowledge the assistance of all those who made the

Workshop's results possible.

S/Je|1|?^3^_/\

Director, Center for ProgrammingScience and TechnologyInstitute for Computer Sciences and

Technol ogy

- V i -

A MANAGEMENT OVERVIEW

To a manager, conversion answers the question, "How do I

preserve my investment in existing data and programs in the faceof inevitable changes?" Selection of conversion as a solutiondepends directly on issues of cost, feasibility, and risk. Sincechange is inevitable, prudent managers must considerpreparations that ease inevitable conversions. How does a

manager choose a course of action?

When Mayford Roark, Executive Director of Systems for theFord Motor Company, and Keynoter of the Workshop, sought an

analogue for examining DBMS and the conversion problem, he aptlyselected the idea of "mapping a jungle." Reporting on his

experience, he noted that 90% of his computers were changed withinthree to five years and major software changes from the vendoroccurred somewhere between five and ten years after acquisition.

These forced changes coupled with the organization'schanging requirements led Roark to a basic point: "evolutionarychange is the natural state of computer systems." In short, theADPmanager's continual task is to "manage change."

Roark's managers experienced the classic benefits of DBMS:quicker response to changing requirements, easier newapplication development, and new capabilities not possible in theearlier systems, but Roark summarized his conversion experiencewithin a DSBMS environment in this way:

. hardware changes--havi ng a DBMS was a moderate to majorburden

.

. software changes--dependent on the circumstances, having

a DBMS ranged from negligible impact to a major burden.

. evolutionary application change--havi ng a DBMS was a

moderate boon. It proved effective, but very expensive.

Considering even the conversion to a DBMS, Roark scores thisprocess a moderate burden because of the risks and costsassoci ated with DBMS.

- V 1 i -

He emphasized this point by describing the major DBMSneed as "easy-to-use, easy-to-apply, and inexpensiveapproaches for upgrading" two decades of computer files to a

data base environment.

Given this charge, how did the workshop respond?

Consideration of the conversion to a DBMS led toseveral specific caveats intended to control the riskinherent in such a step.

Though conversion to a DBMS requires carefulpreplanning and may not be appropriate for everyapplication, managers should consider data basetechnology an inevitable development thrustingitself on future data processing installations. A

manager will have no choice but to face thisdeci si on eventual 1 y

.

The first DBMS application can make or break thesuccess of the conversion. The new system's usersmust have a receptive disposition which results onlyfrom careful preparation, preplanning, and theapplication of basic management skills. The initialapplication plays an important tutorial role foreveryone in the organization including such subtlelessons as:

--whether management is truly committed to the newsystem by supporting it with adequate resources,planning for the continuous support of the system,and applying the necessary managerial cross-department discipline.

--whether the installation technical staff trulyappreciates user needs, can adjust to user changes,and has the necessary skills and backing to carry-off the task.

--whether any of the DBMS conversion proponentshave accurate estimates of the costs, the propertools to use, and a feasible conversion planexpressed in terms that satisfy a 1

1

risk sharers.

- v i i i

While the staff must know the newtechnology, they must not conclude that thenew technology relieves them of the oldproject management controls that all newsystems reauire. Tight planning, managementcontrol, cost monitoring, contingencyapproaches, user review, and step-wisejustifications must be used.

No "final conversion" exists--planning forthe next one begins now! Prepare yoursystem to permit evolutionary change toenhanced tec hn ol ogy-- i nc 1 ud i nq improvementsto DBMS.

How will future technology help managers?Hardware development, particularly the proliferationof mini- and micro-computers, networks, and largemass storage, will increase the need for generalizedconversion tools. On the other hand, droppinghardware costs will make conversion increasinglymore acceptable. Additionally, special DBMSmachines which promote logical level interfacingwill simplify the conversion process. Majoradvances in improved data independence throughsoftware design will also simplify but nevereliminate the conversion problem. Of specialconcern to managers: user demand for several datamodels will continue.

In the next five years, managers can expect to

see more operational generalized conversion toolsbut certainly not full automation of the process.Significantly, standards design and acceptance byvendors will plan a major role in the success ofgeneralized conversion tools. Commerciallyavailable tools for data base conversion seem likelyin ten years but the conversion of applicationprograms is not likely to have a generalizedsolution in the next five years.

Standards address several manager needs in theconversion process. A standard DBMS wouldconsiderably ease the future conversions involving a

DBMS. A standard data dictionary /directory wouldfacilitate all conversions. This latter pointemphasizes that the data dictionary /directory canstand apart from data base systems and, therefore,can assist the conversion to a first DBMS.

- i X -

A standard data interchange format would easeconsiderably the loading and unloading of data andthus facilitate the development of generalizedconversion tools. Manufacturer acceptance of thestandard format would permit their development ofconvertors from the standard form to their system, a

boon to managers either forced to or desirous ofconsidering a different system.

Similarly, standardization of the terminologyused in data base technology, convergence ofexisting DBMS to using common functions, and use byDBMS of a micro-language or standard set of atomicfunctions would assist managers in dealing withconversion from DBMS to DBMS.

In summary: in the next five to ten yearsmanagers must depend on existing good managementpractices rather than wait for automated conversiontools. Standards, as a management exerteddiscipline, will facilitate conversions but userscan expect reluctant acceptance of standards.

-X-

DATA BASE DIRECTIONS

The Conversion Problem

John L. Berg, Editor

ABSTRACT

What information can help a manager assessthe impact a conversion will have on a data basesystem, and of what aid will a data base system beduring a conversion? At a workshop on the database conversion problem held in November 1977under the sponsorship of the National Bureau ofStandards and the Association for ComputingMachinery, approximately seventy-five participantsprovided the decision makers with useful data.

Patterned after the earlier Data Base Direc-tions workshop, this workshop. Data BaseDirections— the conve rs ion probl em , explores database conversion from four perspectives: manage-ment, previous experience, standards, and systemtechnology. Each perspective was covered by a

workshop panel that produced a report includedhere

.

The management panel gave specific directionon such topics as planning for data base conver-sions, impacts on the EDP organization and appli-cations, and minimizing the impact of the presentand future conversions. The conversion experiencepanel drew upon ten conversion experiences to com-pile their report and prepared specific checklistsof "do's and don'ts" for managers. The standardspanel provided comments on standards needed tosupport or facilitate conversions and the systemtechnology panel reports comprehensively on thesystems and tools needed—with strong recommenda-tions on future research.

Key words: Conversion; Data Base; DataDescription; Data Dictionary; Data Directory;DBMS; Languages; Data Manipulation; Query.

-1-

1. INTRODUCTION

Daniel B. Magraw

GENERAL CHAIRMAN

Biographical Sketch

Daniel B. Magraw is Assistant Commissioner,Department of Administration, State of Minnesota.For nearly ten years he has been responsible forall aspects of the State of Minnesota informationsystems activities. His more than thirty years'experience in systems divides almost equallybetween the private and public sectors.

A frequent contributor to professionalactivities, he was one of the founders and is a

past president of the National Association forState Information Systems. He taught courses in

Systems for 22 years in the University ofMinnesota Extension Division and he has been a

frequent speaker on many matters relating toinformation systems and has been deeply involvedwith both Federal and state data security andprivacy legislation.

He was keynote speaker at the 1975 Data BaseDirections Conference.

1.1 THE FIRST DATA BASE DIRECTIONS WORKSHOP

In late October, 1975, a Workshop entitled "Data BaseDirections: The Next Steps" was held in Fort Lauderdale, Florida.Resulting from a proposal brought to Seymour Jeffery at theNational Bureau of Standards by Richard Canning and Jack Minker,the workshop was sponsored jointly by the Association forComputing Machinery and NBS. The product of the intensive two anda half day effort was a series of panel reports which, as

subsequently edited, were issued under the title of the workshopas NBS Special Publication 451.

-3-

As early as December, 1975, suggestions were made toACM and NBS concerning the desirability of one or morefuture conferences on the same general topic. Thesesuggestions were based on the belief that data base systemswill grow increasingly in importance and pervasiveness andwere supported by perceptions, even prior to issuance of NBSSP 451, that the workshop had more than met expectations.

The report was issued in 1976; it was generally thoughtto be a valuable contribution to several audiences includingtop management, EDP management, data base managers, and theindustry. It has had an unusually wide circulation forreports of this nature.

1.2 PLANNING FOR SECOND CONFERENCE

In February, 1977, ACM and NBS decided in favor of a

second data base workshop and invited me to serve as GeneralChairman. An initial planning group was establishedconsisting of Dick Canning and Jack Minker representing ACM,John Berg representing NBS, and the chairman. The planninggroup entitled the Conference "Data Base Directions II--TheConversion Problem."

Four working panel subjects were selected. Panelchairmen were recruited and became members of the planninggroup. Subject matter coverage for each panel was specifiedby the planning group. Each panel chairman then selectedmembers of his working panel. In addition, the planninggroup specified that the format and procedure for theworkshop should follow closely that of the first workshop.As in the 1975 workshop, attendance was by invitation only.Work was done by each panel prior to arriving at thework shop

.

1 .3 DATA BASE DIRECTIONS 1

1

The workshop was held November 1-3, 1977, in FortLauderdale, Florida. There were approximately 75 inattendance. Mayford Roark, Ford Motor Company, gave anexcellent keynote address in plenary session, reproducedherein. Final instructions were given as to objectives ofthe workshop, and all panels were in full operation by 10:30a.m., November 1. From then until the closing plenarysession, each of the four panels met separately with theultimate purpose of developing a consensus among its memberson which to base the panel report.

-4-

Each working panel chairman was expected to guide thegroup discussion and to assure preparation of a good roughdraft report prior to the closing plenary session. Thougheach panel chairman organized his panel in a form best forhis subject, each followed a general pattern. A recorderwas selected from among each panel's members to maintain"minutes" in visual form on flip chart sheets. These weredisplayed so that the principal discussion and consensuspo i nts were V i s i bl e .

One of the panels had done extensive drafting of reportsegments prior to the beginning of the workshop. Each ofthe other three accomplished the objective of rough draftpreparation by developing detailed outlines. Portions ofthe outlines were assigned to individual panel members or to

two person teams for draft preparation. When timepermitted, drafts were reviewed by others on the same panelprior to their incorporation into the final draft.

Following the completion of the workshop, the draftswere put into "final" form and circulated to panel membersfor final comment prior to submission to the proceedingsedi tor

.

Communication among the four panels was maintainedmainly through the panel chairmen and members of theplanning group who circulated among the panels. At theclosing plenary session, each working panel chairmanpresented a twenty minute summary of his panel's findingsand responded to questions or comments from the floor.

I. 4 CONCLUSION

The mix of participants with their differingperspectives, experiences, and technical expertise providesa base of knowledge in data base systems that may beunparalleled. The output from that group should be ofmaterial value, especially to those decision-makers acrossthe country carrying responsibility for data base futures.It is hoped that this publication can have an even greaterimpact than did Data Base Directions I because it is basedon two more years of extensive experience and because itattempted to be more direct and pointed in its advice to thedecision- makers. The real value of Data Base DirectionsII, however, will be directly proportional to the assistancethat this document provides to the several classes of users.

-5-

2. EVOLUTION IN COMPUTER SYSTEMS

Mayf ord L . Roark

KEYNOTER

Biographical Sketch

Mayford L. Roark is Executive Director ofSystems for the Ford Motor Company in Dearborn,Michigan. He has been in charge of the corporatesystems function at Ford since 196 5, as AssistantController and later as Director of the SystemsOffice before assuming his present position in1973. He joined the Ford Division of Ford MotorCompany in 1952 as Senior Financial Analyst, andmanaged various financial departments at Ford from1 955-1 965 .

Mr. Roark was a Budget Examiner at the U.S.Bureau of the Budget from 1947 to 1952.Previously he was with the U.S. Weather Bureau andthe Colorado Department of Revenue.

He studied at the University of Colorado,receiving the B.A. "Magna Cum Laude" degree(Economics), and the M.S. (Public Administration).He is a member of Phi Beta Kappa.

2.1 QUESTIONS

When I was asked to address this group, with its themeof "The Conversion Problem," I felt some puzzlement aboutthe thrust of the meeting. Was there a concern that database systems might be something of a special burden for theorganization faced with a conversion problem? Or, rather,was there the hope that the data base system might be

(something like "Seven League Boots," for making otherwisetough conversions into happy and speedy journeys? Or, wasthere a lingering horror that the data base systemsthemselves might turn out to be conversion nightmares asimproved DBMS resources become available?

-7-

As the working outlines began to arrive through themail, it became clear that all of these questions wereconcerns. As Jim Fry's statement put it, "The basic problemwe are addressing is the need to migrate data andapplications due to the introduction of new, or change inthe current requirements, hardware, and/or softwaresystems." In short, it appears that your intent is to map a

jungle.

I won't try to preempt the work of the individualpanels by offering any detailed map of my own. Rather, as a

frequent traveler through this jungle, my traveler's notesmay be of some help to your individual survey parties. Mycomments will take the form of generalizations andimpressions. As you survey a topic in depth, you may wellconclude that some of these were inaccurate. This hascertainly been the usual result in geographic history, asdetailed surveys refine the rough findings of earlytravel er s

.

As an initial generalization, let me categorizeconversion problems into four families of change:

Hardware Changes ,

Software Changes,

Evolutionary Application Development,

MigrationtoaMewDBMS.

I should like to discuss each of these families ofconversion problems, and to relate my own impressions as totheir frequency of occurrence and their significance withrespect to data base management systems. In an effort to beas specific as possible, I will try to rate the impact of a

DBMS in each case against a "BB scale "--that's shorthand for"Boon or Burden." If a DBMS, in my view, is a major boon orhelp for a given class of conversion problem, I shall scoreit as plus 2. If it is a moderate boon, that will be plus1. If DBMS is more likely to be a burden than a boon, I

shall score it as minus 1 or minus 2. Now let me proceed toa discussion of these families of conversion problems.

-8-

2.2 HARDWARE CHANGES

One of my chores at Ford is to undertake, with the aidof others, a "prior review" of every computer hardwareacquisition involving a change of processor or an addedrental of more than $10,000 a year. I have been doing thisfor a dozen years now, during which time I have signed offon about 3,000 computer projects. If Ford is typical of theuniverse, I would guess that 90 percent of all dataprocessing computers are likely to be replaced within 3 to 5

years from their acquisition. This would not hold forminicomputers used in dedicated, industrial controlapplications; these may remain in operation withoutsignificant change for a decade or more. Data processingcomputers do change fairly rapidly, however. For most ofus, 3 to 5 years is long enough to saturate the capacity ofwhatever computer we have and to move on to something morepowerful. We are also faced with a dazzling succession ofnew products every few years, always with substantialimprovements in cost effectiveness. So, for most of us, I

think the 3 to 5 year cycle is likely to continue forawhile.

The extent of conversion trauma from a hardware upgradedepends on the nature of the change. If the change involvesshifting to another supplier, an event that occurred insomething like 5 percent of our hardviare changes, theconversion effort can be an extended affair requiring a

major effort over a period of a year or more. In suchcases, a DBMS is likely to be something of a burden. In allprobability, the new equipment will require a shift to a

different DBMS. In any event, the logic requiringconversion will be somewhat more complex and difficult to

j

handle if a DBMS is present.

One of our divisions recently completed a conversionfrom Honeywell to Burroughs, a change made necessary becauseof reorganization unrelated to the system function. Some ofthe application programs to be converted had been developed

I

in Honeywell's IDS environment; they were converted to

j

Burroughs DMS II, IDS is a network system; DMS II is a

!

hierarchical structure, so one might guess that we would havehad problems. In fact, I am assured, both by our own people

t

and by the Burroughs conversion team, that this transitionijwent fairly smoothly. On the basis of that once- in-a-Ijlifetime experience, I would have to score the DBMS in this

Icase as minus 1. Perhaps it would have been minus 2 if the

i

application programs involved had been larger and more

I

compl ex .

-9-

Hardware conversion can also be a serious trauma whenmigrating to new products of an incumbent supplier.Suppliers do present us with new families of hardware fromtime to time that require structural changes in ourapplication programs and supporting software. Hopefully,this kind of change will be less frequent in the future thanit has been in the past. Even so, I suspect we are likelyto be faced with something of this kind every 10 years orso. Already, one hears strange rumblings about the kinds ofchanges likely to unfold 2 or 3 years hence in a new productline called "The New Grad." So far, DBMS has not figuredimportantly in any of our conversions within the hardwareproducts of a given supplier. On the basis of guesswork anda skeptical disposition, I would have to assume that thepresence of a DBMS for a major hardware family transitionwould be similar to that where a change in supplier isinvolved, possibly minus 1 or minus 2 on the BB scoringscale.

The remaining hardware conversions, which represent thevast majority of all hardware upgrades, involve moving upwithin a compatible family of products offered by a singlesupplier. In these cases, DBMS would not be much of a

factor, so we might score its presence as zero on the BBscale.

2.3 SOFTWARE CHANGES

Despite IBM's assurance that MVS is here to stay, ourexperience would suggest that we can look for "major"software upgrades or enhancements from any supplier every 5

or 10 years. At Ford, we are somewhat preoccupied at themoment with the MVS problem at our large data center inDearborn. The completion of this conversion will require a

major effort extending over a two-year period. There arethree substantial groups of data base systems that will beaffected by the conversion; these involve, respectively,IMS, TOTAL, and SYSTEM 2000. I have checked with each ofthe systems groups involved to see how they assess theimpact of DBMS at this point, when we are about midway inthe conversion effort. The managers working with IMS agreethat its presence has added substantially to the difficultyand complexity of the conversion task--somethi ng close totwice as much effort as would have been involved with non-DBMS applications. The IMS conversion involves more than a

new operating system, however. It requires the transitionto a new DBMS called IMS-VS. One manager sees the newpackage as offering many attractive new features. Anothermanager sees no incremental benefits at all for hisparticular applications; but he has no choice, his presentIMS software will not be supported under MVS.

-10-

The managers using TOTAL and SYSTEM 2000 see no specialconversion problem at all in going to MVS.

This sampling of experience adds up to a very mixedbag. As the warnings say about some drugs, a major softwareconversion "may be hazardous to your health "--but again, itmay not. I think we have to score the presence of a DBMS insuch a situation with a range from 0 to minus 2 on the BBscale.

We do have numerous other software changes, of course,on an almost continuous basis--new releases to old operatingsystems, new software packages to handle new functions, andthe like. There is nothing in our experience to indicatethat DBMS is much a factor one way or another in these minorchanges.

2.4 EVOLUTIONARY APPLICATION DEVELOPMENT

Now we move into the territory of what the systemsfunction is all about, and what DBMS also is all about.

Before getting into the particulars about this familyof change, we ought to be clear on one central point."Evolutionary change" is the natural state of computersystems, just as it is for biological systems. Perhaps I

!

should not push this analogy too far because there is onedramatic di f ference--evol uti onary change works more rapidly

i

in computer systems, where even 5 years can produce dramaticI changes in structure, outputs, and even objectives.

During the past 5 years, the workload at our majorjcomputer centers at Ford has been growing at close to 20percent annually. I might explain here that we attempt to

,

measure workload' in BIPS (Billions of Instructions{

Processed) for purposes of capacity planning. So, when I

I

say that growth has been close to 20 percent a year, I meanthat the BIPS have been growing at close to 20 percent

I

annually.

As best as I can judge, about half of this growthresults from new applications and about half from newadaptations of existing applications.

The new adaptations, which are often described by the\ rather condescending term "maintenance," include a lot of

II

things. One is simply the effect of volume. If we sellImore cars, other things equal, we will process more BIPS.

I

Another is changing requirements. In our industry, every' model year is, in a sense, a new game. The products maychange, terminologies may change, and data requirements may

-11-

change. One of the biggest sources of changing requirementsin the automotive industry is government regulation. Thework of the regulators never ceases, nor does the growth inrequirements for additional data elements in the burgeoningcomputer records that we must maintain as evidence ofcompliance with all sorts of directives from Washington.

Some of these changing requirements are massive thingslike OSHA regulations or like corporate fuel economystandards. Others seem almost trivial until they areexamined in the context of required changes in computerfiles. The industry is currently being asked to adopt as aninternational standard a 16-character vehicle identificationnumber. Our existing identification numbers, with 11

characters, turn up in more than 50 separate computer filesin North America alone. The cost of converting these filesand their related application programs to the newinternational standard is estimated at $3 million.

In all fairness, the government is not the only sourceof changing requirements, our engineers and product plannersare pretty good changers themselves. Our service partsfiles include roughly twice as many parts today as 10 yearsago, when our basic inventory control system was developed.Our organization people contribute more than a fair share ofchanging requirements. Every time there is a corporatereorganization, we find it necessary to go through a

restructuring of the hundreds, or even thousands, ofcomputer files and application programs that are affected bythe real i gnment

.

And even systems people are contributors to theevolutionary process, only we usually call the changes"efficiency improvements." Almost every systems activityhas its own more or less continuous effort aimed at cleaningup programs that run inefficiently, that are hard tomaintain, or that otherwise need overhaul.

To make another sweeping generalization, I would judgethat each of our systems activities annually rewritessomewhere between 5 percent and 25 percent of its.accumul ated code

.

I

I

What a fantastic opportunity for data base management I

systems! I would score the DBMS impact on this area of

i

evolutionary application development as plus 2 and proceedto the next topic except for one problem. The DBMS benefits?can be extremely difficult to realize in practice, and theprice of the cure is sometimes worse than the pain of the

i

ailment. i

-12-

Many of our files most subject to change are hugeaffairs. Some of these files run into the billions of bytesof data. For systems of this size, processing costs may becounted in 6 or 7 figures annually. An added overheadburden in a range of 30 percent to 40 percent can amount toseveral hundreds of thousands of dollars in added annualcost. One of our activities last year made an analysis ofoverhead associated with one of its large data base systemsand found that 70 percent of the instructions being executedresulted from DBMS overhead. This activity is in theprocess of restructuring its DBMS with the objective ofreducing processing requirements by a full 40 percent.

Several years ago, another of our activities launched a

large-scale data base system in which nearly half of thefile requirements were required for pointers. Processingrequirements in such a case can far exceed expectationsbased on any prior experience. This sort of overhead canmean a loss in processing productivity, something that hasto be weighed against any benefits in programmingproductivity.

The best score I can give DBMS, therefore, for itsimpact on "evolutionary application development" is plus 1.

It sometimes works well, but it can also be awfullyexpensive.

Somewhere in the deliberation of this "or relatedgroups," there ought to be some consideration of what can bedone to simplify DBMS technolgy. If this is not possible,

j

perhaps there could be some guides or standards to protectj

the systems designer with a modest problem from unwittinglystumbling into a huge solution out of all proportion to hisneed. Sometime in the future, I would like to go through anexercise like this again and conclude, without anyreservation, that DBMS is an unqualified boon for the

j

evolving system regardless of its size and complexity. Weare still in the very early stages of DBMS art.

2.5 MIGRATION TO A NEW DBMS

I have saved this family of conversion problems untillast. In a sense, it overlaps all three of the categories I

I

discussed earlier. Migration to a new DBMS may be promptedby a change of hardware; it may be forced by a change of

j

software; or it may be undertaken with a view to realizingI

the benefits we discussed under "evolutionary applicationsj

development." Still, the migration to a new DBMS is a majorI event that deserves consideration in its own right, whetherthe migration is from a non-DBMS environment or is the rarechange from one DBMS to another.

-13-

There is a logical inconsistency in assigning anyrating at all to this family of conversion problems. It islike asking, "What impact does DBMS have upon itself?" Yet,at the risk of sounding completely illogical, I am going toassign a BB rating of minus 1 to this category of conversionproblems on the ground that a DBMS is a high-riskundertaking entirely apart from whether it is related tochanges in hardware, software, or applications.

Moving to a new DBMS is not unlike the process of;

getting married, it takes a lot of desire, commitment,|

sacrifice, and investment. It involves moving to a whollynew lifestyle. Once there, the return to the old lifestylemay be difficult or impossible without wrenching adjustmentprobl ems .

I

I have several times had the experience of encountering i

a spokesman for some other organization who tells me, "We'vemade the policy decision that all future development work inour organization will be based on DBMS." This makes meshudder. It is a little like saying, "We've decided that

^

everyone should get married." I must add, however, that in ;

every instance cross examination has established that,policy or no policy, the organization in question still doesa lot of its computer business outside the DBMS environment.I expect this will be true of nearly all of us for many,many years to come, or at least until the DBMS technologycan be simplified to the point where it no longer requirestotal desire, total commitment, total preparation, and heavyi n vestment

.

Not too long ago, we compiled a catalog of all thecomputer systems applications in use at our North Americandivisions and affiliates. The listing came to somethinglike 50,000 application programs. Some of these resultedfrom major projects requiring scores of man-years to !

develop. Others were relatively simple. The full range of <

applications represents a wide spectrum of data needs and "

complexity. DBMS technology today does not really address i

this whole range of developmental requirements. Perhaps itnever should. In any event, migration to a new DBMS system

\

must be taken as one of life's climactic events to mostsystems people. We all need more guidance about when to I

undertake such a migration and how to stay out of trouble ^

when we do. i

These, then, are the major problems to be found in thisjungle of systems conversions. To summarize, my BB scoringshave suggested that a DBMS can constitute a moderate-to-heavy burden for major hardware conversions and major'software conversions. The presence or availability of DBMS, * ,

however, can be a moderate-to-strong boon for evolutionary'.

-14-

application development. Finally, the migration to a "DBMS

system can be a long and tedious journey, especially whereexisting systems of great size and complexity are involved.

DBMS', in short, still offers attractive visions of a

world in which systems might respond quickly to changingrequirements, in which new creative applications might beeasily spawned from existing data files, and where databases, in the words of Dan Magraw at the earlier conferenceof two years ago, can "move the DBM's into the area ofdecision making."

These visions ought not to be taken lightly. In

preparing for this meeting, I checked back with our managerswho have been in the DBMS mode for 5 years or more. Whatbenefits did they really get? Their consensus might besummarized as follows:

1. They, indeed, have been able to respond more quicklyto changing requirements, although not always aseasily as they once hoped.

2. New applications development has been easier. Themanagers see a productivity improvement factor in a

range of 10 percent to 20 percent.

3. Most important, the managers believe they havecapabilities that previously did not exist at all.

Let me expand on these capabilities for a moment.Those 50,000 computer programs I mentioned are a long-timeaccumulation in response to numerous problems andopportunities that were perceived by our division and staffsover a period of two decades.

Our typical division, which might correspond to a

moderate- si ze company, has its own accumulation, usually a

range of 1,500 to 3,000 programs. If the division is not

I

yet in a DBMS environment, each of those programs comes with[its own set of files. In the last two years, we have been[greatly influenced by the concepts of "top-down design" and["structured programming." For systems of recent vintage,these approaches may have provided a logical structure thatwould give some accessibility to files. For older systems,

j:with what our people call "spaghetti programs,"[l acc e ss i b i 1 i ty is something el se- - f ragmen ted and scatteredlifiles, with all the access keys hidden in "spaghetti code."

jAs functional entities for the purposes they were first

[created, these old programs may serve well. Even where wewant to launch a new application that will need to draw on

jithe data in these files, the problem is not overwhelming.

-1 5-

We can, and do, write programs to get at the needed datafiles, wherever they exist.

But suppose we do not want a new system. All we wantis an in-depth analysis of a difficult problem on which 5 or10 different files have something useful to say. This is a

sort of problem that gives computer people a bad reputationas being slow -moving and unresponsive. It can be extremelydifficult to extract data from 5 or 10 different sources,all with different maintenance cycles and data controlprocedures, and produce anything but a mess.

We do a lot of computer-based analysis at Ford becausewe have found that the data resources in our computer filescan open up all sorts of insights and understanding thatwould otherwise be lost. I wish we could do even more, butthis form of analysis can be very time consuming and i

frustrating in the non-DBMS environment. We have beenworking on one such exercise involving non-DBMS file sourcesfor more than three months, all because of the relativeinaccessibility of data. Last week we decided to skip a !

promising analysis altogether because it would have takenjj

more than a month to extract and organize the needed datafrom a variety of source files. t

i

So, when our managers talk about new capabilities fromj

DBMS, they are talking about one of the computer's most [•

important potenti al s- - the power to carry analysis toentirely new levels of understanding.

The benefits cited by our managers are impressivetestimonials. Why, then, has the data processing world beenso slow in converting to DBMS? I have seen no studies thatwould provide an accurate measure as to the proportion ofdata processing oriented to DBMS. In the absence of any '

sure data, I am going to offer the opinion that theproportion is no greater than 10 percent to 20 percent.

I have never expected that all computer systems would,;

or should, move to DBMS. In the early Seventies, however, (

as we first began to realize the potential of this,technology, most of us, even the conservatives I believe,would have expected that something approaching half of all

;

computer files would acquire a DBMS format by the end of1977. Even at Ford, where I believe the impact of DBMS hasfprobably been greater than average, the progress has seemed j;

slower than I would have expected. This painfully slowI

progress points up perhaps the greatest conversion problemof all--how can we move to a DBMS environment with existingsystems? I in

-16-

Most of our DBMS user divisions made their first movesto data base at least five years ago. A couple of thesedivisions went through a conversion trauma, eventuallyrecovered, and never undertook a subsequent DBMS project.Once was enough, they concluded.

The other divisions have continued to extend their databases in areas of new systems development. But, this leavesa very large accumulation of computer files more or lessuntouched by the new technology. One manager, possibly ourmost enthusiastic data base advocate, believes that about 40percent of his divisional data is now in DBMS format aftersome five years of development. He guesses that this figuremay reach 70 to 80 percent in another five years.

We have still another group of divisions with heavymaintenance workload who have elected not to try the DBMSapproach at all.

The problem is not unlike that of our Detroit skyline.We recently completed a magnificent new Renaissance Centeralong the riverfront, with some of the most beautiful hoteland office structures to be found anywhere. This isexciting, but there is still a long way to go to bring therest of the city up to Renaissance Center standards. Theaccumulation of history is still a huge obstacle to thosewho want the best of things right now. Omar Khayyam, whosummed up so many things in language that a sophomore canunderstand, expressed this frustration perfectly:

" ... could thou and I with fate conspireTo grasp this Sorry Scheme of things entire.

Would we not shatter it to bits--and thenRe- mould it nearer the Heart's desire!"

All of us, however, have to find ways to working withwhat we have inherited. We might all wish we could somehowget rid of the old mess and start all over again.-Unfortunately, that old mess represents an investment in thehundreds of millions of dollars for my company and in manytens of billions of dollars for all of us collectively.

tl

One of our divisions several years ago tackled thisfrebuilding problem in what seemed to me an innovative kindtof way. This division had identified 15 overlapping filesithat had evolved over the years, with inventories and otheridata related to parts. As might be expected, there was muchil redundancy , and it was difficult to reconcile one file toanother. A complete overhaul of the applications programsdid not seem feasible, but the division hit on the idea of a

DBMS master file to serve as a sort of front end to these

-17-

application programs. There was a saving of more than$100,000 annually in data preparation and data control.This limited effort brought the division into the DBMSenvironment and laid the basis for solid evolutionarydevelopment, including subsequent application revisions toexploit more fully the potential of the data baseenvironment.

If there is one thing I would particularly want to seecome out of this conference, then, it would be some easy-to-use, easy-to-apply , and inexpensive approaches forupgrading this great accumulation of computer files we haveall been working on for the last two decades or so. We haveall had tantalizing but too-brief experiences with databases as they can be. The question before us now is, whatcan we do to make those benefits available wherever we needthem?

Where does an organization with an accumulation of1,500 to 3,000 programs begin, without going out of businessfor a year or two? What tools can it use to map out itsdata resources? How can it go about restructuring these i

files without scrapping its investment in applicationprograms? And, even if the files can be rebuilt, what aboutthe necessary remodeling of the interface points within theold programs? I hear that someof you came up with answersto these questions. Perhaps you can point the way to thisnew data-rational world we all seek.

The realization of this promise is still a long wayoff. The real world of tomorrow will come when the DBMS cancontribute to the evolutionary process of applicationsdevelopment and to the full analytical use of computerresources, without exacting an extortionate price; when the

'

DBMS can aid in the evolutionary process of hardware andj

software change; and when the decision to go to a DBMS is no :

longer a high risk affair requiring an all-out commitmentthrough one of the most difficult conversion problems to befound in systems.

I have no doubt that something very much like thisworld of tomorrow will appear one of these days, in greatpart because groups like this one pressed on relentlessly t n

\

the definition of problems and the search for creative|

solutions. ^

I

-18-

3. ESTABLISHING MANAGEMENT OBJECTIVES

Richard L. Nolan

CHAIRMAN

Biographical Sketch

Richard L

consul tantsystems. AsCompany, Incthe Harvardc 0 n t r i

proce s

c 0 n t r i

areas

butedsingb u t i 0 n s

of:

Nolan is a researcher, author, andin the management of informationChairman of the Nolan, Norton &

, and a former Associate Professor atBusiness School, Dr. Nolan has

to improving the management of datain complex organizations. His

include major publications in the

The four stages of EDP growth

Management accounting and control of dataprocessing

Managing the data resource function

Computer data bases

Dr. Nolan's experiencecomputer systems includeswith the Boeing Company,Defense, and numerous

with the management ofearlier associationsthe Department of

other large U.S. andEuropean public corporations.

PARTICIPANTS

Marty AronoffI Richard CanningI Larry Espej

Gordon EverestRobert Gerritsen

Richard GodloveSamuel KahnGene LockhartJohn LyonThomas MurrayJack Newcomb

T. Wi 1 1 iam 01 1

e

Michael SamekSteven SchindlerRichard SecrestEdgar Sibley

I

-19-

3,1 OVERVIEW

"Assimilation of computer technology into organizationsis a process that has unique characteristics whichmanagement does not have a substantial base ofexperience to draw upon for guidance. Perhaps the mostimportant unique characteristic is the pace ofpenetration of the technology in the operations andconventional information systems." [Richard L. Nolan,"Thoughts About the Fifth Staae," Data Base , Fall1975.]

The pace of the assimilation of computer technologyinto the data processing organization is represented by theS-shaped "Data Processing Learning Curve."

The Data Processing Learning Curve is approximated bythe growth of the data processing budget and reflects thestaged evolution of the data processing environment alongfour growth processes:

Growth process T he po r tf ol i o of computer applications .

The programs and procedures which are used by tFeorganization in its business activities. The ApplicationsPortfolio represents the cumulative end product of the dataprocessing organization.

Growth process #2_. The data processing organization andtechnical capabi 1 i ties . The organization structures andtechnical c a p a b i 1 i t i e s found within the data processingdepartment which are required to develop and operateapplication systems. These include:

Data Processing Management Structure

Hardware and Software Resources

Systems Development and Operations Organizations

Growth process #_3 • Data processing pi a n n i n g and ma nagementcontrol sy stems . The set of organization practices used todirect, coordinate and control those involved in thedevelopment and operation of application systems, including:

DataProcessingPlanning

Project Management

Top Management Steering Committees

Chargeout

Performance Measurement

ASSIMILATION OF COMPUTER TECHNOLOGY OCCURS

IN FOUR STAGES

BUILDING THEAPPLICATIONSPORTFOLIO

BUILDING THEDP ORGANIZA-

mTION

<Dnmo BUILDING THEoo DP MANAGEMENTa PLANNING ANDir5 CONTROL

Gro^ DEVELOPINGUSER AWARENESS

Functional

Cost-reduction

Applications

Specialization

for Technological

Learning

Consolidation 1 Data Base

of Existing 1 on-line

Applications1Applications

/Middle 1 Layering

/Managernent 1 and

1 "Fitting"

Formalized 1 Formal

Controls j Planning and

1 Management1 Control

Held 1 Effectively

Accountable 1 Accountable

Stage I:

Initiation

Stage II:

Contagion

Stage Ml.

Control

Stage IV: Tim*Integration

Figure 2-1Processing Learning Curve.

j

Growth process #4_. The user . The members of line and staffdepartments who must use the applications systems in orderjto perform their jobs.

The nature of Data Base Management Systems (DBMS)'dictates that they interface with each of the four growthlareas. First, the DBMS acts as the data manager for all

I

types of application systems. Second, the DBMS introduces a

-21-

new level of technology to be assimilated by the dataprocessing organization, and it calls for the introductionof a new data processing organization structure: the DataBase Administrator. Third, the DBMS requires thatapplication system planning be more comprehensive and thatcontrol through "chargeout" be restructured to reflectshared resource usage. Fourth, the DBMS may impact the userby providing new functional capabilities and mechanisms fordata retrieval.

Because of the integral position occupied by the DBMSin the systems environment, the conversion to data basetechnology and its use should be carefully managed. Inmanaging the conversion to data base technology, dataprocessing managers should have a well articulated set ofobjectives regarding each of the four data processing growthprocesses. These objectives should form the foundation forgoals against which data base conversion activities aremeasured

.

The initial objective of the Establishing ManagementObjectives panel was to analyze the impact of the data baseconversion effort on each of the four data processing growthprocesses. Based on this analysis, the panel thendetermined the management considerations associated withdata base conversions. The management considerationsidentified by the panel can be summarized into four keyconcepts

:

. KEY CONCEPT #U DBMS CONVERSIONS ARE A MATTER 0£^EN?" NOT "WHETHER?"

Conversion from a non-data base to a data baseenvironment is a part of the natural evolution ofdata processing within an organization. A dataprocessing department which has matured andprogressed to a Stage III environment is typicallyfaced with high maintenance costs and an inabilityto respond to ad hoc inquiries and requests fromuser management for integrated reports. Thissituation, in effect, forces the data processingdepartment to employ data base technology torestructure the applications portfolio. In otherwords, conversion to a DBMS is primarily a questionof how soon the data base environment should beginto be constructed, not whether a data baseenvironment should be implemented.

KEY CONCEPT #2: CHOOSE THE DBMS CONVERSIONAPPLICATION CAREFULLY

The initial application used for conversion to data

-22-

base technology represents an important learningexperience for the entire organization. As such,the initial application should:

-be a non-trivial application-demonstrate the "power" of the DBMS facilities-be simple to avoid overextension caused byattempting to do too much, too fast

KEY CONCEPT #3: TREAT THE I NITIAL AND SUBSEQUENTDBMS CONVERSIONS SIMILAR T_0 OTHER SYSTEMS PROJECTS

Although a data base conversion introduces a newtechnology to the organization and requires theinvolvement of all areas in the systems environment,the risk exposure of this conversion effort can bestbe minimized by managing the conversion as any otherlarge project would be managed. The same planningand justification procedures should be used. Thesame project management mechanisms should beexercised throughout the project life cycle.Because of the major impact caused by a data baseconversion, special efforts should be made tocoordinate conversion activities with steeringcommittees, senior management, and user areas.

KEY CONCEPT #4 : PLAN AND STRUCTURE FOR FUTURE DBMSCONVERSIONS NOW; DBMS CONVERSIONS WILL BE A WAY OFUTFE

Certainly, once a DBMS is successfully installed,the conversion of applications to that DBMS willcontinue. However, the mature organization shouldalso plan on converting to another DBMS at somepoint in time. The second DBMS may be just anenhanced version of the first DBMS, or it may be a

totally new software package. In either case, it iscertain that a mature data processing organizationwill want to take advantage of new DBMS facilitiesand efficiencies; therefore, DBMS conversions willbecome a way of life.

To prepare for these continued conversions, the dataprocessing organization can take several steps to minimize

jl

their i mpac t

:

minimize the application system processing logic,ij program code and data base design dependencies onf the features of a particular DBMS

-23-

institutionalize the Data Administration function

fully document all business system functions on anintegrated dictionary

The overall DBMS conversion philosophy developed by theEstablishing Management Objectives panel can be summarizedas follows:

Appreciate the technol ogy , but recognize that DBMSconversion i s a^ management pr obi em .

The panel approached the topic of data base conversionin a chronological manner. As such, the following sectionsare organized to reflect management considerations duringthe life- cycle of data base conversion efforts:

. CONVERSION TO A DATA BASE ENVIRONMENT

. MINIMIZING THE IMPACT OF FUTURE CONVERSIONS

. CONVERSION FROM ONE DBMS TO ANOTHER DBMS

3.2 CONVERSION TO A DATA BASE ENVIRONMENT

A certain level of maturity is necessary beforeconversion to a data base environment is feasible. In

general, data base technology is not appropriate for dataprocessing departments in Stage I or early Stage II

environments. With these exceptions, conversion to a DBMSshould be initiated as soon as possible. However, the StageIII environment is most compatible with the initialconversion.

The following sections discuss the impact of conversionto a DBMS on the four data processing growth processes,n amel y

:

Applications Portfolio

Data Processing Organization

User Awareness

Data Processing Planning and Management ControlSy s tem

s

i

-24-

3.2.1 Impact On the Appi ication Portfol io

.

Conversion to a

data base environment will genera 11 y necessitate a

substantial restructuring of the organization's applicationportfolio to take advantage of the enhanced capabilities ofthe DBMS. The several potential approaches that permitconverting the application portfolio range from anevolutionary approach, in which new or replacementapplication systems are developed using data basetechnology, to a revolutionary approach, in which newdevelopment is suspended until existing application systemsare converted to the new environment. Regardless of theparticular approach selected, an ordering indicating a

priority for conversion must be developed for theapplication portfolio. Within this ordering, theentry -

1 evel application i s of critical importance ! Theabove topiFs are 9Tscussed in greater detail in thefollowing sections.

Approache s To Appi ication Conversion . Two basicapproaches exist for conversion of the application portfolioto a data base environment: revolutionary and evolutionary .

Actually, these two approaches represent opposite ends of a

continuum of approaches. No single approach is universallybest. In fact, more than one approach may be operablewithin a given conversion; i.e., certain application systemsmay be converted on a revolutionary basis while othersystems are converted in an evolutionary manner.

In the revolutionary approach to conversion, sometimescalled resys temi za ti on , one rewrites and restructuresexisting application systems as necessary to operate underthe new DBMS. Generally, one should avoid exclusive use ofthis approach for the following reasons:

Risks overextension caused by attempting too much,too fast.

Delays in one sub-project may impact others.

Insufficient resources may be available fordeveloping new systems during the conversion.

At the opposite extreme of the revolutionary approachis the evolutionary approach in which all new systems arejdeveloped under the new environment. Existing systems areInot converted but rather are replaced at the end of their(inormal life cycle. This approach reduces the risk ofjoverextension and the impact of delays in sub- pr o j ec ts

.

jHowever, there are disadvantages to an evolutionary'approach

:

-25-

Complex interfaces with existing, conventionalsystems are generally entailed.

Local inefficiencies and redundancy typicallyresult.

Current organizational deficiencies and constraintsmay be perpetuated.

Just as the conversion of the entire applicationsportfolio may be approached in an evolutionary orrevolutionary manner, so may the conversion of a single,existing application system. In other words, the entireapplication system may be converted to the new environmentat one time, or the conversion may take place in phases.The latter approach, in which the reporting and updatefunctions are converted gradually, using bridges, has theadvantage of early availability of both cross-functionaldata and the new features of the DBMS. Furthermore, greaterflexibility in scheduling the conversion is provided.However, the gradual approach has the disadvantage ofredundant development and data storage, and requiresincreased management to provide and control the conversionbridges

.

One temporary measure that may be employed to avoid orpostpone conversion is to extract data from existing masterfiles in order to build a transient, integrated data base.This data base is not maintained but instead is recreated ona cyclic basis. The data base is used for cross- functi onalreporting and analysis. This approach provides earlyavailability of cross-functional data and lends itself to a

specialized interrogation language. At the same time, thereare certain disadvantages to this approach:

Availability of data is achieved at the expense of

redundancy and reloading.

Problems of timeliness and consistency may becreated.

Basic application limitations are perpetuated.

Maintenance Moratori ums . There are never sufficientresources, nor is it appropriate, to permit continuedmaintenance and enhancements of application systems duringthe conversion to the data base environment. Moreover,conversion requires a relatively stationary target. Thus, a

moratorium on maintenance (or more accurately onenhancement) may be declared during conversion.

-26-

The declaration of a maintenance moratorium must be theresult of agreement among user, senior, and data processingmanagement. Senior and user management support is necessaryfor the conversion. However, if senior management is theprimary motivator behind the conversion, there will be somedegree of user resistance to a maintenance moratorium. Onthe other hand, if the conversion is driven by usermanagement, senior management will tolerate a moratorium onmaintenance only so long as it does not interfere withnormal business functions. In either case, user and dataprocessing management must jointly determine the scope andduration of the moratorium and agree to the circumstancesunder which it may be modified or cancelled.

A common device for invoking moratoriums is a steeringor priorities committee. Composed of data processing anduser management, the steering committee is responsible forapproving projects and establishing priorities. Thesteering committee does not manage, nor does it relievemanagement of its business responsibilities. Rather, itprovides a forum for discussion and has power derived fromits membership and sponsorship.

Analysis of Opportunities . Certain application systemsindicate better opportunities for conversion than others.The following types of applications represent goodopportunities for conversion:

An application system using many different masterfiles and/or many internal sorts, indicating theneed to represent complex data structures and tosupport multiple paths between data.

An application with a requirement for on-lineinquiry and/or update of interrelated data. A DBMSwould still be applicable, although not required, ifthe data were not interrelated.

An application system with chronically heavymaintenance backlogs, suggesting redundant dataand/or inflexibility with respect to its datastructures.

An application system requiring a broader view ofdata (either more detail or greater cross-functionalbreadth )

.

An application which crosses functional ororganizational boundaries (e.g., project control).

-27-

An application which cannot support basic businessneeds.

An application which provides data used by othersystems.

Certain types of applications represent pooropportunities for conversion. For example:

A purchased application which is maintained by a

third party supplier.

An application which uses historical data and whichis processed infrequently.

A recently installed application system which iseffective in satisfying user needs.

An analysis of the characteristics of the existingapplications based on the above considerations will yield a

preliminary ordering for conversion of the applicationportfolio. As the conversion is planned in more depth, thepreliminary ordering will be revised and refined to reflectsuch factors as precedence relationships regardingconversion, level of effort required, and the availabilityof resources .

Se 1 ec ti ng the Entry - level Appl i cat ions . In convertingto a data base environment, a key decision is selecting theen try -level application. In an ideal world, the initialapplication would be selected as the vehicle for makingmistakes and learning how to convert and how to manage theconversion. It would have a low profile and not present anyrisk to the business. However, the realities of the worldwill force initial conversion of a system which hasvisibility, contains some element of risk, and which must becompleted quickly. The factors listed below should beconsidered in identifying the best opportunity fordeveloping technical competence while simultaneouslyreducing risk and visibility:

The application should be representative and non-trivial .

It should be a good DBMS application (though notnecessarily the best).

It represents a relatively low risk to the business.

-28-

It provides sufficient opportunity for learning.

It is either an old system or technically obsolete.

It provides eventual visibility as a vehicle formanagement controls.

It is "owned" by a vocal, important, but neglected(by data processing), segment of the business.

3.2.2 Impact On the EDP Organization. This section discussesthe following topics relating to the impact of conversion onthe data processing organization:

Organizational considerations.

Technical aspects: tools and methodologies.

Data processing personnel skill requirements.

Organizational Considerations . Converting to a database environment generally entails reorganization of thedata processing function in order to provide the technicaland administrative means for managing data as a resource. A

key organizational consideration is the need to establish a

Data Base Administration (DBA) function within dataprocessing. Conversion will also impact the applicationsdevelopment and computer operations functions within dataprocessing;

Data base administration . The Data Base Administration(DBA) function is responsible for defining, controlling, andadministering the data resources of an organization The manyresponsibilities of the DBA function are not discussed indetail here since they are covered extensively in theliterature. However, some of the major responsibilitiesinclude the following:

Data base definition/redefinition. DBA must haveprimary responsibility for defining the logical andphysical structure of the data base, not merelyconsulting responsibility.

Data base integrity. DBA is responsible forprotecting the physical existence of the data baseand for preventing unauthorized or accidental accessto the data base.

Performance monitoring. DBA must monitor usage ofthe data base and collect statistics to determinethe efficiency and effectiveness of the data base insatisfying the needs of the user community.

-29-

Conflict mediation. DBA must mediate theconflicting needs and preference of diverse usergroups that arise because of data sharing.

Many alternatives exist for locating the DBA functionwithin the overall corporate structure. Three suchalternatives include the following:

Within the data processing organization. In orderto avoid an application orientation or an emphasison computer efficiency, DBA should, in general, notreport to Applications Development or ComputerOperations, respectively. Rather, DBA should reportto the highest full-time data processing executive.

Corporate level. When located at the highestcorporate level, DBA can take a broad view of dataas a corporate resource. Furthermore, DBA is in a

position to resolve conflicts between user areas.When DBA resides at this location, some of the moretechnical aspects of the DBA function are typicallyperformed within the data processing organization.

Matrix organization. This structure is patternedafter the aerospace industry where a given projectdraws upon all functional areas. In this case, theDBA staff would report functionally to DBA but wouldalso report directly to a project manager. Thisorganizational strategy has the advantages ofrecognizing the integration required for a database, puts DBA at an equal level with otherfunctional areas, and serves to increasecommunication during application development.

Within the DBA function, the two basic organizationalstrategies are functional specialization versus applicationareaspecialization.

Functional Specialization. This strategy organizesDBA according to functions performed, such as database design, performance monitoring, datadictionary, and so on. This approach has thedisadvantage of ensuring that no one person isknowledgeable about all aspects of DBA support for a

particular application system.

Application Area Specialization. In this approach,one person within DBA is responsible for performingall DBA functions for a particular application area,including both application development andoperation. This approach has the disadvantage ofdeveloping expertise within functional areas of DBA

-30-

more slowly. Furthermore, unless controlled,activities within DBA may become fragmented.However, this approach results in an interesting andchallenging job and facilitates attracting andkeeping capable personnel.

Applications devel opme nt . Conversion to a data baseenvironment will affect the applications developmentfunction within the data processing organization in severalways. The most fundamental impact upon applicationsdevelopment will be the change from an applicationsorientation to a data orientation. Conversion to a datebase environment should also broaden the scope of theapplication developers. Specifically, the developers needto understand the basic business processes and to developapplication systems that cross organizational boundaries.

The application development methodology will have to be

I

modified by delimiting the relative responsibilities of bothj

DBA and applications development. Moreover, the basicapproach to application development may be revolutionized asa result of conversion to a DBMS. Specifically, instead ofa rigorous approach to application development, the DBMS may

I

permit an iterative or convergence approach. With thisI approach, user requirements are not defined in detail beforei

developing the application system. Rather, userrequirements are defined at a more general level and a

system is quickly built using the DBMS. When presented withthe system outputs, the user specifies any required changes,

' which are then incorporated into the system. This processI

is repeated until the application system satisfies userneeds. Note that this approach to application developmentrequires a DBMS in which data base definition, creation, andredefinition and report writing are quickly and easilyaccompl i shed

.

iComputer operations . Conversion to a data base environment

I

will Tmpac t the Computer Operations function in two ways.I

First, many of the responsibilities of computer operationsfunction will be transferred to the newly established DBA

' function. Second, the characteristics of the applicationsystems may change. Specifically, the DBMS may facilitatethe development and operation of on-line applications as

I opposed to the more traditional batch systems,r Consequently, the computer operations function may have to

reorganize to operate within this more dynamic environment.

I

Technical Aspects - - Tool s and Methodologies . Becausei

the subject of DBMS selection has been covered adequately inthe literature, it was not addressed by this panel.However, the following are some of the tools andmethodologies typically required in making effective use of

-31-

the DBMS after its installation:

Data dictionary /directory. A tool for organizing,documenting, inventorying, and controlling data. Itprovides for a more comprehensive definition of datathan is possible in the DDL facility of mostcommercial DBMS's. As such, it is essential formanagement of data as a resource.

Data base design and validation tools. Used tofacilitate the design process and to validate theresultant design prior to programming. Included inthis category are such tools as hashing algorithmanalyzers and data base simulation techniques.

Performance monitoring tools. Useful in analyzingand tuning the physical data base structure. Thesetools provide statistics on data base usage andoperation

.

Application development tools. Used to facilitatethe development of application systems, includingsuch tools as terminal simulators which operate inbatch mode and test data base generators.

Data base storage structure validation utilities.Used to verify that a stored data base conforms toits definition or to assess the extent of damage ofa damaged data base. Examples include a "chainwal ke r" u ti 1 i ty .

Query/report writer facility. Enables users toaccess the data base and extract data without havingto write a procedural program in a conventionalprogramming language.

Data base design methodology. Needed to standardizetbe approach to data base design and to provideguidance in using the data base design, modeling,and monitoring tools.

Application development methodology. Specifies thestandardized approach to developing applicationsystems; i.e., the activities to be performed duringthe development process and the corresponding rolesand responsibilities of each of the various projectparticipants. Of particular importance is the needto define the points in the development process atwhich DBA and applications development functionsmust interface and the relative responsibilities ofeach with respect to application development.

-32-

Documentation methodologies. Needed by DBA todocument data definitions uniformly and to documentdata base design decisions.

Data Processing Personnel Skill Requi rements . Theimpact of conversion to a data base environment on skillrequirements will be considered in this section.

Types of skills . The following types of skills are neededin a dart a base envi ronment

:

Data Base Administration. DBA should be staffedwith individuals who are strong technically,interface well with people, and collectively areknowledgeable about the DBMS itself, the toolsnecessary to support it, the application developmentprocess, and the corporation and its data.

Logical data base design. Within DBA there is a

need for individuals possessing the ability torecognize and catalog data elements, to grouprelated data elements, identify relationshipsbetween groups, and to use the data descriptionlanguage.

Physical data base design. Within DBA there is a

need for individuals knowledgeable with respect toorganization techniques, data compression, trade-offs in data base design, simulation, and modelingtechniques.

DHL programming. DBA should include individualswith knowledge of the DML and its associated hostlanguage, data base navigation, and the currencyconcept.

Acqui s i ti on and training . Obviously, the required skillsmay be developed internally or acquired externally. Hiringthe required personnel has the advantage of bringingexperience and new ideas into the data processingorganization. However, individuals knowledgeable withrespect to DBMS are scarce and hence expensive. Moreover,individuals brought in from the outside typically have

I little, if any, knowledge of the business.I

Developing skills internally has the advantage ofI building DBMS skills on top of knowledge of the business.I Furthermore, control can be exercised over what is learned

i and when. Finally, it is generally less expensive anddisruptive than hiring.

-33-

When skills are developed internally, there are severalpossible approaches to training:

f

In-house. In this approach, staff personnel i

possessing the necessary skills teach these skillsto others by means of courses or joint projects.This approach may fit well with initial application c

development and has no cash cost. Moreover, themere act of having to teach their skills to others I

enhances the knowledge and understanding of the f

teachers themselves. There are severaldisadvantages to this approach: it requires thetime of the most capable personnel when they may bemore effectively used elsewhere; it cannot be used

I

where the required skills do not exist internally; i

as a closed system, it excludes differing points of 5

view.

Vendor. This approach utilizes the courses offered i

by DBMS and support software vendors. Vendor s

courses may be a relatively inexpensive approach,particularly when courses are bundled as part of the ,

purchase/lease price. Furthermore, the internalj

staff are likely to benefit from the expertise ofthe vendor. However, the courses may be only a i

thinly-disguisedsalespitch.

Other approaches. Additional approaches to traininginclude:

-independent educational organizations

-collegesor universities

-videotape/cassette courses

Tur nover . Conversion to a data base environment may resultTn empl oyee turnover. The new DBMS may be perceived by thestaff as being threatening and, hence, may be resisted.This resistance to change may be overcome somewhat byinvolving the staff in the series of decisions leading tothe acquisition of a DBMS. If required skills are obtainedthrough hiring, the existing employees are likely to resentthe high salaries paid to the new employees. Finally, asthe skills of the staff increase, so does their market valueand it becomes increasingly expensive to retain the staff.These three factors -- resistance to change, resentment ofnew hires, a.nd increased employee market value -- tend toincrease turnover following conversion to a DBMSenvironment.

-34-

On the other hand, certain factors tend to decreaseturnover^ Specifically, conversion to a data baseenvironment involves new opportunities for individual growthand excitement such as new technology, new hardware andsoftware, and major development efforts. Properlyexploited, these factors can increase job satisfaction andcorrespondingly decrease turnover.

3.2.3 Impact On Planning and Control Systems. With theexception of the chargeout mechanism, conversion to a database environment will not affect the basic mechanisms forplanning and control. However, recognize that conversion is

i t sel f a^ process to be ma naiged . This entail s applyingjustification procedures for conversion, planning theconversion, establishing review and approval checkpoints,and monitoring progress.

Planning the Conversion . Planning for the conversionrequires the involvement of senior, user, and dataprocessing management:

Attempts to convert to a data base environmentwithout senior management support runs a high riskof failure. If senior management has not formallyauthorized DBMS studies or incorporated DBMSplanning into corporate plans, the probability ofsuccessful conversion is remote.

Conversion will have a significant impact on userdepartments in the form of disruption of normal dataprocessing services, restructuring of applicationsystems, and a change in orientation on the part ofusers from ownership to sharing of data.Consequently, user involvement in planning theconversion is critical.

Conversion to a DBMS generally affects thestructure, system development methodology, personnelskill requirements, and hardwa re/ sof twa reconfiguration of the data processing function. Thelead time necessary to develop the appropriateinfrastructure for operating in a data baseenvironment must be appreciated and planned foraccordingly.

Given senior management support for the conversion, onestrategy for obtaining the required involvement in theplanning process is to establish a steering committee forthe data base as mentioned in the earlier section onMaintenance Moratorium. This steering committee containsrepresentatives from both user departments and from dataprocessing and is responsible for controlling the evolution

-35-

of the data base. As such, the Data base Steering Committeeis subordinate to the data processing Strategic SteeringCommittee, which is concerned with the evolution of theentire data processing function within the enterprise.

Given the appropriate participation, a necessary firststep in converting from a non-data base to a data baseenvironment is the development of an architectural plan forthe data base. This plan describes the intended structureof the target data base. Conceptually, a data baserepresents a model or image of the organization which it •

serves. In order for the data base to represent an accurateimage of the organization, it is necessary for the data basestructure to reflect the fundamental business processesperformed in the organization. Consequently, the designersof the data base must first understand the key decisions and

i

activities required to manage and administer the resourcesand operations of the enterprise. This typically entails a

:

c ros s- f un c t i onal study of the enterprise in order to i

identify the business processes and information needs of thevarious user departments.

!

The architectural plan permits planning and schedulingthe migration of application programs, manual procedures,and people to a data base environment. This implementationplan must incorporate review and approval checkpoints thatenable management to control and monitor the conversion!process.

Control 1 i ng the Conversio n. The actual conversion to a

data base erTvironment is effected by a project team composedof representatives from user departments, applicationsdevelopment, and data base administration. At formallyestablished checkpoints during the conversion, the data basesteering committee reviews the progress of the project.Items reviewed and analyzed include the following:

Projected benefits vs. actual benefits

Data quality (i.e., completeness, timeliness, andavailability)

Projected operating and development costs vs. actualcosts

Actual costs of collecting, maintaining, and storingdata vs. benefits realized

-36-

Project performance (i.e., performance of theproject team against the conversion schedule andbudget)

Based on the review, the data base steering committeetakes the appropriate approval action (e.g., go/no go) withrespect to the conversion activity.

Chargeback Considerations . The costs of operating in a

data base (i.e., shared data) environment are extremelydifficult to charge back to individual users in an equitablemanner. At best, complex job accounting systems can onlyapproximate actual resource usage. Moreover, the chargebackalgorithm must not be dysfunctional with respect to itsimpact on the various user departments. Conversion to a

data base environment frequently requires that a userdepartment supply data which it does not itself use. Thechargeback algorithm must reward, not penalize, suchbehavior on the part of the user department.

Some considerations in developing an appropriatechargeback algorithm include the following:

Consider capitalization of the costs of conversioninstead of treating such costs as current expense in

order to avoid inhibiting user departments fromundergoing the conversion.

User departments typically have little control overthe costs of conversion. Consequently, considertreating such costs as unallocated overhead, sinceallocation will have little effect on the decisionsor efficiency of the user departments.

Because ongoing costs of collecting, maintaining,and storing data are difficult to associate withindividual users, consider developing percentageallocation factors for these costs based on periodicreviews of data base usage. Alternatively, considertreating these costs as overhead.

Resource usage for retrieval and processing purposesare easier to approximate, and such costs should becharged directly to the users.

Consider incorporating a reverse charging mechanisminto the chargeback algorithm in order to compensateusers who supply data which they do not use.

3.2.4 Impact of Conversion On User Awareness. The mostfundamental impact of conversion to a data base environmentis the required change in orientation on the part of users.No longer are files and applications "owned" by a particularuser department. Rather, data must be viewed as a corporateresource to be shared by all user departments. Therequirement for sharing constrains the freedom of a user tochange arbitrarily and unilaterally the definition of thedata

.

Sharing of data will impact users in a second way.Following conversion to a data base environment, users maybe required to supply data which they themselves do not use.As already discussed, the chargeback algorithm must bestructured to reward such behavior. Furthermore, suppliersof data in general must be infused with a sense ofresponsibility (not ownership) for the data in order tomaintain data quality.

Conversion to a data base environment may impact theuser community in other ways:

Planning the conversion. User participation indeveloping both the architectural plan andimplementation plan is necessary in order to obtainuser commitment and to ensure that the resultantdata base satisfies user needs.

Disruption of operations. Normal data processingservices are likely to be severely disrupted as a

result of such factors as limited availability ofpersonnel and maintenance moratoriums. Furthermore,the conversion may disrupt and strain useroperations as new and old applications are operatedin parallel.

Resolution of inconsistencies. Creation of the database typically entails merging of application-oriented files. During this process,inconsistencies in both data definitions and datavalues are identified. These inconsistencies mustthen be resolved by the relevant user departments.

Structure of user department. The structure of a

user department may no longer be effective followingconversion; e.g., the user department may bedesigned around a particular application system.Restructuring application systems during conversionmay precipitate user reorganization.

-38-

New organizational roles. Conversion may cause neworganizational roles to evolve in user departments.For example, in order to provide coordinationbetween Data Base Administration and the userdepartments, a "user data administrator" may evolve.The user data administrator serves as the focalpoint for participation and involvement on the partof the user department both during and subsequent tothe conversion.

Systems analysis. By providing such tools as a

high-level query language and/or a generalizedreport writer, the DBMS enables non-technical usersto access the data base directly; i.e., users areless dependent upon programmers to satisfy simplerequests for information. This increasedavailability of data may result in the migration ofthe systems analysis function from data processingto user departments.

Personnel skill requirements. Conversion may impactthe skill requirements of user personnel. Forexample, conversion to a data base environment mayalso result in operating certain application systemsonline, requiring that the user department acquireor develop terminal operations skills.

3.3 MINIMIZING THE IMPACT OF FUTURE CONVERSIONS

The initial conversion to a data base environment is

not likely to be the only data base-related conversion thatan enterprise will undergo. Rather, conversions of one formor another are likely to be a way of life. However, thereare certain measures that the data processing organizationcan take to minimize the impact of future conversions,

11 ncl udi ng

:

Institutionalization of the Data Base Administrationfunction.

I

. Insulation of programs from a particular DBMS.

DBMS independent data base design.

j

3 . 3 . 1 I n s t i tu t i onal i za t i on of the DBA Function. A well-jestabl ished DBA function will minimize the impact of futureidata base conversions. Specifically, the DBA functionshould take action as follows:

-39-

J^aintain data definitions and relationships in anup-to-date data dictionary.

Document -the structure and contents of all databases independently of the data description languageof the DBMS.

Develop methodologies and standards for data basedesign which are independent of any particular DBMS.

3.3.2 DBMS Independent Data Base Design. The recent work ofthe ANSI/X3/SPARC Study Group on DBMS" i ntroduc ed the idea ofa conceptual data model. The topic has also been pursued i

extensively in research papers. In practical terms itamounts to design and documentation of the data base in a

form independent of any particular DBMS. In translating theconceptual model to the data description language of the

|

DBMS selected for implementation, the design decisions which S

are predicated on the characteristics of the DBMS are more J

clearly distinguishable from the natural structure of the i

data. Should the DBMS be changed subsequent to I

implementation, it is possible to focus more clearly on thej

structural conversions required of the data base and theapplications accessing the data base. As a point of 1

interest, use of the conceptual data model is appropriatewhether a DBMS or conventional files are to be used. }

3.3.3 Insulate Programs From the DBMS. Two sets of :

circumstances may motivate an organization to attempt toinsulate its application programs from any one DBMS. On theone hand, an organization may be unwilling to commitcompletely to the use of a particular DBMS. Rather, it maydesire to keep its options open with respect to convertingto a different DBMS at a later date. Alternatively, a largemul t i - d i V i s 1 0 n corporation may desire to develop common I

application systems for the divisions; yet it may find that .

the data processing organizations within autonomous '

divisions have installed different DBMS's.

It is possible to build an interface betweenapplication programs and the DBMS in order to isolate theprograms from the DBMS. That is, the programs do notinteract directly with the DBMS. Rather, standard program!requests for DBMS services are translated into the requiredDML statements either at compilation time or at executiontime. Thus, application programs are insulated from changesin the DBMS as long as an interface module can be developedto translate program requests into the DML statements of the

j

new DBMS. Similarly, a single application will executej

under any number of DBMS's as long as the appropriate!interface modules exist. Furthermore, the mul t i - d i v i s i oncorporation retains the flexibility of standardizing on a.

-40-

single DBMS at a future date. The negative aspects of thisapproach include reduced system efficiency and thepossibility of ending up with a pseudo-DBMS whosecapabilities represent the lowest common denominator of thevarious DBMS's for which interfaces are built or planned.Nevertheless, a number of corporations worldwide haveadopted or are adopting this approach.

3.4 CONVERSION FROM ONE DBMS TO ANOTHER DBMS

As a data processing organization goes through theexperiential learning necessary to assimilate data basetechnology, the functions and features of the data basemanagement system package currently installed will tend tobe more highly utilized. Users will have positiveexperiences with the facilities offered by the DBMS and willsubsequently place greater burdens on those facilities.Also, the technical capabilities of the DBMS will beincreasingly utilized by the data processing staff in orderto meet user requirements.

In short, the tendency to use the full functions of theDBMS over time will place a strain on the capabilities ofthe DBMS. This is manifested by either decreasing systemsprocessing efficiency or increasing effort necessary todevelop systems which meet user needs. These increasedcosts are recognized by both users and data processingpersonnel who then initiate a search for increased DBMScapabilities and, thus,' begin data base conversion effort.

This second type of data base conversion can becharacterized by either a complete change in data basemanagement system packages or an upgrade in the version ofthe DBMS currently installed. This section discusses theimpact of the conversion from one data base systemenvironment to another on each of the four growth processespreviously discussed. The section is organized as follows:

Reasons to go through the conversion.

Economic considerations of the conversion effort.

Conversion activitiesand their impacts.

Developing a strategy for the conversion.

-41-

3.4.1 Reasons for Conversion. As has been previouslydiscussed, the most prevalent reason to undertake a

conversion from one DBMS to another DBMS-1 to DBMS-2Conversion is to install a better DBMS . A better DBMS isusually defined as having:

Improved functions (more complete).

Better performance.

Improved query capability.

Development of richer data structures.

More efficient usage of the computer resourcethrough decreased cycles and/or space.

Improved or added communication functions.

Availability of transaction processing.

Distributed processing capability.

Another major reason to undertake a DBMS-1 to DBMS-2Conversion is to standardize DBMS usage within the company.Many large corporations are finding that the DBMS selectionsmade several years ago to meet specific application needshave resulted in the installation of several DBMS packageswithin the data processing organization. The impact ofmulti-DBMS usage in a single data processing environment ismajor. For example:

Application programs are constrained to the designand processing characteristics unique to each DBMS.

Data files are structured to be accessed by a singleDBMS.

Design and programming personnel develop the skillsnecessary to implement systems associated with a

single DBMS.

The multi-DBMS environment results in a substantialinvestment in data processing personnel technical skills andreduces the potential for integrating applications thatoperate on different DBMS's.

For these reasons many companies are now developingstandards for data base management system usage. Thosestandards are usually application systems to be developedunder a single DBMS. Exceptions may exist where theapplication to be developed is stand-alone in nature with a

-42-

low potential for integration with other systems.

The last major reason for DBMS-1 to DBMS-2 conversionis that such a conversion is dictated by a ha rdwa re change .

Many of the commercially available DBMS's are offeredlarge mainframe vendors. As such, a move from one hardwarevendor to another will necessitate a change in DBMS usage.This can become quite a complex effort in that the sourcecode and data base storage structures of all programs willrequire changes. If there is a history of hardwareconversions in the company, the wise data processing managershould select a DBMS that is not hardware dependent.

3.4.2 Economic Considerations. A key concept that wasintroduced in the previous section is that the data baseconversion effort should be analyzed and managed like anyother high-risk systems project. The same concept appliesto a DBMS-1 to DBMS-2 conversion effort. As a result, theDBMS-1 to DBMS-2 conversion should be justified on the samebasis as any other systems development effort is justifiedin the company. An economic justification should be made onthe basis of costs and benefits associated with the database conversion. The economic justification is particularlyimportant if the major reason for the data base conversionis either better DBMS o r standardization o f DBMS usage . Fora hardware change" tT^e cosl and benefits Tssoc i a ted with theDBWS conver si on should be included in the justification forthe hardware change.

The economic justification for a DBMS-1 to DBMS-2conversion should be based on a succinct articulation of theCOSTS and BENEFITS directly associated with the conversioneffort. Costs should be identified on an incremental basisand be classified into three categories:

One-time conversion costs.

Incremental costs for each planned application to beconverted to the

new DBMS.

On-going DBMS support costs.

Benefits should likewise be identified on anincremental basis within the same time frames as theassociated costs. Benefits are divided into two categories:

Discernible/definable cost savings in development,maintenance, and operations.

-43-

Intangible cost savings.

A more complete description of the types of economicconsiderations to be addressed is contained in DATA BASEDIRECTIONS : THE NEXT STEPS , National Bureau of StandardsSpecial PUb 1 i c a t i on 45 1

.

Above all , the justification f o

r

d_ data base conversions h oul d be developed and commun i ca ted to management in thes ame ma nne r that any other pr oj ec t i s justified .

3.4.3 Conversion Activitiesand Their Impact. The impact ofa DBMS-1 to DBMS-2 conversion effort can be felt on each ofthe four growth processes previously discussed. Many of thetypes of impacts are the same as those previously identifiedin a conversion t£ data base technology. Users and dataprocessing management should recogn i ze t ha t many of the sameexperiential learning processes occur in subsequent data baseconversions as they do in the initial effort.

Impact On A p p 1 i c a t i o n Portfol io . The impact ofsubsequent dTta base conversi ons on application portfoliosoccurs in three areas:

Application programs.

Data bases

.

Ca tal ogued modul es

.

A p p 1 i c a t i 0 n programs . Because application programs arebuffered from actual data storage structures by the DBMS,the unique characteristics of each DBMS will have a majorimpact on application programs in the following areas:

DBMS "call" structures.

Programs view of data and mappings (model,structure, content).

Application program logic.

Data communications.i

Data bases . Physical data storage structures and logicaldata rel ationships are implemented via unique DBMS utilitiesand are patterned after distinct DBMS requirements. Assuch, data bases developed under one DBMS are not readilyaccessible by other DBMS packages. Specifically, data basesare impacted by the vagaries of data base management systemsin the following ways:

. Data base definitions in both the DBMS and in theData Dictionary.

Data content and storage format.

Use of data base design and simulation aids.

Conversion aids.

Catal ogued modul es . Though processed just as any otherprogram, catalogued modules differ from application programsin their function and method of development. The specifictypes of catalogued modules which are impacted by a changein DBMS are:

Catalogued queries.

Catalogued report definitions.

Catalogued transaction definitions.

Impact On the Data Processing Organization . As waspreviously discussed in the section on conversion to thedata base environment, the major impact on the dataprocessing organization structure is the implementation ofthe Data Base Administration organization. Because this DBAstructure has already been integrated into the dataprocessing environment during the initial data baseexperience, the conversion from one DBMS to another will nothave a major impact on it. Only procedural fine-tuning willbe required as the functions of the DBMS change. However,it should be recognized that a substantial learning curvewill likely exist as the new DBMS technology is assimilatedby DBA personnel. Other organizational structure changes inthe data processing environment as a result of the DBMS-1 toDBMS-2 conversion will be minimal.

The major organizational impact throughout both dataprocessing and users areas is likely to be in the technicaland functional education required before, during, aTTB afterthe conversion effort. Data processing and user personnelin all areas of systems development and operation will haveto be trained on the new aspects of the DBMS. Trainingprograms for all people should be identified and initiatedin advance of the conversion implementation.

Another major area of impact on the data processingorganization from a data base conversion is themodifications in docume ntati on necessary to accommodate thenew DBMS environment. Changes in documentation will occurin the following areas:

-45-

DBMS functional anddocumentation

.

technical support (reference)

Functional and technical descriptions ofapplication systems converted onto the new DBMS

any

Physical and logical descriptions of any dataconverted .

bases

User- oriented descriptions of application systemsprocessing characteristics.

System development methodology documentation thatreferences particular aspects of data base orapplication development.

Impact On Data Processing Management Control Sy stems .

As previously discussed, Data Processing Management ControlSystems comprise those sets of procedures regularly used tocontrol both systems development and operations functions.The conversion from one DBMS to another is not going tomodify the conceptual framework used to control the data

specific changes willprocessing env.i ronmen t . Howeveraffect the mechanics of control:

changes

Data processing budgeting and user chargeback .

chargeback algorithm used to charge users forprocessing services is likely to change becausemodifications in:

DBMS overhead (cycles).

DBMS space requirements. •

methods of implementing logical relationships.

ownership of data items.

differences in efforts required to designimplement application systems.

Thedata

of

and

differences in methods usedqueries and periodic reports.

to structure ad-hoc

methods of charging end-user costone-time costs of conversion,allocating these charges can also be importantlump sum vs. periodic payments).

centers for theThe time-frame of

(one

Systems devel opme nt me thodol ogy

-46-

There will be changes in the time frame and types ofeffort required in systems development.

Conceptual approach to developing systems may changedue to total effort or time frame required togenerate sample reports on test data bases.

Design procedures in the methodology not likely tochange if DBMS facilities are similar; only jargonshould change in documentation.

Data processing pe rformance measurement .

Systems development and operationswhich data processing personnelmeasured should change due to newespecially to a new learning curve.

standards byare regul arlyfunctions and

Computerstandardstechnol ogy

.

Integrity control

.

operations performancewill change due to

monitors andnew processing

Adequate data base backup should be carefullyanalyzed and managed during the conversion process.

Operational restart/recovery procedures will changedue to new DBMS functions or utilities.

Processing of data exceptions may differ.

Secur i ty control .

Differences in methods of data access securityshould be recognized.

Data manipulation restrictions may vary from DBMS-1to DBMS-2.

Privacy control

Where appropriate, special care should be takenall privacy disclosures are logged during a

base conversion per recent government regulations

thatdata

Impactconver SI onsas possible

Oji

i s

to

User Areas A key note of data basethat the conversion shoul d be a s transparentuser areas . This axiom holds that the

processing i mpac t on user areas should be held to a minimumand that the' necessary technical capabilities to support theconversion should reside in the data processing area.

-47-

The impact of the data base conversion effort should bereadily apparent to users regarding:

Functional improvements (e.g., new query language).

Increased data content (e.g., "while we arechanging, let's add ...").

Increase in sharing of data will highlight datainconsistencies, validations, and format errors.

Possible planned disruption of services duringconversion period.

Data ownership changes.

User mental images or expectations may change.

Archival data capabilities may change (e.g., meetingthe needs of IRS, EEO, etc.).

3.4.4 Developing a Conversion Strategy. The well-manageddata processing installation should carefully articulate a

data base conversion strategy and plan before initiating anyconversion effort. Specifically, the data processingmanagement personnel should do the following:

Determine specific conversion objectives.

Analyze pros and cons and develop a memo ofrationale.

Develop a conversion strategy.

Develop a detailed plan for procedures, data, andprograms.

The following is a list of possible strategies whichmay be undertaken:

Unbundle conversion of procedures, programs, anddata (conversion may affect all)

Build bridge from old programs to new datastructures

Convert non-vital program first

Migrate data and program conversion as maintenancethreshold is approached

-48-

Consider bringing in outsiders to convert procedures

Run parallel for vital systems:

-frustrated users may be a problem.

-you will probably be turning up old bugs.

-parallel may be security blanket only.

Avoid parallel running where possible by carefulplanning and phased cut over or "fast" cutover withfall back contingency plan, but note the possibilityof high risk exposure.

Map how all users use the shared resources.

Split input stream between old and new applicationsand plan valildity checks carefully.

Use DBMS facilities such as dumping/loading andmapping, if available.

Convert data all at once, then convert programs asneeded.

Restructure data if necessary before any programconversion.

Benchmark first.

Once the appropriate conversion strategy is determined,a detailed conversion plan should be developed. Thefollowing is a list of considerations to reference whendeveloping a conversion plan:

Develop a work plan as in any other systemsdevelopment effort.

Append original feasibility documents and review inthe light of latest detailed specifications forconversion.

Re-evaluate "Go/No Go" at pre-defined checkpoints in

conversion process.

Subject the conversion process to standard projectmanagement control

.

-49-

Develop a contingency plan.

Use planning document as an education tool forpro j ec t personnel

.

Evaluate and select conversion aids.

Flesh out impacts (see section on ConversionActivities and their Impact above) into manageabletasks. Schedule these tasks and establish a

reasonable v^ork breakdown structure.

Go through planning document with an implementationha t 0 n .

Control project on at least a weekly basis.

Plan for heavy user involvement in data translationto resolve inconsistencies and consolidatevalidationchecks.

Involve external and internal audit staffs.

3.5 SUMMARY

In summary, the panel on Establishing ManagementObjectives analyzed the impact of a DBMS conversion in termsof the four growth processes along which the D.P. functionevolves, n amel y

:

the portfolio of computer appl i cat ions

the D.P. organization and its technicalc a p a b i 1 i t i e s

the D.P. planning and ma nageme n t control sy stems

the user

Two types of DBMS conversions were addressed: (1) theinitial conversion to a data base environment, and (2) theconversion from one DBMS to a second DBMS.

Four key concepts summarize the findings of the panel:

Conversion to a data base environment is primarily a

question of how soon the data base environmentshould begin to B"e constructed, not whether a database environment should be implemented

Because the initial data base application representsan important learning experience for the entireorganization, great care must be exercised inselecting the entry -

1 e vel data base a p p 1 i c a t i o

n

The exposure to risk associated with a DBMSconversion is minimized by managing the conversionas any other large systems project

An organization will likely be involved in multipleDBMS-related conversions, from the initialconversion of conventional applications to data basetechnology to the conversion of applications fromone DBMS to a second DBMS. Accordingly, the D.P.organization should plan and structure for futureDBMS conversions now in order to minimize the impactof future conversions

In closing, appreciate the technology involved, butrecognize that a data base conversion is a ma nagementproblem.

I

-51-

4. ACTUAL CONVERSION EXPERIENCES

James H. Burrows

CHAIRMAN

Biographical Sketch

James H. Burrows is Director of the Institutefor Computer Sciences and Technology within theNational Bureau of Standards. At the time of theworkshop he served as Associate Director of DataAutomation, USAF and was responsible through theDirector for all data automation matters withinthe Air Force. He directed, controled, andmanaged the Air Force's overall data automationprogram. Prior to that position, he was TechnicalDirector for Command and Information SystemsDivision and Department Head for the ComputerApplications Department at MITRE.

He is a graduate of the United StatesMilitary Academy and has degrees from MIT and theUniversity of Chicago. He is a member of ACM,Institute of Management Sciences, AAAS, andAmerican Society of Public Administration.

Partici pants

Edward ArvelMichael CarterJoseph Col 1 i ca

El i zabeth CourteAhron David

i

Ruth DykeHalaine MaccabeeSteven Merritt

Alfred Sorkowitz

-53-

4.1 INTRODUCTION

The members of this panel reviewed their experiencesunder various conversion scenarios to identify the criticalfactors which led to success or led to delays and failure.The panel felt that the resultant list would benefitmanagers facing data base conversion projects. While itwould have been noteworthy if some set of tools,methodologies, etc. had emerged from the experience of thegroup which would make any future conversions easy andriskless, the panel found none. In the panel's opinion, a

successful conversion takes tedious preplanning and carefulexecution; and in the current state of practice no knownpanacea ex i st s

.

This panel will not try to tell you how to justify database conversion but will give its best advice on what toconsider when approaching the conversion task. The panelbegan its considerations with the assumption that thejustification had been previously determined. The panelagreed that its experience proved conversion justified. In

fact, for some panel members conversion was unavoidable.Others felt they had demonstrated the value of conversion toDBMS but with some reservations. In eitherexperience may help you.

TEN CONVERSION EXPERIENCES

Scenarios:

Manual to DBMS 2

File to DBMS 6

DBMSl to DBMS2 2

1 involved machine1 involved machine

changechange

Ha rdware

:

IBM -Univac - Honeywell

DBMS Packages:

IMS - TOTAL - DMS 1000 - S2K - IDS I & II

Table 1: Conversion Experiences

-54-

The panel consisted of practitioners, people who foughtIn the trenches and who made or followed decisions to useDBMS techniques in actual si tua t i ons--and who, in mostcases, had to suffer the barrage of consequences.

The group reviewed only recent experiences, althoughseveral participants have been using the concepts andtechniques since the early 60's. Table 1 presents a summaryof the types of conversion scenarios reviewed, the machinesinvolved, and the DBMS variety.

Specific details and guidelines that resulted from thereview of these experiences during the workshop are detailedbelow in the section called, "ANNEX: CONVERSIONEXPERIENCES."

4.2 PERSPECTIVES

During the course of its discussion, it became clearthat the panel and the people with whom the panel dealt sawDBMS technology, its promises, and its threats differently.One of the simplest views sees a batch system with an on-line access method to allow queries and, possibly, on-linedata-entry, e.g., a permanent operation using a leasednational timesharing network for access during the day withovernight update on owned facilities. One might use thisoperational mode as an educational step in the process ofselling and orienting the user and technical personnel tothe power and first level details of DBMS or as anintermediate step in transitioning from one DBMS to another.Another view saw the DBMS as a "super access" method,promoting fast response and reducing the manpower costs ofresponding to new requirements. A third view saw data basesystems as the best way to permit ad hoc on-line access anda fourth saw them providing multi-file processing withminimal expenditure of technical talent.

However, all of the above were considered subsidiary toa view that DBMS controls the growth of information insupport of the corporate business act i v i t i es- -wi t h less painand suffering for all participants.

Many also see DBMS as a new sequence of buzz words andacronyms: schema, root segment, subschema, DDL, DML; a newmystique; a new barrier between the ADP community and thereal world. As an industry, we may yet learn that grist forthe Ph.D. mill and true payoff to the business world are notnecessarily the same. But, the boss knows it and suspectsthese new "academically sponsored" techniques. Who elseinvents language sounding like that above?

-55-

And as usual, the new technology threatens the currenttechnol og i st s

.

Many of those selling DBMS emphasize the "total"commitment to the integration possible using a DBMS. Thisconvinces management that DBMS, more than a new risk, means"total" risk. Who would make a decision to so "expose" thecompany? Many managers hear about the attributes of DBMSthat make it seem a new technology that looks good foreverything. But managers have seen other new technologieswith similar histories. They share with DBMS the followingtra i t s

:

More variants seem to appear daily

No one seems to be in charge of standards, or evenhave a clue on how to start

What is available on the market is of unevenqua 1 i ty--ease of use, reliability, power and vendorsupport seem to be random variables

No accepted guidelines on "how to use" and "what toavoid" in the new technology seem to exist. Thisdisconcerts managers.

The technology conflicts with other forces. In DBMSthe basic principle is this: the corporation shouldconsider its data as a central, corporate resource,not a private resource of each local subunit. At

the same time, it is clear to the large national andinternational corporations that local authority isthe only way to inspire local responsibility.

Confusion over the fundamental purpose of the newtechnology. It is not clear whether the advocatesof DBMS are technologists trying their best to meetrequested /perceived needs or management specialistswho see an opportunity to enforce central control.In any case, DBMS technology claims advantages forthe user whether the corporation pursuescentralized, decentralized, or distributedresponsibility, authority, or data processing.

-56-

4.3 FINDINGS

The panel found management's attitude about dataprocessing shifting to concentrate on the data aspects.Data is not a private resource for local exploitation.Therefore, management must commit to both plan for andcontrol the use of the data.

The first commitment, to plan for the use of dataacross the corporate body, is clearly a new job that isbeyond the purview of any single department and also beyondthe authority usually vested in the data processingdepartment. Anyone trying to do central planning faces a

struggle because each private domain will resist change.

Not only a new agent, but also a new function is neededfor central planning: Data Base Administration. Performedat two levels, this function may not be under a commonmanager. The first level is development/design oriented.This group designs the data base, makes available theappropriate tools and utilities, analyses the usage of thedata base(s), and restructures the data base. This groupalso establishes the procedures for the second level, theopera t i on- or i ented Data Base Administrators.

These operat i on- or i en ted Data Base Administrators mustdeal with the physical aspects of the data base. Such oldconcepts as allocation of storage, protection, recovery,dumps, verification, etc., are performed in quite differentways under a DBMS.

Typical figures for the size of the DBA group may fallbetween 4 and 30. Only in very small installations, is it a

one man job.

The panel unanimously warned against an ov erambi t i o us

project size for the initial data base application. Thisusually results in errors in cost and time estimates. Theanxious user withdraws to a "wait and see" mode andannounces changes in his needs as the project progresses.The too-large project will probably be unable to cope withsuch changes, even if the original goals are successfullydelivered. All of this leads to alienation of the users andmanagers and a distinct hesitation to continue with the newapplication technology.

The panel noted the availability of packages at manylevels. However, knowledge of how to best choose and use a

data base system is in short supply in any organizationabout to enter the data base world. Help from bothconsultants and suppliers is available and should be used.

-57-

4.3.1 Industrial /Governmental Practices.

The deliberations on pre-planning and planningidentified a dichotomy between private industry and theFederal Government in both the processes controlling and theresults accomplished.

Indust r i al /Commerc i al (I/C) practice, when makingdecisions on hardware and support software (DBMS,Telecommunications, etc. packages), requires a study of thealternatives followed by negotiations with acceptablesuppliers. The suppliers would be asked for their help indetermining how to use their system. I/C firms may insistthat the DBMS and other packages come from an independentsupplier of software.

This practice allows for a cooperatively developedproposal to be put before management; one in which eachparty to the proposal knows his role. More than one firmmay be requested to make such a proposal. Outsideconsultants who are familiar with the various aspects ofsuch a proposal may be used "to keep everybody honest." Suchan evaluation may take nine months to a year, butsignificant preliminary design on how to use the to-be-supplied hardware/software system would have been done.

In the Federal Government practice, whose majorprocurement dictum is maintenance of competition, theselection of hardware is predicated on a functionalspecification. For systems to be implemented in-houseimplies a hardware functional specification to permitvarious subsystems of the hardware to come from differentvendors. Specific details of support software cannot bespecified if such specification gives special advantages toonly one vendor or, alternately, eliminates too manycom pet i tors

.

At best, this leads to a watered-down softwarespecification that may not be strong enough to support theintent of the internal Federal Government user. At worst,this leads to a strong version of the specification suchthat no one can supply as an off-the-shelf software system.Testing such a system will take place long after selection.Systems lacking extensive field use are notorious for beingunstable and poorly matched to the job at hand. Examples ofsuch systems are the WWMCCS and the ill fated ALS. It alsomeans that non-vendor developed support software is seldombid by the equipment manufacturer.

-58-

In terms of time, this system of procurement produces a

one or two year procurement process which requires a

benchmark testing period oriented to some standard language.Such benchmarks cannot take special advantage of a vendor-specific operating systems or DBMS.

After the hardware is selected, the internal softwaregroup must do a preliminary detailed design, using thespecific hardware/software selected in order to determinewhether the original requirements can be met. At best, thisleads to- an additional one to two year delay over industrialpractice. At worst, it leads to a squabbling, contentiousconfrontation between two groups with even a possibility oflaw suits, threats, and ultimate abandonment of the originalobj act i ves .

Buying high technology from an adversary point of viewrather than a mutually cooperating point of view leadsinevitably to extensive delays and increased costs--if notto outright failure.

Note also that during the lag in the Federal Governmentprocurement cycle, several things occur. The problemchanges in scope. As noted in the keynote address,industrial growth in capacity required each year (stated notin dollars but in computation) grows 10-20 percent per year.The user gets discouraged and cancels; or worse, is giventwo more years to extract new promises from the internal ADPmanager. In addition, while waiting for ha rdwa re/ softwa reselection, the internal ADP team must: 1) Bide its time; 2)gamble on the winner and proceed concurrent withprocurement; 3) disband; or 4) assist the user in makinglarger plans. The first is wasteful, the second is risky,the third introduces delay and confusion after selection andthe fourth leads to extensive over-commitment of both theto-be-acquired hardware and the internal manpower.

Thus, the panel found significant differences betweenI/C and Federal practice. While the differences may havemuch larger implications they also increase hardware costsfor the Federal Government when buying a DBMS.

A way to avoid some of the difficulties described is

for the federal agency to create a truly functionalspecification in the user's terms and to ask for a

competitive turn-key total system. This will eliminatethird party sources, etc., since the total system is to beprocured and supplied by a single source. However,upgrading the system to handle additional growthrequirements would require, at best, an interim upgrade and,at worst, a totally new procurement process. This would beanathema to a concern with a profit motive. Simply stated.

-59-

cost-conscious organizations would not consider the FederalGovernment's methods to be good business practice.

4.3.2 The First DBMS Installa tion. When installing a DBMStor the first time, management , user and service suppliermust "get its feet wet" in a quick and successful project.This argues for phasing implementations into small packagesdeliverable in, at the most, four to six months. Thisallows each group to develop confidence, give feedback,adjust its plans and expectations to match better the tasksat hand and to feel, in general, comfortable with thechanges.

Installing the first DBMS will result in changes tolines of authority and responsibility, some real and someapparent. Do not let the new data base administrators bestymied by the old war-lords. Fiefdoms exist and must becontinued but the data base administrator is responsible tothe organization as a whole and needs the necessaryauthority and management support. Of course, this newoperating mode will require new responsibilities andcorporate approval before old negotiated responsibilitiescan be discarded.

One way to get assistance in promoting acceptance ofthe data base administrator is to keep the communicationlines open. This enables all to observe the changingenvironment, to get frequent statements of managementsupport for the data base administrator, and to reaffirm theresolve to continue and complete the desired conversion.

It may come as a surprise to old hands in the businessthat conversion to data base environment from either a

current file system or a manual system will each require a

fault tolerant or "forgiving" mode for data base generationand update. Even from automated files, there may well beillegal values, voids, etc. in the file. This could cause a

significant number of errors to be found by the edits whenbuilding the new data base, especially if internal DBMSpointers are value dependent.

Your users will need help. Do not "over automate" onthe initial change from manual files. Use the current datain its current forms to meet today's needs. Get it into thedata base and clean it up as the data is needed for newapplications. Cleanup can be a significant and unplanned-for activity. Accept the dirty data and give some serviceof value. Do not take the position that it's their faultthat the data base cannot be generated. Help them.

-60-

For those converting already automated systems, keep a

parallel backup for each full system conversion until thenew system is solid. This may be three to six months.

Try scanning a file for range of values on an item:illegal or nonsense values are frequently present.

4.3.3 Desired Standards.

The panel regreted the lack of common terminology inthe data base management arena; indeed it appears that everynew development of any size brings with it an opportunity tocreate a plethora of new names and verbs to distinguishminor variations. The underlying fundamental concepts shouldbe given standard terminology and variants clearly explainedand justified.

Conversion tasks would become easier if each datamanagement system had the same functions available, orpossibly a basic subset of some superset. Common functionswith common definitions would create the same result for theuser, even though the implementations varied. This may beboth unachievable and undesirable, given the severaldiffering fundamental file forms. However, it may bepossible within classes of data base systems.

A third useful area for standardization is a micro-language (atomic verbs) in which the functions (commands) ofa data management system can be described so that thedetailed specific actions of each command are obvious. Thiswould make definitions more clear and precise. It wouldfacilitate comparing DBMS's and it would facilitate re-implementation of a DBMS or emulating one DBMS with another,an action which might be needed to facilitate changinghardware and DBMS.

4.3.4 Desired Te chn q]_oq

The panel saw the need for tools to assist conversionfrom one DBMS to another. Unfortunately, they might have tobe dependent on a particular situation but converting oneDBMS file to another should be possible without going backto transaction mode or card image mode, the most prevalentway of generating a data base.

Another technology development needed is the creationof a model - independent data description that could beautomatically mapped into any Data Description Language.

-61-

4.4 TOOLS TO AID IN THE CONVERSION PROCESS

4.4.1 Introduction

.

The rapidly changing field of dataautomation ensures that opportunities will always beavailable for transition of application software. Thistransition may involve evolving from a non-DBMS environmentto an exclusively DBMS arena, transitioning from onehardware environment to another through a common DBMS,transitioning from one DBMS to another on either existing ornew hardware, or entering the distributed processing area.This section will attempt to explore the various aspects ofthese transitions and the role which DBMS and other toolsmay play in achieving the transition goal. Extensive usewill be made of the effects of transfer from one DBMS toanother. The specific effects which will be discussedinclude transfer of people, transfer of data, transfer ofcapabilities, and transfer of procedures.

4.4.2 Changing From Non-DBMS To DBMS.

Perhaps the most traumatic of conversions involves theinitial move of an organization into the DBMS world. Forthe traditional application programmer, the DBMS looms asnot only an unknown world but also as a threat to jobsecurity. The first task then is to provide the threatenedprogrammers with the proper training, not only on thespecific DBMS to be used, but also on the advantages anddisadvantages of DBMS as a whole. Of great utility in thetransfer of people is on-site vendor provided training.Ground work for the move, however, should initially be laidthrough commercially available courses dealing with database systems independent of a vendor implementation.Initial training should begin either before or during theDBMS procurement cycle so that the present personnel willsee the DBMS for its utility and not as a threat.Furthermore, thorough conceptual training will ensuregreater utility from the DBMS vendor's on-site training.Care must be taken to ensure that the vendor's courseincludes a practical exercise to demonstrate the advantagesof DBMS, specifically in the areas of interfaces, DDL, andDML. The wise planner will quickly recognize that the mostimportant resource for training people is time. Ifpossible, time should be set aside following vendor trainingfor application programmers to experiment with the procuredsystem before requiring production programming. Thisexercise may involve design and programming for dataconversion needs.

-62-

The transfer of data from independent files to DBMScontrol can require extensive resources. The process isusually a download by stand alone application programs thenupload through DBMS dependent application programs. Althoughthe concept sounds easy, the volume of data as well asvariety of storage formats, devices, and locations can, anddo, drive the cost of transition to very high levels. Thesolution to this problem lies mostly in the continuedefforts of managers to keep current their documentation on

files and their uses. This documentation may take the formof written narratives and file format charts but perhaps themost effective tool is the data element d i ct i onary/ di recto ry(DD/D). This automated system can provide managers withcurrent information as to the type, size, location and usageof data. Although used most extensively in coordinationwith DBMS, the DD/D can ensure that managers and thoseresponsible for the inclusion of data in a data base areprovided a complete view of the agency data as a whole aswell as usage of that data by application programs. In

addition to these tools, current research by Jim Fry of theUniversity of Michigan and Arie Shoshani of SDC [See theConversion Technology panel report.] in data translation andconversion promise future tools for describing source datafiles and providing translation to some target format.Emphasis should also be placed on determining a commoninterchange form for the conversion of data between hardwaretype s

.

Moving from a non-DBMS to a DBMS environment involves,in most cases, a matching of like functions in manyindependent application programs to shared functions in theDBMS. Consequently, this movement involves removing codefrom the application programs and replacing it with code to

interface with the DBMS. The bulk of application dependentcode should remain the same. Capabilities such asbac kup/ recovery and physical data structuring will be movedas a whole to the DBMS. In the DBMS environment, nocapabilities are lost to the application programs- -onl

y

gained.

No tools are currently available to aid in this!|transfer of capabilities, but all DBMS share a common

' repertoire of functions to include facilities for datacreation, update, and deletion. These capabilities,

I

however, vary depending upon the underlying conceptual modelof the DBMS. Standardization of DBMS capabilities andsyntax could allow for a more complete predesign effortthrough knowledge of system features. *

-63-

As mentioned earlier, conversion forces management totake a much tighter control by establishment of a

j

centralized function known as Data Base Administration(DBA). Within the DBA function, procedures are further ^

established to deal with data integrity, accessibility,'confidentiality, and miscellaneous control. The primary!tool of the DBA is "clout" or having a position in the •

organization with the authority and responsibility to ensurethe compliance with established procedures. The DD/D isialso used by the DBA to aid in the design, implementation, <

andmaintenanceofthedatabase. '\

4.4.3 Changing From One DBMS To Another. Perhaps the least|

ex pi ored area of conversion and need tools involves the '

migration of applications from one DBMS to another. This i|

conversion process is second in impact on users only to '!

initial entry into a DBMS environment. The training of i

personnel on the new DBMS is simplified somewhat by a'

previous exposure to the "DBMS way of doing things." The ,

application programmer, however, will be required to viewtthe data with which he works in a new manner, especially if

}

the move is from one conceptual model to another; e.g., froma network to a hierarchically oriented DBMS or vice-versa.

\

Training will be the primary tool in conversion, includingboth vendor and follow-on in-house training sessions and i

exercises. For conversion between certain DBMS (CODASYL-like systems, for example), automated tools may be used fordirect syntax transformation in both DDL and application '

programs. Conversion, however, between systems which do not I

share a similar syntax or mode of interface, e.g., TOTAL toIMS, IMS to DMS-1100, will require extensive training tolearn the new Data Manipulation Language (DML), DDL, andinterfaces.

In migrating data to the new environment, no new '

problems will be encountered which have not previously beendiscussed. The download programs should, however, already^be developed for such purposes as data base archiving( dumpi ng )

.

The transfer of capabilities provides the greatestimpact on the conversion between dissimilar DBMS. DDL i

presents the first difficulty in conversion. In order toimplement the new data base, some description of the newdata base structure must be developed in the target DDL.Automated translation tools can be developed if the sourceDDL is extensive. Manual translations, however, seem more

j

appropriate due to various changes in the way in which DDLmay be interpreted; e.g., DATA SET in DMS-II is not the sameas DATA SET in IMS.

-64-

A DML, as mentioned earlier, offers the most advantagesto automated translation when the interface syntax is fixed(CODASYL's DML). When the interface is purely throughdynamically built character strings (IMS, TOTAL) automatedconversion can only replace CALL statements of one format toCALL statements of another. The DML functions, althoughcommon (GET, PUT, CREATE, DELETE, UPDATE), also vary ininterpretation. An example: FIND in a CODASYL-like DBMSonly locates a record as stated in the record selectionexpression and a subsequent GET must be issued to load thedata into the issuing program's work area. A GU (getunique) in IMS not only accesses a record but also requiresthe Segment Selection Argument (SSA) to find the appropriaterecord. Therefore, the capability of a single DML commandmay not be replaceable by another single command of the newDBMS. Conversely, several commands which span both DDLand DML (record selection expression/set selection) in a

CODASYL-like DBMS may be replaceable by one command (GUW/SSA). Other commands of the source DML may not beduplicated in the DML of the target DBMS. Apart from theDDL and DML capabilities, more basic differences may exist.

Shared functions which DBMS supply are often quitedifferent. Where one DBMS may supply item level locks,another may make only record or data base locks available.Access controls, recovery, and auditing are just a few ofthe support capabilities which vary between DBMS, even thosewhich implement the CODASYL specifications. The change in

capabilities, therefore, is indeed the most evidentconversion problem in migrating applications betweendissimilar DBMS.

The transfer of procedures will usually be transparentto the application programmer but will impact heavily uponthe DBA. Because of the differences in capabilities andarchitecture of DBMS, the DBA will be forced to adaptfastest to the changing environment. The DBA mustimmediately recognize differences in the operation of thenew DBMS and take steps to avoid turmoil. Procedures mustbe changed as little as possible in external appearance.The application programmer must interface with the DBA inthe same manner as before. The DBA, however, must determineand apply new procedures to ensure maximum system efficiencybased on the information passed across this interface. TheDBA's primary tool in this environment will then be trainingand any available consulting services.

-65-

4.4.4 Changing Hardware Environment. The most frequent i

mot i vat ion for conversion is caused by an impending upgrade|

in hardware from one manufacturer's equipment to another orbetween non-software compatible lines of the same vendor.Within this environment, change can extend from simply ^

migrating applications from a DBMS on the current machine toj

the same DBMS on the target hardware (TOTAL on IBM to TOTALj

on CDC), through changing DBMS between machines (IMS on IBM}

to DMS on UN I VAC 1100), and, finally, to initial !

implementation of a DBMS environment on the new computer.Two of these situations (DBMSl to DBMS2, non-DBMS to DBMS)have been previously discussed. The additional impact ofchanging hardware, however, adds a new dimension to thetraining problem. Migration of data also is a bit moredifficult in that some form must be established in which to ,

download data for subsequent upload on the target machine. 3

Transfer of both capabilities and procedures apparently are 3

not impacted or complicated by this transition. ]

The additional transition of DBMS on the current|

hardware to the same DBMS on the target hardware begins toj

show the utility of standardizing both interfaces (end-|

user- f ac i 1 i t i es ) and capabilities. Whether the DBMS in]

question is CODASYL-like or not, costs are significantly 1

reduced when syntax and capabilities remain constant across||

hardware types. Examples of systems which provide these |

transition opportunities include TOTAL, SYSTEM 2000 (Non-CODASYL) and IDMS (CODASYL-like). Applications whichutilize these systems can concentrate on more operating ;

system dependent transitions (JCLl to JCL2) because of thej

relatively small changes required in DBMS oriented syntax. '

Because of the vendor enforced (or de facto ) DBMSi

standardization across machines, a ut oma tecf~tool s can greatly;

aid transition and may be as simple as filtering sourcej

programs and recompi 1 a t i on .

j

4.4.5 C e ntralized Non-DBMS--di str i buted DBMS. With cheaper}

system communication costs and better cost/ performance ;

ratios of the minicomputer (some supporting virtual memoriesj

in the millions of characters and most, if not all, offeringlarge-scale peripheral disk storage), the use of data basesoftware can be relocated from centralized large-host )!

environments to mini-based decentralized arrangements of|

data processing facilities. Conversion from the traditional !

centralized mode to distributed data base processing is\

fraught with both technical and management perils. The\

mini -based DBMS generally does not offer the diversity ofj

facilities available with data base systems centrally)resident in larger hosts and, in fact, may constrain the

;

design of the application systems planned for dispersal.Replicating the application software, data base structure(DDL), and data base software may require a severe ,

-66-

reorientation of existing ADP management philosophy.Management and control of multi-site operations, ADPstandards, application development and data baseadministration will be made more difficult. Conversion ofdata previously centralized requires that some intermediatedata form be created for downloading appropriate subsets ofthe old file(s) and subsequent uploading on the distributedhardware. Standardized interfaces (DDL, DML and end-userfacilities) and common DBMS capabilities will reduce themanagement problems encountered during the design,development, conversion and operation of distributed database implementations (even if data base management softwareand mi n i -computer hardware differ from site to site).

4.4.6 Centralized DBMS--di st ributed DBMS. The problems ofconversion from a centralized mode of DBMS operation todistributed data base processing are somewhat eased by thepresence of experienced and knowledgeable personnel at the

j

beginning of the conversion, but are still of concern.

I

Aside from the difficulties associated with the migration of; existing application code (dependent upon the existingbusiness functions to be dispersed, degree oftransferability of the programming languages presently inuse, limits on size of load modules, etc.), further problemssurface if dissimilar data base logical structures (network,hierarchical relational, etc.) are encountered.

I Conversion to a distributed data base structure that issimilar to the existing centralized structure should be aneasier task. DDL differences may exist, however, betweenthe existing and target data base systems as well as the

I

obvious job control language differences. The target[

(distributed) data base software may not offer theI facilities used (or planned for use) in the existingcentralized applications, and support of user business

I

processes may thus be reduced. The problems noted in the

j:

conversion of centralized non-data base applications to a

,! decentralized data base mode remain unchanged for the

j

migration from centralized to decentralized data base modes.

4.5 GUIDELINES FOR YOUR FUTURE CONVERSIONS

The panel concluded that the single most important

I

guideline to offer a group about to embark on a systemk conversion wasto use their best management techniques. It isI a technically difficult job, like most large software systemdevelopments, and one must apply all well-known software

I development methods.

-67-

i

4.5.1 Ge neral Guidelines. The discussion of the variousconversion scenarios revealed the fact that severalmanagement practices had been utilized in all of thesuccessful conversions which, though common across all dataprocessing environments, were of particular relevance in thedata base environment where problems and errors inestablishing policy proliferate across all applicationsut i 1 i zi ng the DBMS

.

4.5.2 Important Considerations.

Analyze all possible file organizations. Do notassume that the most efficient way is going to be a

data base. Some applications can perform betterusing sequential files such as tapes or other accessmethods such as Index Sequential.

Analyze the final stages of the conversion. Howwill you convert to production? Will it be (or canit be) converted module by module, transaction bytransaction, or will it be necessary to "push thebutton" and convert to production all at once?

Develop a representative model of the company's !

business for design purposes. This model shouldj

provide the basis for the design of the data basei

wh i c h wi 1 1 be i n prod uct i on .1

Determine the degree of security and integrity !

required for the operation of the data base. Whowill have access to certain information? How willunauthorized use be prevented? How will you recoveror reconstruct the data base in the event of a

software or hardware failure?

Determine the space requirement for the productiondata base. Determine if all the data on the file isnecessary and is being used. Check for redundant S

data. (It is not always bad to have redundant data, ^

especially if it will improve retrieval time.) ^

c

Build a small version of the data base and test for ^

deficiencies in the design or in the modules. The §

testing must be thorough and cover all facets of thecompany's business. Users' involvement in this .

stage should be heavy but should not dictate how thedata base is to be designed or what access methods '

must be used.

-68-

Determine whether a Data Base Administrator isneeded for your organization. The DBA will beresponsible for the security and integrity of thedata base (including recovery and backup) and act asan agent between the User's group and the dataprocessing group.

Determine the type of support you will need from thevendor. Will classes be given on data basetechnology? Will the vendor be readily availablewhen a problem arises? Contact various users inyour area. Attend a local users group meeting andfind out the types of probl ems and solutions that thegroup has come up with, especially talk to users inthe same type of business. Valuable information canbe obtained and you will not re-invent the wheel.

Determine how eager upper management is to undertakethe task of converting. If they support you, thetask will be easier and more efficient.

Look into the future. Determine whether the presentdesign of the data base is the most efficient one in

case new applications are introduced. Do not beafraid to redesign if it proves to be moreefficient. Provide for a purge of unneeded data.Reorganize the data bases on a regular basis to re-claim unused space or to compact the data. One verylarge data base may not be feasible.

Determine whether to use standard vendor suppliedsoftware or develop you own.

Once a decision has been made to convert to a DBMS,check and re-check for contract to be signed. Havea contract attorney study the agreement.

4.5.3 Ti ght Control . In successful conversion efforts,spec i al empha si s must be placed on establishing proceduresand policies and insuring that they are followed. Examplesof these procedures and policies include internal standards,data base administration, and explicit (precisely defined)goal s

.

4.5.4 P rec ise PI ann i ng/ pre-pl ann i ng

.

The probability of a

conversion succeeding is directly proportional to the degreeof preparation. Preplanning efforts involve:

establishing current baselines and identifyingperceived inadequacies

-69-

defining requirements or specifying what are thedesires of the conversion

enumerating as many alternatives as practical anddetermining their appropriateness in the targetenvironment

establishing feasibility in terms of costs, people,and time

describing milestones and benefits

receiving and documenting a management commitment toa specific alternative

Planning efforts must begin immediately upon managementcommitment and include:

organizing for the project

defining support software requirements

evaluating available systems (packages)

describing selection criteria

selecting and procuring software

building a well-defined implementation planincluding staffing, training, detailed design, andimplementation strategy

seeking final plan coordination and approval

4.5.5 Important Actions. The panel recommended severalactions:

Do Establish Review Points. When describing a plan,care must be taken to ensure flexibility in order tosupport changing/overlooked requirements. Pointsmust be scheduled at which time the plan is reviewedand updated as required. Decisions must also bemade as to the appropriateness of proceeding withdevel opmen t

.

Do Involve End Users. The data processingdepartment must be extremely careful not to ignoreuser input to the development process. No one knowsthe functional requirements which must be met betterthan the end user.

-70-

Do Keep Scope Reasonable. The scope of the projectmust be dictated by practicality. Attempt only asmuch as is attainable within well defined timeframes and technology. Once the scope is defined,stay within its bounds.

Do Not Stifle Prototyping. Modeling is perhaps thebest method available for trying out ideas.Encourage validation of concepts throughprototypi ng

.

Do Shift Responsibility to Where the ExpertiseExists. Always ensure that responsibility forachieving goals is placed where the ability to bothunderstand and accomplish them exists.

Do Phase the Implementation. Within an overallcontext, ensure that total system conversion is

planned and executed in attainable increments.

Be Wary of Entanglements. System conversionplanners must always review commitments with thegoal of avoiding crippling dependence uponmanufacturers, proprietary packages, and maintenanceagreements. Recognize Costs/Benefits of NewTechnology. Quantify both the advantages anddisadvantages of being the first to use newtechnol ogy

.

Do get adequate management commitment. Make suremanagement knows beforehand what resources and timeperiods are required, what has been available, andwhat progess, if any, has been made. Do not letthem walk away from it. They are key players.

Do initially select an important but tractableportion of the ultimate and have some results infour to six months. This forces a early andrealistic review. One also hopes it demonstrates thenew capabilities, emphasizes the ability of theimplementors to handle the new technology, andprovides better insights into both the technical andoperational (people oriented) problems that theorganization will face.

Do introduce your technical staff to the newtechnology. First, when you are studying whether togo data base but, also, immediately before andduring the initial implementation. Sometimes sixmonths to two years pass from initial study to startof implementation. Further, training for thei ni t i al deci si on process is usually "book learning"

-71-

and quite stale by implementation time.

Do orient and educate your users. Next tomanagement, this group is the key to success. Theymust feel comfortable with what you say. That meansthey should have a role in deciding what they getand when. In addition, they must prepare themselvesfor the new modes of operation.

Do keep the user group on your management reviewteam. They can keep you out of trouble. They canchange their priorities and needs to fit yourcapabilities to deliver. They can help you resistpressures for early and extra delivery since they,too, want and need a quality product. By keepingthem in the loop, you risk less chance of surprisingthem and evoking resistance on delivery.

Do set up a central "data base control group."Useful in any big system or series of relatedsystems, it becomes essential in a data baseoriented organization. Such a group may haveseveral levels with differing managers but thesegroups must be coordinated. There is a policylevel, an implementation level, and an operationallevel. The impl emenat i on group does most of theadministrative and design work involved in bringinginto being and providing for the efficient structureof the data base(s). The operational level isresponsible for the operational integrity of thedata base and must take the actions forrestart/recovery in addition to any periodicinitialization/dumping or consolidation of theactual data base.

Do implement a data d i ct i onary/ d i recto ry . As a keyitem common to all systems, it is needed to make thedata base a corporate rather than a private entity.Because the data is to be shared across groups whoare not responsible for the data, it is essentialthat the "meaning" of each data item be documented.A data item, besides having a name, a form, andvalues chosen from some well documented set, hasseveral meanings. One is the English description;another is an operational description of how thevalue is determined; another is how the data iscommonly used and by whom. All of these must beknown to avoid confusion and to avoid having datamisused.

-72-

Do ask for and accept help from the vendor of thedata base system. If you are also changing hardwareat the same time, put pressure and some of the riskon that vendor. Good advice must be available duringyour learning period. Consultants are useful, notonly initially, but also during design andimplementation time.

4.6 REPRISE

The most significant finding: the ingredients forsuccess are management commitment and discipline, skilledpeople, clear roles, and lots of cooperative conversationbetween the parties involved. Not a new discovery but stillsignificant--and more essential than ever ?,t this stage inDBMS evol ut i on

.

4.7 ANNEX: CONVERSION EXPERIENCES

4.7.1 Conversi on: File To DBMS. The three user experienceswh i c h f 0 1 I ow i 1 1 ust rate the scenario of converting from a

file system environment to a data base environment without a

change in hardware resources.

Conversi on Experience - A.

Goal

s

. Studies were conducted to analyze the problems ofthe file environment. Problems identified by the functionaluser and data processing departments included prohibitivemaintenance and enhancement costs, slow implementation ofadditional user requirements, heavy data and processingredundancies, and lack of data integrity.

Summary of actions . The problems described in the previoussection Tedi to the data base decision. Following thisdecision, a consulting firm prepared a data base conversionplan to execute those tasks in the plan leading up to butnot including the conversion of data or application systems.A directory of steps and tasks in the data base system planthat the consulting firm prepared follows.

Step 1: Evaluate the Present Information SystemTask 1.1 Information Flow AnalysisTask 1.2 Current ADP AnalysisTask 1.3 Current Reports AnalysisTask 1.4 Status of Present Information System

for Data BaseTask 1.5 Draft of Data Base Administration

Responsibil ities

-73-

step 2: Define Data Base RequirementsTask 2.1 Service Analysis Development by ReportTask 2.2 Service Analysis Development by FunctionTask 2.3 Data Dictionary Development

Step 3: Develop Initial Data Base ArchitectureTask 3.1 Distribute Data Dictionary ElementsTask 3.2 Distribution OptimizationTask 3.3 Architecture Description

Step 4: DBMS Package Evaluation and RecommendationTask 4.1 Architecture MappingTask 4.2 Support FeaturesTask 4.3 Secondary FeaturesTask 4.4 Recommendation

Step 5: Application RedesignTask 5.1 Design and Program DocumentationTask 5.2 Structured Programming AnalysisTask 5.3 Implementation Controls

Step 6: Configuration Analysis and EvaluationTask 6.1 Current Load AnalysisTask 6.2 Future Load AnalysisTask 6.3 Simulation ModelingTask 6.4 Simulation Analysis

Step 7: Implementation and Conversion PlanTask 7.1 DBMS Installation PlanningTask 7,2 Data Conversion PlanningTask 7.3 Application Conversion P-lanning

Step 8: Personnel Development and TrainingTask 8.1 Assess Available Talent LevelsTask 8.2 Develop a Formal Training Plan

During these tasks the consulting firm provided projectdirection and DBMS expertise. Company personnel assigned tothe project received practical training while preparing forplanned conversion of application software and data files.

Some points should be noted about the tasks preparedfor the data base system plan:

The evaluation of the results from tasks 1.1 through1.4 was used to support the data base decision.

The data base administration function wasestablished following a draft of responsibilitiesprepared in task 1.5.

-74-

The initial data base architecture described in task3.3 was package independent.

The DBMS package recommended in task 4.4 wasinfluenced by constraints imposed on the evaluationprocess. The influences included the need tointerface with teleprocessing and the ability toinstall DBMS with an initial or pilot applicationsystem conversion in just four months.

The application conversion plans in task 7.3identified functional redesign requirementsresulting from the service analysis conducted intask 2.2 to correct deficiencies in existingsoftware and to benefit from the facilities offeredby the DBMS package.

At this time the recommended DBMS is installed and theconversion and implementation of the pilot applicationsystem and its data files to data base is completed. Theconversion effort was successful but not without itsproblems. The functional user organizations feel thebenefits of DBMS in the areas of performance, data integrityand responsiveness on the part of the data processingdepartment to change requests. However, furtherimplementations to DBMS are suspended due to a redefinitionof priorities outside the data base system plan.

Conci usi ons . Factors contributing to the success of thatconversion or to the problems that occurred were:

Management demonstrated commitment by the funds madeavailable for consultant services, by the investmentin preplanning and planning tasks, and by theestablishment of a DBA function. This commitmentwas instrumental in the responsiveness andcooperation that was sorely needed from dataprocessing personnel and users alike when the database project was undertaken.

The establishment of a DBA function in the dataprocessing organization invariably creates a

political struggle between the DBA, the Manager ofSystems and Programming, and the Manager ofOperations. The use of consultant servicesminimized the political problem; at least during thepreplanning and planning phases. However, theimpact or trauma is felt most during the pilotapplication system conversion process. The lack ofadequate management support to the DBA function asdemonstrated by its placement within the dataprocessing organization and the limited personnel

-75-

resources only aggravated the political problem. Assuch, the establishment of the DBA function was lessthan effective. The influence of such politicalstruggles on the suspension of further data baseconversion activities is difficult to determine.

A major problem in converting from a non-data baseenvironment to data base is that during the periodwhen data base expertise is critically needed, youare least able to provide it. This problem waseliminated with the expertise provided byconsultants..

The placement of the DBA within the data processingorganization and the limitation of itsresponsibilities contributed to the lack of supportat the right level of management for continuation ofthe data base project. This was very much inevidence when staff meetings were conducted toprioritize data processing activities.

The benefits realized by the conversion andimplementation of the pilot application systemresulted from the redesign activities. However, thecost/benefit analysis was instrumental in

identifying a pilot system that had low risks, earlybenefits, and visibility. These three factors alonemay very well save the future of the data baseproject in the context of further implementations ofdata base. Once a data base conversion effort is

suspended, it normally takes support from the usercommunity to reactivate it.

Problems encountered during the conversion processwere due, for the most part, to inadequate trainingof people in the usage of the DBMS and in theinterpretation of its diagnostics. The amount oftraining planned was reduced in an effort tocompensate for an insufficient number of people madeavailable to convert the pilot system.

Conversion Experie nce - B_.

Goal

s

. The experience involved choosing and using a DBMS tosupport processing of solar hardware system data. The solarhardware data resided in file format, yet structuralrelationships existing between and among data records couldnot be supported in any reasonable manner in the fileoriented environment. In addition, many of the userrequirements were not adequately defined and there wasconsiderable worry about whether the known requirements

-76-

could ever be met.

Summary of actions . A group was gathered to define theusers' information processing requirements and to determinewhether a DBMS solution could be provided within a

reasonable time at reasonable costs. Initially, allcombinations of hardware and software solutions wereconsidered because the data processing center within theorganization did not have a DBMS. As time progressed, theusers' requirements became clearer and time-sharing servicesol ut i ons- -t he desired mechanism for achieving a hardwarechange -- were dismissed because of control and costfactors. The conclusion was to provide a DBMS for theexisting hardware.

Although unanswered questions remained regarding thefeasibility of this approach, many desirable aspects were,on the other hand, apparent. The DBMS selected wasavailable in a bundled form from the hardware vendor--noadditional costs--and the software could be delivered in a

timely manner. In addition, the selected DBMS supported therequired structural relationships with its CODASYLorientation.

The best understanding of what output requirements theDBMS should provide was used to design the data base andimplement application programs. All of the requirementswere met in a reasonable time within reasonable costs.

Conclusions . Critical points in the success of this DBMSconversion experience were:

DBMS expertise was available from the beginning--amost important time in any DBMS environment.

The users participated in defining the informationprocessing requirements.

Expertise was available in matching informationprocessing requirements to available DBMScapab i 1 i t i es

.

The vendor provided adequate training and sufficienttime to gain experience after the training wascompleted. This provided the necessary insightsinto the actual capabilities of the chosen DBMS.

Portions of the output requirements weresufficiently well documented and were of reasonablemagnitude to demonstrate successful operation of theDBMS demonstration.

-77-

Conversion Experience - C.

Goals

To convert a large number of antiquated files to thestate-of-the-art technology.

To provide for a more efficient way of retrievingand updating data.

To eliminate redundancy through a centralized databank

.

To eliminate erroneous information through dataintegrity.

To provide better management of information.

To provide for more system security.

Summary of actions . Only one alternative was available toconvert existing files to a DBMS and at the time only oneDBMS could meet the requirements set by upper management. A

hierarchically structured data model DBMS used exclusivelywith its own higher level language interface was chosen.

The first conversion had to execute the load procedureseight times until all known errors were eliminated. Eachload took over 100 hours to run with no checkpoints.Fortunately no crashes occurred during any of the runs. Theload process, in itself, was not a major problem; however,some erroneous data present in the old file was input todata bases. The users were encouraged to be involved in

checking and re-checking the new system. At times, theusers requested changes which, to them, seemed minute, butwhich required large programming and system changes. Someusers wanted to have more information in the data base (suchas segment's last date of change) while others wereconcerned with space and efficiency. We provided a backupsystem which we could fall back to in case the conversionfailed. Meetings with users continued for six months untilrepeated testings satisfied us that we had reached a

satisfactory level of accuracy.

Another problem resulted from personnel assignments.While one group was loading the data base, another wastesting. The group that loaded the data base did not haveas good an understanding of content as the testing group.Consequently erroneous data was loaded. If both groups hadworked more closely, most of these problems would have been

-78-

resol ved

.

Conci us i ons . The following points are characteristic ofsuccessful conversions in the described scenario:

1. Management commitment ensures sufficient resourcesand end user support of the conversion effort.

2. DBMS expertise must be available from the beginningof the conversion effort.

3. Adequate preplanning and planning activities areinstrumental in the conversion to data base.

4. The end users must participate in defininginformation processing requirements.

5. The DBA support must be a strong and continuingcomm i tmen t

.

6. In the DBMS package evaluation, expertise must beavailable to match the information processingrequirements to the capabilities of available DBMSpackages

.

7. Adequate and timely training must be provided andsufficient hands-on experience must be gained priorto any conversion of application software.

8. A conservative staging plan, one which selects anapplication system for initial implementation thatoffers low risks, high benefits and visibility, mustbe developed to insure continuing resource support.

9. The DBMS conversion must be thoroughly andadequately tested prior to the production cycle. A

fall-back procedure should be established in theevent that the conversion is not successful.

4.7.2 C onv ersion: Ma nual E nvironment T o DBMS. This scenariorepresenfs conversions in vTFTrcF Jafa Ts maintained and usedmanually in the source (old) environment and it is desiredto convert the data and functions directly to a DBMS withoutfirst introducing a non-DBMS file system.

Convers i on Experience - A

.

Goals . The goal of this conversion was to mechanizef unct i ons performed manually by separate operatingdepartments using the same input documents. The inputdocuments triggered each of these departments to perform its

-79-

own -functions using its individual, manually-maintained datarecords. The flow of the input documents went from onedepartment to another. The flow was sequential in naturesince one department had to complete its function before the •

next could begin. Each department independently maintainedits own data records. The records in each department

j

contained some common data components. i

This total process had never been mechanized before I

because of the complexity of the operations involved and the ,

lack of data structures to represent the interrelations ofi

the data and functions. DBMS technology made mechanized!

processing and a common data storage place feasible.

The expections of mechanization included:|,

1, Better performance of end-user functions through[

more accurate records and increased capability of f

machine over human operation.,

ii

2. More efficient performance of functions through '

elimination of duplicate record keeping andmechanization. '

Summary of actions . The project selected a network DBMSincluding f ul 1 communications and transaction managementcapability. Interactive terminals were designated for enduser locations as well as an interface developed with a

front-end system controlling the flow of the inputs to thesystem and the distribution of the outputs to theappropriate user departments and to other mechanizedsystems.

The conversion from a manual environment to the full^

DBMS has been completed successfully at one site and theDBMS has been operational for several years. During thistime, more sites have been converted to the system.Interfaces between this system and other DBMS systems arebeing developed. Since these interfaces are not yetoperational 8 conclusive findings cannot be stated.

Concl us [_o_n s . The task of interfacing two DBMS should not beunderestimated, especially if the systems use differentsoftware or hardware. For example, plenty of time should beallocated in developing the interface simply to work out thekinks of reading data created by another DBMS in a languagewith a different bit structure and character set.

When an interface is first developed between twosystems, time must also be allotted to resolving differencesin how one system identifies or names what are thought to becommon data elements. Both systems may process the same

-80-

widgets, but one may be concerned with inventory and theother with maintenance or repair. Both systems can beinvolved with some of the same widgets, but the way theyidentify or represent them may be different. Thesedifferences may not be apparent or may seem trivial until anactual mechanized interface is attempted.

Merely because there could be a mechanized link betweentwo systems does not mean there should be such a link. It

may be far more economical to use a manual interface,especially for low volume interactions or in cases where theviewpoints of the two systems differ greatly.

Environments with multiple DMBS are very frustrating tousers if these DBMS each use different input devices with

idifferent sign-on procedures and modes of interaction.

Coordination and planning between the DBMS indevelopment are needed to prevent proliferation of differentterminal equipment at the same user locations.

Conversi on Experience - B_.

Goal

s

. The goal of the project was to automate a manualdocument retrieval system.

j

Summary of actions . In this project, management greatlyI

" s i mpl i f i ed"" the decision process by directing conversion toa custom-built document retrieval system which would beconverted to our local hardware (i.e., converted from onehardware manufacturer to another). Apparently, the reasonfor this decision was that a copy of the software could beobtained free. A software conversion ( COBOL-to-COBOL ) hadto be done before the local document retrieval applicationcould be automated with the software. Then, the softwarehad to be augmented (new code) to add needed functions,

Iincluding: (a) more extensive editing of input data; (b)

more user- ori ented processing of retrieval requests; and (c)

I

redesign of outputs.

I

After the software conversion started, analysis made itapparent that the capture of the manual data in a machine-readable form would require about twelve staff/years. Also,on-going entry of new documents and servicing users would

i have required a permanent staff of 4-5 people. Attempts toconvince management of these needs were unsuccessful.

1The events that followed were:

-81-

A successful software conversion followed byaugmentation of the converted software.

Design of procedures and entry forms, andidentification of resources needed to capture themanual files.

Shelving of the project because resources were notavailable to either contract the data capture or tostaff it in house.

Concl us i ons » While not a conversion to a true DBMS, thisexperience illustrates, by their absence, several pointsnecessary for DBMS conversions.

Importance of planning and analysis before anydecisions are "cast in concrete."

In this case, the software was decided upon beforeanalysis. And the software chosen was totallyinappropriate for these reasons: (1) It was notrunning on the same brand of hardware as the localsystem, so it had to be converted. (2) It wascustom-built and no support was available from theoriginator. (3) Software documentation wasfragmentary and obsolete--we got source code,compiler listings, test data, and very little elsethat was useful. (4) The software did not provideadequate functions so it had to be augmented with

. new code.

Importance of obtaining a commitment of resourcessufficient for the entire project before startingany part of it. In this case, the decision-makershad no comprehension of what would be involved.Neither the one-time data capture nor the on-goingsupport staff needed could be had when the time came I

to actually implement the retrieval system andservice customers with it. Therefore, extensiveconversion and analysis were wasted.

Recommendati ons .

The manual -to-DBMS conversion situation has twocharacteristics which make it unique among the Data Baseconversion situations:

1. Data not already machine-readable

-82-

2. Users not accustomed to automated systems

Consequently, those contemplating such a conversion shouldtake the following actions:

Develop implementation strateg y. Purpose: Control cost andmeet sched ul e

.

Limit scope. It is necessary to limit the scope ofthe conversion to something manageable and do-able.Limit objectives to high- volume, high-usagefunctions. Do not try to automate low-usage and/orexception cases. Stick to your initial limitedscope. Do not be tempted to agree to ad hocextensions of scope. Prioritize and save good ideasfor future enhancements.

Plan on phased implementation. Get something usefulup soon. (Your limited scope--above--shoul d haveselected a useful small application.) Gettingsomething into actual use soon will:

Let the customer see benefits early and provideexperience on the systema. Gives the project team some feedback earlyb. Buys the project team some credibility with users

for later, larger projects

Plan on re-implementing. The first limited-scopeappl icati on ( s) will very 1 i kel y benef i t from laterre- impl ementa ti on

.

Select initial installation/site carefully. Giveyourself the best chance of success. If the systemis planned for several sites, do the easiest onefirst.

Understand magnitud e of data capt ure and data cleaningeffort . Everyone, fncTuding top management, must understand"that capturing the data and correcting it will be a 'large'effort. This effort is a significant part of overallproject costs; it may be the largest.

It should be understood and budgeted for at thebeginning. Sampling the manual data for accuracy andcompleteness is useful in estimating the resources neededfor data capture and correction.

Resolve political problems of ownership of data.

-83-

Source. Where the data is kept by severalorganizational divisions, determine which one is thebest source for the application you are converting.

Discrepancies. Designate someone to be responsiblefor resolving discrepancies.

missing data elementsincorrect data elementsObsolete data elements

These, of course, are familiar functions of the DBA.Such problems are, however, magnified in a situation wherethere has been no automation before.

In vol ve users early . Early end user involvement can:

Get some support for the project team

Give end users time to identify:

Changes to their work flowChanges to staff requirementsTraining requirements

Inconsi stency tol eran t systems . In going from a non-aut omatecT" to an automatedenvironment, higher incidences oferrors or data discrepancies are likely to be found. Thisis especially true in initial installations because the newapplication software or system software may process the dataincorrectly. The system should be designed to be defensivein all aspects--f rom program design to backup and recoveryprocedures. The system should be able to handle datadiscrepancies and inconsistencies gracefully. It shouldprovide meaningful outputs in the face of data conditions itcannot process. It should provide easy to use tools tocorrect data problems that are found.

4.7.3 Conversion: Batch Fi le System To a DBMS

Conversion Experien ce - A (a^ Feder al GovernmentAgency )

.

Goal

s

. The goals of this agency were:

to replace outmoded hardware

to implement centralized data management under a

• data base administrator

-84-

to implement both new applications and redesignedsystems in an integrated data base, on-line disk andtelecommunications environment, under a data basemanagement system, in order:

to permit simple query capabil i ties

to treat data such that:a. input multiple-use data only onceb. have it accessible, locally and from distantcities, by terminalc. allow unstructured queries to be processedthrough the use of a query language without thenecessity for programming

Summary of actions . As a Federal agency, the options werelargely dictated by Federal procurement regul ati ons . I f therewere no regulatory constraints, the alternatives would havebeen

:

to determine the features needed by the data basemanagement system, evaluate existing DBMS's and buya computer on which the most suitable one for theirneeds would run.

This option was not allowed by the procurementregulations, because it would have resulted in a

sole source procurement.

To include in the request for proposal the DBMSrequired features. This option was not allowedbecause, in their case, the requirements would haveunduly restricted competition.

To procure the hardware under open competition, andprocure the software separately.

However, regulations do exist and the agency had to

i

select the final option which the procurement regulations' had forced upon them.

Concl usi ons . The results, in terms of their goal of

j,creat i ng an integrated data base under a data basemanagement system, were disastrous. The computer systemthat was procured was one on which a suitable DBMS did notexist. After two and a half years of effort to procure,

i through established DBMS software vendors, a system whichIf

could be made to function on this computer, either throughconversion of an existing DBMS or the use of research anddevelopment software, the agency has not yet found a

solution. Even after a contract is eventually awarded, they

-85-

anticipate a 12 to 18 month wait for delivery of theproduct.

Therefore, it is safe to say that their goal of movingonto new hardware and a DBMS has been so delayed by Federalprocurement policy that it will take 3-1/2 to 4 years forthem to recover.

Conversi on Experienc e - Federal GovernmentAgency )

.

Goal

s

. To convert three stand-alone systems with thef ol 1 owi ng features

:

Input. Batch systems. Inputs manually transcribedfrom messages onto cards.

Outputs. Outputs to other subsystems in the form ofcard images on tape that was hand carried, as wellas printed reports.

The features of the new system were to be as follows:

Inputs. On-line system. All error correctionsprocessed and corrected interactively on a CRTterm i nal .

Outputs. Printed reports.Ad hoc queries available through a DBMS querylanguage.Outputs to other systems automatically generated andforwarded via a communications network.

Recommendat ions

.

1. Conversion is usually a one-for-one affair.Redesign occurs when the requirements are changed.Conversion is usually an excuse for partial orcomplete redesign. A common mistake is to assumethat for the price of a conversion effort, one canalso redesign. Experience has shown that this isnot the case. Unrealistic estimates of needed timeand resources result.

2. The need for expertise in the new DBMS is greater atthe beginning of a project than at any other time.Ironically, this is the time before your staff isretrained and when your expertise level is thelowest. Therefore, outside consultants must bebrought in. These outside experts will have a greatimpact on your data base design.

-86-

3. The DBMS experts will, at an early stage, have todesign the data base structure. The system designwill then build upon this data base structure. Theprocess of data base design and system design willbe iterative with a series of changes until both arein sync.

4. The extreme pressure to get off old machines andoperational on the new machines prevents sufficienttime for thorough analysis and planning. And, asdiscussed above, if you are redesigning and notconverting, the time pressure becomes greater.

5. Government procurement regulations do not recognizethat a system is as dependent on the software neededas it is on the hardware.

6. The move from separate batch systems to an on-lineintegrated system, coupled with the need to learnhow to use new hardware, require careful planningand extensive training in both new concepts and newtechniques.

7. Moving into the centrally controlled DBA environmentrequires lead time in the systems development cyclefor data dictionary development and implementation,and for data standardization to be done carefully.

8. Time required to accommodate recommendations fiveand six is seldom available when a move to newhardware is underway. The result is hasty andcostly straight conversions of obsolete systems;therefore, the potential benefits of the newhardware are delayed or lost.

4.7.4 Conversion: DBMS-l To D BM$-2._

Conversi on Experien ce - A. This conversion illustratesthe experience of moving from one DBMS to another, with a

change in vendors.

Goal

s

.

To provide for the ability to update concurrentlyand retrieve information from the data base.

To provide for a quick response time for inquirypurposes

.

-87-

To provide for the capability to have on-lineinquiry even if the DBMS is in abort status or inrecovery stages.

To provide an efficient way to recover and backupthe data base.

To save money in the long run.

To provide tools for security and integrity of theDBMS.

Summary of actions . At the outset several alternatives wereconsidered:

Continue with the present DBMS . This alternativeimplied maintaining duplicate data bases--one forinquiry purposes and one for updating purposes. Theinquiry data base was at least one day behind thesecond one and at most a week behind the latter. Inorder to make the inquiry data base as current aspossible, an image copy was made once each week ofthe updated base and copied to the inquiry database.

Update the DBMS using the same vendor .

Conver t to another DBMS using a d i f fe re nt vendor .

The new vendor agreed to convert the bulk of thesystem quickly and with the least amount of logicchanges to existing modules. This vendor would meetmost if not all of the goals.

Upper management decided to convert to a DBMS using a

different vendor.

In order to meet the restriction of making no logicchanges, the new vendor developed an Emulator to translate

jdynamically (at exeution time) calls in COBOL programs usingthe old vendor format without requiring manual re-coding.Only format changes were required and these were handledautomatically via a standard vendor software package. The

,

Emulator itself was developed by some of the best technicalpeople available at the vendor site. It sounded and lookedg rea t- -mi n imal programming changes and transparent to users. '

In fact, the programs looked as if they were written in thei

old vendor DBMS. The Emulator did, however, have certain|

drawbacks . The overhead incident to using the Emulator wasextremeTy high since the calls were converted at executiontime rather than at the source level.

-88-

Using the initial version of the Emulator would haveresulted in failure to maintain the necessary productionschedule. A task force was brought together to assess thematter and determine exactly how and where the system couldbe improved. The task force included site staff (three) andvendor staff (two persons).

The task force recommended changing some Emulatedprograms to execute under standard software and "looked intothe Emulator internals" in an attempt to improve some of itsfunctions. The changes yielded significant improvements;after the changes were effected, the regulator productionsc hed ul e wa s met

.

Another significant problem arose when the Emulator didnot perform properly, e.g., a call to the data base wasmishandled. On the rare occasions when a malfunction ofthis sort occurred, the following steps were taken to solvethe problem:

Disable the erroneous call

Inform the vendor of the problem, providing completedocumentation on the problem

Wait for the solution

Re-test the call

Re-load the new versionsystem library.

of the Emulator to the

The prospect of losing technical support for theEmulator, in the event of personnel turnover among thevendor staff who designed and developed the system, wasanother potential problem. This situation did materialize,but the vendor was able to continue technical support withother individuals.

Conversion of the data bases themselves went extremelywell. Standard utilities were used to unload the databases. The tapes were moved physically to the new vendor'sfacilities and were converted via a standard vendor utility.The sub-files were loaded employing user written programs.

Since the on-line system had been installed on the newsystem well before the other applications systems, there wasa need to en.sure that the data base on the new vendor systemwas as current as the old. With smaller data bases, weunloaded daily from the old vendor and loaded daily on thenew. With the exception of our largest data base, the otherdata bases were unloaded and loaded weekly or monthly. To

-89-

keep the largest sub-file current on the new system, wecaptured the data base changes on the old system daily andapplied them to the new data base. These procedures lastedfor approximately one year until our other applicationssystems were completely converted.

Conclusions . To summarize our findings:

Differences between terminology associated with eachsystem were found to be the initial barrier.

Incremental transition was recommended as a means ofgraduated "phase-in" to the new DBMS.

An intermediate software system was procured as themethod of transitioning application code. DDL andDML functions appeared the same across thetransition. Run-time interpretation of code throughcalls to interpretive-software was used to provide a

mapping from original DBMS to native- made DBMScalls. Preliminary conversion of DDL-typefacilities was provided at compile time with mappingstructures being derived for use at interpretationtime.

Recommendat i ons

.

1. Experience in interpretation and run-time mappinghas shown that system overhead can be prohibitive.Problems encountered in execution require resolutionby the developer of the interpretive software andcan result in excessive response times. It isthereby seen to be more practical to utilizestandard vendor software where possible. Conversioncosts may be higher in the initial investment butwill eventually be equaled and even surpassed by theoverhead costs of the interpretive method.

2. The development of the interpretive software washandled by the vendor, with valuable insight ofusers. User involvement in the implementation ofthe interpretive software provided a more timelyproduct with greater response to requirements.

3. The interpretive software, by nature, requiredprocessing support over and above that required foractual application processing. Direct transitioncould possibly have utilized to a greater degree theprocessing efficiency of the target DBMS. Again,this leads to the realization that should efficiencybe a major consideration, direct transition may bemore desirable.

-90-

4 . In utilizingshould bemethodol og i esdesc r i pt i on

,

transfer.

direct transition methods, vendorsencouraged to develop standard

for conversion processes in datadata manipulation, and data base

5. Incremental transition insures that timelyconversion can be established for applications.Prototyping efforts are provided through validationof earlier increments of the entire transitionprocess

.

6. Fall back positions must be established. Thetransitioned software must be completely validatedbefore actual acceptance. The ability must alwaysexist to turn-down transitioned software due toerrors without degrading overall effectiveness ofthe data processing operations.

-91-

5. STANDARDS

Milt Bryc

e

CHAIRMAN

Biographical Sketch

Milt Bryce is President of M. Bryce &

Associates, Inc., Cincinnati, Ohio, a companyspecializing in providing management consultingservices in areas of systems design, datamanagement, and project management and control.Mr. Bryce has been involved with the informationsystems field since 1951, working both for majorcorporations as an MIS Director and with Univac inproduct planning and systems programming. He hastraveled extensively around the world conductingseminars in systems design and development anddata management. He has been involved withstandards development for the last fifteen yearsand has pioneered since the early sixties in

structured information systems design and datemanaaement

.

Participants

Robert BemerDon BranchJean BryceElizabeth FongAl Gaboriault,Anthony Klug

Recorder

H e n ry C . L e f k o v i t s

C. H. RutledgePhilip ShawJay P. ThomasEwa r t Wi 1 1 eyJ e r ry Winkler

5.1 INTRODUCTION

5.1.1 Objectives. The objective of the Standards WorkingPanel was to recommend standards that a manager shouldconsider when converting data from present sources (manual,semi-automated, or automated) to a computer Data BaseManagement System (DBMS as it is commonly known). The Panelconsidered administrative guidelines as well as technicalstandards since both are essential to an effectiveconversion process.

5.1.2 What Is a Standard? A "Standard" is a consensus orccmrFon practice sometimes established by authority derivedfrom d desire to reduce arbitrary variety for economicreasons. A standard is one method of insuring compatibilityby using accepted conventions.

Several types of standards exist. For instance, a defacto standard is a practice accepted through common usag¥7a 1 though it has not been subjected to officialstandardization. On the other hand, standards have alsobeen approved by an authorized official for use within a

particular organization. For example. Federal InformationStandards Processing (FIPS) are those that have beenapproved by the Federal standards process. American NationalStandards are those standards that have been approved forvoluntary use by the American National Standards Institute(ANSI), a body made up of participants from government,industry, and academe. International standards are thosethat have been approved by the International Organizationfor Standardization (ISO) for voluntary use by membernations and international organizations.

In the case of FIPS, ANSI, and ISO, a formal mechanismhas been established to propose, review, approve, andvalidate standards. While ANSI and ISO standards arevoluntary for all participants, the FIPS standards aremandatory within the Federal Government. For mandatorystandards to be effective, an enforcement and validationmechanism is necessary.

5.1.3 Background. As yet, no formal DBHS standards exist.Many groups concerned about standardization are activelyworking in this area. The committee felt that a review ofthe various Standards activities would be worthwhile for thereader. The Committee included in its review the Data BaseDirections I Report and the activities being carried on byvoluntary groups; specifically, the ANSI/X3/SPARC andCODASYL croups.

Data Base Directions Overall, the recommendationsof the standards section of Data Base Directions I have twoprominent characteristics: they are still valid, and theyhave not been implemented. Specifically, the Committee atthat time recommended the undertaking of more internationalactivity in data base standardization. Despite considerableinterest in data base, standards are only now beginning tooccur at the national or international level. Nothingcomparable to the International Telegraph and TelephoneConsultative •Committee (CCITT) has been formed. Littleprogress has also gone on within the InternationalOrganization for Standardization (ISO). This lack ofprogress exists in spite of the various admonitions from

-94-

governmental operating bodies, such as Congressman JackBrooks and the House Operations Committee in the UnitedStates, European Economic Community Committees, etc. In

effect, no firm standards efforts have been undertaken.[Editors's note: subsequent to the preparation of thisreport but prior to publication. A MSI initiated several DBMSactivities.]

Voluntary standards bodies such as EC MA, and AmericanNational Standards I n s t i t u te/ St a n da rd s Planning andRequirements Committe (ANSI/SPARC et al.)have been slow toagree upon firm recommendations for standards or interimaction. The technical societies have not been able torecommend agreed upon and coherent roadmaps to the eventualrealization of data base standards. The art end/or scienceof decision- ma king using data base data does not seem tohave advanced. Date collection procedures may have improvedsince the days of manual actions with paper, but the scienceof decision- ma king based on such data does not seem to havemade significant progress. For example: Data BaseDirections I recommended the more accurate usage of terms.It suggested that "CDBS" (Computer Data Base System) be usedinstead of "DBMS" since organizations have always used database. This suggestion underscores the important new factorsto be considered by the reader of this new term. However,the suggestion has not had acceptance. On the positive sideof the data base standards issue, substantial additions havebeen made to the literature, and we anticipate eventual,improved understanding of these issues.

Many practical data base conversions have beenundertaken successfully -- both from conventional systemsto a computer data base system, or from one computer database system to another computer database system. A body ofevidence indicating the economic superiority for the database approach is accruing. The reader will read of suchevidence in other sections of this report.

t

American National Standards Institute ( ANSI ) . TheStandards Planning and Requirements Committee TSPARC) ofANSI /X 3 (Computers and Information Processing) establishedin 1972 a study group which was chartered to investigate thepotential for standardization in the area of DBMS.

During early deliberations, the study group concludedthat the proper subjects for DBMS standardization wereinterfaces within a DBMS. It was then necessary to define a

framework for DBMS. In 1975, the study group produced aninterim report [3] which described such a framework. A

revised report has also been published recently [A] whichrepresents the study group's most important ideas relevantto standardization.

-95-

The framework represents three kinds of objects:processing functions, pe r son - ro 1 e s , and interfaces betweenthese. The objects were divided into three basic levels:the internal, the conceptual, and the external. Theinternal level is oriented towards the most efficient use ofthe computing facility. The conceptual level contains a

logical representation of the persons, places, things, andtheir relations of interest to the organization using theDBMS. The external level contains different views of thedata base which various applications need to do theirinformation processing. The three levels are associatedthrough explicit mappings. Mappings, through their abilityto isolate changes to one level, facilitate data baseconversion and form a basis for orderly standardization ofDBMS. In addition, the ANSI/SPARC framework defines threeadministrator roles, one for each of the three levels in theframework. These distinct roles also are potential aids in

standardization in that responsibilities are clearlydel i n e a t e d

.

Although the study group itself did not recommend toits parent, ANS I /X3/SPARC , action for or againststandardization of any DBMS component, SPARC recommended toANSI/X3 that standardization efforts begin on a CODASYL DataDefinition Language (DDL) and on CODASYL additions toFORTRAN end COBOL for data manipulation (DML). Theseefforts have been initiated.

CODASYL . Through the work of several technicalcommittees coordinated by en executive committee, CODASYLhas continued to develop specifications for a datadefinition language (DDL) and data manipulation languages(DML) for incorporation into the existing COBOL and FORTRANlanguages. CODASYL has addressed a number of technicalproblems including concurrency and device independence andhas endeavored, through task groups of its COBOL and DDLcommittees, to resolve inconsistencies between the languagespecifications produced by the two committees. CODASYL hasauthorized the publication of three journals of development(DDL JOD, COBOL JOD, FORTRAN JOD) containing the respectivelanguage specifications to facilitate the work of ANSI indefining standards for the three lanauaaes.

-96-

5.2 POTENTIAL BENEFITS THROUGH STAMDARDI ZATI ON

Standardization can provide many benefits to vendorsand users involved in conversion tasks. In this sense,users mean the applications development people who willactually perform the conversion. The benefits can take manyforms. Among the benefits the committee felt could berealized we re:

Improved Portability,the logical rel atiparticular insure thsoftware implementatithe same logical data,dependencies and allowdifferent and improvedexpenditure of bothwould provide protect

i

sizeable investmentestablish a data base.

Defining in a standard wayonships of data would inat diverse hardv/are and/orons could successfully address

This could minimize hardwarethe users to progress to bothtechnology with a minimum

time and money. Portabilityon against loss of the usually

required by the user to

Improved Education. Standards could provide a basisupon which educational institutions/corporations canstructure their data processing trainingsatisfy the needs of their personnel,standards could reduce the time andretraining data processing staffs.

programs to

In effect,expense of

Enhanced Communication. Standarddefinitions would enhance communicationand/or vendors. As it now stands, thesebeing haphazardly developed and becomingwith little common understanding.

terms andamong usersterms are

widely used

Pr oduc t

productvendors,prec i segrea ter

Specification,specifications

Standardsthat could

would providebe met by al

1

End users will have therefore moreproduct specifications and could exerciseimpact on what products would be

manufactured to satisfy their needs.

Easier to Specify User Reouirements. Thestandardization machinery would enable the usercommunity to be involved in the evolution of thedata base technology. In this respect, it should be

self-evident that the user would have a far greaterinterest than the vendor in ensuring that thestandards applicable to data base reflect the needto minimize the problem of conversion.

-97-

5.3 SOFTWARE COMPONEK'TS IN CONVERSION PROCESS

This section examines the software components of thedata base environment which the committee felt could bestandardized. The rationale for standardization is based onfour different conversion scenarios and the componentswhich, if standardized, the committee felt would facilitateconversion. In all cases, technical feasibility of thestandard has not been considered. The term "data baseenvironment" is used to describe an environment in which themany parts making up an organization's data base would beavailable for shared use and coordination. This approach is

less confusing than using the term "DBMS environment," sincemany installations are merely using the DBMS software as a

substitute for traditional access methods. In thissituation, the environment has not changed; it is merely a

change of access method.

The term DBMS will be usedused to manipulate the computerbase. The conversion scenarios

to describe the softwareresident portion of the dataconsidered were:

Moving from a non-data base environment to a

data base environment using a standard DBMS.

Moving from a standard DBMS to the same standardDBMS within a different hardware environment.

Moving from a Type A standard DBMS to a differentType B standard DBMS.

Moving from a standard DBMS to a non-standard DBMSenvironment.

5.3.1 Scenario 1. Moving from the non-data base environmentto the standard data base environment. When considering theprocesses involved in data base conversion, several pointscan be identified which can be facilitated by theavailability of standard tools or methods.

Data Fl o w Analysis . To determine the organization'sdata requirements, a date flow analysis should beundertaken. To facilitate this analysis a standard approachis needed. It should include a complete description of thedata and where it is used, including all input and outputdocuments, both manual and automated processes.

Dictionary / directory Sy s tern (D/DS^). Although it may bepremature to consider a standard for a D/DS, this is animportant tool for definition analysis and data flowanalysis and to facilitate access to this information. A

-98-

more deteiled discussion of the possible capabilities is

presented elsewhere in the report.

Data Ma nagement Function . An organizational groupconcerned with overall responsibility for the management ofdata, both computerized and non-computerized. This functionshould have a specific functional description.

Data B ase Administrator . A position holding primaryre spon s i b iTi ty for the overall accuracy, timeliness, andavailability of the corporate data through direct controlover the data dictionary/directory system. This positionlocated within the date management function should haveclearly understood responsibilities.

Loading the Computer Date Base . To facilitate theconversion effort, specific conventions ere needed forpiecing date into deta bases initially.

C onver si on To the De t e Base E nvi ronment This can beaccomplished by either Til moving current applications tothe new environment with no major changes in process or (2)Greeting new applications. Both approaches use a DateManipuletion Language (DML) to access the data base from a

user-oriented view of the datebese.

5.3.2 Scenerio 2. A DBMS to DBMS conversion where each DBMSis the same standard and only the hardware changes. In thiscese, the steps in scenerio 1 were presumebly implementedalreedy. However, additional problems occur:

Date Transl at ion . Moving the dete in the sourcecomputer dete baTe to the terget computer dete bese. Thisre ou ires e translation of the data from physical structureof the source computer to the physical structure of thetarget computer. A needed standard facility would be a

method of unloeding the physicel dete to a stenderdinterchange formet, in displey formet representation, andthen loading that data into physical structure of the targetcomputer

.

Program Translation . Restatement of the applicationprograms reflecting changes to the description of dete erenecessery as well as changes to the application programs.Stenderd data description and dete menipuletion functionsare needed in addition to those stated in scenario 1.

-99-

5.3.3 Scenario 3. Conversion between standard versions ofdifferent DBMS. Assumes both standard DBMS incorporated thepoints in scenario 1. t^o additional points beyond thosedescribed above are needed, but significant differences inthe details of the data models may occur.

5.3.4 Scenario 4. Standard DBMS to non-standard DBMSconversion. Similar to Scenario 3 because there would be noreason to make this step unless no standard DBMS satisfiesinformation requirements.

5.3.5 Miscellaneous Standards Kecessary. Additional computercomponents considered and mentioned as desirable, but forwhich standards may be premature or which may not affect theconversion effort are:

Standard end-user facilities; i.e., a means wherebya non-computer professional can communicate with thedata base

.

Network interfaces.

Restart functions.

Security functions.

Communication facilities.

Command Languages (operating systems).

5.3.6 Non-software Components Necessary. Many non-softwarecomponents are involved in any conversion process. Thoseconsidered by the committee include:

Admini strati ve /management Procedures . The committeeidentified no standards which could be applied to theseprocedures during conversion. However, the development ofguidelines and checklists based on earlier experience andlessons learned would be very useful.

Verification of C ompl i a nee With Standards . Thecommittee believed there is a need for a standard approachto validating compliance with DBMS standards. An importantcomponent of such a validation would be having a set ofcommon procedures for measuring compliance.

-100-

5.4 RECOMMENDATIONS

Two areas should be considered immediately forstandardization if progress toward successful conversions isto be made. These two areas are:

Development of a standard DBMS.

Development of a Generalized Di c ti ona ry/D i r ec to rySy stem

.

5.4.1 The Development of e Standard DBMS. A DBMS standardwould benefit government and industry in their conversionefforts and the Committee encourages progress toward a DBMSstandard. The work of CODASYL group, with some possiblemodifications, appears to offer a first step toward thecomplete specification of a DBMS, although some committeemembers felt several sucessful commercial DBMS may become defacto standards. The committee also believed thatconsideration should be given to the idea that the standardshould not inhibit any future technological developments inhardware and software.

5.4.2 Generalized Dictionary/directory System. The use of a

D i c t i 0 n a ry / D i r e c to ry facility is Fii ghl y des i rabl e as a toolin any conversion process, whether from manual orconventional file to a data base environment or from oneDBMS to another DBMS. The advantages for the use of a

dictionary facility will be gained primarily in thefollowing two areas:

It will allow the proper assessment and definitionof the organization's data to be undertaken or to bemade prior to the start of a project.

It provides an essential management mechanism to

control the actual development of the new system.

Since, in this sense, the dictionary facilitychronologically precedes the DBMS targeted in theconversion, it appears likely that the direct support of thedictionary facility itself should not depend on the targetDBMS, or, for that matter, on any particular DBMS. In anycase, the dictionary facility must be able to supportdescriptive facilities for the data in a logical mode.

Another useful capability is that of being able to

produce processable data descriptions in the target DBMSenvironment. This does not, however, imply that this targetDBMS must be used in the implementation of the dictionaryfacility, but rather that a suitable (possibly manual)

-101-

interface to this DBMS be made available. It would behighly desirable, if not required^ that all DBMS had thesame logical interface. Then the dictionary could interfacewith multiple DBMS at the same time.

In discussing the architectural placement of thedictionary, this facility should not exist as an adjunct toany DBMS, but as a separate facility that interfaces to datain conventional files as well as to data being managed byone or more DMBS. A distinction needs to be made in themanner in which the dictionary facility is invoked;

In the case of a free-standing dictionary, thisfacility is invoked under user control. As such, it

is a user option if, and when, the dictionary is tobe invoked in the execution of any process.

In the case of an integrated dictionary, which meansthe data dictionary is directly available to theDBMS, this facility might be invoked automaticallyby the system itself in the execution of anyprocess, thus assuring synchronization of theprocess with the information contained in thedictionary. The feasibility of this approachdepends on the overhead burden it may generate.

Since the use of some of the currently availabledictionary facilities is believed to be of considerablevalue in the attainment of these and other goals, it isvisualized that a more general facility, called a

Generalized Dictionary /Directory System, would be moreeffective. In brief, a GD/DS is an information system in

and of itself whose subject matter is all the informationabout the enterprise on the following classes of entities:

Da ta Components .

Processing Components.

People and Organizational Components.

Events.

Attributes to be included in the data dictionary arethose about the entities themselves, the relationship thatexists among the entities, as well as the context in whichthese relationships exist. For example, among theattributes would be the relationship between systems andboth automated and manual procedures. The generalized datad i c t i 0 n a ry/ d i rec to ry should contain not only the logicalattributes and relationships between the entititesdescribed, but also attributes associated with the physical

-102-

location of the entities.

5.5 CONCLUSION

In summary, the committee strongly believes that manyareas for standardization exist within the conversionprocess and the data base environment in general. Thecommittee notes with frustration the lack of overallprogress since the first Data Base Directions workshop.

The committee noted that in actual use DBMS are oftenmisapplied. It is possible, however, that this misuserelates to the historical problems in the environments whenDBMS are introduced. For example, most organizations seemto be obtaining DBMS as their initial introduction into thedata base environment and then find a need for a

Dictionary /Directory which is then introduced as a secondaryaction. Only when both of these steps do not really solvethe problem, does the organization look at its totalorganization and address the fundamental work necessary.This fundamental work includes documenting data, systems,processes, etc., to insure their more effective operation inthis new DBMS environment.

The committee observed that the installation processactually should be reversed with the data dictionaryintroduced first. All of the data within the environmentand the systems that use them should be carefully documentedand defined (including all the developmental shortcuts thathad been taken in the past), and then entered into a

Generalized Dictionary/Directory. Once this is

accomplished, then the appropriate data bases to supporttheir various information systems in the organization can bebuil t

.

5.6 REFERENCES

The following references are included as part of thestandard panel Report as a convenience to the readers.Some of the material included was not used directly by thecommittee but did influence the thinking of some of theparticipants

.

1. Sibley, Edgar H., "Standardization and DatabaaseSystems," IFSM TR No. 23, University of Maryland, CollegePark , MD 20742

, July 1977

-103-

2. Berg, John L., (Editor). "Data Base Directions -

the Next Step," Proceedings of the Workshop of NBS and ACMheld et Ft. L^u(ier6^^e, FL , October 29-31 , 1 975 , NationalBureau of Standards Special Publication 451

3. ANS I /X3/SPARC Study on Data Base ManagementSystems, "Interim Report, 75-02-08," published as Vol. 7,No. 2 1975 of FDT, the Bulletin of ACM-SIGMOD

4. Tsichritzis, D. and A. Klug (eds.) "TheANSI/X3/SPARC DBMS Framework: Report of the Study Group onData Base Management Systems," AFIPS Press, 1977.

5. CODASYL Data Description Language Committee, "DataDescription Language," Journal of Development, Jan. 1978.

6. Leong-Hong, B. & Marron, B., "A Technical Profileof Seven Data Element Dictionary /Directory Systems." NBSSpecial Publication 500-3, February 1977

7. Lefkovits, Henry C, "Data Dictionary Systems,"Q.E.D. Information Sciences, 1977

8. CODASYL Data Description Language Committee^ "DDLJournal of Development," June 1973, NBS Handbook 113,January 1974, Page 155

9. "Date Dictionary/Directory Evaluation Criteria," M.Bryce & Associates, Inc., Cincinnati, Ohio

10. "Installing a Data Dictionary," EDP ANALYZER, Vol.16, No. 1, January 19 78

-104-

6. CONVERSION TECHNOLOGY--AN ASSESSMENT

James P. Fry

CHAIRMAN

Biographical Sketch

James P. Fry is the Director of DatabaseSystems Research Group in the Graduate School ofBusiness Administration at The University ofMichigan where he directs research in areas ofdatabase design, distributed systems, databaserestructuring and systems conversion. Previouslyhe held positions at the MITRE Corporation, TheBoeing Company, and E. I. du Pont de Nemours.

Nationally listed in Who's Who in ComputerResearch and Education and having published overthirty-five articles, Mr. Fry is a well knownspeaker at AFIPS and IFIPS conferences and aninvited lecturer to others. Professionally he hasparticipated in the CODASYL organization (on theSystems Committee and Database Program DiskGroup), the Association for Computing Machinery,National Chairman Special Interest Group onManagement of Data and the SHARE/GUIDE UserOrganization.

Pa rt i c i pa nt

s

tdward Birss''eter DressenUancy Goguen

Vincent LumRobert MarionShamkant NavatheSteven SchindlerArie Shoshani

Stanley SuDonald SwartwoutRobert Tayl orBeatrice Yormarkllichael Kaplan

ilugene Lowenthal

-105-

i

6.1 INTRODUCTION

6. 1.1 The Scope of the Conversion Problem. Two factors makeconversion necessa ry : changes fn users' functionalrequirements and changes in their performance requirements.These changes may necessitate the acquisition of newsoftware and hardware which, in turn, may require changes tothe existing data programs. For example, the acquisition ofa data base management system to replace a file managementsystem requires the integration of the original files into a

data base system and modification of the programs tointeract with it. The replacement of a current DBMS with a

new data base system to provide additional functionality mayrequire a different way of logically structuring theinformation (its data model) and a different kind oflanguage interface. The establishment of a communicationnetwork between differing systems to implement data sharingwill require dynamic (i.e., in real time) conversion of databetween different nodes in the sense that the same item willrepeatedly undergo conversion as it is needed, oralternatively, it requires conversion of programs when theytravel to different nodes to access data there. Even whenthe acquisition of new software or hardware is notwarranted, changes in a users' functional and performancerequirements can require data and program conversion.

What makes conversion difficult is the proliferation ofdata models and levels and styles of DBMS interfaces,internal data representation, and hardware architecture.This panel will examine the technology that has beendeveloped to perform conversions, analyze the areas whichrequire new or improved techniques, and consider strategiesfor minimizing the need to convert as well as forstreamlining the unavoidable conversions. Over the past sixyears, research and development has primarily centered onthe problem of converting in non-dynamic environments. Thefirst part of the section called "CONVERSION TECHNOLOGY"surveys tools and technology currently available for theconversion of data organization. More recently, data baseprogram convers i on-- probabl y the most difficult part of theconversion problem--has received attention. The middle partof this section considers the directions being taken by;

current data base conversion research. The final partanalyzes the status of the entire technology. The third andfinal section explores the factors affecting conversion, theapproaches for reducing the need to convert data andapplications programs, and the impact of new software andhardware technologies on conversion. Section 4 summarizesthe trends in conversion needs and tools.

-106-

6.1.2 Components of the Conversion Process. When conversionTs necessary, users wi 1 1 extract data from their sourceenvironment and restructure them to the form required forthe target environment. While the extraction andrestructuring may themselves be complex, these processes arefrequently complicated further by the undisciplined natureof the source data. For example, data may exist induplicate or contain numerous errors and multipleinconsistencies. The whole process of extracting it fromits source, "cleaning the data," restructuring it to a

desired form, and loading it into the targeted environmentis generally referred to as data con vers i on or transl ati on .

After a data conversion, particularly one involvingextensive restructuring, the application programs whichprocess the original data may not run correctly against thenew data. A small amount of restructuring may require onlya simple modification, while extensive restructuring mayrequire extensive rewrite or redesign. The process ofmodifying the programs to process the restructured data isreferred to as application program conversion ortranslation. (This report will not consider the problem ofprogram conversions not related to a change in datastructure.) Let us discuss data conversion and applicationprogram conversion a bit further.

Fi gure 6-1A Data Conversion Model

-107-

Concept s of Data Conversion

In brief and conceptual terms, the data translation orconversion process can be represented d i ag rammat i cal 1 y inFigure 6-1. As shown in this diagram, a data translationsystem generally requires three components: a reader, a

rest rue turer , and a writer. While the capability of eachcomponent depends on the individual design of a datatranslation system, the reader, in general, accesses datafrom its source environment to prepare it for furtherprocessing. The accomplishment of this processunquestionably requires a description of the data and, thus,a data description language. The writer is the functionalinverse of the reader, and puts the transformed data into thetarget environment. It too requires a description of thedata structure and shares with the reader the need of a datadescription language. The restructurer functions quitedifferently from that of the other two components. Thiscomponent, in general, extracts data from its source orinternal forms and restructures it to a desired format orstructure. This process usually requires a translationdescription language.

In a data conversion system with a limited applicationintended, the three components may not be distinguishable,nor the need for the two languages clear. For example, ifone wishes to create a conversion system merely to translateEBCDIC characters into ASCII, one can create a simple systemwith one component and a simple data description languageembedded in it. However, if development of a broadlyapplicable data translation system is the goal, clearly onemust have a reader and writer capable of accessing andputting out data in all kinds of environments and a powerfulrestructurer capable of all manners of manipulating data andof creating some data as well. Such a generalized systemimplies the need for versatile data translation descriptionlanguages.

Concepts of Data Base Program Conversion

Figure 6-2 represents a general approach to data baseprogram conversion or translation. To convert a program onemust determine the functions of the program, and itssemantics. Programmers making assumptions about the stateof the data may not, and currently need not, state theseassumptions explicitly in their programs. Therefore, oneusually must provide more information about the semantics ofthe program than that provided in the program text and itsdocumentation. The program conversion process also needsinformation about the data structure the program originallyran against, the new structure it must run against, and howthe two relate. These data descriptions could be the ,same

-108-

as those used to drive the data translation process. Theprogram, the description of its semantics, and thedescription of the data conversion are inputs to the programconversion process. It uses them to determine the dataoriginally accessed and how to accomplish the same access inthe new structure, and it produces a new program to do this.Currently, a combination of manual translation, emulation,and bridge programs accomplishes this conversion process andan automatic or semi-automatic program conversion technologyis only in the early stages of research.

PROGRAMSEMANTICSDESCRIPTION

DATASTRUCTUREDESCRIPTION

1

PROGRAMCONVERSION

TARGETPROGRAM

SOURCEPROGRAM

Figure 6-2Program Translation

6.2 CONVERSION TECHNOLOGYI

This section turns from the general concepts of dataand program conversion to the technology by which a

1 conversion is achieved. Since most of the conversionresults have been achieved in data conversion, we focus onthis aspect and limit our discussion on program conversion

j

to those cases which are the result of a data conversion.

-109-

,6.2.1 Data Conversi on Technol ogy

.

Currently, as the commonapproach to data conversion, one develops customs" zedprograms for each transfer of data from one environment toanother. This approach is inherently expensive: theprograms are developed for use only once and theirdevelopment costs cannot be amortized. Further, poorreliability results from the greater likelihood of. makingerrors as data is passed from program to program. For a

large data conversion effort, tracing data back throughseveral passes to the source of an error in the data at a

particular point can become unmanageable. As an alternativeone may search for a broader approach to data conversionwith a generalized system. We shall now describe suchsystems in more detail.

Problem Discussion . Data exists in many varied andcomplex forms; Figure 6-3 (next page), an expanded form ofthe diagram in Figure 6-1, indicates some of thetransformations that need to take place in a dataconversion. This diagram illustrates the capabilitiesneeded by the read and write process in a data conversionsystem.

Unloading of data from its source permits reducing thecomplex physical structure of the source data base to a verysimple physical structure; namely, a sequential data stream.The source data base contains not only the information thatinterests the user, but also a large amount of controlinformation, specific to a particular system. This controlinformation is used by the system for such data managementfunctions as overflow chaining, index maintenance, andrecord blocking. For example, many systems mark a recordto be deleted rather than actually delete it and the unloadtransformation can remove such system specific information.Another factor that causes complexity in the unloadingprocess is the frequent use of pointers in a source system.Pointers are used for two basic purposes: (1) to representrelationships that exist between record instances, and (2)to implement alternative access paths to the data. Duringthe unload transformation, the second class of pointers maybe discarded without loss of information. The first classof pointers, however, contains information that thetransformation must preserve.

The reformat process creates a common data form of thesource data for the restructuring step in the conversionprocess. In the case of a simple sequential file, not underDBMS control, it may enter the conversion process at thispoint. Depending on the design of a system, this step mayinvolve editing (i.e., encoding or receding of items),limited data extraction, correction, and the like.

-110-

-Ill-

Since this step seeks to create a data structure of thesource data without system-dependent information, one canconsider the mapping between the input and the output of thereformat process to be generally one-to-one. While thisstep looks simple functionally, its actual application andimplementation can be quite complex. For example, anapplication program may use the high order bits of a zoneddecimal number for its own purposes, knowing that these bitsare not used by the system. Such specifications ofnonstandard item encodings present a difficult problem in

data conversion.

The load process is the counterpart of the unloadprocess and needs no further clarification. Note, however,that the use of a common data form provides additionalbenefits, such as easing the portability problem.

The restructuring process undoubtedly represents themost complex process of a generalized data conversionsystem. The languages for this mapping process can differwidely (for example, some procedural and othernonprocedural) and the models used to represent the data inthe conversion system are also quite divergent. (Forexample, some use network structures; others usehierarchical structures). More will be said on this topiclater in this section.

Let us now turn to discuss the issue of implementationbriefly. Generally, there are two techniques: aninterpretive approach or the generative approach. In theinterpretive approach, the action of the system will bedriven by the descriptions written in the system's languagesvia the general interpreter implemented for the particularsystem. In the generative approach, the data and mappingdescriptions are fed into the compiler(s) which generates a

set of customized programs executable on a certain machine.Later in this section we'll discuss the merits of each ofthese approaches.

Turning our attention to the tools that have beendeveloped for data conversion, we shall first discusscurrently available tools and then the research anddevelopment work in progress.

Ava i 1 abl e Conversion Tool

s

. Currently, available toolshave limited capabilities. Because it is impossible in thisshort report to provide an exhaustive survey of all thevendor-developed conversion tools, we will highlight thespectrum of capabilities available to the user by providingexamples from specific vendor shops.

-112-

The repertoire of vendor conversion tools begins at thecharacter encoding level of data conversion with theprovision of ha rdwa re/ f i rmwa re options and continues throughthe software aids for conversion and restructuring of databases .

Depending on a diversity of conditions, the need todevelop software tools varies from vendor to vendor.Probably the most prevalent file conversion tool is a COBOLforeign file processing aid. This type of facility allowsthe direct reading or writing of a particular class of filessuch as EBCDIC tapes or Honeywell Series 200/2000 fileswithin COBOL. Although a relatively widespread facility,its capabilities are nevertheless limited. For example,some do not handle unlabeled tapes, while others cannotprocess mixed mode data types. Aside from the work ofBehymer and Bakkom [DTll], which was aimed toward achievinga general conversion bridge with a particular vendor, to ourknowledge there are no vendor supported general i zed filetransl ati on tool s

.

In contrast to the above file translation tools, toolshave been developed that have their main applications in a

data base environment. One example of a data baseconversion aid is the I-D-S/II migration aid provided byHoneywell. Because of the large volumes of data involvedand the fact that the user cannot afford to shut down hiswhole data processing shop, a co-existence approach wasadopted. The first step is to reformat the I-D-S/I database into the I-D-S/II format, making the necessary datatype conversions and pointer mechanism adaptions. This stepallows the data base to be processed in the I-D-S/II mode,but not optimally. Additional steps in this migrationinclude the generation of the additional I-D-S/II pointerfields (I-D-S/II requires "Prior & Header" chain pointers,which are allocated in step 1 but not filled in) and therestructuring of the I-D-S/II (Coexistence) to the moresophisticated capabilities of I-D-S/II.

Some data base restructuring tools specific to a

particular DBMS have been developed by DBMS users. Oneexample of this type of tool is REORG [Rll], a systemdeveloped at Bell Laboratories for reorganization of UNIVACDMS-1100 data bases. REORG provides capabilities forlogical and physical reorganization of a data base using a

set of commands independent of DMS-1100 data managementcommands. A similar capability has been developed at theAllstate Insurance Company.

-113-

In addition to the above, there are also softwarecompanies and vendors who will do a customized conversiontask on a contractual basis.

Data Conversion Prototype s and Model

s

. Over the pastseven years, a greaT deal "of research on the conversionproblem has been performed, with the results summarized inFigure 6-4. The University of Michigan, the University ofPennsylvania, IBM, SDC , and Bell Laboratories initiatedprojects, as well as a task group of the CODASYL SystemsCommittee. In many cases, interaction and cross-fertilization between these groups led to some consensus onappropriate architectures for data conversion. Theindividual achievements of these groups is discussed below:

The CODASYL St ored - Da ta Description and Transl ati onTa sk Group In 1970 , the CODASYL Systems Committee formed a

task group (originally called the Stored StructureDescription Language Task Group) to study the problem ofdata translation. The group presented its initialinvestigation of the area in the 1970 SIGMOD (then SIGFIDET)annual Workshop in Houston [SLl]. In 1972, the group wasreformulated as the Stored-Data Description and TranslationTask Group and presented a general approach to thedevelopment of a detailed model for describing data at alllevels of implementation [SL4,DT2]. The most recent work ofthe group specifies the data conversion model and presentsan example language for describing and translating a wideclass of logical and physical structures [SL8]. Thestored-data definition language allows data to be describedat and distributed to the access path, encoding, and device1 ev el s .

The University of Michigan . The nonprocedural approachto stored-data definition set forth by Taylor and Sibley[SL3,6] provided one of the major foundations for thedevelopment at the University of Michigan (see Figure 6-4)of data translators. In concert with Taylor's language,Fry, et al. [DTI] initiated a model and design for a

generalized translation.

The translation model was tested in a prototypeimplementation of the Michigan Data Translator in 1972[UT2,4], and the results of the next implementation. VersionI, were reported by Merten and Fry [DT4j.

In 1974, the work of the Data Translation Project ofthe University of Michigan focused on the data baserestructuring problem. Navathe and Fry investigated thehierarchical restructuring problem by developing severallevels of abstractions, ranging from basic restructuringtypes to low level operations [R6].

-114-

S Eh(X. <o (2J oW Oh> cca oa o

o ••

« Pd

1 rHc ft O

o 0 •H 0rt ^4-! o o

tn ft ra CO•H 0)

Id 0)

ai o;3 (n c o

(D o oan ti 2rH cd oj

+i Hai CM

ana

1

an SL

H H) p0) at3 o 0O +J •H

CO

EHQ

O sM<

bOol d3 0H

-P0)

H

§CQ

a0)

+J VH a(U ai •H

-P <M0 cd <U

S "3 Q

0) oft o>» ^-P oO CO-PO O 0)

!h +J Sh

ft Oi OrH O

0) C+-1 cd

•H -p VO

P cd mcn +J Eh0) cd Q05 -d^

cd ft

o>H B5EH 5M C5CO M« 33H O

H0)

^3 tn O

in Po

H -P 3 col o ^< oo 3 -P o •H•H !-l 4JhO -p Cd

0 tn H o rHrH (ti Cm tn

0) O•H cn §tn hO'—>j

rH O •H "

<U 3 ft ft cd

13 Si ft J p5 +^ 0 cd CO cd

S cn -P S Q

P ?-l rHO O -Ht< +J <tH

ft cd

H rH I 1

t3 cn Cd -d-

o; C -H 6h+J cd +J Qo M a -

•H +-> OJ J-3 -

p cd cr*

M +J (U Fh(U cd tn 3« T3

o cc

O 1 60 •H •Ha -p

-p o •H ft to 0) oEh o u > H cd >HO a D-i Ch ft rH Eh •H Cd

LA 0 <u O O rH CM>H « o c rH cn to a Cd a EH

Eh K m 0 <u o o Q^ CO s O •M H O Q •H OJ •HQ o o tn tn P -P a in -3-

O w M •H u tn cd (d UO o EH

AS0 m -P bO 0) COM Cd > ft 01 cd cn >3 EH o 43 ^ Q a Cm co M •H o T) 1 cd 0 o 1)

Eh Cm o c oi J t3 (J ?-

CO H •H cd O CO 0) Eh ft rH 3a O cd U '—

i

^ 3 OJ cd -p1) e ft 0 13 O p CJ

ft cd 1) ft -p +J fn O cd

CO t3 rH td -r-l CO cd C2 S 'C' p

F i gure 6 -4

Historic Context of Data Conversion Efforts

-115-

pc:

4!a aPt oO ft1-4 -P<u cd

> HEh

H9

-P EHca Q

c1)

a "r-l HO4 EhB O

(d

au<up!«!

<U

<M COO K

Oh

O <H

U ttl

O -Pm

0) a>J -p

ec >H

> aM s

o

M OW MK XPd U> MM s:

°^-P Ehm '—

'

P >H

oH Pc3 aj ^e H Hi< W O(U c oP cC ^H !h OW Eh m

0) M

C P(L) W

0)

o o

W ^ O

P 0)

>H 05M Eh< COQO WO O

O MEh fe,

Q

C ^<

D OJ

P >P Gai 0

Cm 00 CO T3

0)

r-l W P0) 1—

.

00)

0 d 0S -P •H

oi OiOJ

0rH T3 p•H <U ftai !h 3 a cP 0 p 0 00 p -H •HQ tn 0 P CO

Figure 6-4 (continued)Historic Context of Data Conversion Efforts

-116-

Later, Navathe proposed a methodology to accomplish theseoperations using a relational normal form for the internalrepresentation [DT12]. Version II of the Michigan DataTranslator was designed to perform hierarchicalrestructuring transformations, but the project did notimplement it. Instead, the research was directed into thecomplex problem of restructuring network type data bases.To address this problem, Deppe developed a dynamic datamodel--the Relational Interface model--which simultaneouslyallowed a relational and network view of the data base[UR3]. This model formed the basis of the Version IIAdesign and implementation of generalized restructuringcapabilities [LIT8,9,10]. Another component necessary forthe development of a restructurer was the formulation of a

language in which to express the source to target datatransformations. This language, termed TranslationDefinition Language (TDL), evolved through each translatorversion beginning with a source-to-target data item "equatelist" in the Version I Translator to the networkrestructuring specifications of Version IIA. While theinitial version of the TDL was quite simplistic, the currentversion, the Access Path Specification Language [DT16,R9],provides powerful capabilities for transforming network databases.

iThe University of Penn syl van i a . Concurrent with the

j

work at the UniversTfy ot Michigan, Smith at the UniversityI

of Pennsylvania (see Figure 6-4) also took a datadescription approach and developed a stored-data definitionlanguage (SDDL) for defining storage of data on secondarystorage devices, and a translation description language

Ij (TDL) [SL2,DT2] and three levels of data base structures,the logical, storage, and physical, are described using the

I SDDL. In order to describe the source-to-target datamappings, a first order calculus language was used.Following from this work, Ramirez [DT3,6j implemented a

language-driven "generative" translator which created PL/1programs to perform the conversion. One of the firstreports on the utilization of generalized translation toolswas provided by Winters and Dickey [TAI]. Using thetranslator developed by Ramirez, they installed it on their

I

system, and applied it to converting IBM 7080 files.

IBM Research , San Jose In 1973, another major datatranslation research endeavor was initiated at the IBMResearch Laboratory in San Jose, California. Researchers in

jthis project--initial ly Housel , Lum, and Shu, later joined

I

by Ghosh and Tayl or- -adopted the general model as specifiedi

in Figure 6-1 but made several innovations. First, in thebelief that programmers know well the structure of the datain a buffer being passed from a DBMS to the applicationprogram, the group concentrated its effort on designing a

-117-

data description language appropriate for describing data atthis stage. Second, regardless of the data model underlyingany DBMS, the data structure at the time it appears in thebuffer of an application program will be hierarchical. Thegeneral architecture, methodology, and languages reflectingthese beliefs is reported in Lum et al.[DT14].

In addition, the group in San Jose felt that, while itis desirable to have a file with homogeneous record types,it is a fact of life that many of today's data are still inCOBOL files in which multiple record types frequently existwithin the same file. As a result, the group concentratedon designing a data description language which can describenot only hierarchical records (in which a relationalstructure is a special case) but also most of the commonlyused sequential file structures. This language, DEFINE, isdescribed by Housel et al-[SL7].

The philosophy of restructuring hierarchies is furtherreflected in the development of the translation definitionlanguage CONVERT, as reported by Shu et al [R2]. Thislanguage, algebraic in structure, consists of a dozenoperators, each of which restructures one or morehierarchical files into another file. The languagepossesses the capability of selecting records and recordcomponents, combining data from different files, built-infunctions (e.g., SUM and COUNT), and the ability to createfields and vary selection of the basis of a record's content( a CASE statement ) .

A symmetric process occurs at the output end of thetranslation system. Sequential files are created to matchthe need of the target loading facility. The specificationof this structure is again made in DEFINE.

A prototype implementation, originally called EXPRESSbut renamed XPRS, is reported in [DT15].

System De vel opmen t Corporati on Another restructuringproject reported by Shoshani [R3,4] was performed at TheSystem Development Corporation in 1974-1975. In order toavoid the complexities of storage structure specification(i.e., indexes, pointer chains, inverted tables, and thelike) they chose to use existing facilities of the systemsinvolved. In particular, they advocated the use of queryand load (generate) facilities of data base managementsystems. However, when such facilities do not exist,reformatters from the source (e.g., index sequential file)to a standard form and from the standard form to same outputfile had to be used. Given that data bases can bereformatted to and from a standard form, they concentratedon the problem of logical restructuring of hierarchical data

-118-

bases in this form.

The language used in the above project for specifyingthe restructuring functions (called CDTL--Common DataTranslation Language) was designed to be conceptuallysimple. For the most part, it provides functions forspecifying a mapping from a single field (or combination offields) of the source to a single field of the target. Forexample, while a DIRECT would specify a one-to-one mappingof source items to target items, a REPEAT would specify therepetition of a source item for all instances of a lowerlevel in the target hierarchy. In both cases, only thesource and target fields need to be mentioned as parameters.In addition, there are more global operations, such as theINVERSION operator, which causes pa rent/ dependent recordrelationships to be reversed. The system also supportedextensive field restructuring operators, where individualfield values could be manipulated according to prescribedlanguage specifications. Since most of these operators arelocal, there is the possibility that they could be used incombinations that do not make sense globally. Therefore, a

further component of the system was built to perform"semantic analysis," which checks for possibleinconsistencies before proceeding to generate the targetdata base.

Bell Laboratories The Bell Labs data translation systemADAPT DAta Parsing and Transformation system), currentlyunder development, is a generalized translation systemdriven by two high-level languages [DT17]. With the firstlanguage one describes the physical and logical format andstructure of the data and to provide various tests andcomputations while parsing the source data and generatingthe target data. The second language is used to describethe transformations which are to be applied to the sourcedata to produce the target data. Extensive validationcriteria can be specified to apply to the source and targetdata .

Two processing paths are available within the ADAPTi

system: a file translation path and a data base translation' path (see Figure 6-3). A separate path for file translation

responds to real-world considerations: many types ofconversions do not require the capabilities and associatedhigh overhead involved in using a data base translationpath.

I

Related Work Additional research effort examines the

Idevelopment and acceptance of a standard interchange form.

' An interchange form would increase the sharing of data basesand provide a basis for development of generalized datatranslators. The Energy Research and Development

-119-

Administration (ERDA) has been supporting theInterl aboratory Working Group for Data Exchange (IWGDE) inan effort to develop a proposed data interchange form. Theproposed interchange form [GG2] has been used by severalERDA laboratories for transporting data between thelaboratories. Additional work on development of interchangeforms has been pursued by the Data Base Systems ResearchGroup at the University of Michigan [UT14].

Navathe [RIO] has recently reported a technique foranalyzing the logical and physical structure of data baseswith a view to facilitating the restructuring specification.Data relationships are divided into identifying andnon i dent i f yi ng types in order to draw an explicit schemadiagram. The physical implementation of the relationshipsin the schema diagram is represented by means of a schemarealization diagram. These diagrammatic representations ofthe source and target data bases could prove to be veryuseful to a restructuring user.

6.2.2 Application Program Conversion. So far, we haveconcentrated on the data aspects of the conversion problem;it is necessary to deal as well with the problems ofconverting the application programs which operate on thedata bases. Program conversion, in general, may bemotivated by many different circumstances, such as hardwaremigration, new processing requirements, or a decision to

adopt a new programming language. Considerable effort hasbeen devoted to special tools such as those to assistmigration among different vendor's COBOL compilers, andgeneral purpose "decompilers" that have been developed totranslate assembly language programs to equivalent softwarein a high level language. Wnile progress has been madedeveloping special purpose tools for a limited programconversion situation, little progress has been made inobtaining a solution to the general problem of programconversion. With this fact in mind, this section focuses onthe modifications to application programs that arise as a

consequence of data restructuring/conversion.

Probl em Statement . There are three types of data baseswhich can affect application programs:

alterations to the data base physical structure, forexample, the format and encoding of data, or thearrangement of items within records

changes to the data base logical st r uc t ur e- - e i t her

:

a. the deletion or addition of access paths toaccommodate new performance requirements, or

-120-

b. changes to the semantics of data, for example,modification of defined relationships between recordtypes or the addition or deletion of items withinrecord s

migration to a new DBMS, perhaps encompassing a datamodel and/or data manipulation language differentfrom the one currently in use

The actual impact of these data base changes onapplication programs is a function of the amount of dataindependence provided by the Data Base Management Systems.Data independence and its relationship to the conversionproblem are discussed elsewhere [GG3]. We assume hereincomplete data independence and that therefore some degreeof program conversion is required in response to data baseschema changes. In fact, whereas most commercial data basemanagement systems provide application programs withinsulation from a variety of modifications to the physicaldata base, protection from logical changes--part i cul arl y atthe semantic level--is minimal. Examples of semanticchanges likely to have a profound effect on applicationprograms i ncl ude

:

Changes in relationships between record types, suchas changing a one-to-many association to a many-to-many association or vice-versa.

Deletion or addition of data items, record types, orrecord relationships.

Changing derivable information ("virtual items") toexplicit information ("actual items") or vice-versa.

Changes in integrity, authorization or deletionrules.

Various properties of data base application programsgreatly complicate the conversion problem. For instance,many data base management systems do not require that therecord types of interest (or possibly even the data base ofinterest) be declared at compile time in the program; ratherthese names can be supplied at run time. Consequently atthe compile time, incomplete information exists about whatdata the program acts on. Other troublesome problems occurwhen programs implicitly use characteristics of the datawhich have not been explicitly declared (e.g., a COBOLprogram executes a paragraph exactly ten times because theprogrammer knows that a certain repeating group only occursten times in each record instance). Complexity is

introduced whenever a data manipulation language is

-121-

intricately embedded in a host language such as COBOL. Theinterdependence between the semantics of the data baseaccesses and the surrounding software greatly complicatesthe program analysis stage of conversion. Because of theseconsiderations, substantial research has been devoted toalternatives to the literal translation of programs. In

particular, some currently operational tools utilize sourceprogram emulation or source data emulation at run time tohandle the problem of incomplete specification of semanticsand yet still yield the effects of program conversion.

Current Approaches . In this section, we discuss twomain techniques currently employed in the industry. Thesetechniques are commonly used but unfortunately notdocumented in the form of publications.

DML Statement Substitution

The DML substitution conversion technique, which can beconsidered an emulation approach, preserves the semantics ofthe original code by intercepting individual DML statementcalls at execution time, and substituting new DML statementcalls which are correct for the new logical structure of thedata base. Two IBM software examples which provide thistype of conversion methodology are 1) the ISAM compatibilityinterface within VSAM (this allows programs using ISAM callsto operate on VSAM data base), and 2) the BOMP/DBOMPemulation interface to IMS. This program conversionapproach becomes extremely complicated when the programoperates on a complex data base structure. Such a situationmay require the conversion software to evaluate each DMLoperation against the source structure to determine statusvalues (e.g., currency) in order to perform the equivalentDML operation on the new data base. Generalization of thisapproach requires the development of emulation code for thefollowing cases: maintain the run time descriptions andtables for both the original and new data baseorganizations, intercept all original DML calls, and utilizeold-new data base access path mapping description (humaninput) and rules to determine dynamically what set of DMLoperations on the new data base are equivalent to eachspecific operation on the source data base.

Although a conceptually straightforward approach, ithas several drawbacks. The drawbacks can be categorized asdegraded efficiency and restrict i veness . Efficiency is

degraded primarily because each source DML statement must bemapped into a target emulation program, which uses the newDBMS to achieve the same results. The increased overhead in

program size and/or processing requirements can besignificant.

-122-

The drawback of restrict!' veness comes about because theemulation approach inhibits the utilization of the increasedcapabilities of the new DBMS and/or data structure throughthe modeling of the old methods. Additionally dependenceupon the old program semantics limits the sets ofpermissible new data structures that must support all of thesemantics of the source program if the source program is tocontinue to execute in the same manner. Note that the rulescan be quite complex, even for the limited situation ofwhich the data structure changes preserve semanticequivalence. Therefore, in some instances, just the limitedtask of determining if a change in data structure (given nochange in the data model) will support a set of sourceprograms will be an extensive task.

Bridge Program The second method in use today issometimes referred to as the Bridge Program Method. In thistechnique, the source application program's accessrequirements are supported by reconstructing from the targetdata base that portion of the source data base needed. Datareconstruction is done by means of "bridge programs." Thesource program is then allowed to operate upon thisreconstructed portion of the source data base to effect thesame results that would occur if the source data base werenot modified. Of course, a reverse mapping is required toreflect and update, and each simulated source data basesegment must be prepared before it is needed by theapplication program.

This approach suffers from the same types ofdisadvantages inherent in the emulation approach.Efficiency problems for complex/extensive data bases andprograms performing extensive data accessing can make thismethod prohibitively expensive for practical utilization.This technique is generally found as a "specific softwarepackage" developed at a computer installation rather than asa standard vendor supplied package.

Current Research . Differing from the emulation andbridge program approaches, current research aims towardsdeveloping more generalized tools to automatically or semi-automat i cal 1 y modify or rewrite application programs. Thedrawbacks of the existing approaches described above can be

javoided by rewriting the application programs which would

I

take advantage of the new structure and semantics of a

Iconverted data base and by using a general system to do the

f

conversion rather than using ad hoc emulation packages andbridge programs.

-123-

Research on application program conversion is still in

its infancy. Consequently, very few published papers on'

this subject exist. This section describes a handful of'works in the order of the dates of publication. Mehl and

|

Wang [PT6] presented a method to intercept and interpret '

DL/1 statements to account for some order transformations ofj

hierarchical structures in the context of the IMS system.Algorithms involving command substitution rules for variousstructural changes have been derived to allow the correct

|

execution of the old application programs. This approachworks only for a limited number of order transformation of!segments in a logical IMS data base. Since it is basicallyan emulation approach, it has the drawbacks discussed in theprevious section.

A paper by Su [PT12] gives a general model of!application program conversion as related to data basechanges resulting from a data base transformation. An

f

attempt was made to identify the tasks required for thej

automatic or semi-automatic conversion of application r

programs due to data base changes. The paper stresses two I

main points: 1) the need for extensive analysis of anapplication program including the analysis of program logic,

i

data variable relations, prog ram- s ub prog ram structure,execution profile, etc.; and 2) the use of data basetranslation operators to determine what program;transformations are required to account for the effects of*;

these operators. The idea of the use of a common languageto describe the operations of source queries and the data

;

translation statements is also proposed.

An approach to the transformation of DBTG-like programsin response to a data base restructuring was proposed by*Schindler [PTIO]. The approach is based on the concept ofcode templates, which are predefined sequences of host

^

1 anguage--DML statements (roughly analogous to assembly;language macros). Application programs can be written asnested code templates. The code templates are devised sothat each one corresponds to an operator in the relational

|

algebra. An application program is then mapped into a

relational expression, transformations are performed on theexpression to accommodate the data base restructuring, and a

new program is generated by mapping the transformedexpression back into code templates. This approach suggestsfthat a level of logical data independence may be achievedthrough current programming technology.

The work by Su and Reynolds [PT15] studied the probleml*of high-level sublanguage query conversion using thejrel ati onal . model with SEQUEL [Z5] as the sublanguage, DEFINE^[SL7] as the data description language and CONVERT [R2] as''

the translation language. Algorithms for rewriting the!

-124-

source query were derived and hand simulated. In thisstudy, query transformation is dictated by the datatranslation operators which have been applied to the sourcedata base. The purpose of this work was to study theeffects of the CONVERT operators on high-level queries.Only restricted types of SEQUEL queries were considered.This work demonstrates that a general program conversionsystem should separate the data model and schema dependentfactors from the data model and schema independent factors;and an abstract representation of program semantics and thesemantics of data translation operators need to be sought sothat data conversions at the logic level (especially thetype which changes the data base semantics) and the DBMSlevel can be attempted.

Two indpendent works carried out about the same time bySu and Liu [PT13] and Housel [PT14] take a more generalapproach to the application program conversion problem. Theformer work is based on the idea that the same datasemantics (a conceptual model) can be modelled externally byvarious existing data models (relational, hierarchical andnetwork) using different schemas. Application programs aremapped into an abstract representation which representsprogram semantics in terms of the primitive operations(called access patterns) that can be performed on dataentities and associations. Transformation rules are thenapplied on the abstract representation based on the types ofchanges introduced by the data translation operators. Thetransformed representation is then mapped into anotherintermediate representation (called access path graphs)which is dictated by the external model and specific schemaused for the target data base. This representation is thenmodified by an optimization component and used for thegeneration of target programs. This work stresses that thesemantics of both the source and target data base be madeexplicit to the conversion system and be used as a basis forapplication program analysis and transformation. Theconversion methodology described is for program conversionto account for data conversion at the logical level as wellas the DBMS 1 evel .

Housel extends the work on application migration"undertaken at the IBM San Jose Laboratory. This work uses a

common language for specifying the abstract representationI of source programs as well as for specifying the data'l transl ati on operations. The language is a subset of CONVERT

i

with some of Codd's relational operators [GG4]. Theoperators of the language are designed to have a simplesemantics and convenient algebraic properties to facilitate

i program transformation. They are designed to handle data

ji

man i pul at i on in a general hierarchical structure called a

"form" as well as relational tables. In this system.

-125-

program transformation is dictated by the data mapping «

operations applied to the source data base. The proposedmodel assumes that the inverse of these data mappingoperators exists; i.e., the source data base can bereconstructed from the target data base by applying someinverse operators on the target data base. More precisely,it is assumed that M'(T) = S where S is the source data

i

base, T is the target data base, and M is the mappingjfunction. Thus, program conversion is done by substitutingf

the inverse M'(T) into the specification language statements(the abstract representation of the source program) for each

\

reference to the source data base. This process is followed 1:

by a simplification procedure to simplify the resultingstatements (the target abstract representation of the

i

program). The author points out that the assumption on theexistence of M'(T) restricts the scope of the conversionproblem handled by the proposed approach.

2

S

Presently, the Data Base Program Conversion Task Group t

(DPCTG) of the CODASYL Systems Committee is investigating ;

the application group conversion problem. The group is,

looking into various aspects of the problems including|

decompilation of COBOL application programs, semantic|

changes of data bases and their effects on application\

programs, program conversion techniques and methodologies,i

etc .J

To this date, the work on application programi

conversion is still very much at the research stage and more >

progress has to be made before we can start actualimplementation. The problems of automatic application ;

program conversion are multitudinous and extremely complex.:

Current research indicates that program conversion ispossible for some types of data conversion, but the

;

complexity of program conversion depends on how drasticallythe data has been modified. Further research needs to beundertaken to determine what can be done automatically, what

j

can be done semi -automati cal 1 y , and what cannot be done at;

all. A fully automatic tool is hard to achieve. Buildingi

semi-automatic systems or systems which provide aids for (

manual conversion would be a more realistic goal. :

Current Research Directions . The current research has r

uncovered several problems which need to be investigated[

further before the implementation of a generalizedi

conversion tool can be attempted. The following issues are,;

believed to be important for future research:5

Semantic Descri pti on of Data Base and A p p 1 i c a t i o n ;

Programs Based on the work by Su and Liu [PTI 3] and""thestudy of the DPCTG group, it is quite clear that a programconversion system would need more information about the

-126-

semantic properties of the source and target data bases thanthe information provided by the schemas of the existingDBMS. Semantic information of the data bases is animportant source for determining the semantics ofapplication programs which is the real bottleneck of theapplication program conversion problem. Future researchneeds to be conducted to 1) model and describe the semanticsof application programs, 2) study the meaningful semanticchanges to data bases and their effect on applicationprograms, and 3) derive transformation rules for programconversion which account for the meaningful changes.Several existing works on data base semantics[Mil , SMI , 2 , 3,DL29] may provide a good basis for future workson this subject.

Equivalency of Source and Ta rget Programs Dataconversion may alter the semantic contents of the sourcedata base. A converted application program may or may notperform the identical operation on the target data as thesource program on the source data. For example, it may notretrieve, delete, or update the same data as the sourceprogram because some records may be deleted and datarelations may have been changed in data conversion. It isnot clear at all how we can prove, in general, that a targetprogram generated by a conversion system still preserves theoriginal intent of the source program. Naturally, if thesource data can be reconstructed from the target datawithout losing the original data relations and occurrences,we can establish the equivalence relation between the sourceand target programs based on the same effort they have onthe source and target data.

Decompi 1 at i on Program conversion via decompilation is a

technique whereby a data base application program is firsttransformed into an operationally equivalent higher order

i language or an abstract representation and then returned to

j!a usable language level in a converted form. Thetransformation to a higher order language level is a

1 decompi 1 at i on process and the process of returning the

I

program to a language level appropriate to conventionalcompilers is a compilation process. The underlying conceptis that the decompilation to a higher order language canproduce a functionally equivalent program that does notcontain the DBMS, data model and data dependencies thatinhibit the conversion process. That is, the decompiledprogram has the same "intent" while being unrelated to the

I changed DBMS environmental conditions. The changedij env i ronmental conditions should be easily incorporated intothe program during the process of compiling the program backinto a form appropriate to the new system.

-127-

Some researchers think that this would be the preferredmethod to effect DML/host language program conversion. Itshould avoid many of the ef f i c i ency/ rest r i c t i on drawbacksinherent in current automated methods, while being more costeffective and less error prone than current manual methods(e.g., program rewrite).

One likely disadvantage to this method is that in orderto use it to convert existing data base application programsthe programs may have to first be manually altered to placeDML related code in a structured format. This disadvantageis to be expected because of the ambiguity inherent in theorganization of DML/host language programs. However, thedevelopment of structured programming templates designed forDML related code should provide a means for creatingprograms that are convertible by the decompilation method.Structured templates might also provide the needed insighttoward the dev el opment/ sel ect i on of an appropriate highlevel language into which programs can be compiled. Someinitial concepts of data base program templates have beenproposed by the University of Michigan [PTIO],

Conversi on Aids A system which provides assistance toconversion anal yst s would seem to be a practical tool and a

feasible task. Given the information about data changes andsemantics of the data, a system can be built to analyzeapplication programs to 1) identify and isolate programsegments which are affected by the data changes, 2) detectinefficient code in the programs, 3) produce a programexecution profile [GGl] which gives an estimate of thecomputation time required at different parts of the program,and 4) detect, in some cases, the program code which dependson the programmer's assumption of data values, ordering ofrecords, record or file size, etc. The data obtained in 1,

together with some on-line editing and debugging aids, wouldspeed up the manual conversion process. The data obtainedin 2 and 3 would be useful for producing more efficienttarget programs and the data obtained in 4 would help theconversion analyst to eliminate the "implicit semantics' in

programs which makes the program conversion task (manual orautomatic) extremely difficult. A more complete cross-referencing than that usually produced by today's compilerscan assist the conversion analysts in identifyingramifications of changes to programs. An example of such a

product is the Data Correlation and Documentation systemproduced by PSI-TRAN Corporation. One technique, sometimesused during a conversion process that has been initiated bya data base structure change, is to alter the names ofeffected data base items in the DDL only and use errorsgenerated by the compiler to locate those program segmentsneeding changes. A more complete cross-referencing systemwould be a much better tool, if it were available.

-128-

Optimi zation of Ta rget Program As the result of dataconversion, multiple access paths to the same data mayoccur. This is because redundant data may be introduced ornew access paths may be added in the course of dataconversion. In this situation, a conversion system willhave the choice of selecting a path to generate the targetprogram. The efficiency of the program during executiontime may depend on the selection of optimized access pathduring program conversion. Also, for reasons of achievinggenerality, some program conversion techniques proposed[PT13,14] convert small segments of programs or theequivalent of DML statements separately. It is necessary todo a global optimization or simplification to improve theconverted program. Techniques for program optimizationrelated to program conversion need to be investigated.

6.2.3 Prototype Conversion Systems Analysis. This sectionana 1 y ze s the state ot the art of general i zed data conversionsystems. It summarizes what has been learned in the variousprototypes. The prototypes have yielded encouragingresults, but some weak points have also emerged. A sectionbelow lists some questions that remain to be answered andcomments on additional features that will be necessary toenhance usability. The following section on Analysis ofArchitecture analyzes some implementation issues which canaffect the cases where a generalized conversion system canbe a ppl i ed

.

Where Do W£ Stand . The prototype systems described inSectioTi 5". 772 have been used in a few conversions. Whilesome of these tests were made on "toy files," a few of thetests involved data volumes from which realistic performanceestimates can be extrapolated. This section will summarizethe major tests that were done with each of the prototypes.

The Penn Transl ator The translator developed by Ramirezat the University of Pennsylvania [DT3,6] processes singlesequential files to produce single sequential target files.Facilities exist for redefining the structure of source filerecords, reformatting and converting accordingly.Conversion of the file can be done selectively using user-defined selection criteria. Block size, record size, andcharacter code set can be changed, and some useful datamanipulation can be included.

The translator was used in several test runs on anIBM/370 Model 165. The DDL to generated PL/1 code expansionratio was 1:4, so coding time was reduced.

-129-

A further test of the Penn Translator was conducted byWinters and Dickey [TAl]. An experiment was conductedcomparing a conventional conversion effort against the PennTranslator (slightly modified). Two source files stored onIBM 1301 disks under a system written for the IBM 7080 usingthe AUTOCODER language were converted to two target filessuitable for loading into IMS/VS. Much of the data wasmodified from one internal coding scheme to another. Theconversion required changing multiple source files tomultiple target files.

The conventional conversion took seven months versusfive months for the generalized approach, a productivityimprovement of roughly thirty percent. Time for adaptingthe translator, learning the DDL, and adapting to a newoperating system is included in the five month figure.Without these, an estimate of three months was made for theconversion using the generalized approach.

The SDC Translator The translator described in [R3,4] I

was T¥pl emenTecI during 1 975-1 976 . The translator couldhandle single, hierarchical files from any of three localsystems--TDMS , a hierarchical system which fully invertsfiles; DS/2, a system which partially inverts files; andORBIT, a bibliographic system which maintains keys andabstracts. Data Bases were converted from TDMS to ORBIT,from TDMS to DS/2, and vice-versa, and from sequential filesto ORBIT. TDMS files were unloaded using an unload utility.Target data bases were loaded by target system loadu t i 1 i t i e s .

The total effort for design and implementation wasabout three man-years. The system was implemented in

assembly language on an IBM/370 Model 168, and occupiedabout 40 K-bytes, not including buffer space which could bevaried. The largest file tested was on the order of 5

million characters and the total conversion time was about 1 ^

minute of CPU time per 2.5 megabytes of data. ^

The work was discontinued in 1976. ^

i

The Honeywel 1 Translator The prototype file translator 5

developed at Honeywell by Bakkom and Behymer [DTll]\

performed file conversions (one file to one file) among!

files from IBM, Honeywell 6000, Honeywell 2000, Honeywell8200 sequential and indexed sequential files. Data types of

I

fields could be changed as well as field justification andfalignment. New fields could be added to a record and fields >

could be permuted within a record. File record format!(fixed, variable, blocked, etc.) could be changed and a'compare utility was available for checking the consistencyfof files with different field organizations and encodings. /

-130-

Tests of up to 10,000 records were run. Performance of 15milliseconds per record was typical (Honeywell Series 6000Model 6080 computer). The prototype has been used in a

conversi on/ benchmark environment but has not been offeredcommerc i al 1 y

.

The Michigan Translator Version IIB, Release 1.1 of theMichigan Translator was completed for the DefenseCommunications Agency in October 1977 [UT16]. It offerscomplete c on v er s i o n/ rest r uc t ur i ng facilities for users ofHoneywell sequential, ISP, or I-D-S/I files. Up to fivesource data bases of any type may be merged, restructured orotherwise reorganized into as many as five target databases, all within a single translation. Data Basedescription is accomplished by minor extensions to existingI-D-S DDL statements. Restructuring specification is easilyindicated via a high level language. Tests performed todate included a conversion of a 150,000 record I-D-S/I database with a total elapsed time of 24 hours (500 millisecondsper record). A given translation can be broken off at anypoint to permit efficient utilization of limited resourcesand also protect against system failures. The user isprovided with the capability of monitoring translationprogress in real time.

XPRS Test cases with the XPRS system have focussed onfunctionally duplicating earlier real conversions done byconventional methods. Several cases have been programmed.Each case involved at least two input files. Generally,there was a requirement to select some instances from onefile, match with instances in another file, eliminate someredundant or unwanted data, and build up a new hierarchicalstructure in the output. In several cases there was a needfor conditional actions based on flags within the data. In

all cases, the XPRS languages were found to be functionallyadequate to replicate the conversion. A productivity gainof at. least fifty percent in total analysis, coding, anddebugging time was achieved. Test runs were conducted onseveral thousand records. Performance was deemed adequatein that XPRS can restructure data at least as fast as it canbe delivered from direct access storage. No detailedperformance comparisons were made comparing XPRS-generatedprograms with custom written programs.

Questions Remaining To Be An swered . Given that severalprototype data translation systems are operational in a

laboratory environment, there is a little questionconcerning the technical feasibility of building generalizedsystems. The remaining questions pertain to the use of a

generalized system in "real world" data conversionsinvolving a wide variety of data structures, very large datavolumes, and significant numbers of people. Three major

-131-

questions to be resolved are:

1. Are the generalized systems functionally completeenough to be used in real conversions, and if not,what will it take to make them functionallycompl ete?

2. Can the people involved in data conversions use thelanguages? What additional features are necessaryto enhance usability?

3. Overall, what is the productivity gain availablewith the generalized approach?

Within the next year, prototype systems will beexercised on a variety of real-world problems in datatranslation, and concrete answers to these questions shouldbe available. The systems being further tested for cost-effectiveness are the Michigan Data Translator, the IBM XPRSsystem, and the Bell Laboratories ADAPT system.

To date, preliminary results have been promising. Asignificant sample size on which to do analysis ofproductivity gain should be available at the end of the yearof testing.

A number of factors must be taken into account inmeasuring the cost-effectiveness of the generalized datatranslator versus the conventional conversion approach.These factors include:

ease of learning and using the higher levellanguages which drive the generalized translators;

availability of functional capability to accomplishreal-world data conversion applications within thegeneralized translators;

overall machine efficiency;

correctness of results from the conversion;

ability to respond in timely fashion to changes in

conversion requirements (conversion program^"maintenance");

debugging costs;

-132-

ability to provide "bridge back" of converted datato old applications;

ability to provide verification of correctness ofdata conversion;

capabilities for detection and control of dataerrors.

The languages used to drive generalized datatranslators are high-level and non-procedural; they providea "user-friendly" interface to the translators. Since thelanguages are high-level, programs written in them have a

better chance of being correct. Experience to date withDEFINE and CONVERT, the languages which drive XPRS, hasshown that users can learn these languages within a week; ithas also shown that some practice is necessary before usersstart thinking about their conversion problem in non-procedural rather than procedural terms.

In early test cases, the languages which drivegeneralized data translators have been found to befunctionally adequate for many common cases. In those caseslacking a feature, a "user hook" facility is often provided.However, forcing a user to revert to a programming language"hook" defeats the purpose of the high level approach, andinterfacing the hook to the system requires at least someknowledge of system interfaces. Thus, high level languagesmust cover the vast majority of the cases in order tosucceed; otherwise, users will perceive little differenceover conventional approaches.

Facilities for detecting and controlling data errors inthe generalized systemsare very important, and most of theprototypes do not yet do a complete job in this area.However, the generalized packages offer an opportunity forgeneralized, high level methods for dealing with data errorsduring conversion, and it could well be that once theseerror packages are developed, they will contribute to evenlarger productivity gains than have been experienced todate

.

The high-level language approach to driving generalizedtranslators should provide the ability to respond to changesin conversion requirements with relative ease. Since largeconversions often take one or more years, it is not unusualfor the target data base design to change or for newrequirements to be placed on the conversion system. In

other words, in a large conversion effort, the programs arenot as "one shot" as is commonly believed. In largeconversions, the savings in conversion program maintenancecould be significant.

-133-

Generalized systems can also be used to map target databack to the old source data form, assuming the originalconversion was information-preserving. This capabilityprovides a means for verifying the correctness of the dataconversion. In addition, this capability can be used as a

"bridge back" to allow users to continue to run programswhich have not yet been converted against the data in theold format. Using a generalized system in this way allowsphased conversion of programs without impacting user needsduring the conversion period.

In an environment where a generalized translator is

used regularly as a tool for conversion, costs associatedwith the debugging phase should be decreased. Common,debugged functional capabilities will be utilized, whereasit is unusual in the conventional approach for commonconversion modules to be developed. Thus, each newconversion system requires debugging.

U s a b i 1 i t

y

The usability of generalized data translationsystems must also be evaluated. Experience to dateindicates that the languages are easy to learn and use.However, it would be wrong to think that these prototypesare mature software products or that they can be used in allconversions. This section discusses some of the unansweredquestions with respect to usability of the current dataconversion systems.

One question concerns the level of users of thegeneralized languages. Current prototypes have been used byapplication specialists and/or members of a data basesupport group. The systems have not yet been used byprogrammers, and the question remains whether programmers(as opposed to more senior application specialists andanalysts) will be able to use the systems productively.There is no negative data on this point; the systems havenot been used widely enough.

At present, all the systems require a user to describeexplicitly the source data to be accessed by the read stepusing a special data description language. These datadescription languages are generally easy to learn and use;they resemble statements in the COBOL Data Division.However, the writing of the description is a manual processwhich can be tedious because a person may have to describe a*file with hundreds of fields. Ideally, a data conversionsystem should be able to make use of an existing datadescription, such as those existing in a data dictionary ora system-COBOL macro library. As evidenced by the MichiganData Translator [DT16], it is reasonable to expect that suchan interface will be available as data conversion systemsevolve. Note, however, that a data dictionary or COBOL

-134-

macro library link may not necessarily solve the problem.Data in current systems is not always fully enough definedto be converted. This is especially true with non-data basefiles. In these files, data definition often is embedded in

the record structures of the programs, and a full definitiondepends on a knowledge of the procedural program logic.Even with existing data bases, some fields and associationsmay not be fully defined within the system data basedescription. Thus, the user can expect a certain amount ofmanual effort in developing data definitions. If existingdocumentation is incomplete, this can be a time consumingtask, though it probably must be done regardless of whethera generalized package is used or not.

Another area where a user may have to expend effort isin the unload step of the data conversion process. The datadescription languages used to drive the read step have a

limited ability to deal with data at a level close to thehardware (e.g., pointers, storage allocation bit maps,etc.). Generally one assumes that a system utility programcan be used to unload source data and remove the morecomplex internal structures. Another alternative is to runthe read step on top of an existing access method or database management system with the accessing software removingthe more complex, machine dependent structures. Whileacceptable alternatives in a great many environments,including most COBOL environments, some cases may existwhere neither approach will work. For example, a

load/unload utility may not exist, or a file with embeddedpointers which was accessed directly by an assembly languageprogram might not be under the control of an access method.For these cases, the user is faced with complexity duringthe unload step. The complexity associated with accessingthe data would appear to be a factor for either theconventional methods or for the generalized approach.However, in cases such as those above, some special purposesoftware may have to be developed. Note that some research[SL8] has examined the difficulty of extending datadescription languages to deal directly with these morecomplex cases and has concluded that providing the datadescription language with capabilities to deal with more

i complex data structures greatly complicates theimpletnentation and has an adverse affect on usability.

jThus, special purpose unload programs will continue to berequired to deal with some files.

I Analysis of Architectures . This section discusses some!of the different approaches that have been taken in

i impl ement i ng the prototype data conversion systems. The[objective is to analyze some of the performance andusability issues raised by the prototypes.

-135-

Two approaches have been used in the prototypes--agenerative approach in the Penn Translator and XPRS, and aninterpretive approach in the Michigan Data Translator. In

the generative approach, a description of the input files,output files, and restructuring operations is fed to a

program generator. From these descriptions, special purposeprograms are generated to accomplish the describedconversion. In both the Penn Translator and XPRS, PL/1 isthe target language for the generator. The generated PL/1programs are then compiled and run. In the interpretiveapproach, tables are built from the data and/orrestructuring description. These tables are theninterpreted to carry out the data conversion.

In data conversion systems, as in other software, animplementation based on interpretation can be expected torun considerably more slowly than one based on generationand compilation. Initial experience with prototype datatranslators has shown that there is much repetitive work,strategies for which can be decided at programcompi 1 ati on/ generat i on time. Also, there is a good deal oflow level data handling, such as item type conversions.Thus, those implementations based largely on an interpretiveapproach run more slowly, and the ability to vary bindingsat run time does not appear to be necessary. Interpretationwas chosen in the prototypes for ease of implementation, andin the future it can be expected that a compilation-basedapproach or a mixture of compilation with interpretationwill be the dominant implementation architecture. However,for medium scale data bases, the machine requirements of theinterpretive data conversion prototypes are notunreasonable, and overall productivity gains are stillpossi bl e

.

Performance measurements with conversion systems basedon the generative approach indicate that generalized systemscan be quite competitive with customized programs. In onecase, the program generated by the data conversion systemran slightly faster than a "customized" program which hadbeen written to do the same job. However, this examplecould well be the exception and it would be naive to expectthis in general. The reason generalized packages can becompetitive is that they often have internal algorithmswhich can plan access strategies to minimize I/O transferand/or multiple passes over the source data. "Customized"conversion programs written in a conventional programminglanguage often are not carefully optimized, since theexpectation is that the programs will be discarded when theconversion is done.

-136-

A second architectural difference involves the use ofan underlying DBMS or not. In both the Penn Translator andXPRS, the generated PL/1 program, then executing, accessessequential files, performs the restructuring, and writessequential files. On the other hand, the Michigan DataTranslator functions as an application program running on a

network structured data base management system. Thus, theinterpreter makes calls to the underlying DBMS to retrievedata during restructuring and puts restructured data intothe new data base.

The two approaches offer different tradeoffs. Forexample, the Michigan Data Translator can make use of theexisting extraction capabilities of a DBMS and performpartial translations easily. In addition, since it operatesdirectly within the network data model, a user does not haveto think of "unloading" data to a file model and thenreloading it back; rather, the user describes a network tonetwork restructuring much more directly.

On the other hand, when converting non-data base datato a data base, the use of an underlying DBMS as part of a

data translator implies a second order data conversionproblem--the non-data base data must be converted into theDBMS of the data conversion system, which may or may not bedifficult. It can be difficult, for example, when the dataimodel of the data being converted differs significantly from|the data model of the DBMS upon which the conversion system|is based. Also, the use of an underlying DBMS may also^require more on-line storage, whereas the file orientedconversion systems can be made to run tape-to-tape. ThisIjcan be important in very large data base conversions.

In the future, one can expect that data conversionsystems will offer a variety of interfaces to accommodatevarious kinds of conversion situations. For example, it isjpossible to interface the " f i 1 e- ori ented " conversion systems|to run as application programs on top of existing data basemanagement systems. It is also possible to develop "readerprograms" to load non-data base data into conversion systemsIbased on a DBMS. In addition, more automated interfaces toIdata dictionary packages can be expected in order to improveusability and obviate the need for multiple data|d e f i n i t i 0 n s .

jiOne possible performance problem with generalized

[conversion systems lies in the unload phase. For reasons ofjiusability, generalized conversion systems usually rely on anlunload utility program to access the source data, thusisolating the conversion package from highly system specificdata. A potential problem with this approach is that theiunload package may not make good use of existing access

-137-

paths or may tend to access the source data in a fashionwhich assumes that the data has recently been reorganized(with respect to overflow areas, etc.)* In cases where thedata is badly disorganized, a customized unload programwhich accessed the data at a lower level might runconsiderably faster, and for very large data bases might bethe only feasible way to unload the data. It is not clearhow common this case is, and one can usually make theargument that the "special" unload software could beinterfaced to the generalized package. However, from a

practical standpoint, the unloading phase on a very large,badly disorganized data base is a performance unknown, andmore sophisticated unload utilities may have to be developedas part of the generalized packages.

Summary Detailed performance and productivity figuresfor major conversions should be available in about one year.Expectations are that machine efficiency of the generalizedpackages based on a generation/compilation approach will beacceptable (no worse than a factor of 2) when compared withconventional conversion programs. Additional enhancementsto improve usability can be expected, especially in theareas of data error detection and control and interfaces todata dictionary software. If the savings in conversionprogram analysis and coding times--often fifty percent ormore--are confirmed, then the generalized conversion systemswill be ready for extensive use."

6.3 OTHER FACTORS AFFECTING CONVERSION

In this section we look at the conversion problem fromtwo aspects. First, we address the que st i on- -What can we dotoday to lessen the impact of a future conversion? Second,we look to the future to see what effects future technologyand standards will have on the conversion process.

6 . 3 . 1 Le s sen i ng the Conversion Effort. In order to identifyguidelines for both reducing the need for conversion and fors i mpl i f yi ng conversions which are required, one mustconsider the entire application software development cyclebecause poor application design, poor logical data basedesign, inadequate use of and inappropriate DBMS selectioncould each lead to an environment which may prematurelyrequire an application upgrade or redesign. This redesigncould, in many cases, require a major data base conversionef f 0 rt

,

-138"

The set of guidelines specified below is not intendedas a panacea. Instead, it is meant to make designers awareof strategies which make intelligent use of currenttechnology. It is doubtful that all conversions could beavoided if a project adhered strictly to these proposedguideliines. However, adherence to the principles set forthby these guidelines could certainly reduce the probabilityof conversion and, more importantly, simplify theconversions that are required.

With respect to application design and implementation,the more the application is shielded from system softwareand hardware implementation details, the easier it becomesfor a conversion to take place. For example, a goodsequential access method hides the difference between tapes,disks, and drums from the application programs which use theaccess method.

The logical data base design should be specified with a

clear understanding of the information environment. A goodlogical data base design reduces the need to restructurebecause it actually models the environment it is meant toserve. Introduction of data dependencies in the datastructure should, if possible, be kept to a minimum. Ananalysis of the tradeoffs between system performance andlikelihood of conversion should definitely be made.

Selecting the wrong or non-optimal data base managementsystem, given the application requirements, is also a keyproblem which can lead to unnecessary and large conversionefforts. The prospective user of a DBMS should, forexample, carefully evaluate the data independencecharacteristics of a proposed DBMS.

The underlying principle of the guidelines which followis that decisions can be made at the system design andimplementation stages which are crucial to the stability oftheap plications.

Appl i cat i on Design G u i d e 1 i n e s .

Requirements Analysis Many of the decisions made duringthe requirements anal ysi s stage of system development affectthe long-term effectiveness of the application system (database design as well as application programs). Questionssuch as what functions does the application require, whowill the data base serve and how will they use the database, what are the possible future uses of the data, andwhat dre the performance constraints of the application areanswered at this stage. It is essential that the designerunderstand the information environment as much as possibleat the outset in order to lessen the probability that

-139-

frequent conversions will be necessary.

Requirements analysis should focus on information needsand should minimize constraints being imposed by thephysical environment since it can distort the designer'sview of the application system's true objectives. Theinfluence of the physical environment should be consideredsecondarily, in order that the designer be fully aware ofthe resulting compromises to the logical requirements. Thisis not intended to imply that consideration of the physicalenvironment is unimportant. Indeed, if the physicalenvironment is ignored, the effect could be development of a

set of requirements that are impossible to meet withinexisting physical and cost constraints.

Program Design G u i d e 1 i n e s Three underlying principlesmotivate this discussion o7 application program design.They are:

design for maintainability

design for the application

data independence

Keeping sight Of all ofapplication program willrendering the application asconsi derat i ons .

these during the design of thelessen conversion effects byfree as possible from physical

Designing for maintainability implies that theapplication should be written in a high-level language witha syntax that permits good program structure. Structuredprogramming techniques such as top-down program design andimplementation should be used throughout. The system shouldbe modular with relatively small, functionally orientedprograms. The programs should all be well commented andorganized for readability. Design reviews and programwalkthroughs also help to expose errors in the overalldesign and "holes" in the application logic at an earlystage. It has been well documented that these steps helpease making program modifications.

One error which is often made in designing programs ina DBMS environment is to let the capabilities of the DBMSdrive the design rather than the application. This designerror can yield programs which are unnecessarily dependentupon the features of a specific DBMS. For example, inSystem 2000 one can use a tree to represent a many-to-manyrelationship instead of using the LINK feature. Theparent/child dichotomy that results is an efficient but

-140-

arbitrary contrivance that cannot easily be undone later on.The key principle here is to concentrate on what results aredesired rather than on the implementation details ofachieving these results. Simplicity and generalization ofthe design will provide a very high level of interface tothe application programmer which will, in turn, minimize thetotal amount of software, provide the greatest degree ofportability, maintainability, devide independence, and datai ndependence

.

Of extreme importance in program design is the notionof data independence; i.e., insulating the applicationprogram from the way the data is physically stored.

Layered Design

In the area of application design, the motivatingfactor for mitigating the effects of a conversion is toinsulate logical operations from physical operations. Oneof the concepts applied to achieve this is layered design.That is, designing the application as a series of layers,each of which communicates with the system at a differentlevel of abstraction. One can visualize this as an "onion,"with hardware as its core and layers of successively moresophisticated software at the outer layers. The userinteracts with the outermost skin of the onion, at thehighest level of abstraction.

If application programs are written at the outermostlayers of the onion, then these programs are smaller, easierto understand and, therefore, easier to modify or convertthan programs written at lower layers. For example,introduction of a new mainframe will require conversion ofthe software which references the particulars of themainframe. However, since the layers are constructed sothat physical machine and device independence is realizedabove some level, only the software below that level issubject to modification. To the extent that applicationprograms stay at the outermost layers (i.e., above thecritical layer) reduced conversion effects can be achieved.

We can thus summarize the goal of program design asf ol 1 ows

:

to provide the highest possible applicationinterface to the program

to maximize program independence from thecharacteristics of the mainframe, peripherals, anddata base organization

-141-

to maximize portability of the application programthrough the use of high-level languages

to maintain a clean program/data interface

Prog rammi ng Techn i ques

The previous sections of this chapter have focused onthe design decisions which should be made to alleviate theconversion problem. However, regardless of how noble thesegoals are, poor implementation decisions can go a long waytowards diminishing the returns of a good design. Equallyimportant to intelligent design is a set of programmingtechniques and standards which prohibit programmers fromintroducing dependencies in code. For example, a "clever"programmer may introduce a word size dependency in a programby using right and left shifts to effect multiplication anddivision. Of course, there are no hard and fast safeguardsagainst using tricky coding techniques; an effort must bemade to make the programmer conscious of the consequences ofthis kind of coding. In particular, a programmer should notbe allowed to jump across layers of the onion, such as usingan access method to read or write directly data bases.

Data Base Design . Perhaps the most costly mistake a

designer can make is an error in the data base designbecause it has a direct effect on the information that isderivable and the application programs that are created.Incorrect or unanticipated requirements can lead either toinformation deficient data bases or overly complex andgeneral design. An inadequate logical design has thepotential for complex user interfaces or extremely longaccess time. A poor physical design can lead to highmaintenace and performance costs. Unfortunately, data basedesign is still an art at the present time. Two surveysreport the results in the area to date. Novak and Fry[DL26] survey the current logical data base designmethodology and Chen and Yao [DL34] review data base designin general. The work of Bubenko [DL31] in the developmentof the CADIS system and the abstraction and generalizationtechniques of Smith and Smith [DL29,30] show promise.

An accurate logical design can still be unnecessarilydata dependent. Dependencies are inadvertently ordeliberately introduced in the interest of improving systemperformance. In essence, "purity" is compromised to gainprocessing efficiences. Since optimization is a worthwhilegoal, insisting on absolute purity may be unreasonable.However, the data base designer should at least be aware ofcontrivances and, therefore, be in a position to evaluatethe relative effects a design decision may have. Designersshould become sensitive to their decisions by asking: "How

-142-

will the data model be affected by a future change inperformance requirements? Have I done a reasonable job ininsulating applications from data structure elements thatare motivated strictly by performance considerations?"

Some examples of induced data dependencies in logicaldata base design which may impact upon conversion are:

The use of owner-coupled sets in DBTG to implementperformance-oriented index structures or orderingson records.

Storing physical pointers (or data base keys) in aninformation field of a record

Combining segment types (in DL/1) to reduce theamount of I/O required to traverse a data base.

DBMS Ut i 1 i za t i on and Selection . Selection of a DBMScan have a major impact on conversion requirements. Ofimportance in evaluating a DBMS is to consider productsexhibiting the highest level user interface.

A high level DBMS is characterized by both a powerfulset of functions and a high degree of data independence fromthe point of view of the application. With respect tofunctions, that is, the DML, the distinction between "highlevel" and "low level" has traditionally centered on whetherthe DBMS provides user operations on sets of records(select, retrieve, update, or summarize a 1

1

the records ortuples which satisfy some conditions) or whether one isrestricted to record-at-a-time processing ("navigation").The DBMS with the "high-level" set operation approach issignificantly more desirable than the navigational record-by-record approach.

DBMS prospects should evaluate the data independence1 characteristics of a proposed product. Systems are! preferred which support an "external schema" or "subschema"I

feature which permits the record image in the applicationprogram (the user work area) to differ significantly fromthe data base format. However, the subschema concept is

,

only one aspect of data independence. In general, it is

necessary to determine in what ways and to what extent theapplication interface is insulated from performance or

jinternal format options. For instance, will programs have

I

to be modified if:

-143-

lii

a decision is made to add or delete an index?

the amount of space allocated to an item is

increased or decreased?

chains are replaced by pointer arrays?

Other conversion related questions about DBMS productsinclude the following:

Are there adequate performance and formattingalternatives? Are there too many (i.e.,unproductive or incomprehensible) tuning options?Are there adequate performance measurementtechniques and tools to guide the exercise of thesechoices?

Does the system automatically convert a populateddata base when a new format option is selected?

Aside from tuning, does the DBMS gracefullyaccommodate at least simple external changes such asadding or deleting a record or item type?

Are there other useful high level facilitiesassociated with or based on the DBMS, such as a

report writer, query processor, data dictionary,transaction monitor, accounting system, payrollsystem, etc.?

Is there a utility for translating the data baseinto an "interchange form;" i.e., a machineindependent, serial stream of characters?

Is the vendor committed to maintaining the productacross new operating system and hardwarereleases/upgrades? Conversely, is the vendorprepared to support the product in order released ofthe operating system, so the user will not be forcedto upgrade?

What hardware environments are currently supportedand what is the vendor's policy regarding conversionto another manufacturer's mainframe?

What programming language interfaces are available?Can the same DBMS features be used if there is a

migration, say, from COBOL to PL/1?

-144-

How intelligent is the system's technique fororganizing data on the media? Specifically, willperformance deteriorate at an inordinate rate asupdating proceeds? How often will reorganization(cleanup) be required? Does the DBMS have a built-in reorganization utility? How does the userdetermine the optimal time to reorganize?

Are the language facilities and data modelingfacilities of DBMS adequate for the anticipated longterm requirements of the enterprise? What is therisk of having to convert to a new DBMS?

Likewise, are the performance characteristics andinternal storage structure limitations adequate tomeet the long term requirements (response times,data base sizes) of the enterprise?

Are there facilities to assist the user inconverting data from a non-DBMS environment or fromanother DBMS? For instance, can a data base beloaded from one or more user defined files?

6.3.2 Future Technologies/Standards Impact.

In this section we discuss trends in computer hardwaretechnologies, DBMS software directions, and standardsdevelopment, and consider their impact on data and programconversion. We intend to make the reader aware of what toexpect in terms of conversion problems rather than give a

complete assessment of future technologies. Therefore, wediscuss only technologies and standards that will impactconversion problems.

The first three parts discuss the areas of hardware,software, and standards and their impact on conversion in

some detail. The last part summarizes the major points ofour assessment without going into detailed reasoning.

Ha rdwa re and Archi tect ural Technologies . The cost andperformance of processor 1 og i c and memory continue toimprove at a fast rate. As a result, overhead costs aremore acceptable, especially when such costs save people'stime and work, and provide user oriented functions that donot require a computer expert. In particular, one can nowthink about using generalized conversion tools not only whenit is required as a result of hardware or software changes,but also as a result of a changing application that requiresa new more efficient data base organization. What couldhave been a prohibitive cost for a data base conversion inthe past, may not be a major factor in the future.

-145-

At the same time, the cost/ performance improvementcontributes to the proliferation of data bases and thereforeaccentuates the need of generalized conversion tools. Themore cost effective is the process of accessing andmaintaining data, the more data is collected on computers.Improvements in hardware (as well as software) technologiescreate more need for data and program conversion. Inaddition, the emergence of new technologies, such ascommunication networks, add another level of sophisticationto the way that data can be organized and used. Distributeddata bases, where multiple data bases (or subsets of databases) may reside on different machines, require tools forthe integration and the correlation of data. Invariably,data will need to move from system to system dynamically,possibly moving between different hardwa re/ softwa re systems.In this environment, generalized tools for dynamicconversion will become a necessity.

In recent years, two promising approaches to datamanagement hardware technologies have been pursued. One isthe specialized data management machine and the other is thebackend data management machine. As will be explained next,both approaches can help simplify the conversion problem.

The specialized data management hardware is based onthe idea of using some kind of an associative memory device,a device that can perform a parallel access to the databased on its content. Such a device eliminates thenecessity for organizing the internal structure of a database using indexes, hash tables, pointer structures , ^etc .

,

which are primarily used for fast access. As a result, thedata can be essentially stored in its external logical form,and the data management system can use a high level languagebased on the logical data structure only. The conversionprocess is simplified since data is readily available in itslogical organization. Referring to the terminology used in

previous sections, the functions of unloading and loading ofthe data base can be greatly simplified. Also, norestructuring will be required because of a change in database use, since the physical data base organization can beto a large degree independent of its intended use. In

addition, the program conversion problem is simplified as a

result of the program interfacing to the DBMS using a highlevel logical language.

Similar benefits can be achieved if backend machinesare used. A backend machine is a special processordedicated to managing storage and data base on behalf of a

host computer. The primary motive for the backend machineis to off-load the data management function from the host toa specialized machine that can execute this function at muchlower cost. From a conversion standpoint, the separation of

-146-

data management functions from the host promotes the needfor a high level logical interface that provides theadvantages discussed above. Another advantage is that it is

possible to migrate from one host machine to another withoutaffecting the data bases and their management, alleviatingthe need for data conversion if the same backend machine isused with the new host.

Mass storage devices, such as the video disks, makestoring very large data bases, in the order of 10 to the10th power characters, cost effective. Converting largedata bases of this size compounds the cost considerationsmerely by the processing of this large amount of data. As a

result, such data bases will tend to stay in the sameenvironment for longer periods of time. The use ofspecialized data management machines or dedicated backendmachines in conjunction with these mass storage devices canhelp postpone the need for data base conversion.

Finally, we should mention the growing use ofminicomputers in supporting data management functions.DBMSs now exist on many minicomputers, with moreforthcoming. The proliferation of minicomputers whichsupport data bases can only increase the needs forgeneralized conversion tools.

Softwa re Devel opment Trends . Much of the work over thelast years in the data management area has concentrated ontechniques that clearly separate the logical structure ofthe data base from its physical organization. This concept,called "data independence" was introduced to emphasize thatusers need not be exposed to the details of the physicalorganizations of the data base, but only to its logicalrelationships. This led to the development of data accessand manipulation languages that depend on the logical datamodel only. The effect of this trend is similar to that ofusing specialized data management machines and backendmachines discussed previously; namely, the simplification ofthe unload and load functions since the interface to theDBMS is provided at the logical level only, and thesimplification in program conversion for similar reasons.

At the user end of the spectrum, it seems reasonable toI assume that the diversity of data models (network,

relational, hierarchies and other views that may be

I developed in the future) will be required for many moredecades. This is especially true since there are problemareas that seem to map more naturally into a certain model.Furthermore, it is often the case that users do not agree on

i

the same model for a given problem area. Obviously, this

I

state of affairs only accentuates the need to generalize1 conversion tools that can restructure data bases from one

-147-

model to another. Even with the development of large scaleassociative memories, data structures will likely provideeconomic rationales for their contrived use. Anotherpossibility is the use of a common underlying data modelthat can accommodate any of the user views. However, thisapproach will still require some type of a dynamicconversion process between the common view and each of thepossible user views.

Standards Devel opment . There is much work andcontroversy in developing standards for DBMS. Standardsthat are oriented to determine the nature of the DBMS arehard to bring about even in a highly controlled environmentbecause of previous investment in application software anddata base development, and because of disagreement. Forexample, there is still much controversy whether the networkmodel proposed by the CODASYL committee is a proper one. It

seems reasonable to assume that there will always be non-standard DBMSs. Further, even if such a standard can beadopted, different DBMS implementations will still exist,resulting in different physical data bases for the samelogical data base. In addition, one can safely assume thatrestructuring because of application needs will still benecessary, and that changes in the standard itself mayrequire conversion.

A standard that is more likely to be accepted is onethat affects only the way of interfacing to a DBMS. In

particular, from a conversion standpoint, a standardinterchange data form (SIDF) will be most useful. A SIDF isa format not unlike a load format for DBMSs. Any advancedDBMS has a load utility that requires sequential data streamin a pre- spec i fi ed format. If a standard for this formatcan be agreed upon, and if all DBMSs can load and unloadfrom and to this format, then the need for reformatting (asdescribed earlier) is eliminated. The conversion processcan be reduced to essentially restructuring only, given thatunload and load are part of the DBMS function. A

preliminary proposal for such a standard was developed bythe ERDA Inter-Working Group on Data Exchange (IWGDE) [GG2].However, it is only designed to accommodate hierarchicalstructures. Consideration is now being given to theextension of the standard to accommodate more generalstructures (i.e., networks and relations). We believe thatthere are no technical barriers to the development of a

SiDF, and that putting such a standard to use wouldalleviate a major part of the data conversion process.

Summary . The rationale for the points summarized belowappear in the previous parts of this section. We will onlystate here our assessment of the impact on conversionpr ob 1 ems

.

-148-

Hardware development will increase the need forgeneralized conversion tools (in particular,proliferation of minicomputers, computer networks,and mass storage devices).

The reduction in hardware costs will make conversioncosts more acceptable.

Special hardware DBMS machines will simplify theconversion process (in particular, for load, unloadfunctions, and program conversion) because theypromote interfacing at the logical level.

Software advances will not eliminate the need forconversion but can simplify the conversion processin a way similar to DBMS machines.

Multiplicity of logical models is likely to exist,thus adding to the need of conversion tools betweenmodel s

.

Standards will not eliminate the conversion problem.Even with a standard, the implementations would bedifferent and non-standard DBMS will likely exist.

Standards can greatly simplify the conversionproblem. In particular, a standard for interchangedata form will simplify load and unload function andeliminate reformatting.

What can we expect in the next five years and beyond in

I

the data base conversion area? The state of the art has1

advanced enough to give hope for generalized tools. Withinthe next five years we can expect more generalizedconversion systems to become operational but some additionalwork would be required for moving them from one environmentto another. We can expect to have a standard form developedand agreed upon. It will probably take longer beforemanufacturers will see the benefit of adopting a standard

I

form and provide load and unload facilities using it.

j

However, we can expect them to provide some conversion toolsto convert data bases from other systems to their own. Itwill probably take as much as ten years before thecommercial availability of a generalized converter and theadherence of manufacturers to a standard interchange form.Another area of concern is the application programconversion resulting from a data base conversion. We cannotexpect th^t a generalized solution for this problem will beachieved within the next five years. However, this problemwill be simplified to a large extent as hardware andsoftware development trends promote interfacing at the

j|1 ogi cal 1 evel .

-149-

BIBL lOGRAPHY

(DL) LOGICAL DATA BASE DESIGN

DL26 NOVAK, D. and FRY, J., "The State of the Art ofLogical Data base Design," Proceedings of the FifthTexas Conference on Compu ti ng Sy s tem s , IEEE, LonqBeach, 1 976 , pp. T^-TT.

DL29 SMITH, J. M., and SMITH, D. C. P., "Data BaseAbstractions: Aggregation and Generalization," ACMTrans act ions on Data Base Systems 2 , 2 ( 1 977 ): 105-1 33 .

DL30 SMITH, J. M., and SMITH, D. C. P., "Data BaseAbstractions: Aggregation," Communications of theACM 20 ,6 ( 1 977 ) :405-1 3 .

^

DL31 BUBENKO, J. A., "lAM: An Inferential AbstractModeling Approach to Design of Conceptual Schema,"Proceedings of the ACM - S I GMOD InternationalConference on Management of Data, ACM, N.Y., 1977,pp. 62-74.

'

DL34 CHEN, P. P., and YAO, S. B., "Design and PerformanceTools for Data Base Systems," Proceedings of theT h i r d International Conference on Ve ry Large DataBases .

ACM, N.Y., 1977, pp. 3-1^7

(DT) DATA TRANSLATION

DTI FRY, J. P., FRANK, R. L., HERSHEY, E. A., Ill, "ADevelopmental Model for Translation," P roc . 1 972 ACMS I GF I DET Work shop on Data Description , Access andControl , A. L. Dean CeaTT, ACM, N. Y.

, pp. /7-lT!i6T

DT2 SMITH, D. C. P., "A Method for Data Translation Usingthe Stored Data and Definition Task GroupLanguages," Proc . ojf the 1 972 ACM SIGFIDET Work shopon Data Description , Access and Control , ACM , N . Y .

,

pp ."TUT-IY^:

DT3 RAMIREZ, J. A., "Automatic Generation of DataConversion Programs Using a Data DescriptionLanguage (DDL)," Ph.D. dissertation. University ofPennsylvania, 1973.

DT4 MERTEN, A. G., and FRY, J. P., "A Data DescriptionApproach to File Translation," Proc . 1 974 ACM SIGMODWork shop on Data Description , Access and Control ,

ACM, N.Y.,ppT~T91 -205 .

-150-

DT5 HOUSEL, B., LUM, V., and SHU, N., "Architecture to anInteractive Migration Systems (AIMS)," P roc . 1974ACM S I GF I PET Workshop on Data Description , Accessand Control , ACM

,N.Y., ppT~T57-1 69

.

DT6 RAMERIZ, J. A., RIN, N. A., and PRYWES, N. S.,"Automatic Conversion of Data Conversion ProgramsUsing a Data Description Language," P roc . 1 974 ACMS I GF I DET Work shop on Data Description , Access andControl , ACM, N.Y., ppT~?07-22 5 .

DT7 FRANK, R. L., and Yamaguchi, K., "A Model for a

Generalized Data Access Method," P r oc . of the 1974National Computer Conference , AFIPS Press, Mon tval e ,

N. J. , pp . 437-444.

DT8 TAYLOR, R. W., "Generalized Data Structures for DataTranslation," P r oc . T h i r d Texas Conference o

n

Computing Systems, Austin, Texas, 1974, pp.~6-3-lTT.

'~

DT9 UNIVAC, UNIVAC 1100 Series Data File Converter,Programmer Reference UP-8070, Sperry RandCorporation, March, 1974.

DTIO YAMAGUCHI, K., "An Approach to Data Compatibility, A

Generalized Access Method," Ph.D. dissertation. TheUniversity of Michigan, 1975.

DTll BAKKOM, D. E., and BEHYMER, J. A., "Implementation ofa Prototype Generalized File Translator," P roc . 1975ACM S I GMO D International C o n

f

. on Ma n ageme n t ofData , W. F. King (ed.), ATMT^N.TT, pp. ^99-110.

DT12 NAVATHE, S. B., and MERTEN, A. B., "Investigationsinto the Application of the Relation Model of Datato Data Translation," Proc . 1 975 ACM SIGMODInternational C o n f . on Ma n ageme n t o f Data , W . F .

King ( ed . ) , ACM ,N.Y., pp. 123-1 38 .

DT13 BIRSS, E. W., and FRY, J. P., "Generalized Softwarefor Translating Data," Proc . of the 1976 NationalComputer Conference , Vol. 45, AFIPS Press, Montvale,N . J

. , pp . 889-899.

DT14 LUM, V. Y., SHU, N. C, and HOUSEL, B. C, "A GeneralMethodology for Data Conversion and Restructuring,"IBM R & P' journal , Vol. 20, No. 5, 1976, pp.483-497 .

DT15 HOUSEL, B. C, et al .,

"Express: A Data Extraction,Processing, and Restructuring System," Transactions

-151-

on Data Base Systems, 2,2, ACM, N.Y., 1977, pp.

134TTT4 .

DT16 SWARTWOUT, D. E., DEPPE, M. E., and FRY, J. P.,"Operational Software for Restructuring Network DataBases," P r PC . of the 1977 National ComputerConference , VoTT TFT AF I PS Press, Montvale, N.J.,pp. 499-508 .

DT17 GOGUEN, N. H., and KAPLEN, M. M., "An Approach toGeneralized Data Translation: The ADAPT System,"Bell Telephone Laboratories Internal Report, October5, 1977.

(GG) GENERAL

GGl INGALS, D., "The Execution Time Profile as a

Programming Tool in Design and Optimization ofCompilers, ed. by R. Rustin, Prentice Hall, 1972",

pp. 108-128.

GG2 ERDA I nterl abora tory Working Group for Data Exchange(IWGDE) Annual Report for Fiscal Year 1976, NTISLBL-5329.

GG3 DATE, C. J., An Introduction Data Base System,Addi son - Wesl ey , 1975. GG4 CODD, E. F.,"Relational Completeness of Data Base Sublanguage,"In Data Base Systems, Caurant Computer ScienceSymposia Series, Vol. 6, Prentice Hall, 1972.

(M) MODELS-THEORY

Mil CHEN, P.P.S., "The Entity-Relationship Model - Towardsa Unified View of Data," Transactions on Data BaseSystems 1 , 1 ( 1976 ) :9-36 .

(PT) PROGRAM TRANSLATION

PTl SHARE AD-HOC COMMITTEE ON UNIVERSAL LANGUAGES, "TheProblem of Programming Communication with ChangingMachines: A Proposed Solution," Comm . ACM , Aug.,1958, pp. 12-18.

PT2 SHARE AD-HOC COMMITTEE ON UNIVERSAL LANGUAGES, "TheProblem of Programming Communication with ChangingMachines: A Proposed Solution, Part 2," Comm. ACM ,

Sept. , 1958, pp. 9-16.

-152-

PT3 SIBLEY, E. H., and MERTEN, A. G., "Transferability andTranslation of Programs and Data," I nf orma ti onSystems, COINS IV, Plenum Press, N.Y., 1972, pp.291-301

.

PT4 YAMAGUCHI, K., and MERTEN, A. G., "Methodology forTransferring Programs and Data," P roc 1974ACM - S I GF I PET Work shop on Data Description , Accessand Control ,

ACM, N.Y., ppT~T41-l 56

.

PT5 HOUSEL, B. C, LUM, V. Y., and SHU, N., "Architectureto an Interactive Migration System (AIMS)," P r oc .

1 974 ACM - SIGFIDET Work shop on Data Description,

Access and Control,ACM, N.Y., pp. 157-170.

PT6 MEHL, J. W., and WANG, C. P., "A Study of OrderTransformation of Hierarchical Structures in IMSData Bases," Proc . 1 974 ACM - SIGFIDET Workshop oji

Data Description, Access and Control , ACM , RTY. , pp.

TT^lTD"

PT7 HOUSEL, B. C, and HALSTEAD, M. H., "A Methodology forMachine Language Decompilation," Proc . of the 1974ACM Annual Conference , ACM, N.Y., pp. 254-260 .

PT8 HONEYWELL INFORMATION SYSTEMS, "FunctionalSpecification Task 609 Data Base Interface Package,"Defense Communications Agency Contract DCA100-73-C-0055.

PT9 KINTZER, E., "Translating Data Base Procedures," Proc .

1975 ACM National Conference , ACM, N.Y., pp. 359-62.

PTIO SCHINDLER, S., "An Approach to Data Base ApplicationRestructuring," Working Paper 76 ST 2.3, Data BaseSystems Research Group, The University of Michigan,Ann Arbor, Mich. 1976.

PTll DALE, A. G., and DALE, N. B., "Schema and OccurrenceStructure Transformation in Hierarchical Systems,"Proc . 1 976 International Conference on Ma nagemen t ofDa ta , pp. 1 57-168 .

PT12 SU, STANLEY Y. W., "Application Program Conversion Dueto Data Base Changes," Proc . of the 2ndInternational Conference on VL DB , Brussels, Sept.8-10, 1976, pp. 143-157.

PT13 SU, S. Y. W., and LIU, B. J., "A Methodology of

Application Program Analysis and Conversion Based onData Base Semantics," Proceedings of theInternational Conference on Ma nagemen

t

of Data,

-153-

1977, pp. 75-87.

PT14 HOUSEL, B. C, "A Unified Approach to Program and DataConversion," Proceedings of the Third InternationalConference on Ve ry Large Data Bases , ACM , N . Y . ,

1977, pp.

PT15 SU, S. Y. W., and REYNOLDS, M. J., "Conversion ofHigh-Level Sublanguage Queries to Account for DataBase Changes," Proc . o_f NCC_, 1978 , pp. 857-875 .

(R) RESTRUCTURING

Rl FRY, J. P., and JERIS, D., "Towards a Formulation ofData Reorganization," Proc . 1974 ACM / SIGMOD Workshopon Data Description , Access and Control , ed. by R.

Rusti n,ACM, N.Y., pp. 83-10liT~

R2 SHU, N. C, HOUSEL, B. C, and LUM, V. Y., "CONVERT: A

High-Level Translation Definition Language for DataConversion," Comm . ACM 18,10, 1975, pp. 557-567.

R3 SHOSHANI, A., "A L o g i c a 1 -L e v e 1 Approach to Data BaseConversion," Proc . 1 975 ACM / SIGMOD InternationalConf . on Management of Data ,

ACM, N.Y., pp. 112-122.

R4 SHOSHANI, A., and BRANDON, K., "On the Implementationof a Logical Data Base Converter," Proc .

International Conference on Very Large Data Bases ,

ACM, N.Y. , 1 975 , pp. 529-531 .

R5 HOUSEL, B. C, and SHU, N. C, "A High-Level DataManipulation Language for Heirarchical DataStructures," Proc . of the 1976 Conference on DataAbstraction , Definition and Structure , ^1 1 LakeCity, Utah, pp. 155-169.

R6 NAVATHE, S. B., and FRY, J. P., "Restructuring forLarge Data Bases: Three Levels of Abstraction," ACMTransactions on Data Base Systems, 1,2, ACM, N.Y.,1 976

, pp. 138^5F:

R7 NAVATHE, S. B., "A Methodology for Generalized DataBase Restructuring," Ph.D. dissertation. TheUniversity of Michigan, 1976.

R8 GERRITSEN; rob, and MORGAN, HOWARD, "DynamicRestructuring of Data Bases With Generation DataStructures," Proc . of the 1 976 ACM Conference , ACM,N. Y . , pp . 281^2M.

-154-

R9 SWARTWOUT, D., "An Access Path Specification Languagefor Restructuring Network Data Bases," P r oc . of the1 977 SIGMOD Conference, ACM, N.Y., pp. MTTOl .

RIO NAVATHE, S. B., "Schema Analysis for Data BaseRestructuring," P r oc 3rd International Conferenceon Very Large Data Bases , ACM, N.Y., 197T: Toappear i n TODS

.

Rll EDELMAN, J. A., JONES, E. E., LIAW, Y. S., NAZIF, Z.

A., and SCHEIDT, D. L., "REORG - A Data BaseReorgani ze r

,

" Bell Laboratories Internal TechnicalReport ,

Apr i 1 , 1 976 .

(SL) STORED-DATA DEFINITION

SLl STORAGE STRUCTURE DEFINITION LANGUAGE TASK GROUP(SSDLTG) OF CODASYL SYSTEMS COMMITTEE, "Introductionto Storage Structure Definition" (by J. P. Fry);"Informal Definitions for the Development of a

Storage Structure Definition Language" (by W. C.

McGee); "A Procedural Approach to File Translation"(by J. W. Young, Jr.); "Preliminary Discussion of a

General Data to Storage Structure Mapping Languaae"(by E. H. Sibley and R. W. Taylor), Proc . 1970ACM - S I GF I PET Work shop on Data Description , Accessand Control, ed. by E. F. Codd, Houston, Tex., Nov.TT70, pp. 368-80

.

SL2 SMITH, D. C. P., "An Approach to Data Description andConversion," Ph.D. dissertation, Moore School Report72-20, University of Pennsylvania, Philadelphia,Pa., 1972.

SL3 TAYLOR, R. W., "Generalized Data Base ManagementSystem Data Structures and Their Mapping to PhysicalStorage," Ph.D. dissertation. The University ofMichigan, Ann Arbor, Mich., 1971.

SL4 FRY, J. P., SMITH, D. C. P., and TAYLOR, R. W., "AnApproach to Stored-Data Definition and Translation,"Proc . 1 972 ACM - SIGFIDET Work shop on DataDescription , Access and Control , ed. by A. L. Dean,Denver, Colo., Nov. TW72

, pp. 13-55.

SL5 BACHMAN, C. W., "The Evolution of Storage Structures,"Comm . ACM 15,7 (July 1972), pp. 628-34.

SL6 SIBLEY, E. H. and TAYLOR, R. W., "A Data Definitionand Mapping Language," Comm. ACM 16,12 (Dec. 1973),pp. 750-59.

-155-

SL7 HOUSEL, B., SMITH, D., SHU, N., and LUM, V., "Define:A Non-Procedural Data Description Language forDefining Information Easily," P roc . ojf 1 975 ACMPacific Conference , San Francisco, CA, April 1975,pp. 62-7ir

SL8 The Stored-Data Definition and Translation Task Group,"Stored-Data Description and Data Translation: AModel and Language," Information Systems 2,3( 1 977 ) : 95- 148'.

(SM) DATA SEMANTICS

SMI SCHMID, H. A., and SWENSON, J. R., "On the Semanticsof the Relational Model," Proc . , ACM - SIGMOD 1975Conference , May 1 975

, pp. 21 1 -233 .

SM2 ROUSSOPOULOS , H., and MYLOPOULOS, J., "Using SemanticNetworks for Data Base Management," Proc . Very LargeData Base Conference

,Framingham, Mass., Sept. 1 975

,

pp. 144-172.

SM3 SU, STANLEY Y. W., and LO, D. H., "A Multi-levelSemantic Data Model," CAASM Project, TechnicalReport No. 9, Electrical Engineering Dept.,University of Florida, June 1976, pp. 1-29.

(TA) TRANSLATION APPLICATIONS

TAl WINTERS, E. W., and DICKEY, A. F., "A BusinessApplication of Data Translation," Proceedings of the1 976 S I GMOD International Conference on Ma n ageme n

t

of Data , Ed . by J . B"^^ RoThnie, Washington, D.C.,June~T977, pp. 189-196.

(UR) UM RESTRUCTURING

URl LEWIS, K., DRIVER, B., and DEPPE, M., "A TranslationDefinition Language for the Version II Translator,"Working Paper 809, Data Translation Project, TheUniversity of Michigan, Ann Arbor, Michigan, 1975.

UR2 LEWIS, K., and FRY, J., "A Comparison of ThreeTranslation Definition Languages," Working Paper DT5.1, Data Translation Project, The University ofMichigan, Ann Arbor, Michigan, 1975.

UR3 DEPPE, M. E., "A Relational Interface Model for DataBase Restructuring," Technical Report 76 DT 3, Data

-156-

Translation Project, The University of Michigan, AnnArbor, Michigan, 1976.

UR4 DEPPE, M. E., LEWIS, K. H., and SWARTWOUT, D. E.,"Restructuring Network Data Bases: An Overview,"Data Translation Project, Technical Report 76 DT 5,The University of Michigan, Ann Arbor, Michigan,1976 .

UR5 DEPPE, M. E., and LEWIS, K. H., "Data TranslationDefinition Language Reference Manual for Version IIARelease 1," Data Translation Project, Working Paper76 DT 5.2, The University of Michigan, Ann Arbor,Michigan, 1976.

UR6 SWARTWOUT, D. E., MARINE, A. M., and BAKKOM, D. E.,"Partial Restructuring Approach to DataTranslation," Data Translation Project, WorkingPaper 76 DT 8.1, The University of Michigan, AnnArbor, Michigan, 1976.

UR7 SWARTWOUT, D. E., WOLFE, G. J., and BURPEE, C. E.,"Translation Definition Language Reference Manualfor Version IIA Translator, Release 3," DataTranslation Project, Working Paper 77 DT 5.3, TheUniversity of Michigan, Ann Arbor, Michigan, 1977.

(US) UM STORED-DATA DEFINITION

USl DATA TRANSLATION PROJECT, "Stored-Data DefinitionLanguage Reference Manual," The University ofMichigan, Ann Arbor, Michigan, 1972.

US2 DATA TRANSLATION PROJECT, "Revised Stored-DataDefinition Language Reference Manual," TheUniversity of Michigan, Ann Arbor, Michigan, 1974.

US3 DATA TRANSLATION PROJECT, "University of MichiganStored-Data Definition Language Reference Manual forVersion II Translator," The University of Michigan,Ann Arbor, Michigan, 1975.

US4 BIRSS, E. W., and FRY, J. P., "A Comparison of TwoLanguages for Describing Stored Data," DataTranslation Project, Technical Report 76 DTI, TheUniversity of Michigan, Ann Arbor, Michigan, 1976.

(UT) UM TRANSLATION

-157-

UTl "Functional Design Requirements for a Prototype DataTranslator," Data Translation Project, TheUniversity of Michigan, Ann Arbor, Michigan, 1972.

UT2 "Design Specifications of a Prototype DataTranslator," Data Translation Project, TheUniversity of Michigan, Ann Arbor, Michigan, 1972.

UT3 "Program Logic Manual for the University of MichiganPrototype Data Translator," Data TranslationProject, The University of Michigan, Ann Arbor,Michigan, 1973.

UT4 "Users Manuals for the University of MichiganPrototype Data Translator," Data TranslationProject, The University of Michigan, Ann Arbor,Michigan, 1973.

UTS "Functional Design Reauirements of the Version I

Translator," Data Translation Project, TheUniversity of Michigan, Ann Arbor, Michigan, 1973.

UT6 "Program Logic Manual for the University of MichiganVersion I Data Translator," Working Paper 306, DataTranslation Project, The University of Michigan, AnnArbor, Michigan, 1974.

UT7 "Design Specifications: Version II Data Translator,"Working Paper 307, Data Translation Project, TheUniversity of Michigan, Ann Arbor, Michigan, 1975.

UTS BIRSS, E., DEPPE, M., and FRY, J., "Research and DataReorganization Capabilities for the Version IIA DataTranslator," Data Translation Project, TheUniversity of Michigan, Ann Arbor, Michigan, 1975.

UT9 BIRSS, E., et al., "Program Logic Manual for theVersion IIA Data Translator," Working Paper 76 DT3.1, Data Translation Projects, The University ofMichigan, Ann Arbor, Michigan, 1976.

UTIO BODWIN, J., et al . , "Data Translator Version IIARelease 1 User Manual," Working Paper 76 DT 3.2,Data Translation Project, The University ofMichigan, Ann Arbor, Michigan, 1976.

UTll BODWIN, J., et al. , "Data Translator Version IIA

Release 2 User Manual," Working Paper 76 DT 3.4,Data Base Systems Research Group, The University ofMichigan, Ann Arbor, Michigan, 1976.

UT12 KINTZER, E., et al. , "Michigan Data Translator Version

-158-

IIB Release 1 User Manual," Technical Paper 77 DT 8,Data Base Systems Research Group, The University ofMichigan, Ann Arbor, Michigan, 1977.

UT13 BURPEE, C. E., et al. , "Michigan Translator Program

Logic Manual Version IIB Release 1," Working Paper77 DT 3.7, Data Base Systems Research Group, TheUniversity of Michigan, Ann Arbor, Michigan, 1977.

UT14 BAKKOM, D., et al. ,

"Speci f ications -f or a GeneralizedReader and Interchange Form," Working Paper 77 DT6.2, Data Base Systems Research Group, TheUniversity of Michigan, Ann Arbor, Michigan, 1977.

UT15 DeSMITH, D., and HUTCHINS, L., "Michigan DataTranslator Design Specifications Version IIB,"Working Paper 77 DT 3.8, Data Base Systems ResearchGroup, The University of Michigan, Ann Arbor,Michigan, 1977.

UT16 KINTZER, E., et al. ,

"Michigan Data Translator VersionIIB Release 1.1 User Manual," Technical Paper 77 DT8.1, Data Base Systems Research Group, TheUniversity of Michigan, Ann Arbor, Michigan, 1977.

UT17 BAKKOM, D. E., and Schindler, S. J., "OperationalCapabilities for Data Base Conversion andRestructuring," Technical Report 77 DT 6, Data BaseSystems Research Group, The University of Michigan,Ann Arbor, Mich., 1977.

(Z) RELATIONAL SYSTEM

Z5 CHAMBERLAIN, D. C. and BOYCE, R. F., "SEQUEL: A

Structured English Query Language," Proceedings ofthe ACM - S I GMOD Workshop on Data Descn pti on , AccessTrTa" Control

, ACM, N. Y. , T^lT.

i

-159-

7. PARTICIPANTS

The following is a list of attendees, participantsand contributors to the workshop.

Edward ArvelConversion Experiences

Data Sciences Group890 National Press BuildingWashington^ D.C. 20045

Marty AronoffManagement Objectives

National Bureau of StandardsTech B258Washington, D.C. 20234

Robert BemerStandards

Honeywell Information SystemsP.O. Box 6000Phoenix, AZ 8 5 CO

5

John BergProceed inqs Editor

National Bureau of StandardsTech A259Washington, D.C. 20234

Edwa rd Bi r ss

Conversion TechnologyHewlett PackardGeneral Systems Division5303 Stevens Creek Blvd.Bldg. 498-3Santa Clara, CA 95050

Don BranchStandards

Advisory Bureau for ComputingRoom 828, Lord Elgin Plaza66 Slatter StreetOttawa, Ontario KIA 0T5CANADA

Jean BryceStandards

M. Bryce & Associates, Inc1248 Springfield PikeCincinnati, OH 45215

-161-

Milt Bryc e

Chairman, Standards

Jim BurrowsChairman, ConversionExperiences

Richard G. CanningManagement Objectives

Lt. Michael CarterConversion Experiences

Joseph CollicaConversion Experiences

Elizabeth CourteConversion Experiences

Ahron DavidiConversion Experiences

Peter DressenConversion Technology

Ruth F. DykeConversion Experiences

Larry EspeManagement Objectives

M. Bryce & Associates, Inc.1248 Springfield PikeCincinnati, OH 45215

Director, Institute for ComputerSciences and TechnologyNational Bureau of StandardsAdministration Bldg., Room A200Washington, D. C. 20234

Canning Publications, Inc.925 Anza AvenueVista, CA 92083

Air Force Data Systems Design Ctr.AFDSDC/SDDA, Building 857Gunter AFB, AL 36114

National Bureau of StandardsTech. A254Washington, D. C. 20234

Bell Laboratories3B210 Six Corporate PlazaPiscataway, NJ 08854

Blue Cross of Massachusetts100 Summer Street, 12th FloorBoston, Massachusetts 02106

Honeywell Inforamtion SystemsP. 0. Box 6000Phoenix, Arizona 85005

U.S. Civil Service Commission1900 E Street, N. W., Room 6410Washington, D. C. 20415

Nolan, Norton and CompanyOne Forbes RoadLexington, Massachusetts 02173

I

i

f

i

I

-162-

Gordon EverestManagement Objectives

University of Minnesota271 19 Avenue SouthMinneapolis, MN 55455

Elizabeth FongStandards

National Bureau of StandardsTech B212Washington, D.C. 20234

James P. FryChairman, ConversionTechnol ogy

276 Business AdministrationUniversity of MichiganAnn Arbor, MI 48109

Al GaboriaultStandards

Sperry UnivacP.O. Box 500Mail Station ClNW-12Blue Bell. PA 1 9424

Mr. Rob GerritsenManagement Objectives

Wharton SchoolUniversity of PennsylvaniaPhiladelphia, Pennsylvania 19174

Richard GodloveManagement Objectives

Monsanto Company800 North Lindbergh BoulevardSt. Louis, Missouri 63166

Nancy GoguenConversion Technology

Bell Laboratories6 Corporate PI acePiscataway, NJ 088 54

Seymour JefferyHost

Director, Center for ProgrammingScience and Technology

National Bureau of StandardsTech A247Washington, D.C. 20234

Samuel C. KahnManagement Objectives

Information System Dep.E. I. duPont de NemoursWilmington, Delaware 19899

PlanningCo .

I M i k e KaplaniConversion Technology

Bell Laboratories8 Corporate PlacePiscataway, NJ 08854

-163-

Anthony Kl ugStandards

Coirputer Sciences DepartmentUniversity of WisconsinMadison, Wisconsin 537 06

Henry LefkovitsSta ndards

H. C. Lefkovits & Associates, IncP.O. Box 297Harvard, MA 01451

H, Eugene LockhartManagement Objectives

Nolan, Norton & CompanyOne Forbes RoadLexington. Massachusetts 02173

Thomas LoweHost

ChiefOperations Engineering DivisionCenter for Programming

Science and technologyK^ational Bureau of StandardsTech A265Washinaton. D.C. 20234

Gene LowenthalConversion Technology

MR I Systems CorporationP.O. Box 9968Austin, Texas 78766

Vincent LumConversion Technology

Mr . John Lyo n

Management Objectives

Halaine MaccabeeConversion Experiences

William MadisonContributor

IBM Research Corp.5600 Cottle RoadSan Jose, CA 95103

K55/282

Colonial Penn Group Data Corporatiof5 Penn Center Plaza

!

Philadelphia, PA 19103

Northeastern Illinois Plan. Comm.400 West Madison St.Chicaqo, Illinois 60606

Universal Systems, Inc.2341 Jefferson Davis HighwayArlinqton, Virginia 22202

-164-

Daniel B. MagrawGeneral Chairman

State of MinnesotaDepartment of Administration208 State Administration BuildingSt. Paul , m\ 55155

Robert MarionConversion Technology

Defense Communications Agency11440 Issac Newton Square. NorthReston. VA 22090

Steven MerrittConversion Experiences

GAO, FGMS-ADP411 G Street ,

Washinqton, D

Room 6011N. W.

C. 20548

Jack MinkerACM Liaison

Cha i rma n

Department ofUniversity ofColl ege Park

,

ComputerMaryl andMa ry 1 and

Science

20742

Thomas E. MurrayManagement Objectives

Delmonte CorpP.O. Box 3575San Francisco CA 94119

Shamkant NavatheConversion Technology

New York UniversityGrad. School of Business600 Ti sch Hal 1

40 West 4th StreetNew York , New York 10003

Mr. Jack NewcombManagement Objectives

Department of Finance & Admin326 Andrew Jackson State Off.Nashvil le. TN 37219

Bl dq

Richard NolanChairman^ ManagementObjecti ves

Nolan, Norton and CompanyOne Forbes RoadLexington. Massachusetts 02173

T. Wil 1 iem 01 1 e

Management ObjectivesConsultant27 BlackwoodWest ByfleetSurrey, KT14

Close

6PP ENGLAND

Mayford RoarkKeynoter

Ford Motor CompanyThe American RoadDearborn, MI 48121

-165-

i

C. H. RutledgeStandards

Marathon Oil Company539 South Main StreetFindley, OH 45840

Mr. Michael J. SamekManagement Objectives

Celanese Corporation1211 Avenue of the AmericasNew York , NY 10036

Steve SchindlerManagement Objectivesand ConversionTec hnol ogy

Mr. Richard D. Sec restManagement Objectives

University of Michigan276 Business AdministrationAnn Arbor, MI 48109

Standard Oil Company (Indiana)P.O. Box 591Tulsa, OK 74102

Philip ShewStandards

IBM Corporation555 Bailey AvenueSan Jose, CA 95150

Aric ShoshaniConversion Technology

Lawrence Berkeley LaboratoryUniversity of CaliforniaBerkeley, CA 94720

Edgar SibleyManagement Objectives

Dept . of InfoUniversity ofCol 1 eae Park,

SystemsMa ry 1 andMD 20742

Mgmt

.

Prof. Diane SmithConversion Technology

University of UtahMerrill Enaineering BldaSalt Lake City, UT~84112

Alfred SorkowitzConversion Experiences

Department of451 7 Street,Washington, D

Housing & UrbanS. W. , Room 4152C. 20410

Prof. Stanley SuConversion Technology

Electrical EngineeringUniversity of FloridaGai nesvil 1 e . FL 3261 1

-166-

Donald SwartwoutConversion Technology

276 Business AdministrationUniversity of MichiganAnn Arbor, Ml 48109

Robert W. TaylorConversion Technology

IBM CorporationIBM Research, K55/2825600 Cottle RoadSan Jose, CA 95193

Jay ThomasStandards

Allied Van Lines2501 West Roosevel t RoadBroadview, IL 60153

Ewart WilleyStandards

Prudential Assurance Co142 Holborn BarsLondon ECIN 2NHENGLAND

Major Jerry WinklerStandards

U.S.A.F. -AFDSC/GKDRoom 3A153, PentagonWashington, D.C. 20030

Beatrice YormarkConversion Technology

Interactive Systems Corporation1526 Cloverfield Blvd.Santa Monica, CA 90404

A Contributor is onesubmitted a papermanner that deserves

who could not attend but eitheror added to the Workshop in a

our acknowledgement and thanks.

-167-

*U S GOVERNMENT PRINTING OFFICE: 1980 328-117/6522

NBS-114A (REV. 9-78)

U.S. DEPT. OF COMM.

BIBLIOGRAPHIC DATASHEET

1. PUBLICATION OR REPORT NO.

NBS SP 500-64

l.Gov*L Acfcesslon No.

4. TITLE AND SUBTITLE

DATA BASE DIRECTIONS - THE CONVERSION PROBLEMProceedings of a Workhop held In Ft. Lauderdale, FL,

Nov. 1-3, 1977

Publication Date

September 1980

S.PerferjRingOfganJzation Cwtej

7. AUTHOR(S)

John L. Berg, Editor

8. Performing Organ. Report No.

9. PERFORMING ORGANIZATION NAME AND ADDRESS

NATIONAL BUREAU OF STANDARDSDEPARTMENT OF COMMERCEWASHINGTON, DC 20234

11. Contract/Grant No.

12. SPONSORING ORGANIZATION NAME AND COMPLETE ADDRESS ("Sfreef, City, State, ZIP)

National Bureau of StandardsDepartment of CommerceWashington, DC 20234

Assoc. for Computing Machi1133 Ave. of AmericasNew York, NY 10036

13. Type of Report & Period Covei]

Final

14. ^nsoring Agaicy Code

15. SUPPLEMENTARY NOTES

I I

Document describes a computer program; SF-185, FIPS Software Summary, is attached.

16. ABSTRACT (A 200~word or Iea3 (actual summary of most significant information. If document includes a significant bibliography or

literature survey, mention it here.)

What information can help a manager assess the impact a conversion will have

on a data base system, and of what aid will a data base system be during a conver-

sion? At a workshop on the data base conversion problem held in November 1977

under the sponsorship of the National Bureau of Standards and the Association for

Computing Machinery, approximately seventy-five participants provided the decision

makers with useful data.

Patterned after the earlier Data Base Directions Workshop, this workshop. DataBase Directions - the Conversion Problem ,

explores data base conversion from fourperspectives: management, previous experience, standards, and system technology.

Each perspective was covered by a workshop panel that produced a report included herThe management panel gave specific direction on such topics as planning for dat

base conversions. Impacts on the EDP organization and applications, and minimizingthe impact of the present and future conversions. The conversion experience panel

drew upon ten conversion experiences to compile their report and prepared specific

checklists of "do's and don'ts" for managers. The standards panel provided comments

on standards needed to support or facilitate conversions and the system technologypanel reports comprehensively on the systems and tools needed with strong recommend-ations on future research.

UK

Ml

n

17. KEY WORDS (six to twelve entries; alphabetical order; capitalize only the first letter of the first key word unless a proper name;

separated by semicolons)

Conversion; data base; data-description; data-dictionary; data-directory;

data-manipulation; DBMS: languages; query

18. AVAILABILITY ^^Unlimited

I IFor Official Distribution. Do Not Release to NTIS

Order From Sup. of Doc, U.S. Government Printing Office, Washington, DC20402

I I

Order From National Technical Information Service (NTIS), Springfield,

VA. 22161

19. SECURITY CLASS(THIS REPORT)

UNCLASSIFIED

20. SECURITY CLASS(THIS PAGE)

UNCLASSIFIED

21. NO. OFPRINTED PA

178

22. Price

$5,50

USCOMM-DC

JMEGKrHEMmdo those automated checkout counters, gas pumps,

it offices, and banl<ing facilities work? What every

|5umer should know about the modern electronic sys-•. now used in everyday transactions is explained in

'-page booklet published by the Commerce Depart-

t's National Bureau of Standards.

imation in the Marketplace (NBS Consumer Informa-

Series No. 10) is for sale by the Superintendent of

jments, U.S. Government Printing Office, Washington,20402. Price: 90 cents. Stock No. 003-003-01969-1.

(please detach

"AUTOMATION IN THE MARKETPLACE"A Consumer's Guide

D/H CAKE MIX .83WALNUTS CAN .59JELLO PUDDING .30

UNIVERSAL PRODUCT CODE see booklet

1 GREEN PEPPER .34LASER SCANNER see bkltCHRRY TOMATO .79

ELECTRONIC CASH REGISTER see bklt

1 CUCUMBERS .34

HANDLING OF UNCODED ITEMS see bklt

ELECTRONIC SCALES see bklt

GREETING CARD .60

WEIGHTS & MEASURES ENFORCEM'T see bklt

DELICATESSEN 1.352.19b @49/bBR0CC0 1.07

SPECIAL FEATURES OF COMPUTER CHECKOUTSYSTEMS see bklt

DRUG 4.49 T

BANK TELLER MACHINES see bklt

COMPUTER TERMINALS see bklt

CONSUMER ISSUES see bklt

THANK YOU BE INFORMED see bklt

along dotted line)

^DER FORMI'aSE send me COPIES OFij

pmation in the Marketplace

lii90 per copy.

fiStock No. 003-003-01969-1

lie type or print)

jl

I enclose $

Deposit Account No.

Total amount $

(check, or money order) or charge my

Make check or money order payable to Superintendent of Documents.

MAIL ORDER FORMWITH PAYMENT TO

Superintendent of DocumentsU.S. Government Printing Office or any U.S. Department of

Washington, D.C. 20402 Commerce district office

;ESS

ZIP CODE

FOR USE OF SUPT. DOCS.

Enclosed

To be mailed

-later

Refund

Coupon refund

Postage

NBS TECHNICAL PUBLICATIONS

PERIODICALS

JOURNAL OF RESEARCH—The Journal of Research of the

National Bureau of Standards reports NBS research and develop-

ment in those disciplines of the physical and engineering sciences in

which the Bureau is active. These include physics, chemistry,

engineering, mathematics, and computer sciences. Papers cover a

broad range of subjects, with major emphasis on measurement

methodology and the basic technology underlying standardization.

Also included from time to time are survey articles on topics

closely related to the Bureau's technical and scientific programs.

As a special service to subscribers each issue contains complete

citations to all recent Bureau publications in both NBS and non-

NBS media. Issued six times a year. Annual subscription; domestic

$13; foreign $16.25. Single copy, %} domestic; $3.75 foreign.

NOTE: The Journal was formerly published in two sections; Sec-

tion A "Physics and Chemistry" and Section B "Mathematical

Sciences."

DIMENSIONS/NBS—This monthly magazine is published to in-

form scientists, engineers, business and industry leaders, teachers,

students, and consumers of the latest advances in science and

technology, with primary emphasis on work at NBS. The magazine

highlights and reviews such issues as energy research, fire protec-

tion, building technology, metric conversion, pollution abatement,

health and safety, and consumer product performance. In addi-

tion, it reports the results of Bureau programs in measurement

standards and techniques, properties of matter and materials,

i engineering standards and services, instrumentation, and

j

automatic data processing. Annual subscription; domestic $11;

1 foreign $13.75.

' NONPERIODICALS

I

Monographs— Major contributions to the technical literature onI various subjects related to the Bureau's scientific and technical ac-

tivities.

iHandbooks—Recommended codes of engineering and industrial

practice (including safety codes) developed in cooperation with in-

terested industries, professional organizations, and regulatory

bodies.

Special Publications—Include proceedings of conferences spon-

sored by NBS, NBS annual reports, and other special publications

appropriate to this grouping such as wall charts, pocket cards, and

I

bibliographies.

!i Applied Mathematics Series— Mathematical tables, manuals, and1 studies of special interest to physicists, engineers, chemists,

j

biologists, mathematicians, computer programmers, and others

I

engaged in scientific and technical work.

{National Standard Reference Data Series— Provides quantitative

data on the physical and chemical properties of materials, com-piled from the world's literature and critically evaluated.

Developed under a worldwide program coordinated by NBS under

t|the authority of the National Standard Data Act (Public Law

I90-396).

NOTE; The principal publication outlet for the foregoing data is

the Journal of Physical and Chemical Reference Data (JPCRD)published quarterly for NBS by the American Chemical Society

(ACS) and the American Institute of Physics (AIP). Subscriptions,

reprints, and supplements available from ACS, 1 155 Sixteenth St.,

NW, Washington, DC 20056.

Building Science Series— Disseminates technical information

developed at the Bureau on building materials, components,

systems, and whole structures. The series presents research results,

test methods, and performance criteria related to the structural and

environmental functions and the durability and safety charac-

teristics of building elements and systems.

Technical Notes—Studies or reports which are complete in them-

selves but restrictive in their treatment of a subject. Analogous to

monographs but not so comprehensive in scope or definitive in

treatment of the subject area. Often serve as a vehicle for final

reports of work performed at NBS under the sponsorship of other

government agencies.

Voluntary Product Standards— Developed under procedures

published by the Department of Commerce in Part 10, Title 15, of

the Code of Federal Regulations. The standards establish

nationally recognized requirements for products, and provide all

concerned interests with a basis for common understanding of the

characteristics of the products. NBS administers this program as a

supplement to the activities of the private sector standardizing

organizations.

Consumer Information Series— Practical information, based on

NBS research and experience, covering areas of interest to the con-

sumer. Easily understandable language and illustrations provide

useful background knowledge for shopping in today's tech-

nological marketplace.

Order the above NBS publications from: Superintendent of Docu-

ments, Government Printing Office, Washington, DC 20402.

Order the following NBS publications—FlPS and NBSlR's—fromthe National Technical Information Services, Springfield, VA 22161

.

Federal Information Processing Standards Publications (FIPS

PUB)— Publications in this series collectively constitute the

Federal Information Processing Standards Register. The Register

serves as the official source of information in the Federal Govern-

ment regarding standards issued by NBS pursuant to the Federal

Property and Administrative Services Act of 1949 as amended.

Public Law 89-306 (79 Stat. 1127), and as implemented by Ex-

ecutive Order 1 1 7 1 7 (38 FR 1 23 i 5, dated May 1 1 , 1 973) and Part 6

of Title 15 CFR (Code of Federal Regulations).

NBS Interagency Reports (NBSIR)~A special series of interim or

final reports on work performed by NBS for outside sponsors

(both government and non-government). In general, initial dis-

tribution is handled by the sponsor; public distribution is by the

National Technical Information Services, Springfield, VA 22161,

in paper copy or microfiche form.

BIBLIOGRAPHIC SUBSCRIPTION SERVICES

The following current-awareness and literature-survey bibliographies

are issued periodically by the Bureau:

Cryogenic Data Center Current Awareness Service. A literature sur-

vey issued biweekly. Annual subscription; domestic $35; foreign

$45.

Liquefied Natural Gas. A literature survey issued quarterly. Annualsubscription; $30.

Superconducting Devices and Materials. A literature survey issued

quarterly. Annual subscription; $45. Please send subscription or-

ders and remittances for the preceding bibliographic services to the

National Bureau of Standards, Cryogenic Data Center (736)

Boulder, CO 80303.

U.S. DEPARTMENT OF COMMERCENational Bureau of StandardsWashington. D.C. 20234

OFFICIAL BUSINESS

Penalty for Private Use. $300

POSTAGE AND FEES PAIDU.S. DEPARTMENT OF COMMERCE

COM-21S

SPECIAL FOURTH-CLASS RATEBOOK