Understanding Software Systems Using Reverse Engineering Technology

Understanding 'Software System s

Using Reverse Engineering Technology

Perspectives from the Rigi Project t

Hausi A. Muller

Scott R. Tilley

Kenny Wong

Department of Computer Science, University of Victori aP.O. Box 3055, Victoria, BC, Canada V8W 3P6

Tel: (604) 721-7294, Fax: (604) 721-7292E-mail : {hausi, stilley, kenw}@csr .uvic .ca

Abstract

Software engineering research has focuse dmainly on software construction and has ne-glected software maintenance and evolution .Proposed is a shift in research from synthesi sto analysis . Reverse engineering is introduce das a possible solution to program understandingand software analysis . Presented is reverse en-gineering technology developed as part of theRigi project . The Rigi approach involves th eidentification of software artifacts in the subjectsystem and the aggregation of these artifacts t oform more abstract architectural models . Re-ported are some analyses on the source code ofSQL/DS, performed by the authors while visit-ing the Program Understanding project at theIBM Centre for Advanced Studies in Toronto .

Keywords : Legacy software, software evolu-tion, program understanding, reverse engineer-ing .

1 Introduction

Suppose we could turn back time to 1968—t othe first software engineering conference heldin Garmisch, Germany. This NATO confer-

t This work was supported in part by the BritishColumbia Advanced Systems Institute, IBM Canad aLtd., the IBM Centre for Advanced Studies, the IRISFederal Centres of Excellence, the Natural Sciences andEngineering Research Council of Canada, the Scienc eCouncil of British Columbia, and the University o fVictoria.

ence, which was held in response to the per-ceived software crisis, introduced the term soft-ware engineering and significantly influence dresearch and practice in the years to follow .What advice would we give to those softwar epioneers, given the software engineering back -ground and experience we have today?

These software pioneers did not anticipatethat their software, constructed in the 1960 ' sand early 1970 's, would still be used and mod-ified twenty-five years later . Today, these sys-tems are often referred to as legacy or heritagesystems . They include telephone switching sys-tems, banking systems, health information sys-tems, avionics systems, and many compute rvendor products . In switching systems, ne wfunctionality is added periodically to reflect thelatest market needs . Banks have to updatetheir systems regularly to implement new orchanged business rules and tax laws . Health in-formation systems must adapt to rapidly chang-ing technology and increased demands . Com-puter vendors are often committed to support-ing their products (for example, database man-agement systems) indefinitely, regardless of age .

Such legacy systems cannot be replacedwithout re-living their entire history . Theyembody substantial corporate knowledge suc has requirements, design decisions, and busines srules that have evolved over many years an dare difficult to obtain elsewhere . This knowl-edge constitutes significant corporate assets to-talling billions of dollars . As a result, long-termsoftware maintenance and evolution are as im-portant as software construction—especially i f

217

we consider the economic impact of these sys-tems .

2 Analysis and synthesis

Our main advice to the software pioneers o f1968 would be to carefully balance softwareanalysis and software construction efforts inboth research and education .

Over the past three decades, software engi-neering research has focused mainly on softwareconstruction and has neglected software main-tenance and evolution . For example, numeroussuccessful tools and methodologies have beendeveloped for the early phases of the softwarelife cycle, including requirement specifications ,design methodologies, programming languages ,and programming environments . As such, therehas arisen a dramatic imbalance in software en-gineering education, for both academia and in-dustry, of favoring original program, algorithm ,and data structure construction .

Computer science and computer engineer-ing programs prepare future software engineer swith a background that encourages fresh cre-ation or synthesis . Concepts such as architec-ture, consistency and completeness, efficiency ,robustness, and abstraction are usually taugh twith a bias toward synthesis, even though theseconcepts are equally applicable and relevant t oanalysis . The study of real-world software sys-tems is often overlooked . Instructors rarely pro-vide assignments that model the normal modeof operation in industry : analyzing, under-standing, and building upon existing systems .Contrast this situation with electrical or civi lengineering education, where the study of ex-isting systems and architectures constitutes amajor component of the curriculum .

Knowledge of architectural concepts in largesoftware systems is key to understanding legac ysoftware and to designing new software . Theseconcepts include: subsystem structures ; layere dstructures ; aggregation, generalization, special-ization, and inheritance hierarchies ; resource-flow graphs ; component and dependency classi-fication ; event handling strategies; pipes and fil-ters; user interface separation ; and distribute dand client-server architectures . The importanceof architecture is now recognized ; courses on the

foundations of software architecture [1] have re-cently emerged at several universities .

The bias toward synthesis has also resulte din a lack of tools for the software maintainer .To correct the imbalance, a shift in researchfrom synthesis to analysis is needed . Thiswould allow the software maintenance and evo-lution community to catch up . Once the reper-toire of tools and methodologies for analysis be-comes as refined as that for synthesis, softwar emaintenance and evolution will become mor etractable .

In 1990 the Computer Science TechnologyBoard of the U .S . proposed a research agendafor software engineering [2] . Their report con-cluded that progress in developing complexsoftware systems was hampered by differingperspectives and experiences of the researchcommunity in academia and of software en-gineering practitioners in industry . The re-port recommended nurturing a collaborationbetween academia and industry, and legitimiz-ing academic exploration of complex softwar esystems directly in government and industry .Software engineering researchers should testand validate their ideas on large, real-worldsoftware systems . An excellent example of pro-viding such arrangements in Canada is the IB MCentre for Advanced Studies (CAS) in Toronto ,where there is a Program Understanding (PU )project .

3 Program understandin gvia reverse engineerin g

Programmers have become part histo-rian, part detective, and part clairvoy-ant .

— T.A. Corbi, IBM

One of the most promising approaches to theproblem of software evolution is program un-derstanding technology. It has been estimate dthat fifty to ninety percent of evolution wor kis devoted to program comprehension or under-standing [3] . Programmers use programmingknowledge, domain knowledge, and comprehen-sion strategies when trying to understand a pro-

218

https://www.researchgate.net/publication/220069916_An_Essay_on_Software_Reuse?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

gram. For example, one might extract syntac-tic knowledge from the source code and rely o nprogramming knowledge to form semantic ab-stractions . Brooks 's early work on the theory ofdomain bridging [4, 5, 6] describes the program -ming process as one of constructing mappingsfrom a problem domain to an implementatio ndomain, possibly through multiple levels . Pro-gram understanding then involves reconstruct-ing part or all of these mappings . Moreover ,the programming process is a cognitive one in-volving the assembly of programming plans —implementation techniques that realize goals i nanother domain . Thus, program understand-ing also tries to pattern match between a set o fknown plans (or mental models) and the sourcecode of the subject software .

For large legacy systems, the manual match-ing of such plans is difficult . One way of aug-menting the program understanding process isthrough reverse engineering . Although thereare many forms of reverse engineering, the com-mon goal is to extract information from exist-ing software systems . This knowledge can the nbe used to improve subsequent development ,ease maintenance and re-engineering, and aidproject management [7] .

3.1 Reverse engineering process

The process of reverse engineering a subjec tsystem involves the identification of the sys-tem's current components and their dependen-cies followed by the extraction of system ab-stractions and design information . During thi sprocess the source code is not altered, althoughadditional information about it is generated . Incontrast, the process of re-engineering typicallyconsists of a reverse engineering phase followe dby a forward engineering or re-implementationphase that alters the subject system .

There is a large number of commercial re -verse and re-engineering tools available ; [8] list sover one hundred such packages. Most com-mercial systems focus on source-code analysi sand simple code restructuring, and use the mostcommon form of reverse engineering : informa-tion abstraction via program analysis .

3.2 Reverse engineeringapproaches

Reverse engineering consists of many divers eapproaches, including : formal transformation s[9], meaning-preserving restructuring [10], pat -tern recognition [11], function abstraction [12] ,information abstraction [13, 14], maverick iden-tification [15], graph queries [16], and reuse-oriented methods [17] .

Three specific approaches are briefly de -scribed below . Each approach is supported byone (or more) research groups in the PU projec tat CAS. One goal of this project is to exploit allthree to produce a more comprehensive reverseengineering toolset .

3.2 .1 Defect filtering

The work by Buss and Henshaw at IBM CA Sexplores design recovery and knowledge re -engineering of large legacy software, with anemphasis on software written for the SQL/DS 1product . SQL/DS is typical of many legacy sys-tems: old, highly modified, popular, success-ful, and large . Higher quality standards andincreased productivity goals motivated this ex-ploration .

Their work in the PU project is mainly con-cerned with defect filtering in SQL/DS [18] andother IBM products . They use the commer-cial Software Refinery product (REFINE) [19 ]to parse the source code into a form suitabl efor analysis . This work applies the experienceof domain experts and the results of causalanalysis to create REFINE "rules" to find cer-tain families of defects in the subject software .These defects include programming languag eviolations (overloaded keywords, poor data typ-ing), implementation domain errors (data cou-pling, addressability), and application domainerrors (coding standards, business rules) .

3.2.2 Pattern matching

Four research groups affiliated with the IB MCAS PU project focus on pattern-matching ap-proaches at various levels : textual (characters) ,lexical (token stream), syntactic (parse tree) ,

1 SQL/DS is a trademark of the International Busis-ness Machines Corporation.

219

https://www.researchgate.net/publication/3246714_Recognizing_a_program's_design_A_graph-parsing_approach?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/3246722_Using_function_abstraction_to_understand_program_behavior?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/3187223_The_C_Information_Abstraction_System?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/234791029_Discovering_visualizing_and_controlling_software_structure?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/221553954_Visualizing_and_Querying_Software_Structures?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/223007975_Towards_a_theory_of_the_cognitive_processes_in_computer_programming?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/200085840_Using_a_behavioral_theory_of_program_comprehension_in_software_engineering?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/221501068_Experiences_in_program_understanding?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/223510979_Towards_a_theory_of_computer_program_comprehension_Int_J_Man_Mach_Stud?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/3248509_TMM_Software_Maintenance_by_Transformation?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/237033641_Software_Reengineering?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/261021460_The_concept_assignment_problem_in_program_understanding?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0


https://www.researchgate.net/publication/220143940_Object-Oriented_Design_Archaeology_with_CIA?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

semantic (meaning), and structural (architec-ture) .

Johnson and Gentleman study redundancyat the textual level, with some work at the lex-ical level . A number of uses are relevant t othe SQL/DS product : looking for code reusedby cut-and-paste, building a simplified modelfor macro processing based on actual use, an dproviding overviews of information content inabsolute or relative (version or variant) terms .

Paul and Prakash use programming languag econstructs as a type of plan in the SCRUPL Esystem [20] . Instead of looking for low-level tex-tual patterns or very high-level semantic con-structs, SCRUPLE looks for code cliches. Thecliche pattern is user-defined . This approachis a natural progression from simple textualscanning techniques ; instead of using charac-ter strings as the search criteria, programming-language constructs are used .

The work by Kontogiannis concerns seman-tic or behavioral pattern-matching [21] . Atransformational approach is used to simplifysyntactic programming structures and expres-sions, such as while loops, by translating themto simpler canonical forms . A canonical for mis a semantic abstraction that improves under -standing while providing a unified representa-tion for similar constructs . These canonicalforms can reduce the number of plans that nee dto be stored .

Finally, structural patterns are investigatedas part of the Rigi2 [22] project .

3.2.3 Structural redocumentatio n

Software structure refers to a collection of arti-facts that software engineers use to form men-tal models when designing, documenting, im-plementing, or analyzing software systems . Ar-tifacts include software components such asprocedures, modules, subsystems, and inter-faces ; dependencies among components suchas supplier-client, composition, and control-flow relations; and attributes such as compo-nent type, interface size, and interconnectionstrength . The structure of a system is the orga-nization and interaction of these artifacts [23] .

2Pigi is named after a mountain in centralSwitzerland .

For a large software system, the reconstruc-tion of the structural aspects of its architectureis beneficial . This process may be termed struc-

tural redocumentation . It involves the identifi-cation of the software artifacts in the subjec tsystem and the organization of these artifact sinto more abstract models to reduce complexity .As a result, the overall structure of the subjec tsystem can be derived and some of its architec-tural design information can be recaptured .

This process is supported by Rigi, a flexibl eenvironment under development at the Univer-sity of Victoria for discovering and analyzin gthe structure of large software systems . It pro-vides the following desirable components of areverse engineering environment :

• a variety of parsers to support the com-mon programming languages of legacysoftware ; '

• a repository to store the information ex-tracted from the source code [24] ; and

• an interactive graph editor to manipulat eprogram representations .

In the Rigi approach, the first phase o fthe structural redocumentation process is au-tomatic and language-dependent . It involve sparsing the source code of the subject sys-tem and storing the extracted artifacts in arepository . The second phase involves humaninteraction and features language-independen tsubsystem composition methods that generat ebottom-up, layered hierarchies [25, 26] . Sub-system composition is the iterative process ofaggregating building blocks such as data types ,procedures, and subsystems into compositesubsystems. The process is guided by parti-tioning the resource-flow graphs of the sourc ecode via equivalence relations that embody soft -ware engineering principles concerning moduleinteractions such as low coupling and strong co-hesion [27, 28] . The Rigi project also devise dsoftware quality measures, based on exact inter-faces and established software engineering prin -ciples, to evaluate the generated subsystem hi-erarchies [29, 30, 31] .

3 Such software is typically written in procedural orimperative programming languages such as C, COBOL ,Fortran, and PL/I .

220

https://www.researchgate.net/publication/242620956_Measuring_the_quality_of_subsystem_structures?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/221501293_SCRUPLE_a_reengineer's_tool_for_source_code_search?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/3506101_Composing_subsystem_structures_using_K2-partite_graphs?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/221500969_Toward_program_representation_and_program_understanding_using_process_algebras?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/255665575_A_Mechanism_for_Specifying_the_Structure_of_Large_Layered_Systems?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/2961141_A_Guided_Tour_of_Program_Design_Methodologies?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/227837427_A_reverse-engineering_approach_to_subsystem_structure_identification?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

4 The Rigi projectThe main goal of the Rigi project is to inves-tigate frameworks and environments for pro -gram understanding, reverse engineering, an dsoftware analysis (all in-the-large) . The mos trecent results of the project include a reverseengineering environment consisting of a parsingsubsystem, a repository, and a graph editor [32] ;a reverse engineering methodology based o nsubsystem composition [25] ; a documentationstrategy using views [33, 34, 35] ; a structuredhypertext layer [36] ; and an extension mecha-nism via a scripting language [37] . These result shave been applied to several industrial softwar esystems to validate and evaluate the Rigi ap-proach to program understanding [38, 39, 40] .Early experience has shown that we can pro-duce views that are compatible with the menta lmodels used by the maintainers of the subjectsoftware . Over the past year, we analyzed th esource code of the SQL/DS system as part ofthe IBM CAS PU collaborative project .

4.1 Scalability, flexibility, an dextensibility

To compete with commercial tools and to in -spire the current state-of-practice, we need t otrain our program understanding tools an dmethodologies on large software systems . Tech-niques that work on toy projects typically d onot scale up . Our current scale objective is t oanalyze systems consisting of up to five millio nlines of code .

Because program understanding has so man ydifferent facets and applications, it is wise t omake our approach as flexible as possible for usein many different domains . Most reverse engi-neering tools provide a fixed set of extraction ,selection, filtering, organization, documenta-tion, and representation techniques . We pro -vide a scripting language that allows users t ocustomize, combine, and automate these activ-ities in novel ways .

4 .2 Involving the user

Much work on program understanding stillmakes heavy use of human cognitive abilities .There is a tradeoff between what can be au -

tomated and what should (or must) be left t ohumans . The best solution seems to lie in acombination of the two. Rigi depends heavilyon the experience and domain knowledge of th esoftware engineer using it ; the user makes allthe important decisions . Nevertheless, the pro-cess is one of synergy as the user also learns an ddiscovers interesting relationships by exploringsoftware systems with the Rigi environment .

4.3 Summarizing softwarestructure

Subsystem composition is the methodologyused in Rigi for generating layered hierarchiesof subsystems, thereby reducing the cognitivecomplexity of understanding large software sys-tems. The Rigi environment supports a parti-tioning of the resource-flow graph based on es-tablished software engineering principles . How-ever, because the user is in charge, the partitio ncan easily be based on other criteria, such a sbusiness rules, tax laws, message paths, or othe rsemantic or domain information . Moreover, al-ternate decompositions may co-exist under thesoftware structure representation supported b yRigi .

4 .4 Documenting with view s

Software engineers rely heavily on internal doc-umentation to help understand programs . Un-fortunately, this documentation is typically out -of-date and software engineers end up refer-ring to the source code . The Rigi environ-ment eases the task of redocumenting the sub-ject software by presenting the results using in-teractive views . A view is a bundle of visualand textual frames that contain, for example ,call graphs, overviews, projections, exact inter -faces, and annotations. A view is a dynamicsnapshot that reflects the current reverse engi-neering state . As such, a view remains up-to-date . Views can accurately capture co-existingarchitectural decompositions, providing man ydifferent perspectives for later inspection .

5 Analyzing SQL/D SIn 1992/93, the first author spent a ten-mont hsabbatical at IBM CAS . The other two authors

221

https://www.researchgate.net/publication/3593628_Domain-retargetable_reverse_engineering?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

https://www.researchgate.net/publication/238727797_Applying_software_re-engineering_techniques_to_health_information_systems?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0


https://www.researchgate.net/publication/3722789_Using_virtual_subsystems_in_project_management?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0

joined him for four months and together we an-alyzed the source code of SQL/DS using th eRigi environment .

5.1 Scaling up

SQL/DS contains about three million lines o fPL/AS source code, excluding comments an dblank lines . PL/AS is an internal IBM pro-gramming language—a mix of PL/I and assem-bly language . When we started, the Rigi systemdid not include a PL/AS parser . Moreover, wehad never analyzed a system over 120 thousandlines of code . Thus, this analysis presented areal challenge and an excellent test of whethe rour methodology and environment would scaleup to the million line range .

Many commercial reverse engineering tool sstore entire parse trees in their repositories . Fora multi-million line program, this can requireseveral hundred megabytes of storage . Whilethis level of detail may be necessary for task ssuch as data and control-flow analyses or cod eoptimization and generation, it is not necessar yfor understanding the architecture . For pro-gram understanding, it is important to buil dabstractions that emphasize important themesand suppress irrelevant details ; deciding whatto include and what to ignore is still an art .

The Rigi parsing subsystem can extract a va-riety of software artifacts at various levels of de -tail . Thus, we can reduce the repository size fo ra multi-million line program significantly, mak-ing a major difference when retrieving data in-teractively . For example, the Rigi database fo rSQL/DS is under two megabytes .

Displaying graphs of twenty to fifty thou -sand vertices and their edges for a multi-millionline program is another problem of scale . Forsmaller graphs, it is feasible to update a win-dow after every event or command . This strat-egy fails for very large graphs on current displa ytechnology. We needed to tune the user inter -face, redesigning it to allow the user to batchsequences of operations and to specify when t oupdate a window .

5.2 Identifying a target audienc e

After tuning the Rigi environment to handl emulti-million line programs, we had to identify

a target audience for our program understand-ing experiments . Both the CAS PU groupsand the SQL/DS development and manage-ment teams were extremely helpful .

On our first try, we summarized the entirecall graph of SQL/DS without considering anydomain knowledge . The result was not encour-aging because the developers did not recogniz ethe structures we generated, making it difficul tto give constructive feedback . On our secondtry, we considered the naming conventions use dby the developers . This time, they readily rec-ognized our subsystem decomposition .

We then focused on one large subsystem ofSQL/DS, the relational data system (RDS) ,which contains about one million lines of sourc ecode . With help from a domain expert, RD Swas further decomposed into four main subsys-tems: (1) run-time access generator ; (2) opti-mizer pass one and two; (3) optimizer path se-lection; and (4) executive, interpreter, and au-thorization . Four distinct development team swere in charge of these subsystems .

5.3 Developer feedback

For the individual subsystems, we proceeded toanalyze their call graphs, summarizing them ina set of views depicting different architectura lperspectives . We then presented these views t othe development teams with a series of carefullydesigned one-hour demonstrations .

A demonstration consisted of four phases :(1) highlighting the main features of the Rig iuser interface ; (2) exhibiting the structura lviews of the subsystems pertinent to the partic-ular development group ; (3) allowing the audi-ence to interact with their software structuresusing views as starting points ; and (4) allow-ing individual developers to create new viewson the fly to reflect and record specific domai nknowledge .

While our prepared views did not uncove rthe exact mental model of each developer, th eaudience readily recognized the presented struc-tures . There were two main reasons for this .First, these developers knew their subsystem sintimately. Second, and more importantly, th eviews represented the right level of abstrac-tion. Most satisfying for us was when the in-dividual developers used their specific domai n

222

knowledge to design additional views to reflec ttheir mental model more closely . This was usu-ally done by emphasizing important compo-nents and filtering irrelevant information . In-variably, after the demonstrations, the develop-ers came back to try out and document addi-tional domain knowledge and perspectives .

6 Summary

There will always be old software that need sto be understood . It is critical for the soft -ware industry to deal effectively with the prob-lems of software evolution and the understand-ing of legacy software systems. Since the pri-mary focus of the industry is changing fromcompletely new software construction to soft-ware maintenance and evolution, software en-gineering research and education must makesome major adjustments . In particular, moreresources should be devoted to software analy-sis in balance with software construction .

With the focus changing to software evolu-tion, program understanding tools and method-ologies that effectively aid software engineers i nunderstanding large and complex software sys-tems can have a significant impact . This is crit-ical for keeping up with the varied demands ofthe information industry .

The Rigi environment focuses on the archi-tectural aspects of the subject software underanalysis . The environment provides many way sto identify, explore, summarize, evaluate, andrepresent software structures . More specifically ,it supports a reverse engineering methodologyfor identifying, building, and documenting lay-ered subsystem hierarchies. Critical to the us-ability of the Rigi system is the ability to storeand retrieve views—snapshots of reverse engi-neering states . The views are used to transferinformation about the abstractions to the soft -ware engineers .

While the Rigi system is now primarily a re-verse engineering environment, it was originall yconceived as a design tool . In fact, the syste mcan be used for both reverse and forward engi-neering, helping to complete the cycle of soft-ware evolution .

6.1 Future work

We are currently designing and developing amore ambitious reverse engineering environ-ment based on seven years of experience gainedwith the Rigi project . This new environment i ssupported by an NSERC CRD (Collaborativ eResearch and Development) grant and involvesthree universities (McGill University, the Uni-versity of Toronto, and the University of Vic-toria) with IBM as the industrial partner . Col-laboration is the main theme as the universi-ties walk a fine line between pure research an ddeveloping an integrated system for addressin greal-world program understanding problems i nindustry .

McGill University will extend the structura lpattern matching capabilities of the Rigi sys-tem to support syntactic, semantic, functional ,and behavioral search patterns . The Universit yof Toronto will deliver a more flexible and pow -erful repository for storing software artifacts ,pattern matching rules, and software engineer-ing knowledge. The University of Victoria wil lmake the Rigi system more flexible, scalable ,extensible, by developing a scripting languagefor users to design their own high-level opera-tions by composing and tailoring existing oper-ations. IBM will provide an industrial perspec-tive on architectural issues of the new environ-ment, characterize real-world re-engineering ap -plications, and contribute legacy code for test-ing the new environment . The results of theproject will be disseminated in papers, demon-strations, and tutorials .

We see re-engineering as the next logical ste pafter reverse engineering . We are seeking t oapply the results of the NSERC CRD work t oother large legacy systems .

About the authorsDr. Hausi A. Muller is an Associate Pro-fessor of Computer Science at the Univer-sity of Victoria, where he has been sinc e1986 . From 1979 to 1982 he worked as asoftware engineer for Brown Boveri & Cie i nBaden, Switzerland (now called ASEA BrownBoveri) . In 1992/93 he was on sabbati-cal at the IBM Centre for Advanced Stud-ies in Toronto working with the program un -

223

derstanding group . His research interests in-clude software engineering, software analysis ,program understanding, reverse engineering ,re-engineering, programming-in-the-large, soft -ware metrics, and computational geometry . HisInternet address is hausi@csr .uvic .ca.

Scott R. Tilley is currently on leave fromIBM Canada Ltd ., pursuing a Ph .D. in theDepartment of Computer Science at the Uni-versity of Victoria . His field of researchis software engineering in general, and pro -gram understanding, software maintenance ,and reverse engineering in particular . Hecan be reached at the University of Victo-ria, or at the IBM PRGS Toronto Laboratory ,844 Don Mills Rd., 22/121/844/TOR, Nort hYork, ON, Canada M3C 1V7 . His e-mail ad-dresses are stilley@csr .uvic . ca at UVic, andstilley@vnet .ibm .com or TOROLAB6(TILLEY )

(VNet) at IBM .Kenny Wong is a Ph .D. student in the De-

partment of Computer Science at the Universityof Victoria . His research interests include pro -gram understanding, user interfaces, and soft -ware design . He is a member of the ACM ,USENIX, and the Planetary Society. His In-ternet address is kenw@sanjuan . uvic . ca .

References

[ 1 ] D. E . Perry and A . L . Wolf . Foundations fo rthe study of software architecture . ACM SIG -SOFT Software Engineering Notes, 17(4) :40-52, October 1992 .

[2] CSTB. Scaling up : A research agenda forsoftware engineering . Communications of th eACM, 33(9) :281-293, March 1990 .

[3] T. A. Standish . An essay on software reuse .IEEE Transactions on Software Engineering,SE-10(5):494-497, September 1984 .

[4] R. Brooks. Towards a theory of the cogni-tive processes in computer programming . In-ternational Journal of Man-Machine Studies ,9:737-751, 1977 .

[5] R. Brooks. Using a behavioral theory of pro -gram comprehension in software engineering .In ICSE '3: Proceedings of the 3rd Interna-tional Conference on Software Engineering,(Atlanta, Georgia ; May 10-12, 1978), page s196-201, May 1978 .

[6] R. Brooks. Towards a theory of the compre-hension of computer programs . International

Journal of Man-Machine Studies, 18:543-554 ,1983 .

[7] R. Arnold . Software Reengineering. IEEEComputer Society Press, 1993 .

[8] C. Sittenauer, M. Olsem, and D. Murdock .Re-engineering tools report . Technical Repor tSTSC-Rev B, Software Technology Suppor tCenter ; Hill Air Force Base, July 1992 .

[9 ] G. Arango, I . Baxter, P . Freeman, and C . Pid-geon. TMM: Software maintenance by trans -formation . IEEE Software, pages 27-39, May1986 .

[10] W. G. Griswold . Program Restructuring a san Aid to Software Maintenance . PhD thesis ,University of Washington, 1991 .

[11] C. Rich and L. M . Wills . Recoginizing aprogram 's design: A graph-parsing approach .IEEE Software, 7(1):82-89, January 1990 .

[12] P. A. Hausler, M . G. Pleszkoch, R . C. Linger ,and A. R. Hevner . Using function abstractionto understand program behavior . IEEE Soft-ware, 7(1) :55-63, January 1990 .

[13] Y. Chen, M. Nishimoto, and C . Ramamoor-thy. The C information abstraction system .IEEE Transactions on Software Engineering,16(3):325-334, March 1990 .

[14] J . E. Grass . Object-oriented design archaeol-ogy with CIA++. Computing Systems, 5(1) :5-67, Winter 1992 .

[15] R. Schwanke, R . Altucher, and M. Platoff. Dis-covering, visualizing, and controlling softwar estructure . ACM SIGSOFT Software Engineer-ing Notes, 14(3) :147-150, May 1989 . Proceed-ings of the Fifth International Workshop o nSoftware Specification and Design .

[16] M. Consens, A . Mendelzon, and A . Ryman.Visualizing and querying software structures .In ICSE '14 : Proceedings of the 14th Inter-national Conference on Software Engineer-ing, (Melbourne, Australia; May 11-15, 1992) ,pages 138-156, May 1992 .

[17] T. J. Biggerstaff, B . G. Mitbander, andD . Webster . The concept assignment prob-lem in program undertsanding. In WCRE '93 :Proceedings of the 1993 Working Conferenc eon Reverse Engineering, (Baltimore, Mary-land; May 21-23, 1993), pages 27-43. IEEEComputer Society Press (Order Number 3780-02), November 1992 .

[18] E. Buss and J . Henshaw . Experiences in pro-gram understanding . Technical Report TR-74.105, IBM Canada Ltd . Centre for AdvancedStudies, July 1992 .

224






























































[19] G. Kotik and L . Markosian . Program transfor-mation : The key to automating software main-tenance and re-engineering. Technical report ,Reasoning Systems, Inc ., 1991 .

[20] S. Paul . SCRUPLE: A reengineer's tool fo rsource code search . In CASCON'92 : Proceed-ings of the 1992 CAS Conference, (Toronto ,Ontario ; November 9-12, 1992), pages 329 —345 . IBM Canada Ltd ., November, 1992 .

[21] K. Kontogiannis . Toward program represen-tation and program understanding using pro-cess algebras . In CASCON'92 : Proceedingsof the 1992 CAS Conference, (Toronto, On-tario ; November 9-12, 1992), pages 299—317 .IBM Canada Ltd ., November 1992 .

[22] H. A. Muller . Rigi — A Model for Software Sys -tern Construction, Integration, and Evolutio nbased on Module Interface Specifications . PhDthesis, Rice University, August 1986 .

[23] H . L. Ossher . A mechanism for specifyin gthe structure of large, layered systems . InB. D. Shriver and P. Wegner, editors, Re-search Directions in Object-Oriented Program-ming, pages 219—252. MIT Press, 1987 .

[24] N . Kiesel, A. Schurr, and B. Westfechtel .GRAS: A graph-oriented database systemfor (software) engineering applications . InCASE '93: The Sixth International Confer-ence on Computer-Aided Software Engineer-ing, (Institute of Systems Science, Nationa lUniversity of Singapore, Singapore ; July 19 -23, 1993), pages 272—286, July 1993 . IEEEComputer Society Press (Order Number 3480-02) .

[25] H. Muller and J . Uhl . Composing subsyste mstructures using (k,2)-partite graphs . In Pro-ceedings of the Conference on Software Main-tenance 1990, (San Diego, California ; Novem-ber 26-29, 1990), pages 12—19, November 1990 .IEEE Computer Society Press (Order Number2091) .

[26] H. A. Muller, M . A. Orgun, S . R. Tilley, andJ . S . Uhl . A reverse engineering approach t osubsystem structure identification . Journal ofSoftware Maintenance: Research and Practice ,1993 . In press .

[27] G. D. Bergland . A guided tour of programdesign methodologies . Computer, 14(10) :18 —37, October 1981 .

[28] G. Myers . Reliable software through compositedesign . Petrocelli/Charter, 1975 .

[29] H. Muller . Verifying software quality criteri ausing an interactive graph editor. In Proceed-ings of the Eighth Annual Pacific Northwes tSoftware Quality Conference, (Portland, Ore-gon ; October 29-31, 1990), pages 228—241, Oc-tober 1990 . ACM Order Number 613920 .

[30] H. A. Muller and B. D . Corrie . Measuringthe quality of subsystem structures . Techni-cal Report DCS-193-IR, University of Victo-ria, November 1991 .

[31] M . A. Orgun, H . A. Muller, and S. R. Tilley .Discovering and evaluating subsystem struc-tures . Technical Report DCS-194-IR, Univer-sity of Victoria, April 1992 .

[32] H. Muller, S . Tilley, M . Orgun, B . Corrie, andN. Madhavji . A reverse engineering environ-ment based on spatial and visual software in-terconnection models . In SIGSOFT '92: Pro-ceedings of the Fifth ACM SIGSOFT Sympo-sium on Software Development Environments ,(Tyson's Corner, Virginia; December 9-11 ,1992), pages 88—98, December 1992 . In ACMSoftware Engineering Notes, 17(5) .

[33] S. R. Tilley. Documenting-in-the-large vs .documenting-in-the-small . In the proceedingsof CASCON '93, (Toronto, Ontario ; October25-28, 1993), June 1993 . To appear .

[34] K. Wong. Managing views in a program un-derstanding tool. In the proceedings of CAS -CON '93, (Toronto, Ontario ; October 25-28 ,1993), June 1993 . To appear .

[35] S . R. Tilley, H . A. Muller, and M . A. Orgun .Documenting software systems with views . InProceedings of SIGDOC '92 : The 10th Inter-national Conference on Systems Documenta-tion, (Ottawa, Ontario ; October 13-16, 1992) ,pages 211—219, October 1992 . ACM OrderNumber 613920 .

[36] S. R. Tilley, M. J . Whitney, H. A. Muller ,and M.-A . D . Storey. Personalized informatio nstructures . To appear in the Proceedings ofSIGDOC ' 93: The 11th Annual Internationa lConference on Systems Documentation, (Wa-terloo, Ontario ; October 5-8, 1993), October1993 .

[37] S. R. Tilley, H . A. Muller, M. J . Whitney,and K. Wong. Domain-retargetable revers eengineering . To appear in the Proceedings o fCSM '93: The 1993 International Conferenceon Software Maintenance, (Montreal, Quebec ;September 27-30, 1993), April 1993 .

[38] H. A. Muller, J . R . Mohr, and J . G. McDaniel .Applying software re-engineering techniques t o

225























https://www.researchgate.net/publication/238743655_Program_transformation_the_key_to_automating_software_maintenance_and_re-engineering_ieee?el=1_x_8&enrichId=rgreq-54f605afe8c8676d1e7d2a869e9b33ae-XXX&enrichSource=Y292ZXJQYWdlOzIyMTUwMDk3MDtBUzo5NzAzNzc2MDU5ODAyNEAxNDAwMTQ3MDEzNzE0























health information systems . In T. Timmersand B. Blums, editors, Software Engineerin gin Medical Informatics, pages 91-110 . Elsevie rNorth Holland, 1991 .

[39] S. R. Tilley. Management decision supportthrough reverse engineering technology . InProceedings of CASCON '92, (Toronto, On-tario ; November 9-11, 1992), pages 319-328 ,November, 1992 .

[40] S . R. Tilley and H . A. Muller . Using vir-tual subsystems in project management . InCASE '93: The Sixth International Confer-ence on Computer-Aided Software Engineer-ing, (Institute of Systems Science, NationalUniversity of Singapore, Singapore; July 19-23, 1993), pages 144-153, July 1993 . IEEEComputer Society Press (Order Number 3480-02) .

226










Understanding Software Systems Using Reverse Engineering Technology

Documents

Transcript of Understanding Software Systems Using Reverse Engineering Technology