Abstract:
Generic language technology and compiler construction techniques are a prerequisite to build analysis and conversion tools that are needed for the re-engineering of large software systems. We argue that generic language technology is a crucial means to do fundamental re-engineering. Furthermore, we address the issue that the application of compiler construction techniques in re-engineering generates new research questions in the field of compiler construction.
1 Introduction
In 1977, Mathew Hecht wrote in his book [Hec77] on flow analysis of computer programs ``Flow analysis can be used to derive information of use to human beings about a computer program", in fact he was referring to what we nowadays call program understanding or reverse engineering. He further motivated the use of flow analysis by stating that ``some automatic program restructuring may be possible" and that ``perhaps remodularization could be accomodated", techniques that are relevant to restructure and remodularize legacy systems. So, it comes hardly as a surprise that we will argue here that classical compiler construction techniques are extremely useful to aid in re-engineering.
Re-engineering is becoming more and more important. There is a constant need for updating and renovating business-critical software systems for many and diverse reasons: business requirements change, technological infrastructure is modernized, governments change laws, or the third millennium approaches, to mention a few. So, in the area of software engineering the subjects of program understanding and system renovation become more and more important. The interest in such subjects originates from the difficulties that one encounters when attempting to maintain large, old, software systems. It is not hard to understand that it is very difficult--if not impossible--to renovate such legacy systems.
The purpose of this paper is to show that a substantial part of the technology used in re-engineering often originates from these fields. We want to make researchers in the field of compiler construction and generic language technology aware of the application of their techniques in the field of re-engineering. Furthermore, we will identify topics for further research that are particularly relevant for re-engineering.
In [BKV96b] generic language technology is used as a core technology for re-engineering. For more information on the subject of re-engineering we refer to the annotated bibliographies [Arn93] and [BKV96a].
2 Reverse Engineering and System Renovation Terminology
The term reverse engineering finds its origins in hardware technology and denotes the process of obtaining the specification of complex hardware systems. Now the meaning of this notion has shifted to software. As far as we know there is not (yet) a standard definition of what reverse engineering is but in [CC90] we can read:
``Reverse engineering is the process of analyzing a subject system to identify the system's components and their inter-relationships, and to create representations of the system in another form at higher levels of abstraction.''
According to [CC90] the following six terms characterize system renovation:
Forward engineering.
Reverse engineering.
Redocumentation.
Design recovery.
Restructuring.
Re-engineering (or renovation).
Forward engineering moves from a high-level abstraction and design to a low-level implementation. Reverse engineering can be seen as the inverse process. It can be characterized as analysing a software system in order to, firstly, identify the system components and their interactions, and to, secondly, make representations of the system on a different, possible higher, level of abstraction. This can be seen as a form of decompilation. It may be necessary to move even from assembler (or from the executables) level to a higher level.
Reverse engineering restricts itself to investigating a system. Adaptation of a system is beyond reverse engineering but within the scope of system renovation. Redocumentation focuses on making a semantically equivalent description at the same level of abstraction. It is in fact a simple form of reverse engineering. Tools for redocumentation include, among others, pretty printers, diagram generators, and cross-reference listing generators. In design recovery domain knowledge and external information is used to make an equivalent description of a system at a higher level of abstraction. So, more information than the source code of the system is used. The notion restructuring amounts to transforming a system from one representation to another one at the same level of abstraction. An essential aspect of restructuring is that the semantic behaviour of the original system and the new one should remain the same; no modifications of the functionality is involved. The purpose of re-engineering or renovation is to study the system, by making a specification at a higher abstraction level, adding new functionality to this specification and develop a completely new system on the basis of the original one by using forward engineering techniques.
3 Specific Languages and Re-engineering
In this section we will give the reader an impression on the relation between specific programming languages and re-engineering.
Since many business-critial systems that need re-engineering are written in COBOL there are quite a number of papers available that discuss methods and techniques that focus on COBOL. For instance, in [GBW95] the re-engineering of 50,000 lines of COBOL to Ada is described. The goal was to do it as automatically as possible using compiler construction techniques. An example of the use of program slicing to aid in re-engineering COBOL can be found in [NEK93] and [CFV93]. In [NM93] Software Refinery, a tool originally developed primarily to generate programming environments, is used to build a modularization tool for COBOL. In [Zuy93] aspects of re-engineering and their relation to language technology are discussed. We mention the compilation of COBOL code to equational specifications, their restructuring and simplification, and regeneration of COBOL code from them. Moreover COBOL is compiled to an intermediate language supporting both all the features of COBOL as well as those of JCL. Various tools support these compilations. In [BFK 94] and [BFKO93] denotational semantics is advocated as a formal foundation for understanding COBOL programs. These ideas are implemented in a tool for the reverse engineering of COBOL-74 programs.
Not only COBOL is subject to reverse engineering. We mention [OC93] in which a tool combining static and dynamic information for analyzing C programs is described. In [CCC93] a method is described to produce design level documents by static analysis of Ada programs using data flow analysis. Finally, in [Byr91] we can find a method to convert Fortran programs into Ada code. This is done via analysis of the Fortran code followed by a reimplemention of the extracted design in Ada. Needless to say that in the above cases generic language technology and compiler construction techniques play an important role.
4 Compiler Construction Techniques and Re-engineering
Many re-engineering tools use compiler construction techniques. When constructing a compiler these techniques are used to go from a high-level language to a low-level implementation. When re-engineering a legacy system those techniques are used to move from a low-level implementation to a more abstract level. In compiler const