TITLE Symbolic Interpretation of Legacy Assembly Language

更新时间:2023-04-30 11:02:01 阅读量: 综合文库 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

Symbolic Interpretation of Legacy Assembly Language

By

Pulak Kumar Chowdhury,BSc.Engg.

A Thesis

Submitted to the School of Graduate Studies in Partial Ful?lment of the Requirements for the Degree of

Master of Applied Science

Department of Computing and Software

McMaster University

c Copyright by Pulak Kumar Chowdhury,August18,2005

ii

Abstract

Many industries have legacy software systems which are de?nitely important to them but are however,di?cult to maintain due to a lack of understanding of those systems. This occurs as a result of inadequate or inconsistent documentation.Although the costs of redesigning the system may be large,some organizations still plan to reverse engineer the software speci?cation documents from the code to alleviate a large burden from such endeavour.This thesis provides an incremental and modular approach to create a process and tools to extract the semantics of legacy assembly code.

Our techniques consist of static analysis and symbolic interpretation in order to reverse engineer the semantics of legacy software.We examine the case of IBM-1800programs in detail.From the abstract model of the operational semantics of IBM-1800,we simultaneously obtain an emulator and a symbolic analysis process. Augmented with control?ow information,we can use the symbolic analysis to provide complete semantics for the code sequences of interest.We can also generate Data Flow Graphs to depict the?ow of data in those code segments.The whole process of extracting semantic information from the assembler codes is fully automated with only a little human intervention at the initial step.

We use Haskell as our implementation language and its important features help us to create modular and well structured software.The literate programming docu-mentation style in this thesis increases the readability and consistency of the imple-mentation’s documentation.

The process and the associated tools created in this thesis are used in a large re-verse engineering project,which has a goal to extract requirements speci?cation from legacy assembly code.This project is funded jointly by Ontario Power Generation (OPG)and CITO(Communications and Information Technology Ontario).

iii

Acknowledgements

This thesis would not have been possible without the support of many people.Many thanks to my supervisor,Jacques Carette,who guided me through the whole research and read my numerous revisions to correct them.Also thanks to my committee members,Wolfram Kahl and Alan Wassyng who always o?ered guidance and support. Thanks to my group members whose valuable suggestions helped me a lot.A special thank you goes to my fellow student Olivier Dragon for proof reading and correcting important parts of my thesis.And?nally,thanks to my parents and numerous friends who endured this long process with me,always o?ering support and love.

iv

Contents

Abstract iii

Acknowledgements iv Contents v List of Figures xi

1Introduction1

1.1Overview (1)

1.2Thesis Organization (4)

2Problem De?nition6

2.1Background (6)

2.1.1Legacy Systems (7)

2.1.2Ontario Power Generation (9)

2.2Reverse Engineering Project (10)

2.2.1Overview (11)

2.2.2Tool Hierarchy (12)

2.3Semantic Analysis (13)

2.3.1IBM-1800Assembly Language (14)

3Tools and Techniques16

3.1Graphs (16)

3.1.1Control Flow Graph (17)

3.1.2Data Flow Graph (17)

v

vi CONTENTS

CONTENTS vii

viii CONTENTS

CONTENTS ix

x CONTENTS

List of Figures

2.1Tool Suite Architecture of the Reverse Engineering Project (12)

3.1Control Flow Graph (18)

3.2Data Flow Graph (19)

4.1The Steps of Symbolic Interpretation Process (26)

8.1Pictorial Representation of Code Categories (99)

8.2Shape of GSC (104)

8.3Shape of Looping Codes (108)

8.4Control Flow Graph of the Segment0x35C4-0x35DF (112)

8.5Control Flow Graph of the Segment0x35C9-0x35D3 (114)

9.1DFG Generation Process (118)

9.2Data Flow Graph of the Segment0x35B6-0x35BD(Before Garbage

Collection) (143)

9.3Data Flow Graph of the Segment0x35B6-0x35BD(After Garbage Col-

lection) (144)

9.4Data Flow Graph of the Segment0x35C4-0x35DF (145)

11.1Finding Preconditions (171)

xi

xii LIST OF FIGURES

Chapter1

Introduction

1.1Overview

Business organizations spend a large part of their e?orts and budget maintaining existing software,enhancing with new features and adapting it to newer environments. Studies show that the maintenance of existing software can cost often more than60 percent of all the development e?orts.Maintenance in the life cycle of software is inevitable for reasons like removal of errors,new requirements for the software or introduction of new platforms etc.Maintenance can be de?ned as the set of activities that occur after the software has been deployed[CG03].Development of new software from scratch when new requirements arise is grossly impractical as companies make large investments in developing existing software,creating infrastructure and organizational practices around the software,and in training users.Thus,existing software applications are assets to these organizations and as such are needed to be well maintained before being abandoned.

Nevertheless,since these systems were developed decades ago,they are usually written in older languages and use older software engineering methodologies.Legacy software is,henceforth di?cult to modify and maintain.Still,the need for change is obvious as these legacy systems are consuming too much maintenance budget and e?orts.Moreover these systems are becoming less e?cient compared to the systems developed on more sophisticated technology as available today.Most of the software engineering approaches focus mainly on forward engineering–that is,on the software

1

2 1.Introduction

1.Introduction3

4 1.Introduction

1.Introduction5

Chapter2

Problem De?nition

In this chapter,an overview of the problem that we deal with in the thesis is given. First,we provide the background of the problem with the context of legacy systems and a speci?c instance of legacy systems in Ontario Power Generation(OPG).Next, we include a brief introduction of the reverse engineering project(of which this thesis is a part)and the hierarchical structure of the project 731d073131126edb6f1a101dter,we present a brief description of the subject matter of this thesis.

2.1Background

For the last20years,computer technologies are booming like never before.New technologies are being introduced very frequently.Often,software system developed in one technology may?nd itself ine?cient within a short span of time due to intro-duction of newer e?cient technologies.Constant technological advance often weakens the business value of the systems which have been developed over the years through huge investments.Another important thing to note is that advancement in hardware technologies is much more faster than that of software.For this reason,many soft-ware systems can not take the bene?ts of newer hardware as they are implemented to take full advantage of the hardware architecture they are written for.Although more cost-e?ective technologies are available,it is estimated that most of the IT systems are running on legacy platforms.Maintaining and upgrading those systems are some of the most di?cult challenges today.It is worthy to change those systems into newer

6

2.Problem De?nition7

8 2.Problem De?nition

2.Problem De?nition9

本文来源:https://www.bwwdw.com/article/xuye.html

Top