FACTOID # 42: English speaking kids are the world's biggest novel readers - but the least enthusiastic comic readers.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
 
 
WHAT'S NEW
RELATED ARTICLES
People who viewed "Decompiler" also viewed:
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Decompiler

A decompiler is the name given to a computer program that performs the reverse operation to that of a compiler. That is, it translates a file containing information at a relatively low level of abstraction (usually designed to be computer readable rather than human readable) in to a form having a higher level of abstraction (usually designed to be human readable). A computer program is a collection of instructions that describe a task, or set of tasks, to be carried out by a computer. ... A diagram of the operation of a typical multi-language, multi-target compiler. ...

Contents

Introduction

The term "decompiler" is most commonly applied to a program which translates executable programs (the output from a compiler) into source code in a (relatively) high level language (which when compiled will produce an executable whose behavior is the same as the original executable program). By comparison, a disassembler translates an executable program into assembly language (an assembler could be used to assemble it back into an executable program). A diagram of the operation of a typical multi-language, multi-target compiler. ... Source code (commonly just source or code) is any series of statements written in some human-readable computer programming language. ... A high-level programming language is a programming language that is more user-friendly, to some extent platform-independent, and abstract from low-level computer processor operations such as memory accesses. ... It has been suggested that this article or section be merged with Disassembly. ... See the terminology section, below, regarding inconsistent use of the terms assembly and assembler. ...


Decompilation is the act of using a decompiler, although the term can also refer to the decompiled output. It can be used for the recovery of lost source code, and is also useful in some cases for computer security, interoperability and error correction.[1] The success of decompilation depends on the amount of information present in the code being decompiled and the sophistication of the analysis performed on it. The bytecode formats used by many virtual machines (such the Java Virtual Machine) often include extensive metadata and high-level features that make decompilation quite feasible. Machine language has typically much less metadata, and is therefore much harder to decompile. This article describes how security can be achieved through design and engineering. ... Interoperability is connecting people, data and diverse systems. ... In computer science and information theory, error correction consists of using methods to detect and/or correct errors in the transmission or storage of data by the use of some amount of redundant data and (in the case of transmission) the selective retransmission of incorrect segments of the data. ... A Java Virtual Machine (JVM), originally developed by Sun Microsystems, is a virtual machine that executes Java bytecode. ... The simplest definition of metadata is that it is data about data. ... A system of codes directly understandable by a computers CPU is termed this CPUs native or machine language. ...


Some compilers and post compilation tools produce obfuscated code (that is, attempt to produce output that is very difficult to decompile). This is done to make it more difficult to reverse engineer the executable. Obfuscate redirects here; for the Discipline from the Vampire: The Masquerade/World of Darkness fictional setting please see Discipline (World of Darkness)#Obfuscate. ... Reverse engineering (RE) is the process of taking something (a device, an electrical component, a software program, etc. ...


Phases

Decompilers can be thought of as composed of a series of phases each of which contributes specific aspects of the overall decompilation process.


Loader

The first decompilation phase is the loader, which parses the input machine code program's binary file format. The loader should be able to discover basic facts about the input program, such as the architecture (Pentium, PowerPC, etc), and the entry point. In many cases, it should be able to find the equivalent of the main function of a C program, which is the start of the user written code. This excludes the runtime initialisation code, which should not be decompiled if possible.


Disassembly

The next logical phase is the disassembly of machine code instructions into a machine independent intermediate representation (IR). For example, the Pentium machine instruction

 mov eax, [ebx+0x04] 

might be translated to the IR

 eax := m[ebx+4]; 

Idioms

Idiomatic machine code sequences are sequences of code whose combined semantics is not immediately apparent from the instructions' individual semantics. Either as part of the disassembly phase, or as part of later analyses, these idiomatic sequences need to be translated into known equivalent IR. For example, the x86 assembly code: Wikibooks has more about this subject: Programming:x86 assembly x86 assembly language is the assembly language for the x86 class of processors, which includes Intels Pentium series and AMDs Athlon series. ...

 cdq eax ; edx is set to the sign-extension of eax xor eax, edx sub eax, edx 

could be translated to

 eax := abs(eax); 

Some idiomatic sequences are machine independent; some involve only one instruction. For example, xor eax, eax clears the eax register (sets it to zero). This can be implemented with a machine independent simplification rule, such as a xor a = 0.


In general, it is best to delay detection of idiomatic sequences if possible, to later stages that are less affected by instruction ordering. For example, the instruction scheduling phase of a compiler may insert other instructions into an idiomatic sequence, or change the ordering of instructions in the sequence. A pattern matching process in the disassembly phase would probably not recognize the altered pattern. Later phases group instruction expressions into more complex epressions, and modify them into a canonical (standardized) form, making it more likely that even the altered idiom will match a higher level pattern later in the decompilation.


Program analysis

Various program analyses can be applied to the IR. In particular, expression propagation combines the semantics of several instructions into more complex expressions. For example,

 mov eax,[ebx+0x04] add eax,[ebx+0x08] sub [ebx+0x0C],eax 

could result in the following IR after expression propagation:

 m[ebx+12] := m[ebx+12] - (m[ebx+4] + m[ebx+8]); 

The resulting expression is more like high level language, and has also eliminated the use of the machine register eax . Later analyses may eliminate the ebx register.


Type analysis

A good machine code decompiler will perform type analysis. Here, the way registers or memory locations are used result in constraints on the possible type of the location. For example, an and instruction implies that the operand is an integer; programs do not use such an operation on floating point values (except in special library code) or on pointers. An add instruction results in three constraints, since the operands may be both integer, or one integer and one pointer (with integer and pointer results respectively; the third constraint comes from the ordering of the two operands when the types are different).


Various high level expressions can be recognized which trigger recognition of structures or arrays. However, it is difficult to distinguish many of the possibilities, because of the freedom that machine code or even some high level languages such as C allow with casts and pointer arithmetic.


The example from the previous section could result in the following high level code:

 struct T1* ebx; struct T1 { int v0004; int v0008; int v000C; }; ebx->v000C -= ebx->v0004 + ebx->v0008; 

Structuring

The penultimate decompilation phase involves structuring of the IR into higher level constructs such as while loops and if/then/else conditional statements. For example, the machine code

 xor eax, eax l0002: or ebx, ebx jge l0003 add eax,[ebx] mov ebx,[ebx+0x4] jmp l0002 l0003: mov [0x10040000],eax 

could be translated into:

 eax = 0; while (ebx < 0) { eax += ebx->v0000; ebx = ebx->v0004; } v10040000 = eax; 

Unstructured code is more difficult to translate into structured code than already structured code. Solutions include replicating some code, or adding boolean variables. See chapter 6 of [2].


Code generation

The final phase is the generation of the high level code in the back end of the decompiler. Just as a compiler may have several back ends for generating machine code for different architectures, a decompiler may have several back ends for generating high level code in different high level languages.


Just before code generation, it may be desirable to allow an interactive editing of the IR, perhaps using some form of graphical user interface. This would allow the user to enter comments, and non-generic variable and function names. However, these are almost as easily entered in a post decompilation edit. The user may want to change structural aspects, such as converting a while loop to a for loop. These are less readily modified with a simple text editor, although source code refactoring tools may assist with this process. The user may need to enter information that failed to be identified during the type analysis phase, e.g. modifying a memory expression to an array or structure expression. Finally, incorrect IR may need to be corrected, or changes made to cause the output code to be more readable. A graphical user interface (GUI) is a type of user interface which allows people to interact with a computer and computer-controlled devices which employ graphical icons, visual indicators or special graphical elements called widgets, along with text labels or text navigation to represent the information and actions available to...


Legality

The majority of computer programs are covered by copyright laws. Although the precise scope of what is copied by copyright differs from region to region, copyright law generally provides the author (the programmer(s) or employer) with a collection of exclusive rights to the program. These rights include the right to make copies, including copies made into the computer's RAM. Since the decompilation process involves making multiple such copies, it is generally prohibited without the authorization of the copyright holder. However, because decompilation is often a necessary step in achieving software interoperability, copyright laws in both the United States and Europe permit decompilation to a limited extent. Copyright symbol Copyright is a set of exclusive rights regulating the use of a particular expression of an idea or information. ... Random access memory (usually known by its acronym, RAM) is a type of data storage used in computers. ... Interoperability is connecting people, data and diverse systems. ...


In the United States, the copyright fair use defense has been successfully invoked in decompilation cases. For example, in Sega v. Accolade, the court held that Accolade could lawfully engage in decompilation in order to circumvent the software locking mechanism used by Sega's game consoles [3] For fair use in trademark law, see Fair use (US trademark law). ...


In Europe, the 1991 Software Directive explicitly provides for a right to decompile in order to achieve interoperability. The result of a heated debate between, on the one side, software protectionists, and, on the other, academics as well as independent software developers, Article 6 permits decompilation only if a number of conditions are met:

  • First, the decompiler must have a license to use the program to be decompiled.
  • Second, decompilation must be necessary to achieve interoperability with the target program or other programs. Interoperability information may therefore not be readily available, such as through manuals or API documentation. This is an important limitation. The necessity must be proven by the decompiler. The purpose of this important limitation is primarily to provide an incentive for developers to document and disclose their products' interoperability information. See [4].
  • Third, the decompilation process must, if possible, be confined to the parts of the target program relevant to interoperability. Since one of the purposes of decompilation is to gain an understanding of the program structure, this third limitation may be difficult to meet. Again, the burden of proof is on the decompiler.

In addition, Article 6 prescribes that the information obtained through decompilation may not be used for other purposes and that it may not be given to others. A software license agreement is a memorandum of contract between a producer and a user of computer software which grants the user a software license. ... An application programming interface (API) is a source code interface that a computer system or program library provides to support requests for services to be made of it by a Length. ...


Overall, the decompilation right provided by Article 6 is interesting, as it codifies what is claimed to be common practice in the software industry. Few European lawsuits are known to have emerged from the decompilation right. This could be interpreted as meaning either one of two things: 1) the decompilation right is not used frequently and the decompilation right may therefore have been unnecessary, or 2) the decompilation right functions well and provides sufficient legal certainty not to give rise to legal disputes. In a recent report regarding implementation of the Software Directive by the European member states, the European Commission seems to support the second interpretation. In law, codification is the process of collecting and restating the law of a jurisdiction in certain areas, usually by subject, forming the legal code. ... The Commission seat in Brussels The European Commission (formally the Commission of the European Communities) is the executive body of the European Union. ...


References

  1. ^ "Why Decompilation"
  2. ^ C. Cifuentes. Reverse Compilation Techniques. PhD thesis, Queensland University of Technology, 1994. (available as compressed postscript)
  3. ^ The Legality of Decompilation
  4. ^ B. Czarnota and R.J. Hart, Legal protection of computer programs in Europe: a guide to the EC directive. 1991, London: Butterworths.

See also

Wikibooks
Wikibooks has more about this subject:
Look up decompiler in Wiktionary, the free dictionary.

Image File history File links Wikibooks-logo-en. ... Wikipedia does not have an article with this exact name. ... Wiktionary (a portmanteau of wiki and dictionary) is a multilingual, Web-based project to create a free content dictionary, available in over 150 languages. ... It has been suggested that this article or section be merged with Disassembly. ... A diagram of the operation of a typical multi-language, multi-target compiler. ... Figure of the linking process, where object files and static libraries are assembled into a new library or executable. ... An interpreter is a computer program that executes other programs. ... In computer science, abstract interpretation is a theory of sound approximation of the semantics of computer programs, based on monotonic functions over ordered sets, especially lattices. ... It has been suggested that this article or section be merged with obfuscated code. ... Obfuscate redirects here; for the Discipline from the Vampire: The Masquerade/World of Darkness fictional setting please see Discipline (World of Darkness)#Obfuscate. ... Reverse engineering (RE) is the process of taking something (a device, an electrical component, a software program, etc. ...

External links

Image File history File links Information. ...

General information

  • The DeCompilation Wiki discusses various aspects of decompilation: history, research, decompilers for machine code, Java, Visual Basic, and so on.
  • Legality of Decompilation, part of the above Wiki, discusses legal aspects of decompilation.
  • A detailed article on various aspects of decompilation, including how to decompile an executable by hand.

Java is a programming language originally developed by Sun Microsystems and released in 1995. ... Visual Basic (VB) is an event driven programming language and associated development environment from Microsoft for its COM programming model. ...

Decompilers

Java

  • jdec: java decompiler jdec is an open source java decompiler It has a number of useful features. Right now it is mainly hosted at sourceforge site. Apart from providing features like decompiling, disassembling providing detailed information of a java class file, it also supports jar decompilation and also comes with a swing UI.
  • Jad - the fast JAva Decompiler - Jad is a 100% pure C++ program and claims to be several times faster than decompilers written in Java. Since version 1.5.6 it's no longer free for commercial use, but is still free for non-commercial use. Several GUIs exists for Jad, e.g. Jadclipse, a plugin for Eclipse.

‹ The template below has been proposed for deletion. ... Eclipse is an open-source, platform-independent software framework, written primarily in Java, for delivering what the project calls rich-client applications, as opposed to thin client browser-based applications. ...

.NET

  • Dis# - .NET decompiler which allows you to edit local variables and other names in the decompiled code and keep the changes in a project file.
  • jsc - .NET decompiler which allows you to write in c#, but produce javascript, php or java instead.

Machine code


  Results from FactBites:
 
DJ Java Decompiler (0 words)
You can decompile or disassembler a CLASS files on your computer hard disk or on a network drive that you have a connection to (you must have a full access rights or just change the default output directory for.jad files).
DJ Java Decompiler enables users to save, print, edit and compile the generated java code (see Why is a DJ Java Decompiler useful).
Decompiling Java is an excellent way of learning both Java and how the Java VM works.
Decompilation of Binary Programs - dcc (1604 words)
The dcc decompiler was developed by Cristina Cifuentes while a PhD student at the Queensland University of Technology (QUT), Australia, 1991-4, under the supervision of Professor John Gough.
The universal decompiling machine is a language and machine independent module that analyzes the low-level intermediate code and transforms it into a high-level representation available in any high-level language, and analyzes the structure of the control flow graph(s) and transform them into graphs that make use of high-level control structures.
Decompilation is a process that involves the use of tools to load the binary program into memory, parse or disassemble such a program, and decompile or analyze the program to generate a high-level language program.
  More results at FactBites »


 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m