FACTOID # 155: Australia has more than 28 times the land area of New Zealand, but its coastline is not even twice as long.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Compiler theory
A diagram of the operation of an ideal compiler.

A compiler is a computer program that translates a computer program written in one computer language (called the source language) into an equivalent program written in another computer language (called the output, object, or target language). Download high resolution version (582x842, 93 KB) Wikipedia does not have an article with this exact name. ... Download high resolution version (582x842, 93 KB) Wikipedia does not have an article with this exact name. ... A computer program (often simply called a program) is an example of computer software that prescribes the actions (computations) that are to be carried out by a computer. ... A computer language is a language used by, or in association with, computers. ...

Contents

Introduction and history

Most compilers translate source code written in a high level language to object code or machine language that may be directly executed by a computer or a virtual machine. However, translation from a low level language to a high level one is also possible; this is normally known as a decompiler if it is reconstructing a high level language program which (could have) generated the low level language program. Compilers also exist which translate from one high level language to another (cross compilers), or sometimes to an intermediate language that still needs further processing; these are sometimes known as cascaders. Source code (commonly just source or code) is any series of statements written in some human-readable computer programming language. ... A high-level programming language is a programming language that is more user-friendly, to some extent platform-independent, and abstract from low-level computer processor operations such as memory accesses. ... In computer science, object file or object code is an intermediate representation of code generated by a compiler after it processes a source code file. ... A system of codes directly understandable by a computers CPU is termed this CPUs native or machine language. ... In general terms, a virtual machine in computer science is software that creates an environment between the computer platform and the end user in which the end user can operate software. ... A decompiler is a computer program that translates executable programs (the output from a compiler) into a high level language ( source code). ... In computing a cascader is a type of compiler in which the target language is not directly machine executable, but needs further processing, normally by a second compiler. ...


Typical compilers output so-called objects that basically contain machine code augmented by information about the name and location of entry points and external calls (to functions not contained in the object). A set of object files, which need not have all come from a single compiler provided that the compilers used share a common output format, may then be linked together to create the final executable which can be run directly by a user. In computer science, object file or object code is an intermediate representation of code generated by a compiler after it processes a source code file. ... A system of codes directly understandable by a computers CPU is termed this CPUs native or machine language. ... A linker or link editor is a program that takes one or more objects generated by compilers and assembles them into a single executable program. ...


Several experimental compilers were developed in the 1950s, but the FORTRAN team led by John Backus at IBM is generally credited as having introduced the first complete compiler, in 1957. COBOL was an early language to be compiled on multiple architectures, in 1960. [1] (http://www.interesting-people.org/archives/interesting-people/199706/msg00011.html) Events and trends Technology United States tests the first fusion bomb. ... Fortran (also FORTRAN) is a statically typed, compiled, programming language originally developed in the 1950s and still heavily used for scientific computing and numerical computation half a century later. ... John Backus (born December 3, 1924) is an American computer scientist, notable as the inventor of the first high-level programming language (FORTRAN), the Backus-Naur form (BNF, the almost universally used notation to define formal language syntax), and the concept of Function-level programming. ... International Business Machines Corporation (IBM, or colloquially, Big Blue) (NYSE: IBM) (incorporated June 15, 1911, in operation since 1888) is headquartered in Armonk, New York, USA. The company manufactures and sells computer hardware, software, and services. ... 1957 was a common year starting on Tuesday (link will take you to calendar). ... COBOL is a third-generation programming language. ... 1960 was a leap year starting on Friday (link will take you to calendar). ...


The idea of compilation quickly caught on, and most of the principles of compiler design were developed during the 1960s. Centuries: 19th century - 20th century - 21st century Decades: 1900s 1910s 1920s 1930s 1940s 1950s - 1960s - 1970s 1980s 1990s 2000s 2010s Years: 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 Events and trends The 1960s was a turbulent decade of change around the world. ...


A compiler is itself a computer program written in some implementation language. Early compilers were written in assembly language. The first self-hosting compiler -- capable of compiling its own source code in a high-level language -- was created for Lisp by Hart and Levin at MIT in 1962. [2] (http://www.ai.mit.edu/research/publications/browse/0000browse.shtml) The use of high-level languages for writing compilers gained added impetus in the early 1970s when Pascal and C compilers were written in their own languages. Building a self-hosting compiler is a bootstrapping problem -- the first such compiler for a language must be compiled either by a compiler written in a different language, or (as in Hart and Levin's Lisp compiler) compiled by running the compiler in an interpreter. This article is about a computing term. ... Lisp is a family of functional programming languages with a long history. ... MIT redirects here. ... 1962 was a common year starting on Monday (link will take you to calendar). ... Events and trends Although in the United States and in many other Western societies the 1970s are often seen as a period of transition between the turbulent 1960s and the more conservative 1980s and 1990s, many of the trends that are associated widely with the Sixties, from the Sexual Revolution... Bootstrapping alludes to a German legend about a Baron Münchhausen, who was able to lift himself out of a swamp by pulling himself up by his own hair. ...


During the 1980s and 1990s a large number of free compilers and compiler development tools were developed for all kinds of languages, both as part of the GNU project and other open-source initiatives. Some of them are considered to be of high quality and their free source code makes a nice read for anyone interested in modern compiler concepts. Events and trends The 1980s marked an abrupt shift towards more conservative lifestyles after the momentous cultural revolutions which took place in the 1960s and 1970s and the definition of the AIDS virus in 1981. ... Events and trends Technology Explosive growth of the Internet; decrease in the cost of computers and other technology Reduction in size and cost of mobile phones leads to a massive surge in their popularity Year 2000 problem (commonly known as Y2K) Microsoft Windows operating system becomes virtually ubiquitous on IBM... A compiler-compiler or parser generator is a utility for generating the source code of a parser, interpreter or compiler from an annotated language description in the form of a grammar (usually in BNF) plus code that is associated with each of the rules of the grammar that should be... The GNU Compiler Collection (usually shortened to GCC) is a set of programming language compilers produced by the GNU Project. ... Open source refers to a basis case where sources of information, code, pictures, maps, authors, anything likewise, and everything related are all publicly viewable and openly modifiable. ...


Types of compilers

A compiler may produce code intended to run on the same type of computer and operating system ("platform") as the compiler itself runs on. This is sometimes called a native-code compiler. Alternatively, it might produce code designed to run on a different platform. This is known as a cross compiler. Cross compilers are very useful when bringing up a new hardware platform for the first time (see bootstrapping). A "source to source compiler" is a type of compiler that takes a high level language as its input and outputs a high level language. For example, an automatic parallelizing compiler will frequently take in a high level language program as an input and then transform the code and annotate it with parallel code annotations (e.g. OpenMP) or language constructs (e.g. Fortran's DOALL statements). The word platform is used in several different contexts including various topics: In rail transport, a railway platform is an area at a train station to alight from/embark on trains or trams. ... A cross compiler is a compiler capable of creating executable code for another platform than the one on which the cross compiler is run. ... Bootstrapping alludes to a German legend about a Baron Münchhausen, who was able to lift himself out of a swamp by pulling himself up by his own hair. ... The OpenMP Application Program Interface (API) supports multi-platform shared memory multiprocessing programming for in C/C++ and Fortran on many architectures, including Unix platforms and Windows platforms. ...

  • One-pass compiler, like early compilers for Pascal
    • The compilation is done in one pass, hence it is very fast.
  • Threaded code compiler (or interpreter), like most implementations of FORTH
    • This kind of compiler can be thought of as a database lookup program. It just replaces given strings in the source with given binary code. The level of this binary code can vary; in fact, some FORTH compilers can compile programs that don't even need an operating system.
  • Incremental compiler, like many Lisp systems
    • Individual functions can be compiled in a run-time environment that also includes interpreted functions. Incremental compilation dates back to 1962 and the first Lisp compiler, and is still used in Common Lisp systems.
  • Stage compiler that compiles to assembly language of a theoretical machine, like some Prolog implementations
    • This Prolog machine is also known as the Warren abstract machine (or WAM). Byte-code compilers for Java, Python (and many more) are also a subtype of this.
  • Just-in-time compiler, used by Smalltalk and Java systems
    • Applications are delivered in bytecode, which is compiled to native machine code just prior to execution.
  • A retargetable compiler is a compiler that can relatively easily be modified to generate code for different CPU architectures. The object code produced by these is frequently of lesser quality than that produced by a compiler developed specifically for a processor. Retargetable compilers are often also cross compilers. GCC is an example of a retargetable compiler.
  • A parallelizing compiler converts a serial input program into a form suitable for efficient execution on a parallel computer architecture.

Pascal is one of the landmark computer programming languages on which generations of students cut their teeth and variants of which are still widely used today. ... For multi-threaded programming, see Thread (computer science) In computer science, the term threaded code refers to an implementation technique for programming languages that produces very compact code at the expense of some execution speed. ... Forth is a programming language and programming environment. ... A compiler is a computer program that translates a computer program written in one computer language (called the source language) into an equivalent program written in another computer language (called the output or the target language). ... Common Lisp, commonly abbreviated CL (not to be confused with Combinatory logic which is also abbreviated CL), is a dialect of Lisp, standardised by ANSI X3. ... See also Just in time for the business technique In computing, just-in-time compilation (JIT), also known as dynamic translation, is a technique for improving the performance of interpreted programs. ... Byte-code is a sort of intermediate code that is more abstract than machine code. ... A retargetable compiler is a compiler that can relatively easily be modified to generate code for different CPU architectures. ... The central processing unit (CPU) is the part of a computer that interprets and carries out the instructions contained in the software. ... The GNU Compiler Collection (usually shortened to GCC) is a set of programming language compilers produced by the GNU Project. ... Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain faster results. ...

Compiled vs. interpreted languages

Many people divide higher-level programming languages into compiled languages and interpreted languages. However, there is rarely anything about a language that requires it to be compiled or interpreted. Compilers and interpreters are implementations of languages, not languages themselves. The categorization usually reflects the most popular or widespread implementations of a language -- for instance, BASIC is thought of as an interpreted language, and C a compiled one, despite the existence of BASIC compilers and C interpreters. There are exceptions, however; some language specifications assume the use of a compiler (as with C), or spell out that implementations must include a compilation facility (as with Common Lisp); on the other hand, some languages have features that are very easy to implement in an interpreter, but make writing a compiler much harder (one such example is the capability of executing arbitrary source code contained in a run-time supplied string). A compiled language is a vague term referring to a language whose implementations are typically compilers (translators which generate machine code from source code), and not interpreters (step-by-step executors of source code, where no translation takes place). ... In computer programming, an interpreted language is a programming language whose programs may be executed from source form, by an interpreter. ...


Compiler design

In the past, compilers were divided into many passes1 to save space. A pass in this context is a run of the compiler through the source code of the program to be compiled, resulting in the building up of the internal data of the compiler (such as the evolving symbol table and other assisting data). When each pass is finished, the compiler can free the internal data space needed during that pass. This 'multipass' method of compiling was the common compiler technology at the time, but was also due to the small main memories of host computers relative to the source code and data. A diagram of the operation of an ideal compiler. ...


Many modern compilers share a common 'two stage' design. The front end translates the source language into an intermediate representation. The second stage is the back end, which works with the internal representation to produce code in the output language. The front end and back end may operate as separate passes, or the front end may call the back end as a subroutine, passing it the intermediate representation. In their most general meanings, the terms front end and back end refer to the initial and the end stages of a process flow. ... In their most general meanings, the terms front end and back end refer to the initial and the end stages of a process flow. ... In mathematics, a group representation is a way of viewing a group in some more concrete way. ... In computer science, a subroutine (function, procedure, or subprogram) is a sequence of code which performs a specific task, as part of a larger program, and is grouped as one, or more, statement blocks; such code is sometimes collected into software libraries. ...


This approach mitigates complexity separating the concerns of the front end, which typically revolve around language semantics, error checking, and the like, from the concerns of the back end, which concentrates on producing output that is both efficient and correct. It also has the advantage of allowing the use of a single back end for multiple source languages, and similarly allows the use of different back ends for different targets.


Often, optimizers and error checkers can be shared by both front ends and back ends if they are designed to operate on the intermediate language that a front-end passes to a back end. This can let many compilers (combinations of front and back ends) reuse the large amounts of work that often go into code analyzers and optimizers.


Certain languages, due to the design of the language and certain rules placed on the declaration of variables and other objects used, and the predeclaration of executable procedures prior to reference or use, are capable of being compiled in a single pass. The Pascal programming language is well known for this capability, and in fact many Pascal compilers are themselves written in the Pascal language because of the rigid specification of the language and the capability to use a single pass to compile Pascal language programs. Pascal is one of the landmark computer programming languages on which generations of students cut their teeth and variants of which are still widely used today. ...


Compiler front end

The compiler front end consists of multiple phases itself, each informed by formal language theory: In mathematics, logic and computer science, a formal language is a set of finite-length words (i. ...

  1. Lexical analysis - breaking the source code text into small pieces ('tokens' or 'terminals'), each representing a single atomic unit of the language, for instance a keyword, identifier or symbol names. The token language is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it. This phase is also called lexing or scanning.
  2. Syntax analysis - Identifying syntactic structures of source code. It only focuses on the structure. In other words, it identifies the order of tokens and understand hierarchical structures in code. This phase is also called parsing.
  3. Semantic analysis is to recognize the meaning of program code and start to prepare for output. In that phase, type checking is done and most of compiler errors show up.
  4. Intermediate language generation - an equivalent to the original program is created in an intermediate language.

Lexical analysis is the process of taking an input string of characters (such as the source code of a computer program) and producing a sequence of symbols called lexical tokens, or just tokens, which may be handled more easily by a parser. ... In computer science, a keyword is an identifier which indicates a specific command. ... Identifiers (IDs) are used in computer science, data processing, and general telecommunications; the concept is analogous to that of a name. Computer Science In computer science, an identifier is a string of bits (or characters) which name an entity, such as a program, device, or system; in order that other... A symbol, in its basic sense, is a representational token for a concept or quantity; i. ... A regular language is a formal language (i. ... In the theory of computation, a finite state machine (FSM) or finite state automaton (FSA) is an abstract machine that has only a finite, constant amount of memory. ... A regular expression (abbreviated as regexp, regex or regxp) is a string that describes or matches a set of strings, according to certain syntax rules. ... Syntax analysis is a process on compilers that recognizes the structure of programming languages. ... In computer science, semantic analysis is a pass by a compiler that adds semantical information to the parse tree and performs certain checks based on this information. ... In computer science, the code generation is a compilation stage that outputs machine code in the target language. ...

Compiler back end

While there are applications where only the compiler front end is necessary, such as static language verification tools, a real compiler hands the intermediate representation generated by the front end to the back end, which produces a functional equivalent program in the output language. This is done in multiple steps:

  1. Compiler Analysis - This is the process to gather program information from the intermediate representation of the input source files. Typical analysis are variable define-use and use-define chain, data dependence analysis, alias analysis etc. Accurate analysis is the base for any compiler optimizations. The call graph and control flow graph are usually also built during the analysis phase.
  2. Optimization - the intermediate language representation is transformed into functionally equivalent but faster (or smaller) forms. Popular optimizations are inline expansion, dead code elimination, constant propagation, loop transformation, register allocation or even auto parallelization.
  3. Code generation - the transformed intermediate language is translated into the output language, usually the native machine language of the system. This involves resource and storage decisions, such as deciding which variables to fit into registers and memory and the selection and scheduling of appropriate machine instructions along with their associated addressing modes (see also Sethi-Ullman algorithm).

Use-define chains model the relation among definitions of variables and their uses in a sequence of assignments within computer programming. ... A control flow graph (CFG) is an abstract data structure used in compilers. ... Compiler optimization techniques are optimization techniques that have been programmed into a compiler. ... In-line expansion or inlining for short is a compiler optimization which expands a function call site into the actual implementation of the function which is called, rather than each call transferring control to a common piece of code. ... Dead code elimination is a technique used in computer science to reduce program size by removing code which can never be executed. ... In computer science constant propagation (cprop) is an optimization performed by compilers. ... In compiler optimization, register allocation is the process of multiplexing a large number of target program variables onto a small number of CPU registers. ... In computer science, code generation is the process by which a compiler converts a syntactically_correct program into a series of instructions that could be executed by a machine. ... A system of codes directly understandable by a computers CPU is termed this CPUs native or machine language. ... Historically, a register was a sign or chalkboard onto which people would write cash transactions for later bookkeeping, often with chalk. ... The terms storage ( U.K.) or memory ( U.S.) refer to the parts of a digital computer that retain physical state ( data) for some interval of time, possibly even after electrical power to the computer is turned off. ... A system of codes directly understandable by a computers CPU is termed this CPUs native or machine language. ... In computer programming, addressing modes are primarily of interest to compiler writers and to those (few nowadays) who use assembly language. ... When generating code for arithmetic expressions, the compiler has to decide which is the best way to translate the expression in terms of number of instructions used as well as number of registers needed to evaluate a certain subtree (especially if free registers are scarce). ...

Notes

  1. A pass has also been known as a parse in some textbooks. The idea is that the source code is parsed by gradual, iterative refinement to produce the completely translated object code at the end of the process. There is, however, some dispute over the general use of parse for all those phases (passes), since some of them, e.g. object code generation, are arguably not regarded to be parsing as such.

References

  • Understanding and Writing Compilers: A Do It Yourself Guide (ISBN 0333217322) by Richard Bornat is an unusually helpful book, being one of the few that adequately explains the recursive generation of machine instructions from a parse-tree. Having learnt his subject in the early days of mainframes and minicomputers, the author has many useful insights that more recent books often fail to convey.

The book cover showing a knight and dragon Compilers: Principles, Techniques and Tools is a famous computer science textbook by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman about compiler construction. ... Dr. Alfred V. Aho is a computer scientist. ... Ravi Sethi is a computer scientist retired from Bell Labs and now president of Avaya Labs. ... Jeffrey D. Ullman (born November 22, 1942) is a renowned computer scientist. ...

See also

This is a list of important publications in computer science, organized by field. ... An alternate rewrite has been has been proposed. ... This article is about a computing term. ... An interpreter is a computer program that executes other programs. ... Abstract interpretation is a theory of sound approximation of the semantics of computer programs, based on monotonic functions over ordered sets, especially execution of a computer program which gains information about its semantics (e. ... A linker or link editor is a program that takes one or more objects generated by compilers and assembles them into a single executable program. ... In grammar and linguistics, parsing is the process by which a person makes sense of a sentence, usually by breaking it down into words or phrases. ... A compiler parses input from a programming language to assembly language or an internal representation by matching the incoming symbols to Backus-Naur form production rules. ... This article needs cleanup. ... In computer science, semantic analysis is a pass by a compiler that adds semantical information to the parse tree and performs certain checks based on this information. ... When constructing a language translation tool, such as a compiler, an attribute grammar is the formal expression of the syntax-derived semantic checks associated with a grammar. ... A semantics encoding is a translation between formal languages. ... In computer science, an error avalanche is the mechanism by which a small problem can become compounded until it eventually becomes a very large one. ... A decompiler is a computer program that translates executable programs (the output from a compiler) into a high level language ( source code). ... See also Just in time for the business technique In computing, just-in-time compilation (JIT), also known as dynamic translation, is a technique for improving the performance of interpreted programs. ... Loop nest optimization (LNO) is a compiler optimization that makes possible large reductions in the cache bandwidth necessary for some pervasive algorithms. ... A preprocessor is a program that takes text and performs lexical conversions on it. ... Parallel Compilers Parallel compilers are used to speed up the process of compilation on Multi-processor machines. ...

External links

  • What is "compile"? (http://codepedia.com/compile) from the developer's encyclopedia
  • Building and Testing gcc/glibc cross toolchains (http://www.kegel.com/crosstool/)
  • Citations from CiteSeer (http://citeseer.org/cs?q=compiler)
  • The comp.compilers newsgroup and RSS feed (http://compilers.iecc.com/index.html)
  • Let's Build a Compiler by Jack Crenshaw (1988 to 1995) (http://compilers.iecc.com/crenshaw/) "a non-technical introduction to compiler construction"
  • Simple compiler source (http://www.gtoal.com/software/CompilersOneOhOne) from the "Compilers 101 (http://groups.yahoo.com/group/compilers101/)" group. One page, easy to follow.
  • Parallel Compilers (http://www.tutorial-reports.com/computer-science/parallel-compiler/)

  Results from FactBites:
 
Compiler - Wikipedia, the free encyclopedia (3021 words)
Building a self-hosting compiler is a bootstrapping problem -- the first such compiler for a language must be compiled either by a compiler written in a different language, or (as in Hart and Levin's Lisp compiler) compiled by running the compiler in an interpreter.
Compiler construction and compiler optimization are taught at universities as part of the computer science curriculum.
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one pass compilers are generally faster than multi-pass compilers.
CS 447 / CS 545 - Compiler Theory: (277 words)
However, the ideas and techniques of compiler theory apply to many problems in software engineering and in other areas.
For example, the string matching techniques for building the lexical analyzer of a compiler are also used by text editors or pattern matching programs.
The techniques from graph theory for performing code optimization, such as coloring, are used to organize schedules.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.