FACTOID # 64: Sri Lanka has lowest divorce rate in the world - and the highest rate of female suicide.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Compiler optimization

Compiler optimization is the process of tuning the output of a compiler to minimize some attribute (or maximize the efficiency) of an executable program. The most common requirement is to minimize the time taken to execute a program; a less common one is to minimize the amount of memory occupied, and the growth of portable computers has created a market for minimizing the power consumed by a program. This article is about the computing term. ... A computer program is a collection of instructions that describe a task, or set of tasks, to be carried out by a computer. ... To meet Wikipedias quality standards, this article or section may require cleanup. ... A Portable computer is a computer that is designed to be moved from one place to another (in other words, it is a computer that is portable). ...


It has been shown that some code optimization problems are NP-complete. In practice factors such as programmer willingness to wait for the compiler to complete its task place upper limits on the optimizations that a compiler implementor might provide (optimization is a very CPU and memory intensive process). In the past computer memory limitations were also a major factor in limiting which optimizations could be performed. In complexity theory, the NP-complete problems are the most difficult problems in NP, in the sense that they are the ones most likely not to be in P. The reason is that if you could find a way to solve an NP-complete problem quickly, then you could use... A programmer or software developer is someone who programs computers, that is, one who writes computer software. ... CPU can stand for: in computing: Central processing unit in journalism: Commonwealth Press Union in law enforcement: Crime prevention unit in software: Critical patch update, a type of software patch distributed by Oracle Corporation in Macleans College is often known as Ash Lim. ...


Compiler vendors often advertise their products as being optimizing compilers and the ability of a compiler to optimize code can affect its sales and its reputation among programmers.

Contents

Types of optimizations

Techniques in optimization can be broken up among various scopes which affect anything from a single statement to an entire program. Generally locally scoped techniques are easier to implement than global ones but result in lesser gains. Some examples of scopes include:

  • Peephole optimizations: Usually performed late in the compilation process after machine code has been generated. This form of optimization examines a few adjacent instructions (like looking through a peephole at the code) to see whether they can be replaced by a single instruction or a shorter sequence of instructions. For instance, a multiplication of a value by two might be more efficiently executed by shifting the value left or by adding the value to itself (this example is also an instance of strength reduction).
  • Local or intraprocedural optimizations: These only consider information local to a function definition. This reduces the amount of analysis that needs to be performed (saving time and reducing storage requirements) but means that worst case assumptions have to be made when function calls occur or global variables are accessed (because little information about them is available).
  • Interprocedural or whole-program optimization: These analyze all of a program's source code. The greater quantity of information extracted means that optimizations can be more effective compared to when they only have access to local information (i.e., within a single function). This kind of optimization can also allow new techniques to be performed. For instance function inlining, where a call to a function is replaced by a copy of the function body.
  • Loop optimizations: These act on the statements which make up a loop, such as a for loop (eg, loop-invariant code motion). Loop optimizations can have a significant impact because many programs spend a large percentage of their time inside loops.

In addition to scoped optimizations there are two further general categories of optimization: In compiler theory, peephole optimization is a kind of optimization performed over a very small set of instructions in a segment of generated code. ... Machine code or machine language is a system of instructions and data directly understandable by a computers central processing unit. ... Strength reduction is a software optimisation where a function of some systematically changing variable is calculated more efficiently by using previous values of the function. ... In-line expansion or inlining for short is a compiler optimization which expands a function call site into the actual implementation of the function which is called, rather than each call transferring control to a common piece of code. ... Most execution time of a scientific program is spent on loops. ... Loop-invariant code in an imperative programming language consists of statements which could be moved to before the loop (if the loop always terminates), or after the loop, without affecting the semantics of the program. ...

  • Programming language-independent vs. language-dependent: Most high-level languages share common programming constructs and abstractions — decision (if, switch, case), looping (for, while, repeat.. until, do.. while), encapsulation (structures, objects). Thus similar optimization techniques can be used across languages. However certain language features make some kinds of optimizations possible and/or difficult. For instance, the existence of pointers in C and C++ makes certain optimizations of array accesses difficult. Conversely, in some languages functions are not permitted to have "side effects". Therefore, if repeated calls to the same function with the same arguments are made, the compiler can immediately infer that results need only be computed once and the result referred to repeatedly.
  • Machine independent vs. machine dependent: Many optimizations that operate on abstract programming concepts (loops, objects, structures) are independent of the machine targeted by the compiler. But many of the most effective optimizations are those that best exploit special features of the target platform.

The following is an instance of a local machine dependent optimization. To set a register to 0, the obvious way is to use the constant 0 in an instruction that sets a register value to a constant. A less obvious way is to XOR a register with itself. It is up to the compiler to know which instruction variant to use. On many RISC machines, both instructions would be equally appropriate, since they would both be the same length and take the same time. On many other microprocessors such as the Intel x86 family, it turns out that the XOR variant is shorter and probably faster, as there will be no need to decode an immediate operand, nor use the internal "immediate operand register". (The catch being that XOR may introduce a data dependency on the previous value of the register, causing a pipeline stall. However, processors often have XOR of a register with itself as a special case that doesn't cause stalls.) A programming language is an artificial language that can be used to control the behavior of a machine, particularly a computer. ... C is a general-purpose, procedural, imperative computer programming language developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system. ... C++ (pronounced see plus plus, IPA: ) is a general-purpose, high-level programming language with low-level facilities. ... Exclusive disjunction (usual symbol xor) is a logical operator that results in true if one of the operands (not both) is true. ... Reduced Instruction Set Computer (RISC), is a microprocessor CPU design philosophy that favors a smaller and simpler set of instructions that all take about the same amount of time to execute. ... A microprocessor is a programmable digital electronic component that incorporates the functions of a central processing unit (CPU) on a single semiconducting integrated circuit (IC). ... Intel Corporation (NASDAQ: INTC, SEHK: 4335), founded in 1968 as Integrated Electronics Corporation, is an American multinational corporation that is best known for designing and manufacturing microprocessors and specialized integrated circuits. ... x86 or 80x86 is the generic name of a microprocessor architecture first developed and manufactured by Intel. ...


Factors affecting optimization

The machine itself

Many of the choices about which optimizations can and should be done depend on the characteristics of the target machine. It is sometimes possible to parameterize some of these machine dependent factors, so that a single piece of compiler code can be used to optimize different machines just by altering the machine description parameters. GCC is a compiler which exemplifies this approach. The GNU Compiler Collection (usually shortened to GCC) is a set of programming language compilers produced by the GNU Project. ...

The architecture of the target CPU
  • Number of CPU registers: To a certain extent, the more registers, the easier it is to optimize for performance. Local variables can be allocated in the registers and not on the stack. Temporary/intermediate results can be left in registers without writing to and reading back from memory.
  • RISC vs. CISC: CISC instruction sets often have variable instruction lengths, often have a larger number of possible instructions that can be used, and each instruction could take differing amounts of time. RISC instruction sets attempt to limit the variability in each of these: instruction sets are usually constant length, with few exceptions, there are usually fewer combinations of registers and memory operations, and the instruction issue rate (the number of instructions completed per time period, usually an integer multiple of the clock cycle) is usually constant in cases where memory latency is not a factor. There may be several ways of carrying out a certain task, with CISC usually offering more alternatives than RISC. Compilers have to know the relative costs among the various instructions and choose the best instruction sequence (see instruction selection).
  • Pipelines: A pipeline is essentially an ALU broken up into an assembly line. It allows use of parts of the ALU for different instructions by breaking up the execution of instructions into various stages: instruction decode, address decode, memory fetch, register fetch, compute, register store, etc. One instruction could be in the register store stage, while another could be in the register fetch stage. Pipeline conflicts occur when an instruction in one stage of the pipeline depends on the result of another instruction ahead of it in the pipeline but not yet completed. Pipeline conflicts can lead to pipeline stalls: where the CPU wastes cycles waiting for a conflict to resolve.
Compilers can schedule, or reorder, instructions so that pipeline stalls occur less frequently.
  • Number of functional units: Some CPUs have several ALUs and FPUs. This allows them to execute multiple instructions simultaneously. There may be restrictions on which instructions can pair with which other instructions ("pairing" is the simultaneous execution of two or more instructions), and which functional unit can execute which instruction. They also have issues similar to pipeline conflicts.
Here again, instructions have to be scheduled so that the various functional units are fully fed with instructions to execute.
The architecture of the machine
  • Cache size (256kB-4MB) and type (direct mapped, 2-/4-/8-/16-way associative, fully associative): Techniques like inline expansion and loop unrolling may increase the size of the generated code and reduce code locality. The program may slow down drastically if an oft-run piece of code (like inner loops in various algorithms) suddenly cannot fit in the cache. Also, caches which are not fully associative have higher chances of cache collisions even in an unfilled cache.
  • Cache/Memory transfer rates: These give the compiler an indication of the penalty for cache misses. This is used mainly in specialized applications.

Die of an Intel 80486DX2 microprocessor (actual size: 12×6. ... In computer architecture, a processor register is a small amount of very fast computer memory used to speed the execution of computer programs by providing quick access to commonly used values—typically, the values being in the midst of a calculation at a given point in time. ... In computer science, a local variable is a variable that is given local scope. ... In computer science, a call stack is a special stack which stores information about the active subroutines of a computer program. ... Reduced Instruction Set Computer (RISC), is a microprocessor CPU design philosophy that favors a smaller and simpler set of instructions that all take about the same amount of time to execute. ... A Complex Instruction Set Computer (CISC) is an instruction set architecture (ISA) in which each instruction can indicate several low-level operations, such as a load from memory, an arithmetic operation, and a memory store, all in a single instruction. ... Instruction selection is a compiler optimization that transforms an intermediate representation of a program into the final compiled code, either in binary or assembly format. ... The arithmetic logic unit/arithmetic-logic unit (ALU) of a computers CPU is a part of the execution unit, a core component of all CPUs. ... In computer architecture, a hazard is a potential problem that can happen in a pipelined processor. ... Simple superscalar pipeline. ... A floating point unit (FPU) is a part of a computer system specially designed to carry out operations on floating point numbers. ... Diagram of a CPU memory cache A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. ... In-line expansion or inlining for short is a compiler optimization which expands a function call site into the actual implementation of the function which is called, rather than each call transferring control to a common piece of code. ... Wikipedia does not yet have an article with this exact name. ...

Intended use of the generated code

Debugging
When a programmer is still writing an application, they will recompile and test often, and so compilation must be fast. This is one reason most optimizations are avoided while debugging. Also, debugging code is usually stepped through in a symbolic debugger, and optimizing transformations, particularly those that reorder code, can make it difficult to identify the output code with the line numbers in the source code from which the code was generated. This can confuse the debugging tools and the programmer using them.
General purpose use
Prepackaged software is very often expected to be executed on a variety of machines and CPUs that may share the same instruction set, but have different timing, cache or memory characteristics. So, the code may not be tuned to any particular CPU, or may be tuned to work well on the most popular CPU and work reasonably on other CPUs.
Special purpose use
If the software is compiled to be used on one or a few very similar machines, with known characteristics, then the compiler can heavily tune the generated code to those machines alone. Important special cases include code designed for parallel and vector processors, for which we have parallelizing compilers.

A debugger is a computer program that is used to test and debug other programs. ... Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain results faster. ... A vector processor, or array processor, is a CPU design that is able to run mathematical operations on multiple data elements simultaneously. ...

Optimization techniques

Common themes

To a large extent, compiler optimization techniques have the following themes, which sometime conflict.

Optimize the common case
The common case may have unique properties that allow a fast path at the expense of a slow path. If the fast path is taken most often, the result is better over-all performance.
Avoid redundancy
Reuse results that are already computed and store them for use later, instead of recomputing them.
Less code
Remove unnecessary computations and intermediate values. Less work for the CPU, cache, and memory is usually faster.
Straight line code, fewer jumps
Less complicated code. Jumps interfere with the prefetching of instructions, thus slowing down code.
Locality
Code and data that are accessed closely together in time should be placed close together in memory to increase spatial locality of reference.
Exploit the memory hierarchy
Accesses to memory are increasingly more expensive for each level of the memory hierarchy, so place the most commonly used items in registers first, then caches, then main memory, before on disks.
Parallelize
Reorder operations to allow multiple computations to happen in parallel, either at the instruction, memory, or thread level.
More precise information is better
The more precise the information the compiler has, the better it can employ all of these optimization techniques.

It has been suggested that this article or section be merged with Memory locality. ... The hierarchical arrangement of storage in current computer architectures is called the memory hierarchy. ...

Optimization techniques

Loop optimizations

Main article: Loop optimization

Some optimization techniques primarily designed to operate on loops include: Most execution time of a scientific program is spent on loops. ...

Induction variable analysis
Roughly, if a variable in a loop is a simple function of the index variable, such as j:= 4*i+1, it can be updated appropriately each time the loop variable is changed. This is a strength reduction, and also may allow the index variable's definitions to become dead code. This information is also useful for bounds-checking elimination and dependence analysis, among other things.
Loop fission or loop distribution
Loop fission attempts to break a loop into multiple loops over the same index range but each taking only a part of the loop's body. This can improve locality of reference, both of the data being accessed in the loop and the code in the loop's body.
Loop fusion or loop combining
Another technique which attempts to reduce loop overhead. When two adjacent loops would iterate the same number of times (whether or not that number is known at compile time), their bodies can be combined as long as they make no reference to each other's data.
Loop inversion
This technique changes a standard while loop into a do/while (also known as repeat/until) loop wrapped in an if conditional, reducing the number of jumps by two, for cases when the loop is executed. Doing so duplicates the condition check (increasing the size of the code) but is more efficient because jumps usually cause a pipeline stall. Additionally, if the initial condition is known at compile-time and is known to be side-effect-free, the if guard can be skipped.
Loop interchange
These optimizations exchange inner loops with outer loops. When the loop variables index into an array, such a transformation can improve locality of reference, depending on the array's layout.
Loop-invariant code motion
If a quantity is computed inside a loop during every iteration, and its value is the same for each iteration, it can vastly improve efficiency to hoist it outside the loop and compute its value just once before the loop begins. This is particularly important with the address-calculation expressions generated by loops over arrays. For correct implementation, this technique must be used with loop inversion, because not all code is safe to be hoisted outside the loop.
Loop nest optimization
Some pervasive algorithms such as matrix multiplication have very poor cache behavior and excessive memory accesses. Loop nest optimization increases the number of cache hits by performing the operation over small blocks and by using a loop interchange.
Loop reversal
Loop reversal reverses the order in which values are assigned to the index variable. This is a subtle optimization which can help eliminate dependencies and thus enable other optimizations.
Loop unrolling
Unrolling duplicates the body of the loop multiple times, in order to decrease the number of times the loop condition is tested and the number of jumps, which hurt performance by impairing the instruction pipeline. Completely unrolling a loop eliminates all overhead, but requires that the number of iterations be known at compile time.
Loop splitting
Loop splitting attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops which have the same bodies but iterate over different contiguous portions of the index range. A useful special case is loop peeling, which can simplify a loop with a problematic first iteration by performing that iteration separately before entering the loop.
Loop unswitching
Unswitching moves a conditional from inside a loop to outside the loop by duplicating the loop's body inside each of the if and else clauses of the conditional.

These optimizations remove unnecessary expressions from within loops and are vital to good generating highly optimized loops. ... Strength reduction is a software optimisation where a function of some systematically changing variable is calculated more efficiently by using previous values of the function. ... In computer programming, dead code typically consists of blocks of programming instructions or entire routines that will never be accessed because all calls to them have been removed, or code that cannot be reached because it is guarded by a control structure that must always transfer control somewhere else. ... Definition Bounds-checking elimination is a compiler optimization useful in programming languages or runtimes that enforce limits on array indicies. ... In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. ... Loop fission is a compiler optimization technique attempting to break a loop into multiple loops over the same index range but each taking only a part of the loops body. ... It has been suggested that this article or section be merged with Memory locality. ... Loop fusion is a compiler optimization, a loop transformation, which replaces multiple loops with a single one. ... Loop inversion is a compiler optimization, a loop transformation, which replaces a while loop by an if block containing a do. ... In computer architecture, a hazard is a potential problem that can happen in a pipelined processor. ... In computer science, a function is said to produce a side effect if it modifies some state other than its return value. ... In compiler theory, loop interchange is the process of exchanging the order of two iteration variables. ... It has been suggested that this article or section be merged with Memory locality. ... Loop-invariant code in an imperative programming language consists of statements which could be moved to before the loop (if the loop always terminates), or after the loop, without affecting the semantics of the program. ... Loop inversion is a compiler optimization, a loop transformation, which replaces a while loop by an if block containing a do. ... Loop nest optimization (LNO) is a compiler optimization that makes possible large reductions in the cache bandwidth necessary for some pervasive algorithms. ... In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. ... Wikipedia does not yet have an article with this exact name. ... Loop splitting (or loop peeling) is a compiler optimization technique. ... In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. ... Loop unswitching is a compiler optimization. ...

Data-flow optimizations

Data flow optimizations, based on Data-flow analysis, primarily depend on how certain properties of data are propagated by control edges in the control flow graph. Some of these include: It has been suggested that this article or section be merged with Dataflow architecture. ... Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program. ... A control flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution. ...

Common subexpression elimination
In the expression "(a+b)-(a+b)/4", "common subexpression" refers to the duplicated "(a+b)". Compilers implementing this technique realize that "(a+b)" won't change, and as such, only calculate its value once.
Constant folding and propagation
replacing expressions consisting of constants (e.g. "3 + 5") with their final value ("8") at compile time, rather than doing the calculation in run-time. Used in most modern languages.
Induction variable recognition and elimination
Alias classification and pointer analysis
in the presence of pointers, it is difficult to make any optimisations at all, since potentially any variable can have been changed when a memory location is assigned to. By specifying which pointers can alias which variables, unrelated pointers can be ignored.

In compiler theory, common subexpression elimination (CSE) is the practice of finding repeated redundant expression evaluations, and replacing them with a single computation assigned to a temporary variable. ... In compiler theory, constant folding and constant propagation are related optimization techniques used by many modern compilers. ... In computer science constant propagation (cprop) is an optimization performed by compilers. ... In computing, aliasing is a term that generally means that one variable or some reference, when changed, has an indirect (usually unexpected) effect on some other data. ... It has been suggested that Software pointer be merged into this article or section. ...

SSA-based optimizations

These optimizations are intended to be done after transforming the program into a special form called static single assignment (see SSA form), in which every variable is assigned in only one place. Although some function without SSA, they are most effective with SSA. Many optimizations listed in other sections also benefit with no special changes, such as register allocation. In compilers, static single assignment form, more often abbreviated SSA form or just SSA, is an intermediate representation in which every variable is assigned exactly once. ... In compilers, static single assignment form, more often abbreviated SSA form or just SSA, is an intermediate representation in which every variable is assigned exactly once. ...

global value numbering
GVN eliminates redundancy by constructing a value graph of the program, and then determining which values are computed by equivalent expressions. GVN is able to identify some redundancy that common subexpression elimination cannot, and vice versa.
sparse conditional constant propagation
Effectively equivalent to iteratively performing constant propagation, constant folding, and dead code elimination until there is no change, but is much more efficient. This optimization symbolically executes the program, simultaneously propagating constant values and eliminating portions of the control flow graph that this makes unreachable.

Global value numbering is a method of compiler optimization and is one of the applications of SSA (compilers). ... In compiler theory, common subexpression elimination (CSE) is the practice of finding repeated redundant expression evaluations, and replacing them with a single computation assigned to a temporary variable. ... Sparse conditional constant propagation is an optimization frequently utilized in compilers after conversion to static single assignment form (SSA). ... In computer science constant propagation (cprop) is an optimization performed by compilers. ... In compiler theory, constant folding and constant propagation are related optimization techniques used by many modern compilers. ... Dead code elimination is a technique used in computer science to reduce program size by removing code which can never be executed. ... A control flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution. ...

Code generator optimizations

register allocation
The most frequently used variables should be kept in processor registers for fastest access. To find which variables to put in registers an interference-graph is created. Each variable is a vertex and when two variables are used at the same time (have an intersecting liverange) they have an edge between them. This graph is colored using for example Chaitin's algorithm using the same number of colors as there are registers. If the coloring fails one variable is "spilled" to memory and the coloring is retried.
instruction selection
Most architectures, particularly CISC architectures and those with many addressing modes, offer several different ways of performing a particular operation, using entirely different sequences of instructions. The job of the instruction selector is to do a good job overall of choosing which instructions to implement which operators in the low-level intermediate representation with. For example, on many processors in the 68000 family, complex addressing modes can be used in statements like "lea 25(a1,d5*4), a0", allowing a single instruction to perform a significant amount of arithmetic with less storage.
instruction scheduling
Instruction scheduling is an important optimization for modern pipelined processors, which avoids stalls or bubbles in the pipeline by clustering instructions with no dependencies together, while being careful to preserve the original semantics.
rematerialization
Rematerialization recalculates a value instead of loading it from memory, preventing a memory access. This is performed in tandem with register allocation to avoid spills.
reordering computations
Based on integer linear programming, restructuring compilers enhance data locality and expose more parallelism by reordering computations.

In compiler optimization, register allocation is the process of multiplexing a large number of target program variables onto a small number of CPU registers. ... Chaitins algorithm is a graph coloring algorithm designed specifically for register allocation. ... Instruction selection is a compiler optimization that transforms an intermediate representation of a program into the final compiled code, either in binary or assembly format. ... A Complex Instruction Set Computer (CISC) is an instruction set architecture (ISA) in which each instruction can indicate several low-level operations, such as a load from memory, an arithmetic operation, and a memory store, all in a single instruction. ... Addressing modes, a concept from computer science, are an aspect of the instruction set architecture in most central processing unit (CPU) designs. ... In computing, an intermediate representation is a data structure that is constructed from input data to a program, and from which part or all of the output data of the program is constructed in turn. ... The Motorola 680x0, 0x0, m68k, or 68k family of CISC microprocessor CPU chips were 32_bit from the start, and were the primary competition for the Intel x86 family of chips. ... In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. ... This article is about the compiler optimization. ... In mathematics, linear programming (LP) problems are optimization problems in which the objective function and the constraints are all linear. ...

Functional language optimizations

Although many of these also apply to non-functional languages, they either originate in, are most easily implemented in, or are particularly critical in functional languages such as Lisp and ML. Functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions. ... Lisp is a family of computer programming languages with a long history and a distinctive fully-parenthesized syntax. ... ML is a general-purpose functional programming language developed by Robin Milner and others in the late 1970s at the University of Edinburgh, whose syntax is inspired by ISWIM. Historically, ML stands for metalanguage as it was conceived to develop proof tactics in the LCF theorem prover (the language of...

Removing recursion
Recursion is often expensive, as a function call consumes stack space and involves some overhead related to parameter passing and flushing the instruction cache. Tail recursive algorithms can be converted to iteration, which does not have call overhead and uses a constant amount of stack space, through a process called tail recursion elimination.
Data structure fusion
Because of the high level nature by which data structures are specified in functional languages such as Haskell, it is possible to combine several recursive functions which produce and consume some temporary data structure so that the data is passed directly without wasting time constructing the data structure.

A visual form of recursion known as the Droste effect. ... In computer science, tail recursion is a special case of recursion that can be easily transformed into an iteration. ... The word iteration is sometimes used in everyday English with a meaning virtually identical to repetition. ... A binary tree, a simple type of branching linked data structure. ...

Other optimizations

Please help separate and categorize these further and create detailed pages for them, especially the more complex ones, or link to one where one exists.

Bounds-checking elimination
Many modern languages, for example Java, enforce bounds-checking of all array accesses. This is a severe performance bottleneck on certain applications such as scientific code. Bounds-checking elimination allows the compiler to safely remove bounds-checking in many situations where it can determine that the index must fall within valid bounds, for example if it is a simple loop variable.
Branch offset optimization (machine independent)
Choose the shortest branch displacement that reaches target
Code-block reordering
Code-block reordering alters the order of the basic blocks in a program in order to reduce conditional branches and improve locality of reference.
Dead code elimination
Removes instructions that will not affect the behaviour of the program, for example definitions which have no uses, called dead code. This reduces code size and eliminates unnecessary computation.
Factoring out of invariants
If an expression is carried out both when a condition is met and otherwise, it can be written just once outside of the conditional statement. Similarly, if certain types of expressions (e.g. the assignment of a constant into a variable) appear inside a loop, they can be moved out of it because their effect will be the same no matter if they're executed many times or just once. Also known as total redundancy elimination. A more powerful optimization is Partial redundancy elimination (PRE).
Inline expansion
When some code invokes a procedure, it is possible to directly insert the body of the procedure inside the calling code rather than transferring control to it. This saves the overhead related to procedure calls, as well as providing great opportunity for many different parameter-specific optimizations, but comes at the cost of space; the procedure body is duplicated each time the procedure is called inline. Generally, inlining is useful in performance-critical code that makes a large number of calls to small procedures.
Jump threading
In this pass, conditional jumps in the code that branch to identical or inverse tests are detected, and can be "threaded" through a second conditional test.
Strength reduction
A general term encompassing optimizations that replace complex or difficult or expensive operations with simpler ones. Induction variable analysis can accomplish this with variables inside a loop which depend on the loop variable. Several peephole optimizations also fall into this category, such as replacing division by a constant with multiplication by its reciprocal, converting multiplies into a series of bit-shifts and adds, and replacing large instructions with equivalent smaller ones that load more quickly.
Reduction of cache collisions
(e.g. by disrupting alignment within a page)
Stack height reduction
Rearrange expression tree to minimize resources needed for expression evaluation.
Test reordering
If we have two tests that are the condition for something, we can first deal with the simpler tests (e.g. comparing a variable to something) and only then with the complex tests (e.g. those that require a function call). This technique complements lazy evaluation, but can be used only when the tests are not dependent on one another. Short-circuiting semantics can make this difficult.

Definition Bounds-checking elimination is a compiler optimization useful in programming languages or runtimes that enforce limits on array indicies. ... Java is an object-oriented applications programming language developed by Sun Microsystems in the early 1990s. ... In computing, a basic block is a straight-line piece of code without any jumps or jump targets in the middle; jump targets, if any, start a block, and jumps end a block. ... It has been suggested that this article or section be merged with Memory locality. ... Dead code elimination is a technique used in computer science to reduce program size by removing code which can never be executed. ... In computer programming, dead code typically consists of blocks of programming instructions or entire routines that will never be accessed because all calls to them have been removed, or code that cannot be reached because it is guarded by a control structure that must always transfer control somewhere else. ... Partial redundancy elimination (PRE) is a compiler optimization that eliminates expressions that are redundant on some but not necessarily all paths through a program. ... In-line expansion or inlining for short is a compiler optimization which expands a function call site into the actual implementation of the function which is called, rather than each call transferring control to a common piece of code. ... Look up Procedure in Wiktionary, the free dictionary. ... In computing, jump threading is a compiler optimization. ... Strength reduction is a software optimisation where a function of some systematically changing variable is calculated more efficiently by using previous values of the function. ... These optimizations remove unnecessary expressions from within loops and are vital to good generating highly optimized loops. ... In compiler theory, peephole optimization is a kind of optimization performed over a very small set of instructions in a segment of generated code. ... In computer programming, lazy evaluation is a technique that attempts to delay computation of expressions until the results of the computation are known to be needed. ... In computer programming, lazy evaluation is a concept that attempts to minimize the work the computer has to do. ...

Interprocedural optimizations

Inter Procedural optimization works on the entire program, across procedure and file boundaries. It works tightly with intraprocedural counterparts, carried out with the cooperation of a local part and global part. Typical interprocedural optimizations are: procedure inlining, interprocedural dead code elimination, interprocedural constant propagation, procedure reordering. As usual, interprocedural analysis is needed before actual optimizations, such as interprocedural alias analysis, array access analysis, construction of callgraph etc. In-line expansion or inlining for short is a compiler optimization which expands a function call site into the actual implementation of the function which is called, rather than each call transferring control to a common piece of code. ... Dead code elimination is a technique used in computer science to reduce program size by removing code which can never be executed. ... In computer science constant propagation (cprop) is an optimization performed by compilers. ... Alias analysis is a technique in compiler theory, used to determine if a storage location may be accessed in more than one way. ... In computer science, Array access analysis is a compiler analysis used to decide the read and write access patterns to elements or portions of arrays. ...


The existence of interprocedural analysis and optimization is common in modern commercial compilers from SGI, Intel, Microsoft, and Sun Microsystems. For a long time the open source GCC was criticized for a lack of powerful interprocedural analysis and optimizations. But now things are changing. Another good open source compiler with full analysis and optimization infrastructure is Open64, which is used by many organizations for research or even commercial purposes. Silicon Graphics, Inc. ... Intel Corporation (NASDAQ: INTC, SEHK: 4335), founded in 1968 as Integrated Electronics Corporation, is an American multinational corporation that is best known for designing and manufacturing microprocessors and specialized integrated circuits. ... Microsoft Corporation, (NASDAQ: MSFT, HKSE: 4338) is a multinational computer technology corporation with global annual revenue of US$44. ... Sun Microsystems, Inc. ... The GNU Compiler Collection (usually shortened to GCC) is a set of programming language compilers produced by the GNU Project. ... Open64 is an open source, state-of-art, optimizing compiler for Intel Itanium platform. ...


Due to the extra time and space requirement from compiler analysis and optimizations, most compilers choose to turn them off by default. Users must use compilation options to explicitly tell the compiler to enable some of them.


Problems with optimization

Early in the history of compilers, compiler optimizations were not as good as hand-written ones. As compiler technologies have improved, good compilers can often generate better code than human programmers — and good post pass optimizers can improve highly hand-optimized code even further. For RISC CPU architectures, and even more so for VLIW hardware, compiler optimization is the key for obtaining efficient code, because RISC instruction sets are so compact that it is hard for a human to manually schedule or combine small instructions to get efficient results. Indeed, these architectures were designed to rely on compiler writers for adequate performance. Reduced Instruction Set Computer (RISC), is a microprocessor CPU design philosophy that favors a smaller and simpler set of instructions that all take about the same amount of time to execute. ... A very long instruction word or VLIW CPU architectures implement a form of instruction level parallelism. ...


However, optimizing compilers are by no means perfect. There is no way that a compiler can guarantee that, for all program source code, the fastest (or smallest) possible equivalent compiled program is output; such a compiler is fundamentally impossible because it would solve the halting problem. Additionally, there are a number of other more practical issues with optimizing compiler technology: In computability theory the halting problem is a decision problem which can be informally stated as follows: Given a description of a program and a finite input, decide whether the program finishes running or will run forever, given that input. ...

  • Usually, an optimizing compiler only performs low-level, localized changes to small sets of operations. In other words, high-level inefficiency in the source program (such as an inefficient algorithm) remains unchanged.
  • Modern third-party compilers usually have to support several objectives. In so doing, these compilers are a 'jack of all trades' yet master of none.
  • A compiler typically only deals with a small part of an entire program at a time, at most a module at a time and usually only a procedure; the result is that it is unable to consider at least some important contextual information.
  • The overhead of compiler optimization. Any extra work takes time, whole-program optimization (interprocedural optimization) is very costly.
  • The interaction of compiler optimization phases: what combination of optimization phases are optimal, in what order and how many times?

Work to improve optimization technology continues. One approach is the use of so-called "post pass" optimizers. These tools take the executable output by an "optimizing" compiler and optimize it even further. As opposed to compilers which optimize intermediate representations of programs, post pass optimizers work on the assembly language level. Post pass compilers are also limited, however, by the fact that much of the information found in the original source code has been lost. An assembly language is a low-level language used in the writing of computer programs. ...


As processor performance continues to improve at a rapid pace while memory bandwidth improves more slowly, optimizations that reduce memory bandwidth (even at the cost of making the processor execute "extra" instructions) will become more useful. Examples of this mentioned above include loop nest optimization and rematerialization. Loop nest optimization (LNO) is a compiler optimization that makes possible large reductions in the cache bandwidth necessary for some pervasive algorithms. ... This article is about the compiler optimization. ...


List of compiler optimizations

In compiler theory, constant folding and constant propagation are related optimization techniques used by many modern compilers. ... In compiler theory copy propagation is the process of replacing the occurrences of targets of direct assignments with their values. ... In computer science constant propagation (cprop) is an optimization performed by compilers. ... In compiler theory, common subexpression elimination (CSE) is the practice of finding repeated redundant expression evaluations, and replacing them with a single computation assigned to a temporary variable. ... Partial redundancy elimination (PRE) is a compiler optimization that eliminates expressions that are redundant on some but not necessarily all paths through a program. ... Dead code elimination is a technique used in computer science to reduce program size by removing code which can never be executed. ... Strength reduction is a software optimisation where a function of some systematically changing variable is calculated more efficiently by using previous values of the function. ... In computer science, a loop invariant is an invariant used to prove properties of loops. ... Most execution time of a scientific program is spent on loops. ... In computer science, software pipelining is a technique used to optimize loops, in a manner that parallels hardware pipelining. ... This article or section does not adequately cite its references or sources. ... In compiler optimization, register allocation is the process of multiplexing a large number of target program variables onto a small number of CPU registers. ... In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. ...

List of compiler analyses

Alias analysis is a technique in compiler theory, used to determine if a storage location may be accessed in more than one way. ... In computer science, Array access analysis is a compiler analysis used to decide the read and write access patterns to elements or portions of arrays. ... In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. ... In computer science control flow (or alternatively, flow of control) refers to the order in which the individual statements, instructions or function calls of an imperative or functional program are executed or evaluated. ... A simple way to perform dataflow analysis (DFA) of programs is to set up dataflow equations for each node of the control flow graph (CFG) and solve them by repeatedly calculating the output from the input locally at each node until the whole system stabilizes, i. ... In computer science, use-define chains model the relation between definitions of variables and their uses in a sequence of assignments. ... In computer science, the live variable analysis is performed by compilers to calculate for each program point the variables that may be potentially read afterwards before their next write update. ...

See also

In computing, type introspection is a capability of some object-oriented programming languages to determine the type of an object at runtime. ...

External links


  Results from FactBites:
 
Creating High Performance Embedded Applications Through Compiler Optimizations (2675 words)
After the perfect compiler optimization has been applied or the most efficient algorithm has been implemented, it is important to verify the results.
However, good compiler vendors put enough time and energy into ensuring that their compiler is the best that they want to be able to share their expertise.
Some compilers have processor specific optimizations that provide hints to the compiler about what processor the application is going to be run on.
Compiler Optimization (1000 words)
There are also some optimizations that you might not choose to use for your specific application.
Optimizing for space can actually be faster than optimizing for speed because programs optimized for speed are almost always larger, and therefore more likely to cause additional paging than programs optimized for space.
option instructs the compiler not to track unwindable objects as it normally would in case an exception is thrown and objects must be unwound on the stack.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.