|
In compiler design, static single assignment form (often abbreviated as SSA form or SSA) is an intermediate representation (IR) in which every variable is assigned exactly once. Existing variables in the original IR are split into versions, new variables typically indicated by the original name with a subscript, so that every definition gets its own version. In SSA form, use-def chains are explicit and each contains a single element. A diagram of the operation of a typical multi-language, multi-target compiler. ...
In computing, an intermediate representation is a data structure that is constructed from input data to a program, and from which part or all of the output data of the program is constructed in turn. ...
In computer science, use-define chains model the relation between definitions of variables and their uses in a sequence of assignments. ...
SSA was developed by Ron Cytron, Jeanne Ferrante, Barry Rosen, Mark Wegman, and Ken Zadeck, researchers at IBM in the 1980s. International Business Machines Corporation (IBM, or colloquially, Big Blue) (NYSE: IBM) (incorporated June 15, 1911, in operation since 1888) is headquartered in Armonk, New York, USA. The company manufactures and sells computer hardware, software, and services. ...
The 1980s was the decade spanning from 1980 to 1989, also called The Eighties. The decade saw social, economic and general upheaval as wealth, production and western culture migrated to new industrializing economies. ...
In functional language compilers, such as those for Scheme, ML and Haskell, continuation passing style (CPS) is generally used where one might expect to find SSA in a compiler for Fortran or C. SSA and CPS are formally equivalent, so optimizations and transformations formulated in terms of one immediately apply to the other. Functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions. ...
Scheme is a multi-paradigm programming language. ...
ML is a general-purpose functional programming language developed by Robin Milner and others in the late 1970s at the University of Edinburgh, whose syntax is inspired by ISWIM. Historically, ML stands for metalanguage as it was conceived to develop proof tactics in the LCF theorem prover (the language of...
Haskell is a standardized purely functional programming language with non-strict semantics, named after the logician Haskell Curry. ...
Continuation passing style (CPS) is a term used within functional programming to describe a style of programming wherein functions never return. ...
Fortran (previously FORTRAN[1]) is a general-purpose[2], procedural,[3] imperative programming language that is especially suited to numeric computation and scientific computing. ...
C is a general-purpose, block structured, procedural, imperative computer programming language developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system. ...
Benefits The primary usefulness of SSA comes from how it simultaneously simplifies and improves the results of a variety of compiler optimizations, by simplifying the properties of variables. For example, consider this piece of code: Compiler optimization is the process of tuning the output of a compiler to minimize some attribute (or maximize the efficiency) of an executable program. ...
y := 1 y := 2 x := y As humans, we can see that the first assignment is not necessary, and that the value of y being used in the third line comes from the second assignment of y. A program would have to perform reaching definition analysis to determine this. But if the program is in SSA form, both of these are immediate: In compiler theory, a reaching definition for a given instruction is another instruction, the target variable of which may reach the given instruction without an intervening assignment. ...
y1 := 1 y2 := 2 x1 := y2 Compiler optimization algorithms which are either enabled or strongly enhanced by the use of SSA include: Compiler optimization is the process of tuning the output of a compiler to minimize some attribute (or maximize the efficiency) of an executable program. ...
In computer science constant propagation (cprop) is an optimization performed by compilers. ...
Sparse conditional constant propagation is an optimization frequently utilized in compilers after conversion to static single assignment form (SSA). ...
Dead code elimination is a technique used in computer science to reduce program size by removing code which can never be executed. ...
Global value numbering is a method of compiler optimization and is one of the applications of SSA (compilers). ...
Partial redundancy elimination (PRE) is a compiler optimization that eliminates expressions that are redundant on some but not necessarily all paths through a program. ...
Strength reduction is a software optimisation where a function of some systematically changing variable is calculated more efficiently by using previous values of the function. ...
In compiler optimization, register allocation is the process of multiplexing a large number of target program variables onto a small number of CPU registers. ...
Converting to SSA Converting ordinary code into SSA form is primarily a simple matter of replacing the target of each assignment with a new variable, and replacing each use of a variable with the "version" of the variable reaching that point. For example, consider the following control flow graph: In compiler theory, a reaching definition for a given instruction is another instruction, the target variable of which may reach the given instruction without an intervening assignment. ...
A control flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution. ...
An example control flow graph, made by Derrick Coetzee in Illustrator, for use on the static single assignment form page. ...
Notice that we could change the name on the left side of "x x - 3", and change the following uses of x to use that new name, and the program would still do the same thing. We exploit this in SSA by creating two new variables, x1 and x2, each of which is assigned only once. We likewise give distinguishing subscripts to all the other variables, and we get this:
An example CFG with SSA partially done, demonstrating the need for phi functions in SSA. Created by Derrick Coetzee in Illustrator. ...
We've figured out which definition each use is referring to, except for one thing: the uses of y in the bottom block could be referring to either y1 or y2, depending on which way the control flow came from. So how do we know which one to use? The answer is that we add a special statement, called a Φ (Phi) function, to the beginning of the last block. This statement will generate a new definition of y, y3, by "choosing" either y1 or y2, depending on which arrow control arrived from:
Final part of SSA example 1 on static single assignment form page. ...
Now, the uses of y in the last block can simply use y3, and they'll obtain the correct value either way. You might ask at this point, do we need to add a Φ function for x too? The answer is no; only one version of x, namely x2 is reaching this place, so there's no problem. A more general question along the same lines is, given an arbitrary control flow graph, how can I tell where to insert Φ functions, and for what variables? This is a difficult question, but one that has an efficient solution that can be computed using a concept called dominance frontiers. Note: the Φ functions are not actually implemented; instead, they're just markers for the compiler to place the value of all the variables, which are grouped together by the Φ function, in the same location in memory (or same register).
Computing minimal SSA using dominance frontiers First, we need the concept of a dominator: we say that a node A strictly dominates a different node B in the control flow graph if it's impossible to reach B without passing through A first. This is useful, because if we ever reach B we know that any code in A has run. We say that A dominates B if either A strictly dominates B or A = B. Now we can define the dominance frontier: a node B is in the dominance frontier of a node A if A does not strictly dominate B, but does dominate some immediate predecessor of B (possibly A itself if A is the immediate predecessor of B). From A's point of view, these are the nodes at which other control paths, which don't go through A, make their earliest appearance.
Dominance frontiers capture the precise places at which we need Φ functions: if the node A defines a certain variable, then that definition and that definition alone (or redefinitions) will reach every node A dominates. Only when we leave these nodes and enter the dominance frontier must we account for other flows bringing in other definitions of the same variable. Moreover, no other Φ functions are needed in the control flow graph to deal with A's definitions, and we can do with no less.
One algorithm for computing the dominance frontier set is: for each node b if the number of predecessors of b ≥ 2 for each p in predecessors of b runner := p while runner ≠ idom(b) add b to runner’s dominance frontier set runner := idom(runner) Note: in the code above, a predecessor of node n is any node from which control is transferred to node n, and idom(n) is the immediate dominator of node n. There is an efficient algorithm for finding dominance frontiers of each node. This algorithm was originally described in the paper "Efficiently computing static single assignment form and the control dependence graph", by R. Cytron, J. Ferrante, B. Rosen, M. Wegman and F. Zadeck, ACM Trans. on Programming Languages and Systems 13(4) 1991 pp.451–490. Also useful is chapter 19 of the book "Modern compiler implementation in Java" by Andrew Appel (Cambridge University Press, 2002). See the paper for more details. Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy of Rice University describe an algorithm in their paper titled A Simple, Fast Dominance Algorithm. The algorithm uses well engineered data structures to improve performance. Lovett Hall William Marsh Rice University (commonly called Rice University and opened in 1912 as The William Marsh Rice Institute for the Advancement of Letters, Science and Art) is a private, comprehensive research university located in Houston, Texas, United States, near the Museum District and adjacent to the Texas Medical...
Variations that reduce the number of Φ functions "Minimal" SSA inserts the minimal number of Φ functions required to ensure that each name is assigned a value exactly once and that each reference (use) of a name in the original program can still refer to a unique name. (The latter requirement is needed to ensure that the compiler can write down a name for each operand in each operation.) However, some of these Φ functions could be dead. For this reason, minimal SSA does not necessarily produce the fewest number of Φ functions that are needed by a specific procedure. For some types of analysis, these Φ functions are superfluous and can cause the analysis to run less efficiently. Dead code elimination is a technique used in computer science to reduce program size by removing code which can never be executed. ...
Pruned SSA Pruned SSA form is based on a simple observation: Φ functions are only needed for variables that are "live" after the Φ function. (Here, "live" means that the value is used along some path that begins at the Φ function in question.) If a variable is not live, the result of the Φ function cannot be used and the assignment by the Φ function is dead. Construction of pruned SSA form uses live variable information in the Φ function insertion phase to decide whether a given Φ function is needed. If the original variable name isn't live at the Φ function insertion point, the Φ function isn't inserted. In computer science, the live variable analysis is performed by compilers to calculate for each program point the variables that may be potentially read afterwards before their next write update. ...
Another possibility is to treat pruning as a dead code elimination problem. Then, a Φ function is live only if any use in the input program will be rewritten to it, or if it will be used as an argument in another Φ function. When entering SSA form, each use is rewritten to the nearest definition that dominates it. A Φ function will then be considered live as long as it is the nearest definition that dominates at least one use, or at least one argument of a live Φ. Dead code elimination is a technique used in computer science to reduce program size by removing code which can never be executed. ...
Semi-pruned SSA Semi-pruned SSA form [1] is an attempt to reduce the number of Φ functions without incurring the relatively high cost of computing live variable information. It is based on the following observation: if a variable is never live upon entry into a basic block, it never needs a Φ function. During SSA construction, Φ functions for any "block-local" variables are omitted. Computing the set of block-local variables is a simpler and faster procedure than full live variable analysis, making semi-pruned SSA form more efficient to compute than pruned SSA form. On the other hand, pruned SSA form will contain fewer unnecessary Φ functions.
Converting out of SSA form As SSA form is no longer useful for direct execution, it is frequently used "on top of" another IR with which it remains in direct correspondence. This can be accomplished by "constructing" SSA as a set of functions which map between parts of the existing IR (basic blocks, instructions, operands, etc.) and its SSA counterpart. When the SSA form is no longer needed, these mapping functions may be discarded, leaving only the now-optimized IR. Performing optimizations on SSA form usually leads to entangled SSA-Webs, meaning there are phi instructions whose operands do not all have the same root operand. In such cases color-out algorithms are used to come out of SSA. Naive algorithms introduce a copy along each predecessor path which caused a source of different root symbol to be put in phi than the destination of phi. There are multiple algorithms for coming out of SSA with fewer copies, most use interference graphs or some approximation of it to do copy coalescing.
Extensions Extensions to SSA form can be divided into two categories. Renaming scheme extensions alter the renaming criterion. Recall that SSA form renames each variable when it is assigned a value. Alternative schemes include static single use form (which renames each variable at each statement when it is used) and static single information form (which renames each variable when it is assigned a value, and in each conditional context in which that variable is used). Feature-specific extensions retain the single assignment property for variables, but incorporate new semantics to model additional features. Some feature-specific extensions model high-level programming language features like arrays, objects and aliased pointers. Other feature-specific extensions model low-level architectural features like speculation and predication.
Compilers using SSA form SSA form is a relatively recent development in the compiler community. As such, many older compilers only use SSA form for some part of the compilation or optimization process, but most do not rely on it. Examples of compilers that rely heavily on SSA form include: - The ETH Oberon-2 compiler was one of the first public projects to incorporate "GSA", a variant of SSA.
- The LLVM Compiler Infrastructure uses SSA form for all scalar register values (everything except memory) in its primary code representation. SSA form is only eliminated once register allocation occurs, late in the compile process (often at link time).
- The open source SGI compiler ORC uses SSA form in its global scalar optimizer, though the code is brought into SSA form before and taken out of SSA form afterwards. ORC uses extensions to SSA form to represent memory in SSA form as well as scalar values.
- As of version 4 (released in April 2005) GCC, the GNU Compiler Collection, makes extensive use of SSA. The frontends generate GENERIC code which is then converted into SSA form by the "gimplifier" and optimized by the "middle-end". The backend eventually translates the optimized intermediate code into RTL, executes some more low-level optimizations and finally turns RTL into assembly language.
- IBM's open source adaptive Java virtual machine, Jikes RVM, uses extended Array SSA, an extension of SSA that allows analysis of scalars, arrays, and object fields in a unified framework. Extended Array SSA analysis is only enabled at the maximum optimization level, which is applied to the most frequently executed portions of code.
- In 2002, researchers modified IBM's JikesRVM (named Jalapeño at the time) to run both standard Java byte-code and a typesafe SSA (SafeTSA) byte-code class files, and demonstrated significant performance benefits to using the SSA byte-code.
- Mono uses SSA in its JIT compiler called Mini.
- jackcc is an open-source compiler for the academic instruction set Jackal 3.0. It uses a simple 3-operand code with SSA for its intermediate representation. As an interesting variant, it replaces Φ functions with a so-called SAME instruction, which instructs the register allocator to place the two live ranges into the same physical register.
- Although not a compiler, the Boomerang decompiler uses SSA form in its internal representation. SSA is used to simplify expression propagation, identifying parameters and returns, preservation analysis, and more.
- libFirm a completely graph based SSA intermediate representation for compilers. libFirm uses SSA form for all scalar register values until code generation by use of a SSA-aware register allocator.
Oberon-2 is a refinement to Oberon programming language, which adds the FOR loop and type bound procedures, which is to Oberon, what classes are to other object oriented programming languages. ...
Low Level Virtual Machine (LLVM) is a compiler infrastructure designed for compile-time, link-time, run-time, and idle-time optimization of programs from arbitrary programming languages. ...
The GNU Compiler Collection (usually shortened to GCC) is a set of programming language compilers produced by the GNU Project. ...
In their most general meanings, the terms front end and back end refer to the initial and the end stages of a process flow. ...
Look up Generic in Wiktionary, the free dictionary. ...
In their most general meanings, the terms front end and back end refer to the initial and the end stages of a process flow. ...
In computer science register transfer language (RTL) is a term used to describe a kind of intermediate representation (IR) that is very close to assembly language, such as that which is used in a compiler. ...
An assembly language is a low-level language for programming computers. ...
For other uses, see IBM (disambiguation) and Big Blue. ...
A Java Virtual Machine (JVM) is a set of computer software programs and data structures which implements a specific virtual machine model. ...
Jikes is an open source Java compiler. ...
Byte-code is a sort of intermediate code that is more abstract than machine code. ...
SafeTSA is a static single assignment form (SSA) intermediate representation capable of representing all of the type safety of the Java programming language and the standard Java Virtual Machine (JVM) byte-code. ...
Sun Microsystems, Inc. ...
Java HotSpot Virtual Machine is Sun Microsystems implementation of the Java Virtual Machine. ...
A decompiler is the name given to a computer program that performs the reverse operation to that of a compiler. ...
References - ^ Practical Improvements to the Construction and Destruction of Static Single Assignment Form (1998), Preston Briggs, Keith D. Cooper, Timothy J. Harvey, L. Taylor Simpson.
- Appel, Andrew W. (1999). Modern Compiler Implementation in ML. Cambridge University Press. ISBN 0-521-58274-1. Also available in Java (ISBN 0-521-82060-X 2002) and C (ISBN 0-521-60765-5, 199 8) versions.
- Cooper, Keith D.; & Torczon, Linda. (2003). Engineering a Compiler. Morgan Kaufmann. ISBN 1-55860-698-X.
- Muchnick, Steven S. (1997). Advanced Compiler Design and Implementation. Morgan Kaufmann. ISBN 1-55860-320-4.
- Richard A. Kelsey (March 1995). "A Correspondence between Continuation Passing Style and Static Single Assignment Form". ACM SIGPLAN Notices 30 (3): 13-22.
- Andrew W. Appel (April 1998). "SSA is Functional Programming". ACM SIGPLAN Notices 33 (4): 17-20.
Java language redirects here. ...
C is a general-purpose, block structured, procedural, imperative computer programming language developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system. ...
See also Compiler optimization is the process of tuning the output of a compiler to minimize some attribute (or maximize the efficiency) of an executable program. ...
External links |