|
In computer science, the term threaded code refers to an implementation technique for programming languages that produces very compact code. Threading is a form of code consisting entirely of subroutine calls, written without the subroutine call instruction, and processed by an interpreter (such as the Forth virtual machine), which jumps to each successive subroutine in turn. Wikibooks Wikiversity has more about this subject: School of Computer Science Open Directory Project: Computer Science Collection of Computer Science Bibliographies Belief that title science in computer science is inappropriate Categories: Computer science ...
A programming language or computer language is a standardized communication technique for expressing instructions to a computer. ...
For multi-threaded programming, see Thread (computer science) In computer science, the term threaded code refers to an implementation technique for programming languages that produces very compact code at the expense of some execution speed. ...
Threaded code is used in the Forth and early versions of the B programming languages, as well as many implementations of FORTRAN, BASIC, COBOL and other languages for small minicomputers. Forth is a procedural, data-structured, reflective, programming language and programming environment. ...
B was the name of a programming language developed at Bell Labs. ...
Fortran (also FORTRAN) is a statically typed, compiled imperative computer programming language originally developed in the 1950s and still heavily used for scientific computing and numerical computation half a century later. ...
The word basic may refer to one of several articles in Wikipedia: Basic English BASIC programming language Basic (chemistry), the opposite to acidic, reacting with acids to form salts. ...
COBOL is a third-generation programming language. ...
The benefits of extremely compact code, in some cases, come at the expense of slower execution speed. However, in other cases there is a synergistic effect -- sometimes threaded code is smaller and faster than non-threaded code. In systems with virtual memory (where memory is simulated with a mechanical disk drive), threaded code may be hundreds of times faster than a less-compact design that does not fit in the available physical memory, because disk drives tend to be roughly a thousand times slower than random-access memory (RAM). Synergy or synergism, most often refers to the phenomenon of two or more discrete influences or agents acting in common to create an effect which is greater than the sum of the effects each is able to create independently. ...
Virtual memory is intended to help the programmer by taking care of some memory housekeeping duties. ...
History leading to threaded code In early computers, memory was very expensive. So programmers spent a lot of time trying to find ways to squeeze their programs so they would fit in the memory available. Also, the instruction sets were so primitive that even simple operations like printing a character or dividing one number by another number required quite a few instructions. An instruction set, or instruction set architecture (ISA), describes the aspects of a computer architecture visible to a programmer, including the native datatypes, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O (if any). ...
Instead of writing out every step of such operations in every part of the program where it was needed (possibly using a macro), programmers saved memory by writing out every step of such operations exactly once and placing it in a subroutine. A macro in computer science is an abstraction, whereby a certain textual pattern is replaced according to a defined set of rules. ...
In computer science, a subroutine (function, procedure, or subprogram) is a sequence of code which performs a specific task, as part of a larger program, and is grouped as one or more statement blocks; such code is sometimes collected into software libraries. ...
This process (refactoring) is still commonly used today by all programmers and many wiki editors, although today we do it for different reasons. Refactoring is the process of rewriting written material to improve its readability or structure, with the explicit purpose of keeping its meaning or behavior. ...
The top-level application usually consists of nothing but subroutine calls. And many of those subroutines, in turn, also consist of nothing but subroutine calls. Some early computers such as the RCA 1802 required several instructions to call a subroutine. In the top-level application and in many subroutines, that sequence is repeated over and over again, only the subroutine address changing from one call to the next. Using expensive memory to store the same thing over and over again seems wasteful -- is there any way to store this information exactly once? The RCA (CDP)1802 (aka RCA COSMAC*, COSMAC 1802) is an 8-bit CMOS microprocessor (µP) introduced by RCA in early 1976, and presently being manufactured by Harris Semiconductor. ...
Threaded code To save even more space, programmers squeezed those lists of subroutine calls into simple lists of subroutine addresses (leaving out the "call" instruction on each one), and used a small interpreter (later called a virtual machine) to call each subroutine in turn. This is called DTC (Direct Threaded Code). In general terms, a virtual machine in computer science is software that creates an environment between the computer platform and the end user in which the end user can operate software. ...
Charles Moore invented an even more compact notation in 1970 for his Forth virtual machine: ITC (indirect threaded code). Originally, Moore invented this because it was easy and fast on NOVA minicomputers, which have an indirection bit in every address. He said (in published remarks, Byte Magazine's Forth Issue) that he found it so convenient that he propagated it into all later Forth designs. Charles H. Moore (also known as Chuck Moore) is the inventor of the Forth programming language. ...
Data General SuperNova The Data General Nova was a popular 16-bit minicomputer built by the US company Data General starting in 1968. ...
Some Forth compilers compile Forth programs into direct-threaded code. Other Forth compilers compile those same Forth programs into indirect-threaded code. The programs act the same either way.
Later developments Not content to stop there, programmers have developed other related techniques to make programs even more compact, although they are slower than threaded code. The idea of program compression has led to the idea of Kolmogorov complexity. In computer science, algorithmic information theory is a field of study which attempts to define the complexity (aka descriptive complexity, Kolmogorov complexity, Kolmogorov-Chaitin complexity, or algorithmic entropy) of a string as the length of the shortest binary program which outputs that string. ...
Threading models Practically all executable code uses one or another of these methods for calling subroutines (each method is called a "threading model"). - Indirect threading: The list of addresses in code point to an address at the start of a data area. The address at the start of the data area points at machine code to use the data. This "extra" pointer in indirect threading serves as an "executable data type" to define custom interpreters for data. The address at the start of each data area points to the code shared by all data areas of that type. More compact, yet slightly slower than direct-threaded code. Faster than byte code. No type interpretation is usually needed. Older Forth systems produced indirect-threaded code.
- Direct threading: The addresses in the code are actually the address of machine language. This is a compromise between speed and space. The indirect data pointer is lost, at some loss in the language's flexibility, and this may need to be corrected by a type tag in the data areas, with an auxiliary table. Some Forth systems produce direct-threaded code. On many machines direct-threading is faster than subroutine threading (see reference below).
- So-called "subroutine threaded code", a series of native machine language "call" instructions. This is not threaded code in the original sense, since the instructions are directly executed, without any interpretation. Early compilers for ALGOL, Fortran, Cobol and some Forth systems often produced subroutine-threaded code. The code in many of these systems operated on a first-in-first-out stack of operands, which had well-developed compiler theory. Many people believe that this is the fastest threading model, since all modern processors have special hardware support for such subroutine "call" instructions. But according to measurements by Anton Ertl, "in contrast to popular myths, subroutine threading is usually slower than direct threading."
- Token Threaded Code uses lists of 8 or 12-bit indexes to a table of pointers. Token threaded code is notably compact, without much special effort by a programmer. It is usually half to three-fourths the size of other threaded-codes, which are themselves a quarter to an eighth the size of compiled code. The table's pointers can either be indirect or direct. Some Forth compilers produce token threaded code. Some programmers consider the "p-code" generated by some Pascal compilers to be token threaded code. Some programmers consider the byte codes used by .NET, Java, Basic and some C compilers to be token-threading.
- Huffman Threaded Code consists of lists of Huffman codes. A Huffman code is a variable length bit string used to identify a unique item. A Huffman-threaded interpreter locates subroutines using an index table or tree of pointers that can be navigated by the Huffman code. Huffman threaded code is one of the most compact representations known for a computer program. Basically the index and codes are organized by measuring the frequency that each subroutine occurs in the code. Frequent calls are given the shortest codes. Operations with approximately equal frequencies are given codes with nearly equal bit-lengths. Most Huffman-threaded systems have been implemented as direct-threaded Forth systems, and used to pack large amounts of slow-running code into small, cheap microcontrollers. Most published uses have been in toys, calculators or watches.
Less often used are Byte-code is a sort of intermediate code that is more abstract than machine code. ...
Forth is a procedural, data-structured, reflective, programming language and programming environment. ...
Forth is a procedural, data-structured, reflective, programming language and programming environment. ...
The position of Algol Algol (β Per / Beta Persei) is a bright star in the constellation Perseus. ...
Fortran (also FORTRAN) is a statically typed, compiled imperative computer programming language originally developed in the 1950s and still heavily used for scientific computing and numerical computation half a century later. ...
COBOL is a third-generation programming language. ...
Forth is a procedural, data-structured, reflective, programming language and programming environment. ...
Forth is a procedural, data-structured, reflective, programming language and programming environment. ...
A p-code is similar to a byte-code but a p-code works at a higher level. ...
Byte-code is a sort of intermediate code that is more abstract than machine code. ...
Microsoft . ...
Java is a reflective, object-oriented programming language developed initially by James Gosling and colleagues at Sun Microsystems. ...
BASIC is a family of high-level programming languages. ...
The C Programming Language, Brian Kernighan and Dennis Ritchie, the original edition that served for many years as an informal specification of the language The C programming language is a standardized imperative computer programming language developed in the early 1970s by Ken Thompson and Dennis Ritchie for use on the...
In computer science, Huffman coding is an entropy encoding algorithm used for data compression that finds the optimal system of encoding strings based on the relative frequency of each character. ...
Forth is a procedural, data-structured, reflective, programming language and programming environment. ...
A microcontroller is a computer-on-a-chip optimised to control electronic devices. ...
- String threading: In which operations are identified by strings, usually looked-up by a hash table. This was used in Charles Moore's earliest Forth implementations and in the University of Illinois's experimental hardware-interpreted computer language. It is also used in Bashforth.
- return threading
- call threading
The University of Illinois is the set of three public universities in Illinois. ...
Bashforth is a free Forth interpreter, written entirely in the bash scripting language. ...
Common amenities Separating the data and return stacks in a machine eliminates a great deal of stack management code, substantially reducing the size of the threaded code. The dual-stack principle was originated three times independently: for Burroughs computers, Forth and Postscript), and is used in some Java virtual machines. Forth is a programming language and programming environment. ...
PostScript (PS) is a page description language used primarily in the electronic and desktop publishing areas. ...
Three registers are often present in a threaded virtual machine. Another one exists for passing data between subroutines ('words'). These are: In computer architecture, a processor register is a small amount of very fast computer memory used to speed the execution of computer programs by providing quick access to commonly used values—typically, the values being in the midst of a calculation at a given point in time. ...
In computer science, a subroutine (function, procedure, or subprogram) is a sequence of code which performs a specific task, as part of a larger program, and is grouped as one or more statement blocks; such code is sometimes collected into software libraries. ...
Often, threaded virtual machines such as implementations of the programming language Forth have a simple virtual machine at heart, consisting of three primitives. Those are: The program counter (also called the instruction pointer in some computers) is a register in a computer processor which indicates where the computer is in its instruction sequence. ...
A stack is a data structure that works on the principle of Last In First Out (LIFO). ...
A parameter is a measurement or value on which something else depends. ...
A stack is a data structure that works on the principle of Last In First Out (LIFO). ...
A parameter is a measurement or value on which something else depends. ...
In general terms, a virtual machine in computer science is software that creates an environment between the computer platform and the end user in which the end user can operate software. ...
A programming language or computer language is a standardized communication technique for expressing instructions to a computer. ...
Forth is a procedural, data-structured, reflective, programming language and programming environment. ...
In general terms, a virtual machine in computer science is software that creates an environment between the computer platform and the end user in which the end user can operate software. ...
- nest, also called docol
- unnest, or semi_s (;s)
- next
In an indirect-threaded virtual machine, the one given here, the operations are: next: (ip)+ -> w ; jmp (w)+ nest: ip -> -(rp) ; w -> ip ; next unnest: (rp)+ -> ip ; next This is perhaps the simplest and fastest interpreter or virtual machine.
External links - The Development of the C Language by Dennis M. Ritchie also puts B in the context of BCPL and C.
- Anton Ertl's explanatory page "What is Threaded Code?" describes different threading techniques and provides further references. "Charles Moore invented (indirect) threaded Code in 1970"
|