|
A floating-point number is a digital representation for a number in a certain subset of the rational numbers, and is often used to approximate an arbitrary real number on a computer. In particular, it represents an integer or fixed-point number (the significand or, informally, the mantissa) multiplied by a base (usually 2 in computers) to some integer power (the exponent). When the base is 2, it is the binary analogue of scientific notation (in base 10). A digital system is one that uses numbers, especially binary numbers, for input, processing, transmission, storage, or display, rather than a continuous spectrum of values (an analog system) or non-numeric symbols such as letters or icons. ...
In mathematics, a rational number (or informally fraction) is a ratio or quotient of two integers, usually written as the vulgar fraction a/b, where b is not zero. ...
In mathematics, the real numbers are intuitively defined as numbers that are in one-to-one correspondence with the points on an infinite lineâthe number line. ...
A drawing of a desktop computer. ...
It has been suggested that Discrete number be merged into this article or section. ...
The significand (also coefficient or, more informally, mantissa) is the part of a floating-point number that contains its significant digits. ...
In mathematics, exponentiation is a process generalized from repeated multiplication, in much the same way that multiplication is a process generalized from repeated addition. ...
To meet Wikipedias quality standards, this article or section may require cleanup. ...
A floating-point calculation is an arithmetic operation on floating-point numbers. This often involves some approximation or rounding because the result of an operation may not be exactly representable—floating-point numbers are limited precision and can therefore only represent a finite set of values, and if a result is not exactly one of those values then a choice of which value to use has to be be made, and the result will then be inexact. A floating-point number a can be represented by two numbers m and e, such that a = m × be. In any such system we pick a base b (called the base of numeration, also the radix) and a precision p (how many digits to store). m (which is called the significand or, informally, mantissa) is a p digit number of the form ±d.ddd...ddd (each digit being an integer between 0 and b−1 inclusive). If the leading digit of m is non-zero then the number is said to be normalized. Some descriptions use a separate sign bit (s, which represents −1 or +1) and require m to be positive. e is called the exponent. The radix (Latin for root), also called base, is the number of various unique symbols (or digits or numerals) a positional numeral system uses to represent numbers. ...
In Wikipedia, precision has the following meanings: In engineering, science, industry and statistics, precision characterises the degree of mutual agreement among a series of individual measurements, values, or results - see accuracy and precision. ...
The significand (also coefficient or, more informally, mantissa) is the part of a floating-point number that contains its significant digits. ...
This scheme allows a large range of magnitudes to be represented within a given size of field, which is not possible in a fixed-point notation. In mathematics, a fixed point of a function f is an argument x such that f(x) = x; see fixed point (mathematics). ...
As an example, a floating-point number with four decimal digits (b = 10, p = 4) and an exponent range of ±4 could be used to represent 43210, 4.321, or 0.0004321, but would not have enough precision to represent 432.123 and 43212.3 (which would have to be rounded to 432.1 and 43210). Of course, in practice, the number of digits is usually larger than four. In addition, floating-point representations often include the special values +∞, −∞ (positive and negative infinity), and NaN ('Not a Number'). Infinities are used when results are too large to be represented, and NaNs indicate an invalid operation or undefined result. In computing, NaN (Not a Number) is a value or symbol that is usually produced as the result of an operation on invalid input operands, especially in floating-point calculations. ...
Usage in computing
While in the examples above the numbers are represented in the decimal system (that is the base of numeration, b = 10), computers usually do so in the binary system, which means that b = 2. In computers, floating-point numbers are sized by the number of bits used to store them. This size is usually 32 bits or 64 bits, often called "single-precision" and "double-precision". A few machines offer larger sizes; Intel FPUs such as the Intel 8087 (and its descendants integrated into the x86 architecture) offer 80 bit floating point numbers for intermediate results, and several systems offer 128 bit floating-point, generally implemented in software. This website can be used to calculate the floating point representation of a decimal number. It has been suggested that this article or section be merged with decimal representation. ...
The binary numeral system represents numeric values using two symbols, typically 0 and 1. ...
This article is about the unit of information. ...
A floating point unit (FPU) is a part of a CPU specially designed to carry out operations on floating point numbers. ...
The 8087 was the first math coprocessor designed by Intel and it was built to be paired with the Intel 8088 and 8086 microprocessors. ...
x86 or 80x86 is the generic name of a microprocessor architecture first developed and manufactured by Intel. ...
Problems with floating-point Floating-point numbers usually behave very similarly to the real numbers they are used to approximate. However, this can easily lead programmers into over-confidently ignoring the need for numerical analysis. There are many cases where floating-point numbers do not model real numbers well, even in simple cases such as representing the decimal fraction 0.1, which cannot be exactly represented in any binary floating-point format. For this reason, financial software tends not to use a binary floating-point number representation. See: http://www2.hursley.ibm.com/decimal/ In mathematics, the real numbers are intuitively defined as numbers that are in one-to-one correspondence with the points on an infinite lineâthe number line. ...
Numerical analysis is the study of algorithms for the problems of continuous mathematics (as distinguished from discrete mathematics) using basic arithmetical operations like addition. ...
Errors in floating-point computation can include: - Rounding
- Non-representable numbers: for example, the literal 0.1 cannot be represented exactly by a binary floating-point number
- Rounding of arithmetic operations: for example 2/3 might yield 0.6666667
- Absorption: 1×1015 + 1 = 1×1015
- Cancellation: subtraction between nearly equivalent operands
- Overflow, which usually yields an infinity
- Underflow (often defined as an inexact tiny result outside the range of the normal numbers for a format), which yields zero, a subnormal number, or the smallest normal number
- Invalid operations (such as an attempt to calculate the square root of a negative number). Invalid operations yield a result of NaN (not a number).
- Rounding errors: unlike the fixed-point counterpart, the application of dither in a floating point environment is nearly impossible. See external references for more information about the difficulty of applying dither and the rounding error problems in floating point systems
Floating point representation is more likely to be appropriate when proportional accuracy over a range of scales is needed. When fixed accuracy is required, fixed point is usually a better choice. Look up Literal in Wiktionary, the free dictionary Literal (from Latin litteralis, from littera, letter); taken in a non-figurative sense. ...
The term arithmetic overflow or simply overflow has the following meanings. ...
The term arithmetic underflow or simply underflow has the following meanings. ...
In computing, a normal number is a non-zero number in a floating-point representation which is within the balanced range supported by a given floating-point format. ...
In computer science, denormal numbers (also called subnormal numbers) fill the gap around zero in floating point arithmetic: any non-zero number which is smaller than the smallest normal number is sub-normal. Producing a denormal is sometimes called gradual underflow because it allows the calculation to lose precision slowly...
In computing, NaN (Not a Number) is a value or symbol that is usually produced as the result of an operation on invalid input operands, especially in floating-point calculations. ...
Dither is a form of noise, or erroneous signal or data which is added to sample data for the purpose of minimizing quantization error. ...
Properties of floating point arithmetic Arithmetic using the floating point number system has two important properties that differ from those of arithmetic using real numbers. Floating point arithmetic is not associative. This means that in general for floating point numbers x, y, and z: In mathematics, associativity is a property that a binary operation can have. ...
Floating point arithmetic is also not distributive. This means that in general: In mathematics, and in particular in abstract algebra, distributivity is a property of binary operations that generalises the distributive law from elementary algebra. ...
In short, the order in which operations are carried out can change the output of a floating point calculation. This is important in numerical analysis since two mathematically equivalent formulas may not produce the same numerical output, and one may be substantially more accurate than the other. For example, with most floating-point implementations, (1e100 - 1e100) + 1.0 will give the result 1.0, whereas (1e100 + 1.0) - 1e100 gives 0.0.
IEEE standard The IEEE has standardized the computer representation for binary floating-point numbers in IEEE 754. This standard is followed by almost all modern machines. Notable exceptions include IBM Mainframes, which have both hexadecimal and IEEE 754 data types, and Cray vector machines, where the T90 series had an IEEE version, but the SV1 still uses Cray floating-point format. The Institute of Electrical and Electronics Engineers or IEEE (pronounced as eye-triple-ee) is an international non-profit, professional organization incorporated in the State of New York, United States. ...
The IEEE Standard for Binary Floating-Point Arithmetic (IEEE 754) is the most widely-used standard for floating-point computation, and is followed by many CPU and FPU implementations. ...
As of 2000, the IEEE 754 standard is currently under revision. See: IEEE 754r This article is in need of attention. ...
IEEE 754r is an ongoing revision to the IEEE 754 floating point standard. ...
Examples - The value of Pi, π = 3.1415926...10 decimal, which is equivalent to binary 11.001001000011111...2. When represented in a computer that allocates 17 bits for the significand, it will become 0.11001001000011111 × 22. Hence the floating-point representation would start with bits 01100100100001111 and end with bits 10 (which represent the exponent 2 in the binary system). The first zero indicates a positive number, the ending 102 = 210.
- The value of -0.37510 = -0.0112 or -0.11 × 2−1. In two's complement notation, −1 is represented as 11111111 (assuming 8 bits are used in the exponent). In floating-point notation, the number would start with a 1 for the sign bit, followed by 110000... and then followed by 11111111 at the end, or 1110...011111111 (where ... are zeros).
Lower-case pi The mathematical constant Ï is a real number which may be defined as the ratio of a circles circumference (Greek ÏεÏιÏÎÏεια, periphery) to its diameter in Euclidean geometry, and which is in common use in mathematics, physics, and engineering. ...
Hidden bit When using binary (b = 2), one bit, called the hidden bit or the implied bit, can be omitted if all numbers are required to be normalized. The leading digit (most significant bit) of the significand of a normalized binary floating-point number is always non-zero; in particular it is always 1. This means that this bit does not need to be stored explicitly, since for a normalized number it can be understood to be 1. The IEEE 754 standard exploits this fact. Requiring all numbers to be normalized means that 0 cannot be represented; typically some special representation of zero is chosen. In the IEEE standard this special code also encompasses denormal numbers, which allow for gradual underflow. The normalized numbers are also known as the normal numbers. The IEEE Standard for Binary Floating-Point Arithmetic (IEEE 754) is the most widely-used standard for floating-point computation, and is followed by many CPU and FPU implementations. ...
In computer science, denormal numbers (also called subnormal numbers) fill the gap around zero in floating point arithmetic: any non-zero number which is smaller than the smallest normal number is sub-normal. Producing a denormal is sometimes called gradual underflow because it allows the calculation to lose precision slowly...
In computing, a normal number is a non-zero number in a floating-point representation which is within the balanced range supported by a given floating-point format. ...
Note Although the examples in this article use a consistent system of floating-point notation, the notation is different from the IEEE standard. For example, in IEEE 754, the exponent is between the sign bit and the significand, not at the end of the number. Also the IEEE exponent uses a biased integer instead of a two's complement number. The reader should note that the examples serve the purpose of illustrating how floating-point numbers could be represented, but the actual bits shown in the article are different from those in a IEEE 754-compliant representation. The placement of the bits in the IEEE standard enables two floating-point numbers to be compared bitwise (sans sign bit) to yield a result without interpreting the actual values. The arbitrary system used in this article cannot do the same.
See also Significant figures (also called significant digits and abbreviated sig figs or sig digs, respectively) is a method of expressing errors in measurements. ...
It has been suggested that Discrete number be merged into this article or section. ...
In mathematics, theoretical computer science and mathematical logic, the computable numbers, also known as the recursive numbers, are the subset of the real numbers consisting of the numbers which can be computed by a finite, terminating algorithm. ...
The IEEE Standard for Binary Floating-Point Arithmetic (IEEE 754) is the most widely-used standard for floating-point computation, and is followed by many CPU and FPU implementations. ...
IBM System/360 computers, and subsequent machines based on that architecture (mainframes), support a hexadecimal floating-point format. ...
For commercial failures, see list of commercial failures. ...
â0 is the representation of negative zero, a number that exists in computing in some signed number representations for integers and in most floating point number representations. ...
In computing, half precision is a computer numbering format that occupies only half of one storage locations in computer memory at address. ...
In computing, single precision is a computer numbering format that occupies one storage locations in computer memory at address. ...
In computing, double precision is a computer numbering format that occupies two storage locations in computer memory at address and address+1. ...
In computing, quad precision is a computer numbering format that occupies four storage locations in computer memory at address, address+1, address+2, and address+3. ...
References - An edited reprint of the paper What Every Computer Scientist Should Know About Floating-Point Arithmetic, by David Goldberg, published in the March, 1991 issue of Computing Surveys.
- David Bindel’s Annotated Bibliography on computer support for scientific computation.
- Donald Knuth. The Art of Computer Programming, Volume 2: Seminumerical Algorithms, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89684-2. Section 4.2: Floating Point Arithmetic, pp.214–264.
- Kahan, William and Darcy, Joseph (2001). How Java’s floating-point hurts everyone everywhere. Retrieved Sep. 5, 2003 from http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf.
- Introduction to Floating point calculations and IEEE 754 standard by Jamil Khatib
- Survey of Floating-Point Formats This page gives a very brief summary of floating-point formats that have been used over the years.
|