When implemented in a microprocessor this is typically faster than a multiply operation followed by an add. It also allows for getting the bottom half of the multiplication. E.g.,
H = FMA(A, B, 0.0)
L = FMA(A, B, −H)
This is implemented on the PowerPC and Itanium processor families. Because of this instruction there is no need for a hardware divide or square root unit since they can both be implemented using the FMA in software.
A fast FMA can speed up and improve the accuracy of many computations which involve the accumulation of products:
No I am referring to expanding upon the operation list of add, sub, mul, div, - and I was doing this in context of the FMA discussion which I tied to the multiprecision feature discussion.
What if a vendor decides to add a 48 bit significand type to his hardware, and then an application that uses it becomes popular.
I don't really get the implied point of contradiction, surely you don't mean to say that because bugs exist in the world that there is no benefit to standardizing arithmetic or in simplifying the programming task.