Disclaimer: If you are not terribly interested in numerics and mathematical processes, this is most likely nothing for you.
I am currently a bit stuck in a development process of a private project I followed for a long time and I think I need some input because I cannot decide how to do it best. Unfortunately I must explain the problem a bit.
EDIT: This is NOT about rounding implementation. This is already done, both for float/doubles and arbitrary precision. You can assume I have enough knowledge about numerics, rounding modes and floating-point problems, the problem is how to design rounding operators in a stack-based program.
I am interested in the reproducibility problem in numerical computations. Due to the immense power (billions of operations per second) and certain problems by the languages used for number crunching (C and FORTRAN), people cannot check what the machine is actually doing in contrast what the machine is supposedly doing (Deteoriation of language standards like indifference to qNaNs and sNaNs, allow to "optimize" a+(b+c) == (a+b)+c, x86 vs MME floating-point, silent FMA support etc. etc.). This view is shared by William Kahan, the designer of the x86 floating-point architecture.
So I worked on a stack-based language like FORTH which allows reproducible results. You have types: aggregates (Vector,Matrix,Tensor) and numbers (complex, reals). You put two space instances on the stack, execute an ADD, and you get the resulting sum. If the spaces were two floats, you get a float. If the spaces were two doubles, you get a double. If the two spaces were arbitrary precision numbers, you get an arbitrary precision number. The same with vectors and matrices.
My aching problem is rounding. If we exclude exact rationals, after division every higher operation (algebraic and transcendental operations) requires rounding. There are two rounding operations which are already implemented: place rounding, setting the exact position of the rounding (rounding down: 2.345 using 0 => 2 / 2.345 using -1 => 2.3) and significant digits rounding by setting the length of the result (rounding down: 2.345 using 1 = 2 / 2.345 using 2 = 2.3).
The approaches:
One ring to rule them all: Only one rounding setting for all available types. Problem: Some operations are exact and should be executed as such; arbitrary precision numbers offer exact addition and exact multiplication. If I define e.g. ADD_EXACT I include a keyword which is not implemented by most datatypes and will be therefore a normal ADD and the danger is that I (or others) forget to use ADD_EXACT when necessary. Another problem is that it is makes sense to use different rounding modes for some operations: place rounding for addition and significant digit rounding for multiplication. I fear to swamp the program with unnecessary switches to set the rounding mode.
Perhaps some more rings....: Several settable rounding modes. Allows adding and multiplication independently. Problem: Much more possibilities (set and retrieve rounding mode, rounding operation and rounding parameter). How many choices do I offer ? Only two and risking that if e.g. FMA cannot be supported or many more and risking to overwhelm the program with much too much rounding bureaucracy ?
Rounding is a crucial operation and once implemented, it cannot be changed easily. This must be solved as good as possible.
What I need is a simple and elegant model and I fear that I have worked so long on it that I simply do not see it and/or need a plan to find out the right solution. Help is very appreciated and I will update this post to clarify some aspects I may have forgotten.
EDIT: Performance: Is not the design goal. It does not help you to get an incredibly fast wrong answer. It does not help you if two programmers on different machines get a different or (even similar) result if noone knows what is the right answer (or that even both are wrong). The goal of the program is exactly that: Find out when IEEE 754R precision will fail and find indicators and methods to validate numeric results. If you know that double precision and the tool will get you to the solution, by all means use C and FORTRAN.