1

Disclaimer: If you are not terribly interested in numerics and mathematical processes, this is most likely nothing for you.

I am currently a bit stuck in a development process of a private project I followed for a long time and I think I need some input because I cannot decide how to do it best. Unfortunately I must explain the problem a bit.

EDIT: This is NOT about rounding implementation. This is already done, both for float/doubles and arbitrary precision. You can assume I have enough knowledge about numerics, rounding modes and floating-point problems, the problem is how to design rounding operators in a stack-based program.

I am interested in the reproducibility problem in numerical computations. Due to the immense power (billions of operations per second) and certain problems by the languages used for number crunching (C and FORTRAN), people cannot check what the machine is actually doing in contrast what the machine is supposedly doing (Deteoriation of language standards like indifference to qNaNs and sNaNs, allow to "optimize" a+(b+c) == (a+b)+c, x86 vs MME floating-point, silent FMA support etc. etc.). This view is shared by William Kahan, the designer of the x86 floating-point architecture.

So I worked on a stack-based language like FORTH which allows reproducible results. You have types: aggregates (Vector,Matrix,Tensor) and numbers (complex, reals). You put two space instances on the stack, execute an ADD, and you get the resulting sum. If the spaces were two floats, you get a float. If the spaces were two doubles, you get a double. If the two spaces were arbitrary precision numbers, you get an arbitrary precision number. The same with vectors and matrices.

My aching problem is rounding. If we exclude exact rationals, after division every higher operation (algebraic and transcendental operations) requires rounding. There are two rounding operations which are already implemented: place rounding, setting the exact position of the rounding (rounding down: 2.345 using 0 => 2 / 2.345 using -1 => 2.3) and significant digits rounding by setting the length of the result (rounding down: 2.345 using 1 = 2 / 2.345 using 2 = 2.3).

The approaches:

  • One ring to rule them all: Only one rounding setting for all available types. Problem: Some operations are exact and should be executed as such; arbitrary precision numbers offer exact addition and exact multiplication. If I define e.g. ADD_EXACT I include a keyword which is not implemented by most datatypes and will be therefore a normal ADD and the danger is that I (or others) forget to use ADD_EXACT when necessary. Another problem is that it is makes sense to use different rounding modes for some operations: place rounding for addition and significant digit rounding for multiplication. I fear to swamp the program with unnecessary switches to set the rounding mode.

  • Perhaps some more rings....: Several settable rounding modes. Allows adding and multiplication independently. Problem: Much more possibilities (set and retrieve rounding mode, rounding operation and rounding parameter). How many choices do I offer ? Only two and risking that if e.g. FMA cannot be supported or many more and risking to overwhelm the program with much too much rounding bureaucracy ?

Rounding is a crucial operation and once implemented, it cannot be changed easily. This must be solved as good as possible.

What I need is a simple and elegant model and I fear that I have worked so long on it that I simply do not see it and/or need a plan to find out the right solution. Help is very appreciated and I will update this post to clarify some aspects I may have forgotten.

EDIT: Performance: Is not the design goal. It does not help you to get an incredibly fast wrong answer. It does not help you if two programmers on different machines get a different or (even similar) result if noone knows what is the right answer (or that even both are wrong). The goal of the program is exactly that: Find out when IEEE 754R precision will fail and find indicators and methods to validate numeric results. If you know that double precision and the tool will get you to the solution, by all means use C and FORTRAN.

2
  • 2
    Alright, wait a minute. First of all, floating point calculations are not designed to be exact, but only accurate to within a certain precision, and the IEEE standards already say that. Secondly, for the most part, a NaN is a NaN. Most implementations are indifferent to NaN flavor for the same reasons that chemists are indifferent to quarks; it doesn't matter. Third, optimizations such as MME and numeric processors are implementations. Implementations are not guaranteed to produce the exact same results, because they don't have to; they just have to meet the spec for precision.CommentedMay 9, 2015 at 18:22
  • See also Rounding, where all of the different rounding types are described. In short, I don't think you can adequately simplify this problem unless you take a stand, like "Use Banker's Rounding Exclusively." Some platforms allow you to specify the type of rounding you want; others use one type of rounding throughout.CommentedMay 9, 2015 at 18:36

2 Answers 2

3

I couple of notes:

  • I don't understand your ADD_EXACT problem. If the operation is exact, don't round. Alternatively, if there has to be a rounding step (this includes most FPU operations), it must not introduce error.
  • FMA should probably be a completely separate operation from both addition and multiplication, since its rounding behavior is fundamentally different: It rounds only once. The two rounding steps of separate-multiply-add would have to be prescient to match FMA rounding, regardless of the exact rounding strategy.
  • The terms you use to talk about rounding worry me. It sounds like you're unaware of, or departing from, binary floating point and how rounding can actually vary in IEE-754 and in general (round to nearest even, round towards zero, round towards positive infinity, etc.).
  • You seem inexplicably focused on the aesthetic but largely inconsequential concern of how to name the digit to which one rounds. No mention of the actual rounding strategy (see above).
  • Design-wise, there should be a default rounding mode, with the option to explicitly use others. Whether to do this choice on an operation-by-operation basis, with a lexically scoped with_rounding mode { ... code ... } block, or dynamically scoped (i.e. set a global or thread-local variable that controls all rounding everywhere) is a very hard question. All options have severe drawbacks and I don't feel qualified to have an opinion on this.
  • You pitch your design as a potential alternative to FORTRAN, C, etc. for number crunching, but making performance competitive with (occasionally inaccurate) FPUs built in silicon is hard to impossible, and you don't seem worried about it. You will have to deal with binary, fixed-width floating point (i.e., IEEE-754) and the idiosyncrasies of its hardware implementations. (The essentially dynamic typing won't help either.)
5
  • Ok, some clarifications -- ADD_EXACT: IEEE754(R) automatically introduces most times a rounding error, for arbitrary precision numbers this is never a concern. Terms about rounding: I am aware of that and it is already supported. Design-wise: Yeah, that is the problem. That it is very hard gives the self-assurance that I am not too dumb, but is not much help. :) Performance: see aboveCommentedMay 9, 2015 at 19:04
  • @ThorstenS. About ADD_EXACT: Yes, there often is error, but it's unavoidable without leaving the format behind entirely. However, if the result can be represented exactly, it is. What's the problem again? I guess I don't see what you want ADD_EXACT to achieve. Promote to an arbitrary precision type? (If so, why does that have to be part of the addition operation?)
    – user7043
    CommentedMay 9, 2015 at 19:13
  • Ok, I try to explain: Look at the formula for variance. Cite: The naive implementation should not used in practice because of floating-point errors. The problem: If you have exact addition and multiplication, the naive approach is not only not bad anymore, it is the best approach. There are many other algorithms where floating-point is disadvised, but where arbitrary precision makes the argumentation invalid.CommentedMay 9, 2015 at 19:23
  • @ThorstenS. So... offer arbitrary precision types, allow converting floats to those types and back, and implement those algorithms in terms of those. In other words, why do you need to associate the exact-ness to the operation?
    – user7043
    CommentedMay 9, 2015 at 19:49
  • Because sometimes it is actually wise to limit the precision of addition and multiplication. Let's say I want to have the relatively precise logarithm of a number. This can be calculated by an iterative algorithm using only base operations. But if you do not round intermediate results on the n-th digit, your algorithm becomes unbearably slow. You need to set the precision for the multiplication because then the process can ignore part products and speed it up. Sometimes an exact ADD and MUL is very nice. Losing 1-3 orders of magnitude for operations is acceptable, but 7 or more orders...no.CommentedMay 9, 2015 at 20:07
0

Due to the inexplicable power of rubber duck debugging I have now a solution I find better than all the other approaches I thought of.

I will implement a rounding stack. When a subroutine is entered, the last entry will be duplicated and pushed as first element on the stack (It will also be used entry if the stack is completely cleared). The stack is strictly local, access to rounding values of superroutines is not possible. When the subroutine is left, the stack will be cleared; no survivors.

The stack has the following operations: ROUND_DUP (to save the original state if necessary), ROUND_PUSH , ROUND_POP, ROUND_CYCLE (see later) and accessor functions for the current rounding.

ROUND_CYCLE cycles through the rounding modes. If we have 3 rounding modes on the stack, this means rounding mode 1,2,3,1,2,3...If we have only one rounding mode on the stack, the instruction does nothing.

Augmented are test functions: EXACT_ADD_SUPPORTED EXACT_MUL_SUPPORTED (It is not advisable to use only IS_ARBITRARY_PRECISION because it is possible that the system may have an long accumulator for adding (even in hardware) and does not support exact multiplication).

So I can write general functions for both arbitrary precision and fixed precision.

EXACT_ADD_SUPPORTED IF ROUND_PUSH EXACT ENDIF DO .... ROUND_CYCLE ADD ROUND_CYCLE LOOP 

If this is arbitrary precision, the two rounding values are switched and I have exact addition. If this is fixed precision, only one rounding value remains and is used; ROUND_CYCLE does nothing. So I can set the necessary roundings at the beginning of the program, making it easy to find and understand. Given that CYCLE does only switches, I must think before how the rounding modes must be set during the execution, avoiding superfluous and unstructured approaches. I have only one current rounding mode, avoiding overhead.

If you have improvements and criticism, be invited to share.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.