4
\$\begingroup\$

I have this data structure to which the user can add number or remove them, both in constant time. Also, there is a \$O(1)\$ operation for querying the standard deviation of the current contents of the structure:

StandardDeviationNumberSet.java

package net.coderodde.stat; import java.util.HashMap; import java.util.Map; public class StandardDeviationNumberSet { private double sum; private double squareSum; private int numberOfElements; private final Map<Double, Integer> countMap = new HashMap<>(); public void add(Number number) { double numberValue = number.doubleValue(); Integer count = countMap.get(numberValue); if (count != null) { countMap.put(numberValue, count + 1); } else { countMap.put(numberValue, 1); } sum += numberValue; squareSum += numberValue * numberValue; numberOfElements++; } public double remove(Number number) { double numberValue = number.doubleValue(); Integer count = countMap.get(numberValue); if (count != null) { if (count > 1) { countMap.put(numberValue, count - 1); } else { countMap.remove(numberValue); } sum -= numberValue; squareSum -= numberValue * numberValue; numberOfElements--; return numberValue; } else { return Double.NaN; } } public double getStandardDeviation() { checkSetHasAtleastTwoElements(); double step1 = squareSum - sum * sum / numberOfElements; double step2 = step1 / (numberOfElements - 1); return Math.sqrt(step2); } private void checkSetHasAtleastTwoElements() { if (numberOfElements < 2) { throw new IllegalStateException( "Computing the standard deviation requires at least two " + "elements."); } } public static void main(String[] args) { StandardDeviationNumberSet set = new StandardDeviationNumberSet(); set.add(1); set.add(1); set.add(3); System.out.println(set.getStandardDeviation()); set.remove(1); System.out.println(set.getStandardDeviation()); } } 

As always, any critique is much appreciated.

\$\endgroup\$

    2 Answers 2

    4
    \$\begingroup\$

    Overall I don't see any major problems with the code. There are some things I would change though:

    Write unit tests.

    Whenever I deal with mathematical classes, I think that it is obligatory to provide unit tests to verify correct computation and computational precision.

    Add JavaDoc

    Please add a JavaDoc to clarify whether it computes sample or population statistics.

    Add comments

    The use numberValue instead of number in the look ups to countMap has a significant but subtle difference that is easy to miss if you're not an expert. I would add comments to clarify this (you are correctly avoiding Integer(3) and Double(3) getting different counts).

    Use map.getOrDefault

    Since Java 8, which we have to assume is wide spread by this point, you can replace:

     Integer count = countMap.get(numberValue); if (count != null) { countMap.put(numberValue, count + 1); } else { countMap.put(numberValue, 1); } 

    with

     countMap.put(numberValue, countMap.getOrDefault(numberValue, 0) + 1); 

    Consider handling NaN and Inf inputs

    Currently your code will silently break if any of the inputs contain NaN or Infinity. I would consider testing for these inputs and throwing if they are encountered.

    Return type of remove is weird

    As far as I can tell remove will return the input value or NaN if the input value was not removed. To me this feels very strange, I would much rather it'd return true if the statistics changed and false otherwise.

    Change API to use double instead of Number

    Currently your methods like public void add(Number number) first unbox the number into a double then rebox it into a Double. So calling add with a Double causes unnecessary boxing and unboxing.

    Further more calling add with any of the primitive types (int,long,float etc) causes first a boxing to Number, then an unboxing to double then a re-boxing to Double (to use in countMap.get). This is a lot of unnecessary boxing and unboxing and will also generate a compiler warning on many systems.

    So I would change:

    public void add(Number number) { double numberValue = number.doubleValue(); Integer count = countMap.get(numberValue); if (count != null) { countMap.put(numberValue, count + 1); } else { countMap.put(numberValue, 1); } sum += numberValue; squareSum += numberValue * numberValue; numberOfElements++; } 

    to:

    public void add(double number) { // JIT will use CSE to consolidate the auto boxings of number. countMap.put(number, countMap.getOrDefault(number, 0) + 1); sum += number; squareSum += number* number; numberOfElements++; } public void add(Number number){ add(number.doubleValue()); } 

    and similarily for remove.

    I would prefer to use the primitive double in the API because more often than not the user will do some computations to end up with the input value and will thus have a primitive double that they want to pass in. Other primitive types will automatically be type promoted to double. In the odd case that the user actually has a Number of some sort, we add an overload to handle it.

    Add methods for querying the count and mean as well

    You have all the data to get the mean value and sample count as well and these are commonly used in conjunction with the standard deviation so to me it makes sense to provide these methods as well as they are trivial.

    Edit/Addendum

    I would consider leaving the validation of removed values to the user. By having the countMap you add \$O\left(n\right)\$ memory requirement even if the user doesn't need the remove ability. And in many cases where you do need a remove (like a sliding window) the removal is guaranteed to be valid by the definition of the window.

    \$\endgroup\$
      3
      \$\begingroup\$

      I only have a very few nitpicks:

      1. It's not technically \$O(1)\$, it's amortized constant time (\$O(1)\$), as you distribute cost of the calculation over the additions / removals. Not really an issue, but maybe you should make that distinction in your documentation.

      2. Use countMap.put(number, ...) instead of countMap.put(numberValue, ...), as it preserves the caller's object identities that way. This would entail changing the type parameterization of countMap to be Map<Number, Integer>, though.

        Considering comments (especially by @EmilyL), you should leave the implementation as-is, but you should put in a comment that you are only interested in the numeric value, not the actual Number object.

      3. Consider marking your formal parameters as final, as you do not mutate them.

      4. Make remove()'s return type void, as the user has no need to get the same number which was passed as an argument returned. Rather, throw an exception, preferably java.util.NoSuchElementException if the relevant number could not be found in countMap.

      5. An alternative to (4) is to make remove()'s return type boolean, and return true on successful removal of number from the set, or false if the removal was not successful.

      6. Instead of throwing an IllegalStateException when asked to compute the standard deviation when the set has less than 2 elements, return Double.NaN for such sets, as in the 0- & 1-element sample sizes, the standard deviation is mathematically undefined. That does not warrant an exception, as there is a value which indicates exactly undefined: NaN. Here I assume that the standard deviation is calculated for a sample, not a population, because in that case the standard deviation for 0- & 1-element sets would be 0 (although computation of population statistics is not common in practical usage).

      \$\endgroup\$
      4
      • \$\begingroup\$Using countMap.put(number instead of numberValue is incorrect from a mathematical point of view because assertEquals(new Integer(3), new Double(3.0)) will fail. And thus integer 3 and double 3.0 would appear as different counts although they are the same value. OPs code is correct. Although it should be commented because it is far from obvious why it is done that way.\$\endgroup\$
        – Emily L.
        CommentedMar 27, 2017 at 8:00
      • \$\begingroup\$@Emily L. HashMap grows as well when the load factor is exceeded. It's still constant time amortized.\$\endgroup\$
        – coderodde
        CommentedMar 27, 2017 at 10:31
      • \$\begingroup\$@coderodde yeah, my bad. Forgot to consider the put operation.\$\endgroup\$
        – Emily L.
        CommentedMar 27, 2017 at 10:49
      • \$\begingroup\$@EmilyL. Right you are. I clarified. However, keep your comment around as otherwise my answer loses context.\$\endgroup\$CommentedMar 27, 2017 at 11:20

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.