1
\$\begingroup\$

I am working on writing a discrete random variable generator function in excel vba. I've pulled this off a number of ways but here is the one I like the most.

Imagine in excel we have the following in cells A1:B3

red 5 blue 7 yellow 3 

The following code is modeled as follows. Imagine drawing a line going from 0 to 15 (where 15 is 5 + 7 + 3). The line is red from [0,5) and blue from [5,12) and yellow from [12,15). And you pick a uniformly distributed random number, x, from 0 to 15. And wherever on the line x falls, that is the color that wins.

Here is the function:

Function discrete(r1 As Range, r2 As Range) 'shrink column ranges to contiguous cells Set r1 = Range(r1(1), r1(1).End(xlDown)) Set r2 = Range(r2(1), r2(1).End(xlDown)) 'convert ranges to arrays Dim arr1, arr2 As Variant arr1 = r1.Value arr2 = r2.Value Dim n As Long n = Application.Sum(arr2) 'sum total of 2nd array Dim x As Double x = n * Rnd() 'generate a random number 0 <= x < n Dim min, max As Double min = 0 max = 0 Dim i As Long For i = LBound(arr1, 1) To UBound(arr1, 1) 'where on the line x falls max = max + arr2(i, 1) If x >= min And x < max Then discrete = arr1(i, 1) End If min = min + arr2(i, 1) Next i End Function 

I'm looking for some critique of my code and approach.

  1. Assuming I'm not picking from 1000 values (column a), is it reasonable that I convert the ranges to arrays? Is this a time saving step? Or should I just work with the ranges? What I mean is, does this step pay for itself computationally given that I may always be working with only a few rows of data? Or might it be worth it solely on that basis that I may indeed supply this function with large ranges?

  2. Even if you supply a full column, ie. discrete(range("a:a"), range("b:b")), the range inside the function is reset to only those cells that are contiguous starting from the first cells of the supplied ranges. Is that a reasonable approach?

  3. The algorithm itself relies on a for loop and 2 simple variables (min, max). I this a lightweight enough approach? Or are there some generally accepted practices I'm not aware of?

\$\endgroup\$

    1 Answer 1

    4
    \$\begingroup\$

    Option Explicit

    Option explicit. Always.

    Strong typing

    VBA facilitates strong typing. This should be used because it stops VBA guessing at types which improves performance and reduces the chance of coding errors. Especially as some types/objects have default values that are not well documented!

    This is a function, so what does it return? At the moment, it returns a Variant which might be OK, but your code does not reflect this requirement.

    Public Function discrete(r1 As Range, r2 As Range) As String 

    If you want it to be Variant, then be explicit and state that!

    Also, another gotcha is the way variables are declared. The following lines are not doing what you think they are doing:

    Dim arr1, arr2 As Variant Dim min, max As Double 

    In these lines, both the first variables are not declared and thus they default to Variant. Again, if you want a variable to be a Variant, then be explicit (like you tried to be in the first line above). What you really wanted was:

    Dim arr1 As Variant, arr2 As Variant Dim min As Double, max As Double 

    By the way, kudos for declaring variables near where you are using them.

    Proper qualification

    Always put proper qualification on your objects. Excel uses defaults for things that are not properly qualified - the active sheets and cells may not be what you want! Especially in a UDF and you could have multiple workbooks open!

    Range(r2(1), r2(1).End(xlDown)) 

    should be

    wb.ws.Range(r2(1), r2(1).End(xlDown)) ' wb and ws is something you define here - more notes below. 

    Default value for discrete

    You only set the value for discrete in a conditional (If) statement in a loop. What if it never fires? What would you like the user to see?

    A simple option is to set a default value at the beginning - the most obvious is an error message. For this to work, the UDF should return a variant:

    Public Function discrete(r1 As Range, r2 As Range) As Variamt discrete = CVErr(xlErrValue) '<-- pick an error value to suit yourself 

    Now that we have set that up - what are some other ways this can fail?

    • user picks ranges that re not one column wide?
    • user picks two ranges that are a different size?

    Now you can test for this early and then exit the function - the default value will be an error and the user will have an indicator of this.

    Altering the parameters

    This is bad:

    Function discrete(r1 As Range, r2 As Range) […] Set r1 = Range(r1(1), r1(1).End(xlDown)) 

    A big no-no. You should always consider the parameters on a UDF as immutable. If you want to reset the ranges, then use a temporary variable within the Function scope (e.g. Dim tempRange1 as Range).

    However, you are double-guessing the user. What if they did not want to go to the end of the page? What if they had other stuff under the table?

    By doing this manipulation, you are increasing the chances that the value arrays will be mismatched. arr1(found,1) may no longer have an equivalent arr2(found,1)!

    Working with arrays

    Yes, always practice working with arrays where you can. Even replace the Application.Sum call. You can never tell when your user is going to set up a 1000-value discrete array!

    Some suggestions

    1. Use only a single range in the UDF call (Public Function discrete(twoColTable As Range) As Variant. This reduces the opportunity for user error. You array will be two columns arr(found,1) will correspond with arr(found,2)
    2. Check your user input immediately for size and values to make sure you are getting what you think you should. If column count does not equal 2, or your second column is not ascending numbers then exit with an error.
    3. Traverse the array from beginning until you find a value that is greater then your n. The value you want will be the one before you hit that value (i.e. go back one array index). Alternatively, traverse the array from the end until you find a value that is not greater than your n. This way, you do not need max or min.
    \$\endgroup\$
    4
    • \$\begingroup\$You've brought up a lot of interesting points. In fact all your major points give me something to think about and improve not just for my code above but in general.\$\endgroup\$CommentedNov 16, 2019 at 19:03
    • \$\begingroup\$Puzzling indeed\$\endgroup\$CommentedNov 18, 2019 at 17:22
    • \$\begingroup\$@AJD OMG, sorry. I just realized this I inadvertently down voted this answer. Lesson learned, don't hand your toddler your phone with SO open in the background. Nice answer BTW :) +1\$\endgroup\$CommentedNov 26, 2019 at 2:33
    • \$\begingroup\$@RyanWildry :-)\$\endgroup\$
      – AJD
      CommentedNov 26, 2019 at 18:45

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.