Thank you all for all your great answers. I ended up with the following solution, that I would like to share.
Before I go into any more detail about the whys and hows, here's the tl;dr: my shiny new script :-)
#!/usr/bin/env bash # # Generates a random integer in a given range # computes the ceiling of log2 # i.e., for parameter x returns the lowest integer l such that 2**l >= x log2() { local x=$1 n=1 l=0 while (( x>n && n>0 )) do let n*=2 l++ done echo $l } # uses $RANDOM to generate an n-bit random bitstring uniformly at random # (if we assume $RANDOM is uniformly distributed) # takes the length n of the bitstring as parameter, n can be up to 60 bits get_n_rand_bits() { local n=$1 rnd=$RANDOM rnd_bitlen=15 while (( rnd_bitlen < n )) do rnd=$(( rnd<<15|$RANDOM )) let rnd_bitlen+=15 done echo $(( rnd>>(rnd_bitlen-n) )) } # alternative implementation of get_n_rand_bits: # uses /dev/urandom to generate an n-bit random bitstring uniformly at random # (if we assume /dev/urandom is uniformly distributed) # takes the length n of the bitstring as parameter, n can be up to 56 bits get_n_rand_bits_alt() { local n=$1 local nb_bytes=$(( (n+7)/8 )) local rnd=$(od --read-bytes=$nb_bytes --address-radix=n --format=uL /dev/urandom | tr --delete " ") echo $(( rnd>>(nb_bytes*8-n) )) } # for parameter max, generates an integer in the range {0..max} uniformly at random # max can be an arbitrary integer, needs not be a power of 2 rand() { local rnd max=$1 # get number of bits needed to represent $max local bitlen=$(log2 $((max+1))) while # could use get_n_rand_bits_alt instead if /dev/urandom is preferred over $RANDOM rnd=$(get_n_rand_bits $bitlen) (( rnd > max )) do : done echo $rnd } # MAIN SCRIPT # check number of parameters if (( $# != 1 && $# != 2 )) then cat <<EOF 1>&2 Usage: $(basename $0) [min] max Returns an integer distributed uniformly at random in the range {min..max} min defaults to 0 (max - min) can be up to 2**60-1 EOF exit 1 fi # If we have one parameter, set min to 0 and max to $1 # If we have two parameters, set min to $1 and max to $2 max=0 while (( $# > 0 )) do min=$max max=$1 shift done # ensure that min <= max if (( min > max )) then echo "$(basename $0): error: min is greater than max" 1>&2 exit 1 fi # need absolute value of diff since min (and also max) may be negative diff=$((max-min)) && diff=${diff#-} echo $(( $(rand $diff) + min ))
Save that to ~/bin/rand
and you have at your availability a sweet random function in bash that can sample an integer in a given arbitrary range. The range may contain negative and positive integers and can be up to 260-1 in length:
$ rand Usage: rand [min] max Returns an integer distributed uniformly at random in the range {min..max} min defaults to 0 (max - min) can be up to 2**60-1 $ rand 1 10 9 $ rand -43543 -124 -15757 $ rand -3 3 1 $ for i in {0..9}; do rand $((2**60-1)); done 777148045699177620 456074454250332606 95080022501817128 993412753202315192 527158971491831964 336543936737015986 1034537273675883580 127413814010621078 758532158881427336 924637728863691573
All ideas by the other answerers were great. The answers by terdon, J.F. Sebastian, and jimmij used external tools to do the task in a simple and efficient manner. However, I preferred a true bash solution for maximum portability, and maybe a little bit, simply out of love for bash ;)
Ramesh's and l0b0's answers used /dev/urandom
or /dev/random
in combination with od
. That's good, however, their approaches had the disadvantage of only being able to sample random integers in the range 0 to 28n-1 for some n, since this method samples bytes, i.e., bitstrings of length 8. These are quite big jumps with increasing n.
Finally, Falco's answer describes the general idea how this could be done for arbitrary ranges (not only powers of two). Basically, for a given range {0..max}
, we can determine what the next power of two is, i.e., exactly how many bits are required to represent max
as a bitstring. Then we can sample just that many bits and see whether this bistring, as an integer, is greater than max
. If so, repeat. Since we sample just as many bits as are required to represent max
, each iteration has a probability greater or equal than 50% of succeeding (50% in the worst case, 100% in the best case). So this is very efficient.
My script is basically a concrete implementation of Falco's answer, written in pure bash and highly efficient since it uses bash's built-in bitwise operations to sample bitstrings of the desired length. It additionally honors an idea by Eliah Kagan that suggests to use the built-in $RANDOM
variable by concatening bitstrings resulting from repeated invocations of $RANDOM
. I actually implemented both the possibilities to use /dev/urandom
and $RANDOM
. By default the above script uses $RANDOM
. (And ok, if using /dev/urandom
we need od and tr, but these are backed by POSIX.)
So how does it work?
Before I get into this, two observations:
It turns out bash can't handle integers larger than 263-1. See for yourself:
$ echo $((2**63-1)) 9223372036854775807 $ echo $((2**63)) -9223372036854775808
It would appear that bash internally uses signed 64-bit integers to store integers. So, at 263 it "wraps around" and we get a negative integer. So we can't hope to get any range larger than 263-1 with whatever random function we use. Bash simply can't handle it.
Whenever we want to sample a value in an arbitrary range between min
and max
with possibly min != 0
, we can simply sample a value between 0
and max-min
instead and then add min
to the final result. This works even if min
and possibly also max
are negative, but we need to be careful to sample a value between 0
and the absolute value ofmax-min
. So then, we can focus on how to sample a random value between 0
and an arbitrary positive integer max
. The rest is easy.
Step 1: Determine how many bits are needed to represent an integer (the logarithm)
So for a given value max
, we want to know just how many bits are needed to represent it as a bitstring. This is so that later we can randomly sample only just as many bits as are needed, which makes the script so efficient.
Let's see. Since with n
bits, we can represent up to the value 2n-1, then the number n
of bits needed to represent an arbitrary value x
is ceiling(log2(x+1)). So, we need a function to compute the ceiling of a logarithm to the base 2. It is rather self-explanatory:
log2() { local x=$1 n=1 l=0 while (( x>n && n>0 )) do let n*=2 l++ done echo $l }
We need the condition n>0
so if it grows too great, wraps around and becomes negative, the loop is guaranteed to terminate.
Step 2: Sample a random a bitstring of length n
The most portable ideas are to either use /dev/urandom
(or even /dev/random
if there is a strong reason) or bash's built-in $RANDOM
variable. Let's look at how to do it with $RANDOM
first.
Option A: Using $RANDOM
This uses the idea mentioned by Eliah Kagan. Basically, since $RANDOM
samples a 15-bit integer, we can use $((RANDOM<<15|RANDOM))
to sample a 30-bit integer. That means, shift a first invocation of $RANDOM
by 15 bits to the left, and apply a bitwise or with a second invocation of $RANDOM
, effectively concatening two independently sampled bitstrings (or at least as independent as bash's built-in $RANDOM
goes).
We can repeat this to obtain a 45-bit or 60-bit integer. After that bash can't handle it anymore, but this means we can easily sample a random value between 0 and 260-1. So, to sample an n-bit integer, we repeat the procedure until our random bitstring, whose length grows in 15-bit steps, has a length greater or equal than n. Finally, we cut off the bits that are too much by appropriately bitwise shifting to the right, and we end up with a n-bit random integer.
get_n_rand_bits() { local n=$1 rnd=$RANDOM rnd_bitlen=15 while (( rnd_bitlen < n )) do rnd=$(( rnd<<15|$RANDOM )) let rnd_bitlen+=15 done echo $(( rnd>>(rnd_bitlen-n) )) }
Option B: Using /dev/urandom
Alternatively, we can use od
and /dev/urandom
to sample an n-bit integer. od
will read bytes, i.e., bitstrings of length 8. Similarly as in the previous method, we sample just so many bytes that the equivalent number of sampled bits is greater or equal than n, and cut off the bits that are too much.
The lowest number of bytes needed to get at least n bits is the lowest multiple of 8 that is greater or equal than n, i.e., floor((n+7)/8).
This only works up to 56-bit integers. Sampling one more byte would get us an 64-bit integer, i.e., a value up to 264-1, which bash can't handle.
get_n_rand_bits_alt() { local n=$1 local nb_bytes=$(( (n+7)/8 )) local rnd=$(od --read-bytes=$nb_bytes --address-radix=n --format=uL /dev/urandom | tr --delete " ") echo $(( rnd>>(nb_bytes*8-n) )) }
Putting the pieces together: Get random integers in arbitrary ranges
We can sample n
-bit bitstrings now, but we want to sample integers in a range from 0
to max
, uniformly at random, where max
may be arbitrary, not necessarily a power of two. (We can't use modulo as that creates a bias.)
The whole point why we tried so hard to sample just as many bits as are needed to represent the value max
, is that we can now safely (and efficiently) use a loop to repeatedly sample an n
-bit bitstring until we sample a value that is lower or equal to max
. In the worst case (max
is a power of two), each iteration terminates with a probability of 50%, and in the best case (max
is a power of two minus one), the first iteration terminates with certainty.
rand() { local rnd max=$1 # get number of bits needed to represent $max local bitlen=$(log2 $((max+1))) while # could use get_n_rand_bits_alt instead if /dev/urandom is preferred over $RANDOM rnd=$(get_n_rand_bits $bitlen) (( rnd > max )) do : done echo $rnd }
Wrapping things up
Finally, we want to sample integers between min
and max
, where min
and max
can be arbitrary, even negative. As previously mentioned, this is now trivial.
Let's put it all in a bash script. Do some argument parsing stuff... We want two arguments min
and max
, or only one argument max
, where min
defaults to 0
.
# check number of parameters if (( $# != 1 && $# != 2 )) then cat <<EOF 1>&2 Usage: $(basename $0) [min] max Returns an integer distributed uniformly at random in the range {min..max} min defaults to 0 (max - min) can be up to 2**60-1 EOF exit 1 fi # If we have one parameter, set min to 0 and max to $1 # If we have two parameters, set min to $1 and max to $2 max=0 while (( $# > 0 )) do min=$max max=$1 shift done # ensure that min <= max if (( min > max )) then echo "$(basename $0): error: min is greater than max" 1>&2 exit 1 fi
...and, finally, to sample uniformly at random a value between min
and max
, we sample a random integer between 0
and the absolute value of max-min
, and add min
to the final result. :-)
diff=$((max-min)) && diff=${diff#-} echo $(( $(rand $diff) + min ))
Inspired by this, I might try to use dieharder to test and benchmark this PRNG, and put my findings in here. :-)
rand=$(command)
do ifcommand
returns an iteger that fulfills your requirements?dd if=/dev/urandom 2>/dev/null
and piping that throughod -t d
(avoids the detour through base64), but it's not clear to me how the conversion happens and whether it is indeed unbiased. If you can expand your idea to an efficient, working script and explain why there's no bias, it would make for a great answer. :)python
orperl
or your favorite language, but this is not available everywhere. I'd prefer something more portable. Well,awk
's random function would be fine, I guess. But the more portable, the better :)perl -e 'print int(rand(2**32-1))');
. That is pretty darn portable and will be very fast. Awk won't cut it since most implementations start from the same seed. So you get the same random number on subsequent runs. It only changes within the same run.