sort_by{ rand }
is a common idiom to shuffle an array in Ruby. It's short, looks good and is fairly efficient
(despite being ) since Array#sort_by is done at C speed.
Unfortunately, it is biased: if the same rand() value is returned for two different elements,
their relative order will be preserved.
%w[a b c d].sort_by{0}# => ["a", "b", "c", "d"]i=0%w[a b c d].sort_by{10-(i+=1)/2}# => ["d", "b", "c", "a"]
This means that permutations preserving the relative order of one (or
more) pair of elements of the original array are a bit more probable.
How often will that happen? The Mersenne Twister PRNG used by Ruby has
been proven to have a period of , so it
would seem at first sight that the answer is "not before the sun turns
into a nova". But Kernel#rand does not return the full state of the
PRNG: you only get 53 bits of pseudo-randomness out of a call to rand(). This
is becoming more interesting.
The birthday paradox
The birthday paradox states that given 23 people in a room, the odds are over
50% that two of them will have the same birthday.
It is not actually a paradox (there's no self-contradiction), it
merely defies common intuition.
It turns out that the problem with sort_by{ rand } is another instance of the
birthday paradox. In this case, we have a room with N people (N being the size
of the array to shuffle) and there are "days in a year".
Let's try to reproduce the 23-50% result, and apply the method to the other one.
There are
ways to assign a birthday to m people without any repetition. Therefore,
the probability that at least two people share a birthday is
In Ruby terms, we first have to define factorial:
class Integerdef fact0caseselfwhen0..11elseself*(self-1).fact0endendend
This is fairly slow and will typically fail for or so,
depending on the stack size. Stirling's approximation proves helpful here: we'll
use
class Integerdef factMath.sqrt(2*Math::PI*self)*Math.exp(-self)*Math.exp(self*Math.log(self))endend
It compares rather well to the recursive factorial:
Something's wrong, and the culprit is not hard to locate:
A(365,20)# => NaN
BigDecimal to the rescue
Ruby's standard library includes an extension for arbitrary precision
floats, BigDecimal, whose very name suggests that we can think of it as
Bignum's cousin. BigDecimal objects have to be initialized with a String, but
that's not much of a problem
require'bigdecimal'require'bigdecimal/math'BigDecimal.limit(100)class IntegerBigMath=Class.new{extend::BigMath}def fact(prec=10)# !> method redefined; discarding old factBigDecimal.new((Math.sqrt(2*Math::PI*self)*Math.exp(-self)).to_s)*BigMath.exp(BigDecimal.new(self.to_s)*BigMath.log(BigDecimal.new(self.to_s),prec),prec)endend