Fisher Yates Shuffle, but using Javascript Array Functions

Question

I have been looking at the Fisher Yates randomizing suggestions on StackOverflow after ruling out arr.sort(v => Math.random() - 0.5) due to lack of randomness in the latter.

The Fisher Yates functions look OK, but used while loops etc. Alternately, the javascript array functions (map, reduce, etc) have the obvious problem of going forward, so you keep on needing to access the length of the array to impliment it (which obviously drains a bit of time).

So I wrote my own Fisher Yates using reduce.

I wanted it to

Alter the original array, as sort does.
Return the original array, so I could put it at the start of a chain of array functions, eg let myarr = [1, 2, 3, 4, 5]; let newARR = fyRandArr(myarr).filter(v => v !== 3);
The last iteration of the reduce does not need to do anything, because that element is already randomly in position.

My function works and does everything above.

I am a hobby programmer so am not sure if this is considered well written. I have no one to ask. How can it be improved?

 const fyRandArr = arr => arr.reduce((A, v, i, a) => { let r; return i === A.lM1 ? a : (r = i + Math.floor(Math.random() * (A.l - i)), [a[i], a[r]] = [a[r], a[i]], A); }, {l: arr.length, lM1: arr.length -1});

Obviously, if you want a pure function, ie producing a new array in the shuffle, you can just extract the reduce bit and not bother with the fyRandArr.

The ones I did not like:

This one used while and splice

The ticked answer I liked, but still used a while loop

Conclusion:

So from the comments, it appears

the comma operator is not liked
different type accumulators are frowned upon
persistently passing length in an object accumulator is more 'expensive' than calculating the length at each iteration.

With all that, it appears the following is better:

const fyRandArr = arr => arr.reduce((a, v, i) => { let r, l = a.length; [a[i], a[r = (i + Math.random() * (l - i)) | 0]] = [a[r], a[i]]; return a; }, arr);

or

const fyRandArr = arr => { arr.map((v, i, a) => { let r, l = a.length; [a[i], a[r = (i + Math.random() * (l - i)) | 0]] = [a[r], a[i]]; return a[i]; }); return arr; };

"The Fisher Yates functions look OK" - which ones did you look at? It might help to see the code that you started from (or at least links to it). — Bergi, CommentedAug 24, 2021 at 18:24
Am I missing something or does the accumulator never change? — Bergi, CommentedAug 24, 2021 at 18:25
"I have been told in the past that accessing arr.length in a for loop is bad" - well, that's a thing of the past, modern engines optimise such code just fine. But even then, you should just use a local constant const l = arr.length; in your function, not an accumulator. The cost of accessing .length on an array is nothing compared to creating an object and passing it to lots of function calls. — Bergi, CommentedAug 24, 2021 at 18:37
Mostly because it isn't self-contained - it needs some extra declarations for the variables x and y. And once you add those, you might as well just write const x = v1.x-v2.x, y = v1.y-v2.y, l = Math.sqrt(x*x+y*y); anyway. Doing assignments as side effects inside other statements just makes for unreadable code in general. — Bergi, CommentedAug 24, 2021 at 18:52
Yes, but you were asking for well-written code not for code with the minimum number of bytes :-) — Bergi, CommentedAug 24, 2021 at 19:00

ggorlen · Accepted Answer · 2021-08-25 20:15:59Z

This code is overly terse, clever and (in the bottom two examples) broken. The versions with loops are far more readable, correct/verifiable, efficient and intuitive to understand.

Don't cache `array.length`

I recommend reading Eric Lippert's performance rant. One key point is that if there's no performance problem that's affecting users, don't sacrifice code quality by default.

Another key point is never to assume there's a performance problem until you've established it empirically through accurate benchmarking and profiling.

You might be surprised how often something looks one way on paper from a performance or time complexity standpoint, but compiler optimizations, CPU optimization, cache performance and/or garbage collection winds up giving radically different results on a real machine and workload. In fact, the most popular post on Stack Overflow is about this exact phenomenon.

Caching array.length in a variable is a classic premature optimization that can lead to bugs and makes code harder to understand in exchange for only a (generally) negligible performance improvement.

.length isn't the same as strlen() in C which walks over each element of the list from start to end. .length is an O(1) property lookup, a hash table access, the same as any other object property access you'd never think to cache, like the accesses in your reduce accumulator object.

The overhead of making a function call to your .reduce callback is likely many factors more overhead than the .length access, but I haven't profiled so this isn't definitive or the only reason to avoid .reduce. Compilers can do amazing things; trust them and use the right tool (in this case, .reduce is the wrong tool for semantic reasons, as I'll explain later).

Similarly, creating unnecessary objects as in destructuring, [a[r], a[i]], tends to have high overhead due to heap allocations and garbage collection (although a transpiler will convert this to a classical three-step assignment with a temporary variable, giving you the best of both worlds -- this is only something to bear in mind if the code won't be transpiled for compatibility).

Furthermore, your .reduce accumulator object introduces potential constant-time memory overhead that wasn't in the original for loop algorithm. Caching .length but also allocating objects and making extra function calls inside a loop is penny wise and pound foolish.

Anyway, after all the pains taken, A.l (and the confusingly-named A.lM1 -- I'd just do array.length - 1 instead of caching a simple math operation) still does an object access to get the length, so there's no performance improvement, even with the extra hoop to jump through. You might as well just use a.length directly.

From an engineering perspective, detaching .length from the array it's called on means you've introduced more state. More state leads to more bugs -- that's another piece of data, another abstraction the programmer needs to carry in their mental working set to understand the program. More state leads to cognitive overload, making code harder to grasp at a glance.

It's all too easy to forget you've cached the length, mutate an array and wind up with a stale length. If multiple arrays are in scope, it's all too easy to forget which array's length is associated with which cached variable, especially with a name like l. Always using array.length guarantees the most up-to-date length for array, is completely readable and robust to refactoring and extension.

Only cache .length if it a accomplishes a particular programmatic goal or you've profiled extensively and found that caching it gives you enough of a speedup that it outweighs the software engineering cost of detaching the property and regressing back to lower-level C, where native arrays don't have built-in lengths. It's highly unlikely that .length is your app's bottleneck preventing it from delivering value, so you basically never need to do this.

Sure, you can argue this is no problem in a 3-line function, but there's no reason to write 3-line functions any different than the rest of the codebase. Might as well get in the habit of making all code as comprehensible as possible, regardless of size. Small functions can still hide bugs, as is the case in the bottom two code snippets.

Don't return a value for in-place functions

JS offers at least two library functions that operate in-place but return their argument for chaining purposes: .sort() and .reverse(). I use .sort() frequently, so I generally remember to call .slice() in the chain before it, but I don't know how many times I've forgotten to do the same for .reverse() and wound up with a bug. Usually, I'll catch it right away, but why take the risk of a silent and potentially subtle logical error like this making it into an app?

If a function is in-place, returning undefined/null as Python does ("None") is the safest approach -- it's less elegant, but it avoids more problems than the benefits chainability offers. Python has a separate function, sorted, that's the immutable, value-returning version.

Besides, standard library functions like .sort() and .reverse() have well-known and ubiquitously-documented specifications, so their quirks and gotchas are common knowledge. Someone reading your code won't have the luxury of knowing the contract for your function, so I always err on the side of least surprise.

If a programmer really needs a chainable version, they can opt-in to the impure code by writing a wrapper with a clear name. As an example, Ruby uses the bang method syntax .shuffle! to indicate a dangerous in-place operation even though it returns a value. You could follow this pattern by naming the wrapper shuffleInPlace() or similar, or provide a version that makes a copy before sorting, shuffleCopy(). shuffleInPlace(foo).map(...) seems less likely to hide a bug than shuffle(foo).map(...). Seeing this code, however, begs the question why you'd need to chain a shuffle anyway.

Since your function isn't attached to the prototype of Array (a good thing), it can't be truly chained without an abstraction like Ramda's chain.

Avoid inline assignments

The code

[a[i], a[r = (i + Math.random() * (l - i)) | 0]] = [a[r], a[i]]

is very difficult to read and is buggy. A minimal reproduction of the cause of the bug is:

[[][console.log("hi")]] = [console.log("bye")];

You'll see that "bye" is logged, then "hi". This shows that the right-hand side is evaluated first, then the left, meaing a[r] is always a[undefined], which evaluates to undefined.

Instead of trying to push the boundaries of order of evaluation sensibility, spread this into multiple lines, use unabbreviated variable names and prefer Math.floor to bitwise | 0. I sometimes use ~~(x) to floor -- it's not the best habit, but what I like over | 0 is that it looks like a function call, so I find it visually easier to follow.

Creating wrappers on Math.random() that offer higher-level interfaces into random integers and picking random elements from arrays is a great way to clean up the code.

const randomInt = n => Math.floor(n * Math.random()); // ... const j = randomInt(a.length); [a[j], a[i]] = [a[i], a[j]];

is much cleaner and easy to reason about.

Only use `.reduce` for reduction and `.map` for mapping

Array#reduce is very easy to abuse in JS. It's an inherently confusing function that should be used only when the accumulator structure is simple and doesn't have state other than the result itself. For example, summing an array or building a frequency counter for elements in an array are good uses for reduce. These really are reduction operations, but shuffling an array isn't. Almost always, map, filter, forEach or flatMap is more appropriate than reduce.

In this case, .map is also inappropriate because the callback causes a mutation side-effect. This breaches one of the key ideals of functional programming, immutability. .map's callback should never do anything but statelessly return the new value for the i-th element -- it should definitely not mutate other elements in the array or external state elsewhere, and ideally shouldn't read external data other than a[i] itself.

I hate to keep bringing up Python, but they got it right in relegating reduce to a second-class citizen. Python and JS are similar in that they're both imperative languages with functional influence. Some programmers prefer to emphasize the functional influence to a greater or lesser extent. In my view, trying too hard to shoehorn functional programming into imperative languages usually lands flat on its face and just makes a mess, except for rare occasions when someone really knows what they're doing.

The best application of functional paradigms in JS for me is using the correct high-level iteration abstractions as appropriate and the immutability, the program safety. Without immutability, the high-level abstractions tend to collapse.

Avoid the comma operator for multiple expressions

Continuing the themes throughout, there's basically never any reason to use the comma operator to separate expressions, including declarations, in JS. It only leads to pain: expensive eye doctor bills and bugs. Just write every statement on its own line. Save minification to the bundler.

Following up with this and a few other points, keep lines to no more than 80 characters wide, preferably 60-70. This policy forces you to write more readably and spread things into multiple, vertical statements rather than long, stream-of-consciousness comma-delimited expressions that mutate state used by other expressions in the chain, wreaking havoc.

Avoid forward-delarations

Continuing with commas, the idiom:

let foo, bar; // ... later on ... foo = 42; bar = 43;

is an antipattern. Just do:

const foo = 42; const bar = 43;

at the point in the code where the variables are needed.

If you always declare everything with const, scope it as tightly to where the variable was used, and initialize on the spot, dangerous programming habits like assignments in expressions are prohibited.

If you basically never use let (okay, seems fine in for loop counters and other specific use cases, but we already agree we don't like counter-based for loops and will probably barely use them), code quality is forced into improving.

Use linters

A good linter will give you many errors about code like this. Go ahead and drop it into https://eslint.org/demo, then turn on all rules. Some are fairly spurious or innocuous, but others instantly tell you most of the same things humans will tell you on this site manually.

Conclusion

Use the code from Wikipedia that uses a for loop. This code is pretty much straight from the algorithm master Donald Knuth's bible, TAOCP.

Algorithms are tricky, and my philosophy is to basically copy them verbatim from the smart people that came up with them (usually via Wikipedia, which can have errors but usually doesn't; cross-reference to be sure), with as few "clever improvements" as possible. It's so easy to do it wrong and miss some minor detail.

If you're using a linter that prohibits loops, add a comment to silence linting for this one occasion where a traditional loop makes sense: // eslint-disable-next-line no-loops/no-loops. I'm against counter loops too, but this is an appropriate time to use one (you basically never need while though).

My version

const randInt = n => Math.floor(Math.random() * n); const shuffle = a => { for (let i = a.length - 1; i > 0; i--) { const j = randInt(i + 1); [a[i], a[j]] = [a[j], a[i]]; } };

Transpiled for compatibility and performance (no destructuring):

function randInt(n) { return Math.floor(Math.random() * n); } function shuffle(a) { for (var i = a.length - 1; i > 0; i--) { var j = randInt(i + 1); var x = a[i]; a[i] = a[j]; a[j] = x; } }

If you absolutely must chain, write a wrapper with a clear name like:

const shuffleInPlace = a => { shuffle(a); return a; }; // or const shuffleCopy = a => { a = a.slice(); shuffle(a); return a; };

This is useful, not least because it is contrary to stuff I have read and been misguided by recently. — Rewind, CommentedAug 25, 2021 at 19:54
Two questions: 1. why don't you ultimately use arrow functions on randInt and shuffle? 2. I have read using map, reduce, etc is more readable than using for and while loops . Who is right? — Rewind, CommentedAug 25, 2021 at 20:04
Can you show what you've read recently? I'm betting these weren't shuffles. I'm not saying don't use map and reduce -- I use map constantly, and reduce probably slightly more often, or about as often, as traditional counter loops (not very often). This is a rare occasion where these functions are inappropriate. Swapping elements is not a clear-cut mapping or a reduction operation, so it's best to stick to the classical loop. I do use arrow functions, but if you transpile the code or write directly for browser compatibility, you'd use function, so I show the transpiled code as well. — ggorlen, CommentedAug 25, 2021 at 20:05

Stack Exchange Network

Fisher Yates Shuffle, but using Javascript Array Functions

1 Answer 1

Don't cache `array.length`

Don't return a value for in-place functions

Avoid inline assignments

Only use `.reduce` for reduction and `.map` for mapping

Avoid the comma operator for multiple expressions

Avoid forward-delarations

Use linters

Conclusion

My version

Hot Network Questions

Fisher Yates Shuffle, but using Javascript Array Functions

1 Answer 1

Don't cache array.length

Don't return a value for in-place functions

Avoid inline assignments

Only use .reduce for reduction and .map for mapping

Avoid the comma operator for multiple expressions

Avoid forward-delarations

Use linters

Conclusion

My version

Related

Hot Network Questions

Don't cache `array.length`

Only use `.reduce` for reduction and `.map` for mapping