This code is overly terse, clever and (in the bottom two examples) broken. The versions with loops are far more readable, correct/verifiable, efficient and intuitive to understand.
Don't cache array.length
I recommend reading Eric Lippert's performance rant. One key point is that if there's no performance problem that's affecting users, don't sacrifice code quality by default.
Another key point is never to assume there's a performance problem until you've established it empirically through accurate benchmarking and profiling.
You might be surprised how often something looks one way on paper from a performance or time complexity standpoint, but compiler optimizations, CPU optimization, cache performance and/or garbage collection winds up giving radically different results on a real machine and workload. In fact, the most popular post on Stack Overflow is about this exact phenomenon.
Caching array.length
in a variable is a classic premature optimization that can lead to bugs and makes code harder to understand in exchange for only a (generally) negligible performance improvement.
.length
isn't the same as strlen()
in C which walks over each element of the list from start to end. .length
is an O(1) property lookup, a hash table access, the same as any other object property access you'd never think to cache, like the accesses in your reduce
accumulator object.
The overhead of making a function call to your .reduce
callback is likely many factors more overhead than the .length
access, but I haven't profiled so this isn't definitive or the only reason to avoid .reduce
. Compilers can do amazing things; trust them and use the right tool (in this case, .reduce
is the wrong tool for semantic reasons, as I'll explain later).
Similarly, creating unnecessary objects as in destructuring, [a[r], a[i]]
, tends to have high overhead due to heap allocations and garbage collection (although a transpiler will convert this to a classical three-step assignment with a temporary variable, giving you the best of both worlds -- this is only something to bear in mind if the code won't be transpiled for compatibility).
Furthermore, your .reduce
accumulator object introduces potential constant-time memory overhead that wasn't in the original for
loop algorithm. Caching .length
but also allocating objects and making extra function calls inside a loop is penny wise and pound foolish.
Anyway, after all the pains taken, A.l
(and the confusingly-named A.lM1
-- I'd just do array.length - 1
instead of caching a simple math operation) still does an object access to get the length, so there's no performance improvement, even with the extra hoop to jump through. You might as well just use a.length
directly.
From an engineering perspective, detaching .length
from the array it's called on means you've introduced more state. More state leads to more bugs -- that's another piece of data, another abstraction the programmer needs to carry in their mental working set to understand the program. More state leads to cognitive overload, making code harder to grasp at a glance.
It's all too easy to forget you've cached the length, mutate an array and wind up with a stale length. If multiple arrays are in scope, it's all too easy to forget which array's length is associated with which cached variable, especially with a name like l
. Always using array.length
guarantees the most up-to-date length for array
, is completely readable and robust to refactoring and extension.
Only cache .length
if it a accomplishes a particular programmatic goal or you've profiled extensively and found that caching it gives you enough of a speedup that it outweighs the software engineering cost of detaching the property and regressing back to lower-level C, where native arrays don't have built-in lengths. It's highly unlikely that .length
is your app's bottleneck preventing it from delivering value, so you basically never need to do this.
Sure, you can argue this is no problem in a 3-line function, but there's no reason to write 3-line functions any different than the rest of the codebase. Might as well get in the habit of making all code as comprehensible as possible, regardless of size. Small functions can still hide bugs, as is the case in the bottom two code snippets.
Don't return a value for in-place functions
JS offers at least two library functions that operate in-place but return their argument for chaining purposes: .sort()
and .reverse()
. I use .sort()
frequently, so I generally remember to call .slice()
in the chain before it, but I don't know how many times I've forgotten to do the same for .reverse()
and wound up with a bug. Usually, I'll catch it right away, but why take the risk of a silent and potentially subtle logical error like this making it into an app?
If a function is in-place, returning undefined/null as Python does ("None
") is the safest approach -- it's less elegant, but it avoids more problems than the benefits chainability offers. Python has a separate function, sorted
, that's the immutable, value-returning version.
Besides, standard library functions like .sort()
and .reverse()
have well-known and ubiquitously-documented specifications, so their quirks and gotchas are common knowledge. Someone reading your code won't have the luxury of knowing the contract for your function, so I always err on the side of least surprise.
If a programmer really needs a chainable version, they can opt-in to the impure code by writing a wrapper with a clear name. As an example, Ruby uses the bang method syntax .shuffle!
to indicate a dangerous in-place operation even though it returns a value. You could follow this pattern by naming the wrapper shuffleInPlace()
or similar, or provide a version that makes a copy before sorting, shuffleCopy()
. shuffleInPlace(foo).map(...)
seems less likely to hide a bug than shuffle(foo).map(...)
. Seeing this code, however, begs the question why you'd need to chain a shuffle anyway.
Since your function isn't attached to the prototype of Array
(a good thing), it can't be truly chained without an abstraction like Ramda's chain
.
Avoid inline assignments
The code
[a[i], a[r = (i + Math.random() * (l - i)) | 0]] = [a[r], a[i]]
is very difficult to read and is buggy. A minimal reproduction of the cause of the bug is:
[[][console.log("hi")]] = [console.log("bye")];
You'll see that "bye"
is logged, then "hi"
. This shows that the right-hand side is evaluated first, then the left, meaing a[r]
is always a[undefined]
, which evaluates to undefined
.
Instead of trying to push the boundaries of order of evaluation sensibility, spread this into multiple lines, use unabbreviated variable names and prefer Math.floor
to bitwise | 0
. I sometimes use ~~(x)
to floor -- it's not the best habit, but what I like over | 0
is that it looks like a function call, so I find it visually easier to follow.
Creating wrappers on Math.random()
that offer higher-level interfaces into random integers and picking random elements from arrays is a great way to clean up the code.
const randomInt = n => Math.floor(n * Math.random()); // ... const j = randomInt(a.length); [a[j], a[i]] = [a[i], a[j]];
is much cleaner and easy to reason about.
Only use .reduce
for reduction and .map
for mapping
Array#reduce
is very easy to abuse in JS. It's an inherently confusing function that should be used only when the accumulator structure is simple and doesn't have state other than the result itself. For example, summing an array or building a frequency counter for elements in an array are good uses for reduce
. These really are reduction operations, but shuffling an array isn't. Almost always, map
, filter
, forEach
or flatMap
is more appropriate than reduce
.
In this case, .map
is also inappropriate because the callback causes a mutation side-effect. This breaches one of the key ideals of functional programming, immutability. .map
's callback should never do anything but statelessly return the new value for the i
-th element -- it should definitely not mutate other elements in the array or external state elsewhere, and ideally shouldn't read external data other than a[i]
itself.
I hate to keep bringing up Python, but they got it right in relegating reduce
to a second-class citizen. Python and JS are similar in that they're both imperative languages with functional influence. Some programmers prefer to emphasize the functional influence to a greater or lesser extent. In my view, trying too hard to shoehorn functional programming into imperative languages usually lands flat on its face and just makes a mess, except for rare occasions when someone really knows what they're doing.
The best application of functional paradigms in JS for me is using the correct high-level iteration abstractions as appropriate and the immutability, the program safety. Without immutability, the high-level abstractions tend to collapse.
Avoid the comma operator for multiple expressions
Continuing the themes throughout, there's basically never any reason to use the comma operator to separate expressions, including declarations, in JS. It only leads to pain: expensive eye doctor bills and bugs. Just write every statement on its own line. Save minification to the bundler.
Following up with this and a few other points, keep lines to no more than 80 characters wide, preferably 60-70. This policy forces you to write more readably and spread things into multiple, vertical statements rather than long, stream-of-consciousness comma-delimited expressions that mutate state used by other expressions in the chain, wreaking havoc.
Avoid forward-delarations
Continuing with commas, the idiom:
let foo, bar; // ... later on ... foo = 42; bar = 43;
is an antipattern. Just do:
const foo = 42; const bar = 43;
at the point in the code where the variables are needed.
If you always declare everything with const
, scope it as tightly to where the variable was used, and initialize on the spot, dangerous programming habits like assignments in expressions are prohibited.
If you basically never use let
(okay, seems fine in for
loop counters and other specific use cases, but we already agree we don't like counter-based for
loops and will probably barely use them), code quality is forced into improving.
Use linters
A good linter will give you many errors about code like this. Go ahead and drop it into https://eslint.org/demo, then turn on all rules. Some are fairly spurious or innocuous, but others instantly tell you most of the same things humans will tell you on this site manually.
Conclusion
Use the code from Wikipedia that uses a for
loop. This code is pretty much straight from the algorithm master Donald Knuth's bible, TAOCP.
Algorithms are tricky, and my philosophy is to basically copy them verbatim from the smart people that came up with them (usually via Wikipedia, which can have errors but usually doesn't; cross-reference to be sure), with as few "clever improvements" as possible. It's so easy to do it wrong and miss some minor detail.
If you're using a linter that prohibits loops, add a comment to silence linting for this one occasion where a traditional loop makes sense: // eslint-disable-next-line no-loops/no-loops
. I'm against counter loops too, but this is an appropriate time to use one (you basically never need while
though).
My version
const randInt = n => Math.floor(Math.random() * n); const shuffle = a => { for (let i = a.length - 1; i > 0; i--) { const j = randInt(i + 1); [a[i], a[j]] = [a[j], a[i]]; } };
Transpiled for compatibility and performance (no destructuring):
function randInt(n) { return Math.floor(Math.random() * n); } function shuffle(a) { for (var i = a.length - 1; i > 0; i--) { var j = randInt(i + 1); var x = a[i]; a[i] = a[j]; a[j] = x; } }
If you absolutely must chain, write a wrapper with a clear name like:
const shuffleInPlace = a => { shuffle(a); return a; }; // or const shuffleCopy = a => { a = a.slice(); shuffle(a); return a; };
arr.length
in afor
loop is bad" - well, that's a thing of the past, modern engines optimise such code just fine. But even then, you should just use a local constantconst l = arr.length;
in your function, not an accumulator. The cost of accessing.length
on an array is nothing compared to creating an object and passing it to lots of function calls.\$\endgroup\$x
andy
. And once you add those, you might as well just writeconst x = v1.x-v2.x, y = v1.y-v2.y, l = Math.sqrt(x*x+y*y);
anyway. Doing assignments as side effects inside other statements just makes for unreadable code in general.\$\endgroup\$