4
\$\begingroup\$

I am working on a project in which an array can easily grow beyond 50M in length. It's an array holding only boolean (0/1) values. Using Uint8Array is just fine and it's very performant compared normal arrays. Still I just wanted to try implementing a BitArray by extending DataView object. BitArray enormously reduces the memory footprint compared to normal Arrays (1/32) and Uint8Array (1/8). So that's a certain gain but as for speed, despite I did everything to my knowledge BitArray is still slightly slower compared to both unless the size hits the 33,554,433 limit (> 2²⁵). As I have read from this document, at this point the normal Array reaches to ~268MB memory limit and it's internal structure gets switched to NumberDictionary[16] yielding a dramatic slow down in v8. Note that when the length is 33,554,433 the BitArray uses only 4MB of memory.

One other point to note is, for my application i need the total number of 1s in the BitArray (population count) which is the popcount property in below code. Thanks to the blazingly fast Hamming Weight algorithmBitArray is like 80 times faster compared to counting the existing items in array though I am not testing it here.

Any ideas to boost the BitArray up a little are most welcome.

class BitArray extends DataView{ constructor(n,ab){ var abs = n >> 3; // ArrayBuffer Size super(ab instanceof ArrayBuffer ? ab : n & 31 ? new ArrayBuffer(abs + 4 - (abs & 3)) : new ArrayBuffer(abs)); } get length(){ return this.buffer.byteLength*8; } get popcount(){ var m1 = 0x55555555, m2 = 0x33333333, m4 = 0x0f0f0f0f, h01 = 0x01010101, pc = 0, x; for (var i = 0, len = this.buffer.byteLength >> 2; i < len; i++){ x = this.getUint32(i << 2); x -= (x >> 1) & m1; //put count of each 2 bits into those 2 bits x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4 bits x = (x + (x >> 4)) & m4; //put count of each 8 bits into those 8 bits pc += (x * h01) >> 56; } return pc; } // n >> 3 is Math.floor(n/8) // n & 7 is n % 8 at(n){ return this.getUint8(n >> 3) & (1 << (n & 7)) ? 1 : 0; } set(n){ this.setUint8(n >> 3, this.getUint8(n >> 3) | (1 << (n & 7))); } reset(n){ this.setUint8(n >> 3, this.getUint8(n >> 3) & ~(1 << (n & 7))); } slice(a = 0, b = this.length){ return new BitArray(b-a,this.buffer.slice(a >> 3, b >> 3)); } toggle(n){ this.setUint8(n >> 3, this.getUint8(n >> 3) ^ (1 << (n & 7))); } toString(){ return new Uint8Array(this.buffer).reduce((p,c) => p + Array.prototype.reduce.call(c.toString(2).padStart(8,"0"),(f,s) => s+f), "") .slice(0,this.length); } } // Test code starts from here var len = 1e6, // array length tst = 10, // test count arr = Array(len), bar = new BitArray(len), uia = new Uint8Array(len), r1,r2,r3,t = 0; console.log(`There are ${bar.popcount} 1s in the BitArray`); for (var i = 0; i < len; i++){ (Math.random() > 0.5) && ( t++ , bar.set(i) ); } console.log(`${t} .set() ops are made and now there are ${bar.popcount} 1s in the BitArray`); console.time("Array"); for (var k = 0; k < tst; k++){ for (var i = 0; i < len; i++) arr[i] = Math.random() > 0.5 ? 1 : 0; for (var i = 0; i < len; i++) t = arr[1]; r1 = arr.slice(); } console.timeEnd("Array"); console.time("BitArray"); for (var k = 0; k < tst; k++){ for (var i = 0; i < len; i++) Math.random() > 0.5 ? bar.set(i) : bar.reset(i) for (var i = 0; i < len; i++) t = bar.at(i); r2 = bar.slice(); } console.timeEnd("BitArray"); console.time("Uint8Array"); for (var k = 0; k < tst; k++){ for (var i = 0; i < len; i++) uia[i] = Math.random() > 0.5 ? 1 : 0; for (var i = 0; i < len; i++) t = uia[i]; r3 = uia.slice(); } console.timeEnd("Uint8Array");

\$\endgroup\$
1
  • 1
    \$\begingroup\$Would be handy to include a few unit tests. I was thinking of playing around with the code, but first I would have to add tests to make sure I actually help and don't break anything at the same time.\$\endgroup\$
    – K.H.
    CommentedOct 10, 2022 at 15:34

1 Answer 1

2
\$\begingroup\$

abs + 4 - (abs & 3)

Could be simplified: (abs + 3) & -4

They are not equivalent expressions, but they are equivalent when abs is not a multiple of 4. Actually the simpler one is more convenient in this case: with this expression, you can skip the ternary, because it won't improperly round up multiples of 4 so that's no longer a special case.

popcount trickery

You're using a good trick already, but there is still a possibility for more. One interesting property of the trick you used is that when it has done a couple of steps, it starts to waste some of its potential, by which I mean for example that the third step, (x + (x >> 4)) & m4, adds adjacent nibbles, but the values in those nibbles range from 0 to 4 (inclusive), not the range 0 to 15. If the loop was unrolled a factor of two, then instead of having two whole copies of the popcount trick, only the first half needs to be duplicated: then the intermediate results could be added (producing nibbles with values in the range 0 to 8, inclusive), and the second half of the trick (the third and fourth steps) needs only to be performed once.

There are more tricks, for example given 3 uint32s (a, b, c), we only need 2 32-bit popcounts to count all the bits:

b0 = a ^ b ^ c b1 = (a & (b | c)) | (b & c) count = popcount(b0) + 2 * popcount(b1) 

The bit-manipulation steps are effectively a bit-level 3-input addition, with the 2-bit result spread across two variables: at each position, b0 holds bit 0 of the sum at that position, b1 holds bit 1 of the sum. This can be generalized to combining 7 inputs into 3, 15 into 4, etc. Even better versions of this idea can be found in Faster Population Counts Using AVX2 Instructions (of course you won't be using AVX2 in JavaScript, but some of the same techniques can be applied to reduce the number of actual popcounts).

\$\endgroup\$
3
  • \$\begingroup\$Thank you for the answer. One point is, regardless the demanded BitArray size it should be a multiple of 32 as I make sequential 32 bit readings for the popcount function. The excess bits are zero anyways. In other words if the demanded size is 100 bits then I have to arange 128 bits (16 bytes) at least. Here abs is just the base that I have to calculate the extra number bytes needed to make the total number of bits a multiple of 32. Otherwise your information and the linked paper is very interesting. I will study them hoping to improve the popcount functionality.\$\endgroup\$
    – Redu
    CommentedOct 16, 2022 at 18:15
  • \$\begingroup\$@Redu I'm not sure what you mean, you're rounding up the count and so does the alternative I presented. Same thing (in this context), just simpler. You can write it as ((n + 31) & -32) >> 3 also and skip abs.\$\endgroup\$CommentedOct 16, 2022 at 19:21
  • \$\begingroup\$Now seeing this, obviously my implementation seems so naive. Honestly it took me a while to grasp ((n + 31) & -32) >> 3 but this seems to be exactly how this job is supposed to be implemented or perhaps ((n + 31) & ~31) >> 3. Rockstar.\$\endgroup\$
    – Redu
    CommentedOct 16, 2022 at 22:38

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.