Packing bits harder
In previous posts ChatGPT recommended me the hashmap approach.
But it came out to be slow.
The issue boiled down to the https://nothings.org/stb_ds/
Which I use, since it is public domain and good enough.
It is open hashing, but isn't template.
And it also maintains a separate search index.
Such search makes the use easier.
I.e. no need for nil keys and no gaps between elements.
But it is it is not optimal.
So I quickly put together my own hashmap.
And it came to be 2/3 faster.
Apparently one can also dynamically pick a hash function,
depending on bit patterns.
We can change hash function on every growth event.
And as long as keys are below 32bit, we don't need 64bits
In addition to ranges, one can use an actual bitmap.
Then the speed becomes passable.
Hashmap use appears to be mandatory to handle overly sparse cases.
Alternative to hash map would be page table + 1st page offset.
But can grow too large for the 64bit address space.
Hashmap doesn't have such vulnerability.
There is another issue.
Bitmaps fail at moderately sparse data.
Consider 256 bit pages, where bits 0 and 255 are set.
In that case we have to allocate all the bits between them.
Just to store these bits!
People solve that by storing the first 32 bits as bytes.
And more that 32 bits as a bitmap.
Binary search is used to find a byte among these 32.
3rd issue are the degenerate pages, where just a range of bits is set.
These we can store as range or as a special smart bitmap.
For very small bitmaps we can use immediate value.
Instead of allocating memory.
There is no general silver bullet algorithm.
Implementing bitmaps is about doing case analysis.
And then handling each case separately.
Also, while researching a faster way to search for a byte index
among 32bytes, I found that one can do SIMD without SSE,
with a 64bit CPU:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdbool.h>
uint64_t clear_byte(uint64_t value, uint8_t target) {
// Step 1: Broadcast the target byte to all byte positions
uint64_t broadcast = target * 0x0101010101010101ULL;
// Step 2: XOR the broadcasted value with the original value
uint64_t xored = value ^ broadcast;
// Step 3: Create a mask to clear the matching bytes
uint64_t mask = (xored - 0x0101010101010101ULL)
& ~xored
& 0x8080808080808080ULL;
// Step 4: Isolate each byte
mask = (mask >> 7) * 0xFF;
// Step 5: Clear the matching bytes
return value & ~mask;
}
bool has_byte(uint64_t value, uint8_t target) {
// Step 1: Broadcast the target byte to all byte positions
uint64_t broadcast = target * 0x0101010101010101ULL;
// Step 2: XOR the broadcasted value with the original value
uint64_t xored = value ^ broadcast;
// Step 3: Check if any byte is zero
uint64_t result = (xored - 0x0101010101010101ULL)
& ~xored
& 0x8080808080808080ULL;
return result != 0;
}
int main() {
uint64_t value = 0x1234567890abcdefULL;
for (int i = 0; i < 256; i++)
if (has_byte(value, i)) printf("has byte 0x%02x\n", i);
printf("cleared 0x90: %llx\n", clear_byte(value, 0x90));
return 0;
}