Ok. Transitioning to RLE blitting haven't improved the performance that much in general case - just 20% speedup (70% when drawing fully opaque sprites, without any blending or effects), but code complexity greatly increased. Still most of my sprites are opaque. Also, for blitter, GCC compiled code is 30% faster than the Clang one.
One thing I noticed while measuring performance (for both rle and non-rle code) was that at times my code completed two times faster, which is impossible because test used all static data (a single sprite which it blitted a million of times in a loop), with only variables being branch prediction, while I have 0% CPU load, and it doesn't make any syscall inside measured code. What does that even mean? Branch misprediction does affect performance, but not two times in the long run, because it would quickly retrain the cache on thousandth iteration.
Broken scheduling or OSX intentionally slowing down the code? Or maybe the Intel CPU itself does that? My MacBook is relatively old, so if it has any time bomb, it would be activated by now. Or maybe that is the infamous Meltdown fix slowing down my code two times? How does one disable the Meltdown patch? For Linux there is https://make-linux-fast-again.com/
, but what about OSX? I don't care about security - it is overrated.
Another interesting finding, with GCC __builtin_expect makes code faster, but in Clang the same __builtin_expect makes the same code slower. As if Clang intentionally uses that info to incur misprediction. WTF?
There is also an article, https://www.theverge.com/2018/11/12/18077166/apple-macbook-air-mac-mini-t2-chip-security-repair-replacement-tool Current Mood: annoyed