nancygold's Journal
 
[Most Recent Entries] [Calendar View] [Friends View]

Wednesday, May 22nd, 2024

    Time Event
    12:58a
    How ECS differs from multithreading?
    Rewriting your program into ECS opens it for parallel processing, but in an unusual manner.

    Consider a task of counting the occurrences of word "needle". In ECS it will look like
    In: "abc def ghi needle jkl mno"
    
    cls reader ptr end word count!!0
    
    $read_word =
      if $ptr >< End:
        $ptr = No
        pass
      ...
    
    place_readers_on_input In
    
    Total 0
    while ecs_has reader.ptr:
      foreach reader.ptr: $read_word
      foreach reader.word: if $word >< needle: $count+
      fold_each reader.count:
        Total += $count
        $count = 0
    
    say "The word `needle` occurred [Total] times"
    


    We have reader class, which is used to instantiate reader entities, which are placed around input.
    Then we have two system phases:
    The first one reads words;
    The second one checking if any of the word is needle.

    Each of these phases can be spread among multiple threads.
    But the phases themselves must be sequential.

    In normal multithreading, we will instead have several reader threads, each going out of sync with another. And it can be expensive to keep them in sync, to debug them and to switch CPUs between them, since thread has a lot of state. So ECS favors many threads running in parallel and processing humongous amount of data, which can be loaded quickly from secondary storage. Think GPU cores.

    Same way, you apply SIMD horizontally as opposed to vertically. So during blitting, instead of loading colorDst_rgba and colorSrc_rgba two pixels into two registers, and then blending them, you will load colorsDst_rrrrrrrr and colorsSrc_rrrrrrrr for 8 pixels into two registers, and then do the same for gggggggg. In other words, you can make use of SIMD registers of very large sizes, not just 128bit item ones.

    Current Mood: amused
    1:31p
    ECS and a "Single" Entity
    What if we have a single entity, like say a compiler processing a single unit of code?

    The problem arises already when we try to use several tokenizers on the same text file, since for small files it has no effect and for large files on say a sequential access storage, like optical or magnetic disk, we seeking time is too expensive. to read from different parts of the file at the same time.

    Well, we can split that single entity into a conveyor of entities running in a single phase, each processing single part of code, while still being in lock-step with one another.

    I.e. entity A reads tokens, entity B parses them into SEXPs, while entity C compiles the SEXPs to bytecode.

    But each entity could have different frequency of per phase updates. I.e. entity A could read several tokens at once to keep the buffer filled.

    No complex or expensive synchronization primitives required. We can have A_ready, B_ready, C_ready ints, set by the entities, and the phase repeats once they are all 1. And even all three entities run on the same CPU core, we can still get the benefits of both incremental processing and the memory locality, than if we instantly pass token to the parser.

    Obviously we can combine this vertical approach with the horizontal one (several conveyors processing several input streams). But the difference is that we a see a single entity as multiple entities.

    The approach does fail with badly arranged data. For example C99 tokenizer (lexer) needs feedback from the parser to properly read the tokens. That is actually believed one of the 3 main flaws of C99, the other two being aliased memory access and undefined behavior (as opposed to the machine define behavior).

    Current Mood: contemplative
    2:47p
    First video game to use levels of detail
    While you can always decompile the code, you don't have the access to the reasoning behind it and the incremental steps people made.
    But here is a rare case of a leaked prototype for the first video game to use lods
    https://archive.org/details/LEGOIsland-source-June1996

    it also includes the design documents, which explain the different kinds of lods and how a one can cheat by using low res renders when camera or object moves fast to allow fast response to use inputs while simulating blur of the fast moving object on the circa 1995 software renderer. It is 2024 and lods are as actual as ever, especially if you're doing large planetary scale simulations. After all, we need a fully simulated planet Earth first, before implementing your fav zombie apocalypse scenario. But back in 1995 they had just a small island.

    https://www.youtube.com/watch?v=1K5pa5pVgUU
    An early technical demo of LEGO® Island featuring a LOD system that ultimately went unused.

    Current Mood: amused
    4:41p
    Why I'm Reading the Old Source Code Part 2...
    Let's check the Agent Entity from a modern general RTS engine, proposed to be used for simulations with large entity count:
    https://github.com/spring/spring/blob/develop/rts/Sim/Units/Unit.h

    Wait! I have a few questions!..
    1. Yes. The engine treats agents, projectiles and static objects using separate code paths, duplicating logic between them. Obviously UI elements and event's are separate too.
    2. Yes. The agent entity includes caiMemBuffer[700] and a myrriad of other buffers even when their respective systems are not running (i.e. agent is idle awaiting goals).
    3. No. I don't know why the number is "700".
    4. No. That is not a conspiracy by hardware manufacturers to sell CPUs with more cores and memory, because the code isn't even threadsafe. Just your usual modern programming practices.

    So yeah, I have more joy reading the code by https://www.mobygames.com/person/110/john-miles/

    Current Mood: contemplative
    6:45p
    Algorithms of the Past
    Going through the old code one will notice the use of the so called "bitplanes".
    For example in
    https://archive.org/download/Grand_Monster_Slam_Source/MONSTER.S

    they store Nth bit from each pixel as part of a stream of Nth bits of all pixels.
    And the modern style pixels were called "chunky" back then.

    You may wonder why would one ever do that?!!

    There were several good reasons:
    1. By storing image as a set of bitplanes, we can discard lower bits to save on memory. Or for example we can keep only the lowest bits for the entities far away from the camera, making them dim and less detailed, while at the same time scaling them is cheaper. That is without creating a mipmapped version. Basically a free frequency filter. That also allowed games to fit inside the arbitrary memory size and to work with variety of graphic cards (one could use just a single bitplane on a monochrome card or when CPU was slow).
    2. Erasing entire screen requires going only over the alpha plane.
    3. No need to store the alpha values for non-transparent images.
    4. Color data can have smaller resolution than the alpha data, meaning we can preserve the outline while compressing the colors.
    5. We can compress some of the bitplanes using a primitive and very efficient algorithm, like RLE. Chunky pixels would require arithmetic coding to be compressed that well. Generally, if you want to compress some data, I suggest first breaking it into bitplanes, since it gives free arithmetic coding.

    All that made perfect sense on the memory and computation constrained systems. And makes even more sense on the modern systems. For example, if you store alpha separately, you can edit it separately. I.e. the erase brush doesn't have to go through color pixels. And if you increment the lower bitplanes, you don't have to go into higher ones unless the increment overflows. And the fact modern systems are still planar, since normal maps and specular data are stored separately from the diffuse colors. The idea of bitplanes vs chunky pixels is very similar to the idea of ECS vs chunky OOP.

    Current Mood: contemplative
    8:36p
    C++ at its finest!
    This one is just too sweet. ObjectGenerator.cpp is the single thing one has to see there:
    https://archive.org/download/conquest-frontier-wars-fever-pitch-studios-2001-source-code.-7z

    Now I have self-radicalized enough.

    It takes a like a minute to be truly horrified by what they do.
    The brain refuses to register that immediately.
    First you just blindly stare at it.
    Then it goes through the denial, anger, bargaining, depression and acceptance stages.



    Current Mood: amused
    Current Music: Clan of Xymox - A Day

    << Previous Day 2024/05/22
    [Calendar]
    Next Day >>

About LJ.Rossia.org