nancygold's Journal
 
[Most Recent Entries] [Calendar View] [Friends View]

Wednesday, June 19th, 2024

    Time Event
    3:43p
    Mathematics Considered Harmful
    The introduction course into mathematics starts with the so called first order
    logic. It is similar to the CPU logic, but deals with natural language
    statements and doesn't have any temporal components, introduced by CPU cycles.
    
    These statements are joined by connectives.
    For conveniences we will use `+`, `*`, `~` for `or`, `and` and `not`.
    But there are additional connectives defined using these.
    For example, the unidirectional implication:
    
      A + ~B          # A or not B
    
    The macro form for that is `A -> B` (A implies B)
    Math macros are like ARM assembler macros, not the Lisp macros,
    although they do have the so called axiom schemas, which
    are defined using ebonics like prose.
    The entirety of math has that ghetto rap battle feel to it.
    
    Lets use it to produce the formula for bidirectional implication:
    
      A -> B
      B -> A
      --------------------       # Expand these into components
      A + ~B
      B + ~A
      --------------------
      (A + ~B) * (B + ~A)        # Combine both expressions into one
      --------------------
      A*B + ~B*B + A*~A + ~B*~A  # Perform multiplication
      --------------------
      A*B + ~B*~A                # Eliminate the cancelled items
      --------------------
      A*B + ~(A+B)               # Move `~` out
    
    The macro form for that is `A <-> B`
    
    
    Lets check it by assuming A=1
      
      A <-> B           # Base theorem in a macro form
      ---------------
      A*B + ~(A+B)      # Expand the sugar
      ---------------
      1*B + ~(1+B)      # A=1
      ---------------
      B + ~(1)          # Constant elimination (* advances, + shortcircuits)
      ---------------
      B + 0             # ~1 = 0 (by definition)
      ---------------
      B                 # B=1 too!
    
    
    These `->` and `<->` are a bit similar to the `if/else` and for loops in C/C++,
    but, again, they have no temporal component, and therefore just simple boring,
    connectives.
    
    Note also that in computing we start with a single `&` operation, instead of
    `*`, `+`, and `~`, defined in that above notation as
    
      ~(A * B)
      -----------
      A & B
    
    That is called NAND (not and). And can be used to implement `*`, `+` and ~`,
    even in the 1st order logic.
    
    And that my friends is basically what the math is all about. In other words, low
    level bit twiddling, but with a layer for pretentious snobbism.
    
    Of course mathematicians also introduce the so called Set Theory, also called
    second order logic. Which is a pseudo-philosphibcal paperwork, using bombastic
    star-trek names like `the axiom of infinity` or `universal quantification`.
    Mathematicians use these to produce a lot of nonsense statements, completely
    detached from reality. Pure existentialist schzobabble.
    
    So if you want to be a decent human being, stay away from math as you would stay
    away from the OOP crowd and the Uncle Bob's religion. Math is even more useless
    than an infinite set of unit tests.
    
    Cheers!
    


    Current Mood: contemplative
    Current Music: Pilotpriest - Xanadu
    6:39p
    The Borland Overlay Manager
    PREVOUS PART
    
    Before proceeding to overlays, we need to undestand how DOS MZ executables work.
    So to run an executable, DOS first reads the following header:
    
      typedef struct {
        uint16_t magic;    /*00 Magic number MZ_MAGIC */
        uint16_t cblp;     /*02 Bytes of last page */
                           /*   If it is 4, it should be treated as 0,
                                since pre-1.10 versions of the MS linker
                                set it that way. */
        uint16_t cp;       /*04 Pages in file */
        uint16_t crlc;     /*06 Number of relocation entries */
        uint16_t cparhdr;  /*08 Size of header in paragraphs */
        uint16_t minalloc; /*0A Minimum extra paragraphs needed */
        uint16_t maxalloc; /*0C Maximum extra paragraphs needed */
                           /*   if 0 DOS loads as high as possible, else above PSP*/
                           /*   The maximum allocation is set to FFFFh by default.*/
        uint16_t ss;       /*0E Initial (relative) SS value */
        uint16_t sp;       /*10 Initial SP value */
        uint16_t csum;     /*12 Checksum */
        uint16_t ip;       /*14 Initial IP value */
        uint16_t cs;       /*16 Initial (relative) CS value */
        uint16_t lfarlc;   /*18 File address of relocation table */
        uint16_t ovno;     /*1A Microsoft Overlay number
                                MS LINK can create single EXE containing multiple
                                overlays (up to 63) which are simply numbered in
                                sequence.
                                The 0 one is loaded by DOS
                                Each overlay within the file is essentially an
                                EXE file with it's own MZ header.*/
      } PACKED mz_hdr_t;
    
    DOS checks the `magic` to be "MZ" and calculates the size of
    the initial part of the executable to load:
    
      load_sz = (mz_hdr_t.cp-1)*512 + mz_hdr_t.cblp
    
    Checks if it has load_sz of conventional memory to load the exe.
    And then DOS loads the first load_sz bytes of exe, either above the PSP,
    or, if maxalloc is 0, at the top of conventional memory.
    The load_sz includes the header too.
    Everything after these load_sz bytes is ignored, so in fact you can store
    there your program's static or even dynamic data.
    
    Lets assume DOS loaded the executable at the start_ofs.
    
    Once that initial part of .EXE is in memory, DOS goes to the mz_hdr_t.lfarlc.
    There it goes through the array of
    
      typedef struct { /* segment:offset pair address */
        uint16_t ofs; //offset from the start of segment
        uint16_t seg; //segment part of the address
      } PACKED far_t;
    
    These pointers are the so called relocation table (also called `fixup` table).
    They point at the uint16_t segments, inside the compiled code and data.
    DOS adds to each such relative segment the value of the address of the segment
    containing `start_ofs + mz_hdr_t.cparhdr*16`. That address corresponds to the
    Ghidra's 0x10000, which is just above the imagined interrupts table,
    bios data area and DOS. Yes, DOS was below 64K, but it also kept the resident
    part of COMMAND.COM in the highers conventional memory area or in high memory.
    
    After that DOS does relocates the SS and CS segments, from the header,
    sets DS,ES to the start of PSP, and jumps to the mz_hdr_t.ip, which corresponds
    to the C0.ASM's STARTX procedure. STARTX initializes the runtime, like the file
    IO system, argc/argv, environment, C++ constructors and the overlay manager.
    DOS discards the original mz_hdr_t and relocation table after processing.
    
    Note also, that the segments, both in the relocation table far_t pointers,
    and at the locations pointed by these pointers roughly correspond to the
    OBJ files used to produce the executable. But the compiler doesn't respect the
    16-bit segment alignment, when joining functions from multiple object files,
    so these segments may precede their code or data. In case of STRONG.EXE that
    happens with the main() routine, whose segment in relocation table points
    a few bytes before it. The only way to determine actual segment starts is
    by doing disassembly analysis, like Ghidra does, creating the list of
    memory blocks in the Memory Map. 
    
    Unfortunately Ghidra misses the PSP segment (256 bytes before the seg 0 in the
    relocation table), as well as the SS segment. We have to create these manually.
    Given the 0x10000 default Ghidra's load address, the PSP should reside at
    0x0FF00. Everything below 0x0FF00 we can use to load our own stuff in Ghidra.
    For example, I use it to create fake structures overriding the data segment
    analysis and decompilation.
    
    Ok. Lets proceed with decompiling the STARTX() procedure at 0x10000.
    Note: if you follow the C0.ASM, it has a lot of IFDEFs, among them is
    
      __BOSS__
    
    That stands for `Borland Operating System Services`, a i286 protected mode
    DOS extender. Obviously never defined, since it was slower than overlays.
    Same way _DSSTACK_ isn't defined, while LDATA is defined.
    Note also that compiler adds '_' to all C symbols it compiles,
    while the ASM symbols start with __ underscores or no underscore,
    to avoid conflicts with the C symbols.
    To make it easier to follow, I've edited that c0.asm, evaluating the IFDEFs
    to match the STRONG.EXE startup code, adding a few more annotations:
    https://github.com/NancyAurum/devroomm/blob/main/src/c0.asm
    
    I've also decompiled the STARTX code into c0.c, so the general working will be
    more clear. Note that the DOS_* and BIOS_* functions should ideally be macros,
    since they get invoked when the stack isn't initialized properly yet:
    https://github.com/NancyAurum/devroomm/blob/main/src/c0.c
    
    In a nutshell, the STARTX() does the following:
    * Saves the relocated address of the _DATA/DGROUP segment inside the CS,
      so that the MZ relocation table wont get overbloated.
      In large models DGROUP is just all global data.
    * Saves segment of psp->hiseg, basically the `(start_ofs + loaded_sz + 15)/16`
    * Saves _psp->envseg (pointer to the environment).
    * Saves DOS version
    * Save several interrupt handlers, becasue signal() can clobber them.
    * Installs default divide by zero handler.
    * Counts number of environment variables.
    * Checks if the user requested _stklen (stack length) is large enough (>256B)
      to run the C runtime library procedures. Checks if it is small enough
      to fit inside the available conventional memory.
    * Initializes malloc and sbrk variables, and frees unused memory.
    * Resets the uninitialized data area (BSS) to 0.
    * Save the motherboard's real-time clock value of the program's start.
      It is then used by C's standard library's clock() function
      The default RTC resolution is 32768 kHz. I.e. it is a millisecond clock.
    * Together with clock ticks, the BIOS_ReadRTC call returns the is_midnight flag.
      Which indicates whether the current time has just transitioned past midnight. 
      This flag is particularly used by system routines that need to perform
      specific actions at the start of a new day, such as updating the system date.
      So we are obliged to save it to the BIOS Data Area (BDA) at 40:70.
    * Calls StartExit(), which runs the initialization routines, like
      the overlay manager preparation, opens stdin/stdout, creates argv and envp,
      then runs the constructors for the global objects.
      The argv generation code is actually really involved, since DOS executables
      handle the '*' wildcards themselves, as opposed to shell like bash doing that.
      The argv code also does the '\' escapes.
    * Calls main().
    * Calls exit() with the value returned by main.
      exit() runs class destructors, flushes I/O buffers, closes streams and files.
      Then exit() exits to DOS with the value returned by main().
    
    Note that the default _stklen value 0x1000 is defined in CLIB/STKLEN.C.
    STRONG.EXE's _stklen is 0x1400, meaning that the developers modified it.
    So either the games uses highly recursive code or allocates large local buffers.
    But how could they modify at compiler time a global variable, like
    _nfile or _stklen, set inside one of the compiled C standard library's OBJs?
    
    Well, Borland C++ 3.0 allowed statements like
    
        extern unsigned _stklen = 54321U;
    
    which redefined the value of an external variable.
    Kind of an compile-time inter-obj configuration API.
    Obviously linker support required too.
    
    Nowadays we init stack through some code like:
    
        gcc -Wl,--stack,54321 -o myprogram myprogram.c
    
    becausethe stack initialization is now handled by the operating systems,
    during the executable file load.
    
    But what if you actually need to user-side configure library extern variables?
    Well, good luck writing your own compiler and linker, since elegant and useful
    features tend to be removed from the modern languages. So while some compilers
    do support it, it is non-standard and requires obscure tricks, like the GCC's
    
      unsigned int _stklen __attribute__((weak)) = 54321;
    
    
    It is also interesting to see how sbrk() is the C standard library procedure,
    while in the original Unix it was part of the kernel. BC++ sbrk works similarly
    to how Unix on PDP-11 operated. Like x86, PDP-11 had segmented memory, and
    the PDP-11 segments had sizes. The CPU did bounds checking for memory accesses,
    resulting in "segmentation fault" outside of the size allocated for program.
    The sbrk() changed the program's segment size, so malloc() had more workspace.
    Unforunately x86 never supported segments bounds checking, instead they came
    with the far less efficient page table based MMU-gliness, which can slow down
    some applications by a factor for 10.
    
    Anyway, we have to move on to actually fun things and decompile game's main().
    Well, if only everything was that easy... all functions called inside main()
    have `int 0x3f` as their first and only instruction, which the confused
    Ghidra sometimes incorrectly splits among several invalid instructions.
    Around it we see a lot of other `int 0x3f`, each followed by 2 of what appears
    to be an offset and additional byte, which is always 0. Yet the CD,3F (int 0x3f)
    at the start of the segment is somewhat different, followed by 0x0000,
    instead of an offset, and some additional data fields, ending in streak of 00h.
    Apparently a header of sorts, of size 32 bytes and a runtime working area.
    
    Normally DOS doesn't install any `int 0x3f` handler, so it must be some
    code inside StartExit. Normally we also never see many functions
    using interrupts, outside of the debugger inserted break points.
    So it must be the code compiler produced for the overlay manager.
    In other words, the main() is completely closed to us, since code it calls
    wasn't even loaded by Ghidra from the STRONG.EXE. It seems we have no option
    but to face this level's boss - Broland Overlay Manager.
    
    The initialization of the OvrMan can only happen inside the StartExit routine.
    Trying to decompile StartExit, we notice that Ghidra doesn't fully support
    the 16bit seg:offset pointers. That is actually a widely requested feature:
    https://github.com/NationalSecurityAgency/ghidra/discussions/2584
    
    Ghidra's developers say the support requires improving the entire framework:
      
      The definition is incomplete but is limited by GHIDRA's current capabilities.
    
    For now we can work around that by manually specifying the segments
    for the references to known addresses. Tedious, but better than nothing.
    In addition we will use the Data Type Manager to define far_t structure
    
      typedef {
        uint16_t off;
        uint16_t seg;
      } far_t;
    
    we will use that struct as a type for all far pointers we encounter.
    For example, strlen will be decompiled as
    
      size_t __cdecl16far strlen(far_t str)
    
    
    which is a bit clearer than the
    
       size_t __cdecl16far strlen(uint16_t str_ofs, uint16_t str_seg)
    
    In addition we will encounter statements like
    
      CONCAT12(x,y) // takes one byte from x and two byte from y and combine them.
      CONCAT11(0x3f,in_AL); //decompiled `mov AH,0x3F`: (0x3F<<8) | in_AL
    
    or 
    
      // return NULL
      result._0_2_ = 0;
      result._2_2_ = 0;
    
    
    That `._O_L_` is the Ghidras way of saying L bytes at offset O bytes inside
    the value. In that case LO and HI words of the result. That means Ghidra failed
    to decompile the int32_t operations properly.
    
    To make everything clear, lets translate the StartExit() to C99, since
    the term "decompile" would be inappropriate for the code which was never
    compiled.
    
      #define SE_NEAR 0
      #define SE_FAR  1
      #define SE_OVER 0xFF
      typedef struct { // Borland Startup Entry
        uint8_t calltype;  //SE_NEAR,SE_FAR,SE_OVER
        uint8_t priority;  //lower priority entires run first on startup
                           //and last on shutdown
        far_t proc;        //procedure
      } PACKED brl_se_t;
    
      void near StartExit(borland_SE* se_start, borland_SE* se_end) {
        int doing_init = se_start == InitStart;
        while (1) {
          borland_SE *cur_se = se_end;
          uint8_t cur_priority = doing_init ? 0xff : 0;
    
          for (borland_SE *se = se_start; se != se_end; se++) {
            if (se->calltype == SE_OVER) continue;
            bool better = doing_init ? (se->priority <= cur_priority)
                                     : (se->priority >= cur_priority);
            if (better) {
              cur_priority = se->priority;
              cur_se = se;
            }
          }
    
          // till we process all entries in order of priority
          if (cur_se == se_end) break;
    
          uint8_t calltype = cur_se->calltype;
          cur_se->calltype = SE_OVER; // mark as processed
    
          if (calltype == SE_NEAR) NEAR_FN(cur_se->proc)();
          else FAR_FN(cur_se->proc)();
        }
        return;
      }
    
    So the routine is used for both initialization and deintialization.
    And during initialization the lowest priority entries are first to run.
    
    STARTX() call it like that StartExit(InitStart, InitEnd).
    The InitStart and InitEnd are labels for a section dynamically formed by
    LINK.EXE from all the `_INIT_		SEGMENT WORD PUBLIC 'INITDATA'` segments.
    For example in SETARGV.ASM we have:
      
      _INIT_		SEGMENT WORD PUBLIC 'INITDATA'
          db      0                       ;near call
          db      16                      ;priority 16
          dw      offset __setargv
          dw      ?
      _INIT_		ENDS
    
    For C code Borland C++ 3.0 also supports the
    
      #pragma startup <function_name> <priority>
    
    For example,
    
      #pragma startup _c0crtinit 16
    
    Neat?
    
    We don't have source code for the Borland's Overlay Manager, but it is the same. 
    So if your FIDB are correctly made, going to InitStart[] you should see
    something like:
    
      brl_se_t InitStart[] = { //start of the _INIT_ segment
        //these are the startup entries present in STRONG.EXE
        {SE_FAR , 1,&_OvrPrepare},  //init overlay manager
        {SE_NEAR, 2,&_setupio},     //init I/O
        {SE_NEAR,16,&_c0crtinit},   //init textmode video
        {SE_NEAR,16,&_setargv},     //init argv and envp vectors
      };
    
    First in table _OvrPrepare the has the lowest priority, so runs first.
    The _OvrPrepare gets linked from the OVRUSER.OBJ in OVERLAY.LIB.
    There is no source code for the OVERLAY.LIB, but fortunately *.OBJ files
    hold the symbol names for the symbols they reference in other OBJs.
    The Ghidra does expose the _INIT_ section near end of its list of memory blocks.
    The _INIT_ section for OVERLAY.LIB has that _OvrPrepare entry.
    
    Running strings on OVERLAY.LIB, we see that it has additional compiler
    information, which Ghidra happily ignores. So dumping it could help us
    understanding the original overlay project's organization. Ghidra also doesn't
    allow saving the imported OBJ files, so we need a way to extract *.OBJ files
    from the OVERLAY.LIB.
    
    The OMF LIB files are the simple uncompressed archives. Despite the formats
    simplicity and historically wide use, there are no modern Windows software
    to work with it. So we will need the LIB.EXE from one of the old MSC or
    MASM distributions. I used the LIB.EXE v3.20.010 circa 1992.
    Obviously it needs a DOSBox to run it.
    
    LIB.EXE has a really old style IBMish command argument format, where arguments
    are separated by `,`. And there is no option to extract everything at once.
    First we have to list the files in archive by doing:
    
      lib OVERLAY.LIB,listfile;
    
    That produces a file called LISTFILE, which we have to filter for the OBJ names.
    
      cat LISTFILE | grep '^[^ ]' | grep -v '\.\.' | grep -o '^[A-Za-z_.]*' \
          | sort | sed 's/\(.*\)/lib OVERLAY.LIB *\1;/' | unix2dos >unpack.bat
    
    That produces the unpack.bat, which will invoke LIB.EXE for each OBJ file.
    
    Next step is dumping the database tables from the extracted OBJ files.
    Again, there is no established modern solution to do that, despite
    the historical significance of the OMF files.
    
    Luckily there is a basic python script, dumping the OMF info
    https://github.com/JohnVillalovos/omf_parser_py
    
    The script's author apparently cared only about the Novell NetWare history,
    so we have to replace the
      with open("numon.obj", "rb") as in_file:
    
    with
      with open(sys.argv[1], "rb") as in_file:
    
    Now we can run say `obj-decoder.py OVRMAN.OBJ`, listing all the entries ignored
    by Ghidra. In particular from `CMT_LANGUAGE_TRANS` we learn that the OBJ was
    complied with `Turbo Assembler  Version 1.01`.
    
    Unfortunately the the Python script doesn't properly dump the CMT_DEPENDENCY,
    which has the following format:
    
      typedef struct {
        uint8_t cmt_class; //0xE9 for CMT_DEPENDENCY
        uint16_t dos_date; //date that dependency file was modified
        uint16_t dos_time; //time that dependency file was modified
        uint8_t fnlen;
        char filename[fnlen];
      } omf_comment;
    
    
    For our archological study we want it in a nice human readable form.
    To do that we need to mod the __str__ method of the ComentRecord class:
    
        def __str__(self, **kwargs):
            extra = " comment_type: 0x{:02X},".format(self.comment_type)
            if self.comment_class == 0xE9 and len(self._payload)>6:
                p = self._payload
                fnlen = p[6]
                filename = p[7:7+fnlen]
                date = p[2] + p[3]*256
                time = p[4] + p[5]*256
                year = (time>>9)+1980
                month = (time>>5)&0xf
                day = time&0x1f;
                hour = (time>>11) & 0x1f;
                minute = (time>>5) & 0x3f;
                second = (time&0x1f) * 2
                extra += f" date: {year}:{month:02}:{day:02}"
                extra += f"  {hour:02}:{minute:02}:{second:02},"
                extra += f" file: {filename},"
                return super().__str__(extra=extra)
            ...
    
    We will also have to modify the parse_object_module and create_record,
    so it will skip the LIDATA records, instead of giving errors.
    
    Now we can list every dependency for every Borland C++ 3.0 OVERLAY.LIB OBJ file:
    
      for obj in obj/*.OBJ; do
        echo "${obj}:"
        python obj-decoder.py obj/OVRMAN.OBJ                  \
               | grep -o 'date: .* data'                      \
               | sed "s/', data//" | sed "s/, file: b'/   /"  \
               | sed 's/date: /  /' | sed 's/\\\\/\\/g'
        echo ""
      done
    
      obj/OVRBUFF.OBJ:
        1989:11:22  02:27:44   ..\YALBUFF.ASM
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1989:07:07  02:23:14   ..\yominc\RULES.AI
    
      obj/OVRDAT.OBJ:
        1990:05:21  02:37:42   ..\YOMDAT.ASM
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1989:10:04  02:26:08   ..\yominc\SEGMENT.AI
        1989:07:07  02:23:14   ..\yominc\RULES.AI
        1990:04:09  02:36:18   ..\yominc\YALTA.AI
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1989:11:08  02:27:16   ..\yominc\OVERLAY.AI
    
      obj/OVRDATA.OBJ:
        1990:09:06  02:41:12   ..\YALTADAT.ASM
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1989:10:04  02:26:08   ..\yominc\SEGMENT.AI
        1989:07:07  02:23:14   ..\yominc\RULES.AI
        1990:04:09  02:36:18   ..\yominc\YALTA.AI
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1989:11:08  02:27:16   ..\yominc\OVERLAY.AI
    
      obj/OVRDETEC.OBJ:
        1991:05:14  02:53:28   ..\YOMDETEC.ASM
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1990:08:30  02:40:60   ..\yominc\YOM.AI
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1989:10:04  02:26:08   ..\yominc\SEGMENT.AI
        1989:07:07  02:23:14   ..\yominc\RULES.AI
        1990:04:09  02:36:18   ..\yominc\YALTA.AI
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1989:11:08  02:27:16   ..\yominc\OVERLAY.AI
    
      obj/OVRHALT.OBJ:
        1990:09:06  02:41:12   ..\YHALT.ASM
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1989:07:07  02:23:14   ..\yominc\RULES.AI
    
      obj/OVRHP.OBJ:
        1991:04:12  02:52:24   ..\YOMHP.ASM
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
    
      obj/OVRMAN.OBJ:
        1991:03:18  02:51:36   ..\YALTA.ASM
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1990:08:30  02:40:60   ..\yominc\YOM.AI
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1989:10:04  02:26:08   ..\yominc\SEGMENT.AI
        1989:07:07  02:23:14   ..\yominc\RULES.AI
        1990:04:09  02:36:18   ..\yominc\YALTA.AI
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1989:11:08  02:27:16   ..\yominc\OVERLAY.AI
    
      obj/OVRSWAP.OBJ:
        1990:09:06  02:41:12   ..\YALTAMEM.ASM
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1990:08:30  02:40:60   ..\yominc\YOM.AI
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1989:10:04  02:26:08   ..\yominc\SEGMENT.AI
        1989:07:07  02:23:14   ..\yominc\RULES.AI
        1990:04:09  02:36:18   ..\yominc\YALTA.AI
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1989:11:08  02:27:16   ..\yominc\OVERLAY.AI
    
      obj/OVRUSER.OBJ:
        1990:09:07  02:41:14   ..\YALTUSER.ASM
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1989:10:04  02:26:08   ..\yominc\SEGMENT.AI
        1989:07:07  02:23:14   ..\yominc\RULES.AI
        1990:04:09  02:36:18   ..\yominc\YALTA.AI
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
        1989:11:08  02:27:16   ..\yominc\OVERLAY.AI
    
      obj/SETJMP.OBJ:
        1990:09:10  02:41:20   ..\YOMJMP.ASM
        1989:07:07  02:23:14   ..\yominc\RULES.AI
        1991:02:19  02:50:38   ..\yominc\YVERSION.AI
    
    The first dependency line is the original .ASM file, the *.AI are the
    assembly include files, similar to the C/C++ *.h files.
    Note that the DOS file times are in local time, not the UTC/GMT time.
    Borland developers surely had an extended work day...
    
    Anyway, they had a build folder inside the overlay sources folder,
    where they ran TASM.EXE to produce the OBJ files.
    "om" there stands for the overlay manager, but what is "y"?
    Yet another overlay manager? Because Borland alone had two previous managers.
    The dependency YVERSION.AI appears up to 3 times per OBJ, apparently because
    it was also included by the other included files.
    I.e. YALTA.AI and YOM.AI also include YVERSION.AI, in addition to every ASM
    including it as first line (except YOMJMP.ASM, where it came after RULES.AI).
    
    Given that OVERLAY.LIB was compiled from pure *.ASM by late night programmers,
    we should expect weird code, with a few `push cs`/`pop ds`, `mov BP,0`,
    and the returns with a carry flag acrosz multiple procedures,
    with total absence of any good conventions, like the stack frames or standard
    library calls. So lets first gather a bit more information about our specimen.
    
    Googling the history of overlay managers for DOS, we discover that
    there were actually multiple overlay libraries, including several for Borland
    products. Most work through an overlay stack, pushing a popping overlays,
    as the calls progress, except the Borland C++ one, which is called VROOMM.
    The VROOMM stands for 'Virtual Run-time Object-Oriented Memory Manager'.
    As the name suggests, it was made to support object-oriented code, and therefore
    has more fine grained linking scheme.
    
    Still we check the other overlay implemenations. One of them is OVERMGR.TPU
    by Turbopower Software. The OVERMGR.TPU copy was preserved at a Russian site:
    https://pascal.sources.ru/comm/pibter41.htm
    But it doesn't share code with the OVERLAY.LIB we are dealing with.
    
    Another implementation comes with Borland Pascal as OVERLAY.OBJ, which, while
    much smaller, has API similar to the Bolrand C++ 3.0 one. Checking the dates...
    
    obj/OVEREMS.OBJ:
      1991:07:25  02:55:50   ovr\overems.ASM
      1992:08:13  03:08:26   \bp7\com\SE.ASM
    
    obj/OVERLAY.OBJ:
      1990:12:12  02:44:24   ovr\overlay.ASM
      1992:08:13  03:08:26   \bp7\com\SE.ASM
    
    we see that it was made after the Borland C++ one. So it could be a simplified
    version, in the same vein how Borland Pascal was always inferior to Borland C++.
    We also notice the `\bp7\com\SE.ASM` include, which surprisingly comes with
    the standard library in `\RTL\INC\SE.ASM`. Among other definitions it has:
    
      ; Overlay header record
      ovSignature     equ     (word ptr 0)
      ovSaveReturn    equ     (word ptr 2)
      ovFilePos       equ     (dword ptr 4)
      ovCodeSize      equ     (word ptr 8)
      ovFixupSize     equ     (word ptr 10)
      ovJumpCount     equ     (word ptr 12)
      ovLink          equ     (word ptr 14)
      ovSegment       equ     (word ptr 16)
      ovRetryCount    equ     (word ptr 18)
      ovNext          equ     (word ptr 20)
      ovEmsPage       equ     (word ptr 22)
      ovEmsOffset     equ     (word ptr 24)
      ovUserData      equ     (byte ptr 26)
      ovVectors       equ     (byte ptr 32)
      ovRecSize       equ     32
    
    Wait a second! That appears to match the structure we seen at the start of
    these `int 0x3f` segments!!! And the value at ovJumpCount correspons
    to the number of these 5 byte entries, main() calls to. We are up to something!
    That is apparently the so called trampoline/veneer stubs table.
    These are created by compilers to approach distant code, such as the DLL one,
    or well, in OS kernel syscall dispatchers. In our case, code is the dynamically
    linked overlay residing on disk. So triggering these ints leads to
    the actual function's code being loaded and the cotrol transfered to it.
    
    Still we need a clarification how exactly that happens, so we can properly load
    the STRONG.EXE into Ghidra. But now we are ready to challenge
    the OvrMan and proceed towards decompiling _OvrPrepare.
    
    _OvrPrepare begins with the
    
      MOV AX,offset _OVRDATA_ ;;in STRONG.EXE, _OVRDATA_=0x225d
      MOV DS,AX
    
    meaning it uses a personal data segment, not the usual DGROUP, the rest of
    standard library uses. It also resides in a different code segment: 1403. 
    
    To decompile the code, we have to manually introduce and override the segment
    refernces, because Ghidra is bad at 16bit x86 and assigns all references to DS,
    while also ignoring the instructions like `pop ES`. I.e. we have to do
    `Set Register Values` starting from the `MOV DI,0x6` in the code like following:
    
      assume ES = 0x1403
        MOV   DI, 0x6   ;; DI obviously points inside the CS segment
        push  CS
        pop   ES
        ...             ;; do comparison here
      assume ES = <previous value>
    
    Aftewards we can manually override the 0x6 reference to go inside the CS:0006.
    That way the decompiler catches it correctly. In any case, we have to be
    aware that the analyzer produces incorrect references for the segmented code.
    
    
    The OvrMan code is really handcrafted and does rather complex things, like
    parsing the actual x86 instructions and garbage collecting the heap.
    After some work, we decompile the initial batch of the code, responsible for
    the loading of the overlay from file and relocating them: 
    https://github.com/NancyAurum/devroomm/blob/main/src/ovrman.c
    
    OvrPrepare calls __InitModules(), allocates large buffer of memory for storing
    the currently loaded overlays. Then it calls __OVRINIT, to initialize
    the handler for the `int 3Fh` interrupts, which Bolrand C++ 3.0 emitted instead
    of calls to the overlayed functions. OvrMan is expected to resolve these calls
    at runtime, by the means of __OvrTrapHandler at the 1403:04f1, which begins
    by checking that the frame pointer is aligned, and then proceeds by checking
    that it is the overlay header or the trampouline table call...
    
    In general we learn, that after the MZ executable, aligned to the paragraph
    is located a set of `FB` sections, and that the `OV` section contains actual
    overlay. These `FB` sections are completely ignored by DOS. That is how
    self extracting archives are implemented: they just have zip archive
    after the MZ executable, which opens itself and seeks past the MZ data.
    
    That FB/OV section duplicates the __SEGTABLE__ processed by __InitModules.
    
    All in all, we now have enough information to relocation our overlays manually,
    so they could be loaded in Ghidra, opening the main() function to us.
    Fortunatelly each overlay corresponds roughly to a single .C file, while
    also exposing all its functions. Meaning that overlays actually help us by
    preserving more information about the original code. In fact, we already
    know that STRONG.EXE was built from around 60 source files.
    
    Ghidra nicely exposes overlays as CODE_XX header+jump_table blocks.
    But they tend to end in the middle of data. Where did these segments came from?
    Well, old DOS MZ had no proper proper segment/section table.
    Instead we have the seg:ofs pairs, which point to the segment references inside
    the exectuable for the DOS MZ loader to relocate.
    Ghidra collected all these segment values, both from the relocation table,
    and from the values being relocated. Then Ghidra sorted them by their order
    in memory and tried to guess their sizes by doing disassembly analysis,
    since always start on paragraph boundary, which may contain code from
    the previous segment. But disassembly works for code, not data analyzis.
    Therefore the abrupt ends in the middle of data.
    
    The last segment is usually the stack segment. For STRONG.EXE it is 128 bytes.
    But Ghidra misdetects it as a 16 bytes, despite the fact that its size
    could be deduced from the SP field in the MZ.
    
    After stack segment comes the overlay relocations table, which has the FBOV id.
    
    To summarize, after a large model BC++ exe is loaded, the memory layout is:
    * Table of far_t interrupt pointers.
    * BDA
    * DOS kernel (back in the day entire OS kernel fit in 64K)
    * PSP (also the initial value of DS and ES)
    * C runtime segment (0x10000 in Ghidra, STARTX, stdlib, etc...).
      The MZ header and relocation table are missing.
    * Overlay Library.
    * A few more non-overlayed OBJs (i.e. mouse and video code).
    * Segment holding main()
    * Overlay headers (bosh_t + trampouline table)
    * _DATA or DGROUP segment
    * BSS (zero-inited variables)
    * Stack segment
    * Overlay buffer (since it is allocated first)
    * User heap data.
    * Video memory stating 0xA0000
    
    And we can pretend that overlay buffer is really larger and load there
    all out overlays. Of course will prevent us from running the analyzed code in
    the builtin emulator and sampled data, but everything is made of trade offs.
    
    
    BTW, I was wrong believing that IDA Pro doesn't support Borland C++ overlays.
    It does,
    http://blog.rewolf.pl/blog/?p=1202
    
    but the IDA overlay loader is from 1994 and needs to be fixed to work with
    the modern IDA. It also doesn't fixup the function address references correctly.
    There was an IDA SDK floating around Russian torrent sites,
    and the SDK has the source code for the Borland's overlay loader.
    But I have already decided to use Ghidra, and you don' change horses on
    the river crossing.
    
    To be continued...


    Current Mood: contemplative

    << Previous Day 2024/06/19
    [Calendar]
    Next Day >>

About LJ.Rossia.org