nancygold's Journal
[Most Recent Entries]
[Calendar View]
[Friends View]
Wednesday, June 19th, 2024
Time |
Event |
3:43p |
Mathematics Considered Harmful The introduction course into mathematics starts with the so called first order
logic. It is similar to the CPU logic, but deals with natural language
statements and doesn't have any temporal components, introduced by CPU cycles.
These statements are joined by connectives.
For conveniences we will use `+`, `*`, `~` for `or`, `and` and `not`.
But there are additional connectives defined using these.
For example, the unidirectional implication:
A + ~B # A or not B
The macro form for that is `A -> B` (A implies B)
Math macros are like ARM assembler macros, not the Lisp macros,
although they do have the so called axiom schemas, which
are defined using ebonics like prose.
The entirety of math has that ghetto rap battle feel to it.
Lets use it to produce the formula for bidirectional implication:
A -> B
B -> A
-------------------- # Expand these into components
A + ~B
B + ~A
--------------------
(A + ~B) * (B + ~A) # Combine both expressions into one
--------------------
A*B + ~B*B + A*~A + ~B*~A # Perform multiplication
--------------------
A*B + ~B*~A # Eliminate the cancelled items
--------------------
A*B + ~(A+B) # Move `~` out
The macro form for that is `A <-> B`
Lets check it by assuming A=1
A <-> B # Base theorem in a macro form
---------------
A*B + ~(A+B) # Expand the sugar
---------------
1*B + ~(1+B) # A=1
---------------
B + ~(1) # Constant elimination (* advances, + shortcircuits)
---------------
B + 0 # ~1 = 0 (by definition)
---------------
B # B=1 too!
These `->` and `<->` are a bit similar to the `if/else` and for loops in C/C++,
but, again, they have no temporal component, and therefore just simple boring,
connectives.
Note also that in computing we start with a single `&` operation, instead of
`*`, `+`, and `~`, defined in that above notation as
~(A * B)
-----------
A & B
That is called NAND (not and). And can be used to implement `*`, `+` and ~`,
even in the 1st order logic.
And that my friends is basically what the math is all about. In other words, low
level bit twiddling, but with a layer for pretentious snobbism.
Of course mathematicians also introduce the so called Set Theory, also called
second order logic. Which is a pseudo-philosphibcal paperwork, using bombastic
star-trek names like `the axiom of infinity` or `universal quantification`.
Mathematicians use these to produce a lot of nonsense statements, completely
detached from reality. Pure existentialist schzobabble.
So if you want to be a decent human being, stay away from math as you would stay
away from the OOP crowd and the Uncle Bob's religion. Math is even more useless
than an infinite set of unit tests.
Cheers!
Current Mood: contemplativeCurrent Music: Pilotpriest - Xanadu | 6:39p |
The Borland Overlay Manager
PREVOUS PART
Before proceeding to overlays, we need to undestand how DOS MZ executables work.
So to run an executable, DOS first reads the following header:
typedef struct {
uint16_t magic; /*00 Magic number MZ_MAGIC */
uint16_t cblp; /*02 Bytes of last page */
/* If it is 4, it should be treated as 0,
since pre-1.10 versions of the MS linker
set it that way. */
uint16_t cp; /*04 Pages in file */
uint16_t crlc; /*06 Number of relocation entries */
uint16_t cparhdr; /*08 Size of header in paragraphs */
uint16_t minalloc; /*0A Minimum extra paragraphs needed */
uint16_t maxalloc; /*0C Maximum extra paragraphs needed */
/* if 0 DOS loads as high as possible, else above PSP*/
/* The maximum allocation is set to FFFFh by default.*/
uint16_t ss; /*0E Initial (relative) SS value */
uint16_t sp; /*10 Initial SP value */
uint16_t csum; /*12 Checksum */
uint16_t ip; /*14 Initial IP value */
uint16_t cs; /*16 Initial (relative) CS value */
uint16_t lfarlc; /*18 File address of relocation table */
uint16_t ovno; /*1A Microsoft Overlay number
MS LINK can create single EXE containing multiple
overlays (up to 63) which are simply numbered in
sequence.
The 0 one is loaded by DOS
Each overlay within the file is essentially an
EXE file with it's own MZ header.*/
} PACKED mz_hdr_t;
DOS checks the `magic` to be "MZ" and calculates the size of
the initial part of the executable to load:
load_sz = (mz_hdr_t.cp-1)*512 + mz_hdr_t.cblp
Checks if it has load_sz of conventional memory to load the exe.
And then DOS loads the first load_sz bytes of exe, either above the PSP,
or, if maxalloc is 0, at the top of conventional memory.
The load_sz includes the header too.
Everything after these load_sz bytes is ignored, so in fact you can store
there your program's static or even dynamic data.
Lets assume DOS loaded the executable at the start_ofs.
Once that initial part of .EXE is in memory, DOS goes to the mz_hdr_t.lfarlc.
There it goes through the array of
typedef struct { /* segment:offset pair address */
uint16_t ofs; //offset from the start of segment
uint16_t seg; //segment part of the address
} PACKED far_t;
These pointers are the so called relocation table (also called `fixup` table).
They point at the uint16_t segments, inside the compiled code and data.
DOS adds to each such relative segment the value of the address of the segment
containing `start_ofs + mz_hdr_t.cparhdr*16`. That address corresponds to the
Ghidra's 0x10000, which is just above the imagined interrupts table,
bios data area and DOS. Yes, DOS was below 64K, but it also kept the resident
part of COMMAND.COM in the highers conventional memory area or in high memory.
After that DOS does relocates the SS and CS segments, from the header,
sets DS,ES to the start of PSP, and jumps to the mz_hdr_t.ip, which corresponds
to the C0.ASM's STARTX procedure. STARTX initializes the runtime, like the file
IO system, argc/argv, environment, C++ constructors and the overlay manager.
DOS discards the original mz_hdr_t and relocation table after processing.
Note also, that the segments, both in the relocation table far_t pointers,
and at the locations pointed by these pointers roughly correspond to the
OBJ files used to produce the executable. But the compiler doesn't respect the
16-bit segment alignment, when joining functions from multiple object files,
so these segments may precede their code or data. In case of STRONG.EXE that
happens with the main() routine, whose segment in relocation table points
a few bytes before it. The only way to determine actual segment starts is
by doing disassembly analysis, like Ghidra does, creating the list of
memory blocks in the Memory Map.
Unfortunately Ghidra misses the PSP segment (256 bytes before the seg 0 in the
relocation table), as well as the SS segment. We have to create these manually.
Given the 0x10000 default Ghidra's load address, the PSP should reside at
0x0FF00. Everything below 0x0FF00 we can use to load our own stuff in Ghidra.
For example, I use it to create fake structures overriding the data segment
analysis and decompilation.
Ok. Lets proceed with decompiling the STARTX() procedure at 0x10000.
Note: if you follow the C0.ASM, it has a lot of IFDEFs, among them is
__BOSS__
That stands for `Borland Operating System Services`, a i286 protected mode
DOS extender. Obviously never defined, since it was slower than overlays.
Same way _DSSTACK_ isn't defined, while LDATA is defined.
Note also that compiler adds '_' to all C symbols it compiles,
while the ASM symbols start with __ underscores or no underscore,
to avoid conflicts with the C symbols.
To make it easier to follow, I've edited that c0.asm, evaluating the IFDEFs
to match the STRONG.EXE startup code, adding a few more annotations:
https://github.com/NancyAurum/devroomm/blob/main/src/c0.asm
I've also decompiled the STARTX code into c0.c, so the general working will be
more clear. Note that the DOS_* and BIOS_* functions should ideally be macros,
since they get invoked when the stack isn't initialized properly yet:
https://github.com/NancyAurum/devroomm/blob/main/src/c0.c
In a nutshell, the STARTX() does the following:
* Saves the relocated address of the _DATA/DGROUP segment inside the CS,
so that the MZ relocation table wont get overbloated.
In large models DGROUP is just all global data.
* Saves segment of psp->hiseg, basically the `(start_ofs + loaded_sz + 15)/16`
* Saves _psp->envseg (pointer to the environment).
* Saves DOS version
* Save several interrupt handlers, becasue signal() can clobber them.
* Installs default divide by zero handler.
* Counts number of environment variables.
* Checks if the user requested _stklen (stack length) is large enough (>256B)
to run the C runtime library procedures. Checks if it is small enough
to fit inside the available conventional memory.
* Initializes malloc and sbrk variables, and frees unused memory.
* Resets the uninitialized data area (BSS) to 0.
* Save the motherboard's real-time clock value of the program's start.
It is then used by C's standard library's clock() function
The default RTC resolution is 32768 kHz. I.e. it is a millisecond clock.
* Together with clock ticks, the BIOS_ReadRTC call returns the is_midnight flag.
Which indicates whether the current time has just transitioned past midnight.
This flag is particularly used by system routines that need to perform
specific actions at the start of a new day, such as updating the system date.
So we are obliged to save it to the BIOS Data Area (BDA) at 40:70.
* Calls StartExit(), which runs the initialization routines, like
the overlay manager preparation, opens stdin/stdout, creates argv and envp,
then runs the constructors for the global objects.
The argv generation code is actually really involved, since DOS executables
handle the '*' wildcards themselves, as opposed to shell like bash doing that.
The argv code also does the '\' escapes.
* Calls main().
* Calls exit() with the value returned by main.
exit() runs class destructors, flushes I/O buffers, closes streams and files.
Then exit() exits to DOS with the value returned by main().
Note that the default _stklen value 0x1000 is defined in CLIB/STKLEN.C.
STRONG.EXE's _stklen is 0x1400, meaning that the developers modified it.
So either the games uses highly recursive code or allocates large local buffers.
But how could they modify at compiler time a global variable, like
_nfile or _stklen, set inside one of the compiled C standard library's OBJs?
Well, Borland C++ 3.0 allowed statements like
extern unsigned _stklen = 54321U;
which redefined the value of an external variable.
Kind of an compile-time inter-obj configuration API.
Obviously linker support required too.
Nowadays we init stack through some code like:
gcc -Wl,--stack,54321 -o myprogram myprogram.c
becausethe stack initialization is now handled by the operating systems,
during the executable file load.
But what if you actually need to user-side configure library extern variables?
Well, good luck writing your own compiler and linker, since elegant and useful
features tend to be removed from the modern languages. So while some compilers
do support it, it is non-standard and requires obscure tricks, like the GCC's
unsigned int _stklen __attribute__((weak)) = 54321;
It is also interesting to see how sbrk() is the C standard library procedure,
while in the original Unix it was part of the kernel. BC++ sbrk works similarly
to how Unix on PDP-11 operated. Like x86, PDP-11 had segmented memory, and
the PDP-11 segments had sizes. The CPU did bounds checking for memory accesses,
resulting in "segmentation fault" outside of the size allocated for program.
The sbrk() changed the program's segment size, so malloc() had more workspace.
Unforunately x86 never supported segments bounds checking, instead they came
with the far less efficient page table based MMU-gliness, which can slow down
some applications by a factor for 10.
Anyway, we have to move on to actually fun things and decompile game's main().
Well, if only everything was that easy... all functions called inside main()
have `int 0x3f` as their first and only instruction, which the confused
Ghidra sometimes incorrectly splits among several invalid instructions.
Around it we see a lot of other `int 0x3f`, each followed by 2 of what appears
to be an offset and additional byte, which is always 0. Yet the CD,3F (int 0x3f)
at the start of the segment is somewhat different, followed by 0x0000,
instead of an offset, and some additional data fields, ending in streak of 00h.
Apparently a header of sorts, of size 32 bytes and a runtime working area.
Normally DOS doesn't install any `int 0x3f` handler, so it must be some
code inside StartExit. Normally we also never see many functions
using interrupts, outside of the debugger inserted break points.
So it must be the code compiler produced for the overlay manager.
In other words, the main() is completely closed to us, since code it calls
wasn't even loaded by Ghidra from the STRONG.EXE. It seems we have no option
but to face this level's boss - Broland Overlay Manager.
The initialization of the OvrMan can only happen inside the StartExit routine.
Trying to decompile StartExit, we notice that Ghidra doesn't fully support
the 16bit seg:offset pointers. That is actually a widely requested feature:
https://github.com/NationalSecurityAgency/ghidra/discussions/2584
Ghidra's developers say the support requires improving the entire framework:
The definition is incomplete but is limited by GHIDRA's current capabilities.
For now we can work around that by manually specifying the segments
for the references to known addresses. Tedious, but better than nothing.
In addition we will use the Data Type Manager to define far_t structure
typedef {
uint16_t off;
uint16_t seg;
} far_t;
we will use that struct as a type for all far pointers we encounter.
For example, strlen will be decompiled as
size_t __cdecl16far strlen(far_t str)
which is a bit clearer than the
size_t __cdecl16far strlen(uint16_t str_ofs, uint16_t str_seg)
In addition we will encounter statements like
CONCAT12(x,y) // takes one byte from x and two byte from y and combine them.
CONCAT11(0x3f,in_AL); //decompiled `mov AH,0x3F`: (0x3F<<8) | in_AL
or
// return NULL
result._0_2_ = 0;
result._2_2_ = 0;
That `._O_L_` is the Ghidras way of saying L bytes at offset O bytes inside
the value. In that case LO and HI words of the result. That means Ghidra failed
to decompile the int32_t operations properly.
To make everything clear, lets translate the StartExit() to C99, since
the term "decompile" would be inappropriate for the code which was never
compiled.
#define SE_NEAR 0
#define SE_FAR 1
#define SE_OVER 0xFF
typedef struct { // Borland Startup Entry
uint8_t calltype; //SE_NEAR,SE_FAR,SE_OVER
uint8_t priority; //lower priority entires run first on startup
//and last on shutdown
far_t proc; //procedure
} PACKED brl_se_t;
void near StartExit(borland_SE* se_start, borland_SE* se_end) {
int doing_init = se_start == InitStart;
while (1) {
borland_SE *cur_se = se_end;
uint8_t cur_priority = doing_init ? 0xff : 0;
for (borland_SE *se = se_start; se != se_end; se++) {
if (se->calltype == SE_OVER) continue;
bool better = doing_init ? (se->priority <= cur_priority)
: (se->priority >= cur_priority);
if (better) {
cur_priority = se->priority;
cur_se = se;
}
}
// till we process all entries in order of priority
if (cur_se == se_end) break;
uint8_t calltype = cur_se->calltype;
cur_se->calltype = SE_OVER; // mark as processed
if (calltype == SE_NEAR) NEAR_FN(cur_se->proc)();
else FAR_FN(cur_se->proc)();
}
return;
}
So the routine is used for both initialization and deintialization.
And during initialization the lowest priority entries are first to run.
STARTX() call it like that StartExit(InitStart, InitEnd).
The InitStart and InitEnd are labels for a section dynamically formed by
LINK.EXE from all the `_INIT_ SEGMENT WORD PUBLIC 'INITDATA'` segments.
For example in SETARGV.ASM we have:
_INIT_ SEGMENT WORD PUBLIC 'INITDATA'
db 0 ;near call
db 16 ;priority 16
dw offset __setargv
dw ?
_INIT_ ENDS
For C code Borland C++ 3.0 also supports the
#pragma startup <function_name> <priority>
For example,
#pragma startup _c0crtinit 16
Neat?
We don't have source code for the Borland's Overlay Manager, but it is the same.
So if your FIDB are correctly made, going to InitStart[] you should see
something like:
brl_se_t InitStart[] = { //start of the _INIT_ segment
//these are the startup entries present in STRONG.EXE
{SE_FAR , 1,&_OvrPrepare}, //init overlay manager
{SE_NEAR, 2,&_setupio}, //init I/O
{SE_NEAR,16,&_c0crtinit}, //init textmode video
{SE_NEAR,16,&_setargv}, //init argv and envp vectors
};
First in table _OvrPrepare the has the lowest priority, so runs first.
The _OvrPrepare gets linked from the OVRUSER.OBJ in OVERLAY.LIB.
There is no source code for the OVERLAY.LIB, but fortunately *.OBJ files
hold the symbol names for the symbols they reference in other OBJs.
The Ghidra does expose the _INIT_ section near end of its list of memory blocks.
The _INIT_ section for OVERLAY.LIB has that _OvrPrepare entry.
Running strings on OVERLAY.LIB, we see that it has additional compiler
information, which Ghidra happily ignores. So dumping it could help us
understanding the original overlay project's organization. Ghidra also doesn't
allow saving the imported OBJ files, so we need a way to extract *.OBJ files
from the OVERLAY.LIB.
The OMF LIB files are the simple uncompressed archives. Despite the formats
simplicity and historically wide use, there are no modern Windows software
to work with it. So we will need the LIB.EXE from one of the old MSC or
MASM distributions. I used the LIB.EXE v3.20.010 circa 1992.
Obviously it needs a DOSBox to run it.
LIB.EXE has a really old style IBMish command argument format, where arguments
are separated by `,`. And there is no option to extract everything at once.
First we have to list the files in archive by doing:
lib OVERLAY.LIB,listfile;
That produces a file called LISTFILE, which we have to filter for the OBJ names.
cat LISTFILE | grep '^[^ ]' | grep -v '\.\.' | grep -o '^[A-Za-z_.]*' \
| sort | sed 's/\(.*\)/lib OVERLAY.LIB *\1;/' | unix2dos >unpack.bat
That produces the unpack.bat, which will invoke LIB.EXE for each OBJ file.
Next step is dumping the database tables from the extracted OBJ files.
Again, there is no established modern solution to do that, despite
the historical significance of the OMF files.
Luckily there is a basic python script, dumping the OMF info
https://github.com/JohnVillalovos/omf_parser_py
The script's author apparently cared only about the Novell NetWare history,
so we have to replace the
with open("numon.obj", "rb") as in_file:
with
with open(sys.argv[1], "rb") as in_file:
Now we can run say `obj-decoder.py OVRMAN.OBJ`, listing all the entries ignored
by Ghidra. In particular from `CMT_LANGUAGE_TRANS` we learn that the OBJ was
complied with `Turbo Assembler Version 1.01`.
Unfortunately the the Python script doesn't properly dump the CMT_DEPENDENCY,
which has the following format:
typedef struct {
uint8_t cmt_class; //0xE9 for CMT_DEPENDENCY
uint16_t dos_date; //date that dependency file was modified
uint16_t dos_time; //time that dependency file was modified
uint8_t fnlen;
char filename[fnlen];
} omf_comment;
For our archological study we want it in a nice human readable form.
To do that we need to mod the __str__ method of the ComentRecord class:
def __str__(self, **kwargs):
extra = " comment_type: 0x{:02X},".format(self.comment_type)
if self.comment_class == 0xE9 and len(self._payload)>6:
p = self._payload
fnlen = p[6]
filename = p[7:7+fnlen]
date = p[2] + p[3]*256
time = p[4] + p[5]*256
year = (time>>9)+1980
month = (time>>5)&0xf
day = time&0x1f;
hour = (time>>11) & 0x1f;
minute = (time>>5) & 0x3f;
second = (time&0x1f) * 2
extra += f" date: {year}:{month:02}:{day:02}"
extra += f" {hour:02}:{minute:02}:{second:02},"
extra += f" file: {filename},"
return super().__str__(extra=extra)
...
We will also have to modify the parse_object_module and create_record,
so it will skip the LIDATA records, instead of giving errors.
Now we can list every dependency for every Borland C++ 3.0 OVERLAY.LIB OBJ file:
for obj in obj/*.OBJ; do
echo "${obj}:"
python obj-decoder.py obj/OVRMAN.OBJ \
| grep -o 'date: .* data' \
| sed "s/', data//" | sed "s/, file: b'/ /" \
| sed 's/date: / /' | sed 's/\\\\/\\/g'
echo ""
done
obj/OVRBUFF.OBJ:
1989:11:22 02:27:44 ..\YALBUFF.ASM
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1989:07:07 02:23:14 ..\yominc\RULES.AI
obj/OVRDAT.OBJ:
1990:05:21 02:37:42 ..\YOMDAT.ASM
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1989:10:04 02:26:08 ..\yominc\SEGMENT.AI
1989:07:07 02:23:14 ..\yominc\RULES.AI
1990:04:09 02:36:18 ..\yominc\YALTA.AI
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1989:11:08 02:27:16 ..\yominc\OVERLAY.AI
obj/OVRDATA.OBJ:
1990:09:06 02:41:12 ..\YALTADAT.ASM
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1989:10:04 02:26:08 ..\yominc\SEGMENT.AI
1989:07:07 02:23:14 ..\yominc\RULES.AI
1990:04:09 02:36:18 ..\yominc\YALTA.AI
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1989:11:08 02:27:16 ..\yominc\OVERLAY.AI
obj/OVRDETEC.OBJ:
1991:05:14 02:53:28 ..\YOMDETEC.ASM
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1990:08:30 02:40:60 ..\yominc\YOM.AI
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1989:10:04 02:26:08 ..\yominc\SEGMENT.AI
1989:07:07 02:23:14 ..\yominc\RULES.AI
1990:04:09 02:36:18 ..\yominc\YALTA.AI
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1989:11:08 02:27:16 ..\yominc\OVERLAY.AI
obj/OVRHALT.OBJ:
1990:09:06 02:41:12 ..\YHALT.ASM
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1989:07:07 02:23:14 ..\yominc\RULES.AI
obj/OVRHP.OBJ:
1991:04:12 02:52:24 ..\YOMHP.ASM
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
obj/OVRMAN.OBJ:
1991:03:18 02:51:36 ..\YALTA.ASM
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1990:08:30 02:40:60 ..\yominc\YOM.AI
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1989:10:04 02:26:08 ..\yominc\SEGMENT.AI
1989:07:07 02:23:14 ..\yominc\RULES.AI
1990:04:09 02:36:18 ..\yominc\YALTA.AI
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1989:11:08 02:27:16 ..\yominc\OVERLAY.AI
obj/OVRSWAP.OBJ:
1990:09:06 02:41:12 ..\YALTAMEM.ASM
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1990:08:30 02:40:60 ..\yominc\YOM.AI
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1989:10:04 02:26:08 ..\yominc\SEGMENT.AI
1989:07:07 02:23:14 ..\yominc\RULES.AI
1990:04:09 02:36:18 ..\yominc\YALTA.AI
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1989:11:08 02:27:16 ..\yominc\OVERLAY.AI
obj/OVRUSER.OBJ:
1990:09:07 02:41:14 ..\YALTUSER.ASM
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1989:10:04 02:26:08 ..\yominc\SEGMENT.AI
1989:07:07 02:23:14 ..\yominc\RULES.AI
1990:04:09 02:36:18 ..\yominc\YALTA.AI
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
1989:11:08 02:27:16 ..\yominc\OVERLAY.AI
obj/SETJMP.OBJ:
1990:09:10 02:41:20 ..\YOMJMP.ASM
1989:07:07 02:23:14 ..\yominc\RULES.AI
1991:02:19 02:50:38 ..\yominc\YVERSION.AI
The first dependency line is the original .ASM file, the *.AI are the
assembly include files, similar to the C/C++ *.h files.
Note that the DOS file times are in local time, not the UTC/GMT time.
Borland developers surely had an extended work day...
Anyway, they had a build folder inside the overlay sources folder,
where they ran TASM.EXE to produce the OBJ files.
"om" there stands for the overlay manager, but what is "y"?
Yet another overlay manager? Because Borland alone had two previous managers.
The dependency YVERSION.AI appears up to 3 times per OBJ, apparently because
it was also included by the other included files.
I.e. YALTA.AI and YOM.AI also include YVERSION.AI, in addition to every ASM
including it as first line (except YOMJMP.ASM, where it came after RULES.AI).
Given that OVERLAY.LIB was compiled from pure *.ASM by late night programmers,
we should expect weird code, with a few `push cs`/`pop ds`, `mov BP,0`,
and the returns with a carry flag acrosz multiple procedures,
with total absence of any good conventions, like the stack frames or standard
library calls. So lets first gather a bit more information about our specimen.
Googling the history of overlay managers for DOS, we discover that
there were actually multiple overlay libraries, including several for Borland
products. Most work through an overlay stack, pushing a popping overlays,
as the calls progress, except the Borland C++ one, which is called VROOMM.
The VROOMM stands for 'Virtual Run-time Object-Oriented Memory Manager'.
As the name suggests, it was made to support object-oriented code, and therefore
has more fine grained linking scheme.
Still we check the other overlay implemenations. One of them is OVERMGR.TPU
by Turbopower Software. The OVERMGR.TPU copy was preserved at a Russian site:
https://pascal.sources.ru/comm/pibter41.htm
But it doesn't share code with the OVERLAY.LIB we are dealing with.
Another implementation comes with Borland Pascal as OVERLAY.OBJ, which, while
much smaller, has API similar to the Bolrand C++ 3.0 one. Checking the dates...
obj/OVEREMS.OBJ:
1991:07:25 02:55:50 ovr\overems.ASM
1992:08:13 03:08:26 \bp7\com\SE.ASM
obj/OVERLAY.OBJ:
1990:12:12 02:44:24 ovr\overlay.ASM
1992:08:13 03:08:26 \bp7\com\SE.ASM
we see that it was made after the Borland C++ one. So it could be a simplified
version, in the same vein how Borland Pascal was always inferior to Borland C++.
We also notice the `\bp7\com\SE.ASM` include, which surprisingly comes with
the standard library in `\RTL\INC\SE.ASM`. Among other definitions it has:
; Overlay header record
ovSignature equ (word ptr 0)
ovSaveReturn equ (word ptr 2)
ovFilePos equ (dword ptr 4)
ovCodeSize equ (word ptr 8)
ovFixupSize equ (word ptr 10)
ovJumpCount equ (word ptr 12)
ovLink equ (word ptr 14)
ovSegment equ (word ptr 16)
ovRetryCount equ (word ptr 18)
ovNext equ (word ptr 20)
ovEmsPage equ (word ptr 22)
ovEmsOffset equ (word ptr 24)
ovUserData equ (byte ptr 26)
ovVectors equ (byte ptr 32)
ovRecSize equ 32
Wait a second! That appears to match the structure we seen at the start of
these `int 0x3f` segments!!! And the value at ovJumpCount correspons
to the number of these 5 byte entries, main() calls to. We are up to something!
That is apparently the so called trampoline/veneer stubs table.
These are created by compilers to approach distant code, such as the DLL one,
or well, in OS kernel syscall dispatchers. In our case, code is the dynamically
linked overlay residing on disk. So triggering these ints leads to
the actual function's code being loaded and the cotrol transfered to it.
Still we need a clarification how exactly that happens, so we can properly load
the STRONG.EXE into Ghidra. But now we are ready to challenge
the OvrMan and proceed towards decompiling _OvrPrepare.
_OvrPrepare begins with the
MOV AX,offset _OVRDATA_ ;;in STRONG.EXE, _OVRDATA_=0x225d
MOV DS,AX
meaning it uses a personal data segment, not the usual DGROUP, the rest of
standard library uses. It also resides in a different code segment: 1403.
To decompile the code, we have to manually introduce and override the segment
refernces, because Ghidra is bad at 16bit x86 and assigns all references to DS,
while also ignoring the instructions like `pop ES`. I.e. we have to do
`Set Register Values` starting from the `MOV DI,0x6` in the code like following:
assume ES = 0x1403
MOV DI, 0x6 ;; DI obviously points inside the CS segment
push CS
pop ES
... ;; do comparison here
assume ES = <previous value>
Aftewards we can manually override the 0x6 reference to go inside the CS:0006.
That way the decompiler catches it correctly. In any case, we have to be
aware that the analyzer produces incorrect references for the segmented code.
The OvrMan code is really handcrafted and does rather complex things, like
parsing the actual x86 instructions and garbage collecting the heap.
After some work, we decompile the initial batch of the code, responsible for
the loading of the overlay from file and relocating them:
https://github.com/NancyAurum/devroomm/blob/main/src/ovrman.c
OvrPrepare calls __InitModules(), allocates large buffer of memory for storing
the currently loaded overlays. Then it calls __OVRINIT, to initialize
the handler for the `int 3Fh` interrupts, which Bolrand C++ 3.0 emitted instead
of calls to the overlayed functions. OvrMan is expected to resolve these calls
at runtime, by the means of __OvrTrapHandler at the 1403:04f1, which begins
by checking that the frame pointer is aligned, and then proceeds by checking
that it is the overlay header or the trampouline table call...
In general we learn, that after the MZ executable, aligned to the paragraph
is located a set of `FB` sections, and that the `OV` section contains actual
overlay. These `FB` sections are completely ignored by DOS. That is how
self extracting archives are implemented: they just have zip archive
after the MZ executable, which opens itself and seeks past the MZ data.
That FB/OV section duplicates the __SEGTABLE__ processed by __InitModules.
All in all, we now have enough information to relocation our overlays manually,
so they could be loaded in Ghidra, opening the main() function to us.
Fortunatelly each overlay corresponds roughly to a single .C file, while
also exposing all its functions. Meaning that overlays actually help us by
preserving more information about the original code. In fact, we already
know that STRONG.EXE was built from around 60 source files.
Ghidra nicely exposes overlays as CODE_XX header+jump_table blocks.
But they tend to end in the middle of data. Where did these segments came from?
Well, old DOS MZ had no proper proper segment/section table.
Instead we have the seg:ofs pairs, which point to the segment references inside
the exectuable for the DOS MZ loader to relocate.
Ghidra collected all these segment values, both from the relocation table,
and from the values being relocated. Then Ghidra sorted them by their order
in memory and tried to guess their sizes by doing disassembly analysis,
since always start on paragraph boundary, which may contain code from
the previous segment. But disassembly works for code, not data analyzis.
Therefore the abrupt ends in the middle of data.
The last segment is usually the stack segment. For STRONG.EXE it is 128 bytes.
But Ghidra misdetects it as a 16 bytes, despite the fact that its size
could be deduced from the SP field in the MZ.
After stack segment comes the overlay relocations table, which has the FBOV id.
To summarize, after a large model BC++ exe is loaded, the memory layout is:
* Table of far_t interrupt pointers.
* BDA
* DOS kernel (back in the day entire OS kernel fit in 64K)
* PSP (also the initial value of DS and ES)
* C runtime segment (0x10000 in Ghidra, STARTX, stdlib, etc...).
The MZ header and relocation table are missing.
* Overlay Library.
* A few more non-overlayed OBJs (i.e. mouse and video code).
* Segment holding main()
* Overlay headers (bosh_t + trampouline table)
* _DATA or DGROUP segment
* BSS (zero-inited variables)
* Stack segment
* Overlay buffer (since it is allocated first)
* User heap data.
* Video memory stating 0xA0000
And we can pretend that overlay buffer is really larger and load there
all out overlays. Of course will prevent us from running the analyzed code in
the builtin emulator and sampled data, but everything is made of trade offs.
BTW, I was wrong believing that IDA Pro doesn't support Borland C++ overlays.
It does,
http://blog.rewolf.pl/blog/?p=1202
but the IDA overlay loader is from 1994 and needs to be fixed to work with
the modern IDA. It also doesn't fixup the function address references correctly.
There was an IDA SDK floating around Russian torrent sites,
and the SDK has the source code for the Borland's overlay loader.
But I have already decided to use Ghidra, and you don' change horses on
the river crossing.
To be continued... Current Mood: contemplative |
|