Настроение: | amused |
Музыка: | Clan Of Xymox - Weak in My Knees |
Entry tags: | computing |
Diversion
Want to teach myself some Ghidra.
As a test specimen, I decided on https://www.mobygames.com/game/482/stronghold/
There are several reasons.
1. The game is 16 bit and I never did any segmented 16 bit x86,
while Ghidra has limited support for decompiling real 16bit x86.
2. The game likely has some interesting coding, since it has a 3d view
and processes multiple entities at once.
3. Stronghold is the early example of Dungeon Keeper / Majesty style games.
The only game predating it is Populous.
So there is some historical value in it.
4. The lead programmer is a trans girl Cathryn Mataga!
The disassembling of any code base is a methodical process, which begins
with the following steps:
0. Researching what versions are available and picking either the one
with more debug information or the one which will be easier to emulate
and decompile.
It also makes sense to research the other products the programmers behind
the disassembled code made at around the same time, since high chances
they re-used some custom code.
1. Informing oneself about the CPU and OS, which run the code being decompiled.
2. Deciding on the disassembler/decompiler to use.
3. Determining the compiler used.
4. Procuring a function signature database to quickly get over
all the standard runtime code.
5. Enumerating the OS syscalls.
6. Locating the actual main() function, as opposed to the start().
Stronghold comes in 2 flavors
________________________________________________
In addition to English, german, spanish, french and Japanese versions.
The german, spanish and french versions don't appear to be of any importance
for reversing the game, the Japanese one was released for two somewhat different
x86 based computers with MSDOS: FM Towns and PC-98.
In Japanese it is called "ストロングホールド ~皇帝の要塞~".
But it can be found in abandonware sites by googling:
* "Stronghold: Koutei no Yousai"
* "Stronghold: Emperor no Yousai"
The Japanese executable differs drastically from the English version.
In fact it even has a different name (ST.EXE).
The Japanese version comes with a 3d animated intro STRONG.FLI,
has General MIDI music and unique error strings like:
"Error %d %d. (Give Sato & Miwa Both numbers.)"
These are missing in the STRONG.EXE.
Still the Japanese version has no additional symbol information,
so I decided to go with the English version, since it is the most common
and easier to emulate. The Japanese version can be used for reference
after enough progress has been made on the English version.
There were also two different DOS CD releases,
* German version by Softgold
* 1995 English version
Part of the "Unlimited Fantasy' a 5-CD set from Slash Corp.
The English CD version STRONG.EXE is identical to the floppy one.
I couldn't obtain the German CD one.
All versions, including the Japanese, appear to be using the same
version of either Borland C++ or Turbo C++ compiler.
There are several other games by Stormfront Studios:
* Tony La Russa Baseball II could be reusing some code, like animation timing
yet the engine is not 3d and the custom file extensions are different.
The compiler used is the same Borland C++ and the MAIN.EXE has similar size.
It has no debug data present.
* Eagle Eye Mysteries isn't promising since it had a different programmer
and the engine EAKIDS.EXE made by Electronic Arts is completely different,
and it was compiled with MSC.
The EAKIDS.EXE has debug info though, that means the original source code
can be fully recovered.
The game also uses LBM graphics, just like STRONG.EXE, but that was just
a popular format back in the day.
The engine's driver, EEM.EXE, is compiled using Borland C++,
but it doesn't appear to share any code with STRONG.EXE
* Neverwinter Nights, Treasures of the Savage Frontier, etc...
These have the same programmer but use a Gold Box engine.
* Rebel Space had only a screenshot preserved:
https://www.vintagecomputing.com/wp-content/images/prodigy/screenshots/prodigy_rebel_space_large.png
Apparently that was a early EVA Online like game
None of these specimens are of any help to our endeavor.
We can use them to produce a function signature database by detecting
the C runtime library part, but instead we will try to obtain
the original compiler used to build the STRONG.EXE and use its library.
16bit x86 DOS computing environment
________________________________________________
Proper decompilation requires near expert knowledge of the system's quirks.
And 16bit x86 DOS is probably the baroquest and the hardest to grasp
system around. Still studying this sour subject will help understanding
modern x86.
To better grasp what we are getting into, let's start by looking at
the official Stronghold system requirements:
https://www.mobygames.com/game/482/stronghold/specs/
Minimum OS Class Required: PC/MS-DOS 5.0
Minimum CPU Class Required: Intel i386
Minimum RAM Required: 2 MB
Video Modes Supported: VGA
Notes:
505,000 Bytes of free base RAM
1,000,000 Bytes of free EMS/XMS
The i386 CPU requirement could mean one of the two things:
1. The i286 16MHz speed is not enough to run the program,
which is still 16bit.
2. The program is actually 32bit, in that case the progam will be using
32-bit runtime, called VCPI or DPMI, which is basically a mini 32bit OS.
Since there are no references to VCPI or DPMI in the executable or manual,
we can conclude the program is 16bit and needs i386 for more speed.
Reviews at the time of release indeed said it needs min 33MHz.
In 1993, 32bit had just begun gaining momentum and programs were 16bit.
But wait a second!!! What are these "base RAM" and "EMS/XMS"?
To answer this question we have to understand how the orginal i8086 CPU worked.
The main thing about 16bit x86 is its use of two registers to access memory.
One register is called a segment register, and the other is an offset inside
that segment. The x86 had 4 dedicated registers, holding the segment addresses.
These are CS, DS, SS, ES. By convention CS points to code, DS to data,
SS to stack and ES points at some array, we worked with at the time.
We can imagine a segment being a pointer to a C-struct, which is 16byte aligned
has arrays as members (i.e. functions in the CS segment or global vars in DS).
To get the linear address of the referenced memory, we do
linear = ((seg<<4) + off) & 0xFFFFF
That `& 0xFFFFF` (mod 0x100000) is here for a not so good reason.
The very first x86 CPU (i8086) had an architectural memory limit of 1**20 (1MB),
due to the address bus connecting CPU to memory controller having 20 pins.
When anything above 1MB got accesses, 8086 instead of segfaulting or returning
zeroes just `mod 0x100000` the address, since that required less transistors.
As usual the quirk was declared to be a feature, instead of undefined behavior.
So people began depending on it.
In fact, Intel never planned to support more than 1MB, since x86 was made
for microcontroller market, so x86 becoming a standard for workstations due to
Microsoft's marketing came as a surprise to the engineers, due to workstation
use demanding larger and larger amounts of memory. Additional complication
came from the fact that memory above 640KB (0xA0000–0xFFFFF) was reserved for
hardware use, like BIOSes and IO buffer areas for various devices.
That memory was named the upper memory blocks (UMBs), each 64KB in size.
The 640KB below UMBs were called "conventional memory" or "base memory".
The entire 384KB area of these UMB was called upper memory area (UMA).
In particular the 64KB area at 0xA0000 was reserved for graphics output,
and that gave the classic DOS 320x200 limit to graphics resolution.
Since the large portion of UMBs was unused, people invaded it to use for their
own needs, like moving there daemon services, which were named
terminate-and-stay-resident (TSR), because they operated by listening
to interrupts which back in the day were a primitive version of
interprocess communication.
So the need came to extend the CPU, and Intel engineers decided to include
an additional pin, called A20, which specified if address pin 20 is active.
The quirk is that it only controlled the activity of pin 20, not any pins above
it. So when the A20 was disabled, the memory had a "striped" look, where
every odd megabyte mapped onto the megabyte preceding it. I.e.
[0x000000:0x0FFFFF] accessible
[0x100000:0x1FFFFF] maps to [0x000000:0x0FFFFF]
[0x200000:0x2FFFFF] accessible
[0x300000:0x3FFFFF] maps to [0x200000:0x2FFFFF]
etc...
Sweet? Unfortunately that is the core of the i286 system running our STRONG.EXE,
which uses 2MB RAM so we have to understand it to get our project anywhere.
Now accessing any memory above 1MB with a 16bit segment model, required
Intel to introduce an additional quirk - The Segment Descriptor Tables.
One had to initialize translation tables and feed them to the CPU.
These tables specified the 24-bits addresses for 16bit segment registers,
extending the addressable memory to the whooping 16 MB.
That memory above 1MB was called "Extended Memory" or "High Memory".
Additionally a chunk of 0x10000-16 bytes, called high memory area (HMA),
was located at 0x100000. It was accessible, by setting the segment
register to 0xFFFF and enable A20.
Now, instead of addresses, the segment registers hold the 13bit index of
an 8-byte entry inside the descriptor table, plus a bit indicating
if the userspace or OS-space tables are used (LDT/GDT) and also 2bit privilege.
The i286 descriptor table was composed of the following entries:
typedef struct {
uint8_t address[3]; //linear 24bit base address, enough to cover 16MB
uint16_t size; // Basically size_in_bytes-1
// i.e. size of 0 means only the first byte is accessible
uint8_t type; // code/data/system
uint16_t reserved1;
uint16_t reserved2;
} i286_segment_descriptor_t;
To access the 16MB with these descriptors, developers used a special API, called
eXtended Memory Specification (XMS), which was implemented by a driver or
by a BIOS int 15h service 87h, called "Move Block":
http://vitaly_filatov.tripod.com/ng/asm/asm_026.14.html
Which basically mapped a chunk of memory from the above 1MB under it.
It did some black magic, including entering the so-called "protected mode",
where the segment registers hold LDT/GDT indices, instead of raw addresses.
Inside the protected mode, BIOS did the bank switching, and used "SOFT RESET" to
return into the direct memory access mode (called real mode on x86).
That BIOS service call was available only on the pre i386 systems.
While the i386 systems emulated it by means of HIMEM.SYS driver,
which did the usual 32bit switching:
http://info.wsisiz.edu.pl/~bse26236/batutil/help/HIMEM_S.HTM
Note the
/INT15=xxxx
Allocates the amount of extended memory (in kilobytes) to be reserved
for the Interrupt 15h interface. Some older applications use the
Interrupt 15h interface to allocate extended memory rather than using
the XMS (eXtended-Memory Specification) method provided by HIMEM.
THe entire process is described here:
https://medium.com/@wolfcod/a-journey-into-himem-sys-de2ece29c0c8
To switch back to real mode from protected mode on 80286 first, it's necessary
to store at address 0040:0067 the return address of your code in
protected mode and from protected mode you can reset the CPU, but before doing
this it's necessary to write on CMOS memory a code (typically 05 or 0A) to
signal to the POST bios code to switch back. Without this magic value,
the POST procedure will continue the execution of standard BIOS code with
a complete bootstrap of the operating system.
Before the segment descriptor table, the i8086 memory expansion boards supported
the so-called bank switching, following the Expanded Memory Specification (EMS).
EMS implied that an UMB at 0xD0000 or 0xE0000 was broken into 4 16KB banks,
each of which could be switched to point into the memory above 1MB.
To support bank switching, a divice driver, called expanded memory manager,
or EMMXXXX0, was installed, as part of say HT12MM.SYS for HT12 chipset.
The programs using EMS had to check for this EMMXXXX0 driver.
So the presence of string "EMMXXXX0" anywhere in the specimen data indicates
it uses bank switching to access anything above 1MB.
In later systems expanded memory was emulated in software through
the XMS driver, which did the bank switching through segment descriptor tables.
The emulators were QRAM or USE!UMBS.SYS, which used the "shadow ram" feature of
i286 motherboards to perform the mapping:
https://retrocmp.de/hardware/rampage-286/use!umbs.txt
https://retrocmp.de/hardware/rampage-286/quarterdeck-qram.pdf
Some purely software drivers were called "LIMulators". On i286 these LIMulators,
like EMM286.EXE, had limitations, like no mirroring/aliasing:
https://www.pcorner.com/list/UTILITY/EMM286.ZIP/EMM286.TXT/
On the 32bit systems the EMM386 emulated EMS through the HIMEM.SYS API,
which had no such issues due to the use of proper MMU.
And the use of XMS/EMS precluded the use of 32bit code, which could access
entire 4GB of memory at once.
The machines without 2MB could still run EMS software. The MEMSIM32 LIMulator
swapped the banks to/from HDD, instead of expanded memory.
The memory availability can be checked by the DOS MEM command.
All these technologies require maintaining a list of segments or bank tables.
To properly decompile STRONG.EXE we will have to recover these tables buried
somewhere in the startup sequence the exe file code, since MZ exe format
supports only the pre i286 relocations. The C compiler vendors had to introduce
their custom formats to support the memory sizes above 1MB.
Given the officially stated EMS/XMS requirement, we have to deal with all that.
To summarize: typical 1993 i286 DOS systems had about 2MB of memory,
and accessing memory above 1MB is done through the EMS API.
Being so quirky and old, 16bit x86 has a rather limited support from the tools
used to statically analyze the code. For example, many contemporary utilities
will fail to open OMF object files and libraries, which were used by x86
tools before the Windows NT popularized COFF.
Picking a decompiler
________________________________________________
Decompiling a program requires several stages:
* Recovering the control flow graph the machine code.
* Recovering functions and data references.
* Recovering data structures.
* Running the decompiler to produce C/C++ code of functions of interest.
* Using the feedback from the decompiler to recover more of the above.
* Refactoring said the decompiled code, compiling it and plugging in
function by function into a running program, either emulated or recompiled,
which allows easier inserting C/C++ code.
All of these stages require using a builtin address space manager, coupled with
disassembler and hex editor. So while there are separate decompilers,
like e2c,DCC, Bumerang, it is necessary to have everything united into
a holistic development environment.
In general, decompilation is as old as the computing itself:
* https://www.program-transformation.org/Transform/HistoryOfDecompilation1.html
* https://www.program-transformation.org/Transform/HistoryOfDecompilation2.html
But for 16 bit DOS, we have three IDE options, each with its own drawbacks
* IDA Pro:
* While very popular, it is also very expensive and requires obtaining
a pirated copy, which is usually outdated, incomplete and has viruses.
* Closed source and has limited ways to modify the tool to suit your needs.
Outside of decompiling the decompiler itself.
So if it fails to decompile your code, you have no way to fix it.
* Doesn't spport decompiling 16 bit x86, outside of connecting to
an external decompiler, like DCC or Ghidra.
* Ghidra ( https://ghidra-sre.org/ ):
* Very popular and well documented.
* Open source (MIT license), so can be modified to do anything you want.
Being a US government project made to analyze malware and vulnerabilities,
its ability to decompile DOS games is a fortunate byproduct.
The code consists of boilerplate rich Java, with all the crazy OOP patterns
you can imagine used for the simplest of tasks.
* It has become standard.
Plan to work with others on a paid project - you do better know it.
* Has advanced features, like a machine learning plugin.
Should be possible to connect it with LLM to get quick insight on the code.
* Comes with enterprise integration and automation, like headless mode,
to process large loads of code without a human being to drive it.
* It supports decompiling 16bit x86!!!
* Reko ( https://github.com/uxmal/reko ):
* A C# challenger to Ghidra, with a bit cleaner code, without most of the
Java bullshit.
* Cleaner IDA-inspired UI and much better Windows support than Ghidra,
because of C#.
* Super lightweight (20MB distribution completely with IDE).
* Astonishingly good code analysis for 16bit DOS executables.
It was able to locate main() in STRONG.EXE without any help
But it struggles with further stages of decompilation, such as
dataflow analysis, so I couldn't see the decompiled code for most functions.
Yet for 32bit code, like say binkw32.dll, it does a good job.
* Doesn't allow stack variables and arguments layout editing.
* Not a full featured IDE, and offers now way to rename data areas,
supply types, introduce imaginary segments, external data or code.
But Reko does offer some Python scripting support, which I haven't
researched in depth. Maybe you can implement some of these features?
I.e. recovering strings.
* Supports decompiling 16bit x86 and loading OMF files.
It even supports some packers.
Moreover, it claims that:
"16-bit real mode" is "The first architecture targeted by Reko"
https://github.com/uxmal/reko/wiki/Supported-binaries
I.e. the authors made it to analyze DOS executables.
* Radare2+RetDec+Iaito ( https://github.com/radareorg/iaito/ )
* Another full featured disassembly IDE, which consists of several
subprojects, each of which consists of separate utilities.
People love and fork it, so there will always be some active fork like Rizin
* Written in plain C, with a few Python scripts to glue everything together.
Think command line utilities, servers and different GUIs to drive them.
* Apparently the most flexible and professional toolkit, and it expects
the user to know exactly what they are doing.
Will take the most time to set up, but can be made to do anything you want.
* Support for 16bit DOS executables is left as an exercise to the reader,
but given the flexibility of the framework, you can easily insert
any other decompiler.
These are listed in order of historical appearance.
I decided on Ghidra, since it is mature, has all the features needed to reverse
the 16bit x86 code, and if some features are missing, I can implement them
with a plugin or an extension. Ghidra also produces the best C code out of them
all. Ghidra is also an opportunity to study the stupid modern Java practices,
since I never did any JVM programming. Ghidra also has MIT license, if whatever
you contribute to it will automatically have more value to the society than
the Reko's or Radare2's GPL.
Beside IDA, Ghidra, Reko and Radare2, there are standalone and CLI decompilers.
These include e2c, DCC, Boomerang, RetDec, REC Studio, mips2c, m2c, ExeToC...
Some of them can be used with IDA. Others are useful in their context.
For example, mips2c/m2c is specially geared towards recovering
the exact original C source code, when it is required for historical accuracy.
Apparently none of the tools support decompiling EMS/XMS code properly,
which will be a big issue.
Deducing the compiler.
________________________________________________
Ghidra failed to determine the compiler used to compile strong.exe.
So I loaded the executable into IDA, and then checked Option->Compiler
IDA determined it as Borland C++ with a small model (near code, near data).
Both as we find later were wrong.
Yet FLIRT signatures for "TCC/TCC++/BCC++ 16 bit DOS" worked and
about 1/8 of the functions got recognized.
Feeding STRONG.EXE to string exposes:
"Borland C++ - Copyright 1991 Borland Intl"
Apparently IDA also used this string to determine the compiler.
The other ways to detect the compiler are
* Trying function signatures for the runtimes of different compilers against
the studied code.
* Code and data generation and layout idioms specific to that compiler.
* Decompiling a small routine and then compiling it with a suspected compiler.
To pinpoint the exact compiler used, we obtain every 1993 BCC/TCC version, and
check their c0*.obj against the entry point of the strong.exe.
Borland compilers are available at:
https://winworldpc.com/product/borland-c/20
https://winworldpc.com/product/turbo-c/1x
https://winworldpc.com/product/borland-turbo-c/1x
First thing we check is what copyright strings these compilers leave.
Turbo C++ 1.01 spits into exe files
"Turbo C++ - Copyright 1990 Borland Intl."
While Turbo C++ 3.0, Borland C++ 3.0 and 3.1 leave
"Borland C++ - Copyright 1991 Borland Intl"
That excludes Turbo C++ 1.01.
Luckily Borland C++ 3.0 and 3.1 come with source code for the runtime.
Although the Turbo versions are missing it.
The startup code is located in c0.asm under LIB/STARTUP.
Reading c0.asm we see that every Borland C++ executable contains
the exact memory model used at around seg000:0262.
The strong.exe has model=0xC004
To interpret it we will need constants from STARTUP/RULES.ASI:
#define FCODE 0x8000 /*far code*/
#define FDATA 0x4000 /*far data*/
Lower bits determine the model type
#define TINY 0
#define SMALL 1
#define MEDIUM 2
#define COMPACT 3
#define LARGE 4
#define HUGE 5
Given that, we can concluded that the model is FCODE|FDATA|LARGE.
Yet the asm code and the resulting obj files in Borland C++ 3.1 differ from
the strong.exe.
After SaveVectors call, strong.exe has LES, Borland C++ 3.1 always has
mov ax, _envseg@
mov es, ax
The only compilers doing "LES DI,_envseg@" are Borland Turbo C++ 1.01 and 3.0,
as well as Borland C++ 3.0.
Now we have deduced with 100% certainty that either Borland C++ 3.0
or Turbo C++ 3.0 was used to compile strong.exe.
And for Borland the default calling convention is __cdecl, so we set the
Edit -> Options -> Decompiler -> Prototype Evaluation
to be __cdecl16far.
Arguments to __cdecl16far functions start at SS:[BP+4], since the return is
a far pointer, which gets popped by RETF (return from a far function).
While Turbo C++ and Borland C++ 3.0 compiler executables are different,
the startup code and the functions in CL.LIB appear to have exactly
the same machine code with both compilers. The executable size difference could
be due to the Turbo C++ has two features cut, since the compiler is missing
the following option, which were in earlier released Borland C++ 3.0:
-Hxxx Use pre-compiled headers
-Wxxx Create Windows application
Both were intended to support Windows code, which had huge headers.
Therefore determining the exact compiler is impossible.
The Borland C++ 3.0 comes with the source code for the C runtime
library, which we will be using for reference.
The C0.ASM there has the following comment
"Turbo C++ Run Time Library"
So Turbo C++ 3.0 is probably the same compiler as Borland C++ 3.0.
Just with the Windows support cut out.
Further work would be comparing both TCC and BCC against each other,
and checking if any of the differences are inside of the code generator.
And if there are any, then deducing which variant was most likely used
from the generator choices. But that would be going too far off our way,
just to establish some minor historical certainty.
Strong.exe was also compiled with __NOFLOAT__, since MINSTACK is 128.
Apparently 16 bit x86 already supported floating point numbers.
Borland's C/C++ had one unusual feature: it allowed mixing C and x86 asm
statements together, where the asm statements could reference C variables.
It did some magic behind the scenes to allow that to working seamlessly,
compared to say GCC. Best of example of such code is VRAM.CAS
So back in the day C was truly a portable assembler.
This feature can be useful for partially decompiling the code.
Note that, Borland C++ runtime also includes a few functions with `pascal`
calling convention, which is basically cdecl but with arguments in reverse.
Why? No idea, I was never a fan of Pascal anyway.
Signatures Database
________________________________________________
Since the compiler is known, we need a way to determine all the statically
linked functions belonging to its standard library. Since that significantly
reduces the amount of work and allows us to go directly to the main().
While IDA includes sig/pc/bc31rtd.sig, Ghidra doesn't support the *.sig files.
Instead Ghidra has its own *.fidb signature libraries.
The only included *.fidb are the signatures for vs20XX.
For Borland C++, there are
https://github.com/moralrecordings/ghidra-fidb-dos-win16
but these don't appear to work with my Ghidra version.
The *.java code there used to generate *.fidb doesn't work with
the latest Ghidra (in addition to the Python script requiring Linux),
since the headless script for some reason fails to import the OMF *.lib files.
Modifying it I was able to produce the *.fids, but they failed to work either,
just like the premade ones.
What actually works is just using the UI
* File -> Open File System -> pick one of the *.lib files.
* Select all the imported *.obj files and dragging them into code browser,
which makes "Analisis -> Analyze All Open" possible.
* Finally, Tool -> Function ID -> Create new empty FidDb
and Tool -> Function ID -> Populate FidDb from programs
Among the recognized functions, we see a few related to the overlay feature.
The STRONG.EXE also includes strings "EMMXXXX0" and "Runtime overlay error".
That means the code was compiled with the -Yo flag, in addition to -ml.
This spells additional difficulties for us, since the same area of memory can
hold different code and data at different times (i.e. bank switching).
And the compiler generated these overlays accesses all over the code.
Basically it is a poor man's virtual memory paging.
The code handling the runtime part of overlays is called overlay manager.
Its compiled code resides in OVERLAY.LIB, which comes without any source code.
Beside EMS, OVERLAY.LIB includes functions to work with XMS.
But that is of no value to us, since the STRONG.EXE speaks with it through API.
More on overlays is on page 357 of Borland C++ 3.1 Programmer's Guide
Borland Open Architecture Handbook 1.0 includes info on debugging them:
http://annex.retroarchive.org/cdrom/psl-v3n8//PRGMMING/DOS/GEN_TUTR/BC4BOA.ZIP
The overlay function prototypes are located in DOS.H
extern unsigned _Cdecl _ovrbuffer;
int cdecl far _OvrInitEms( unsigned __emsHandle, unsigned __emsFirst,
unsigned __emsPages );
int cdecl far _OvrInitExt( unsigned long __extStart,
unsigned long __extLength );
The _ovrbuffer is initialized to the size_of_the_largest_overlay*2.
The idea apparently was to use two overlays, where code running in one overlay
could load another overlay and jumps into it. Compiler generated such loads
behind the scenes. These overlays can reside either in extended memory or on
disk.
Some people already made a feature request on github for it:
https://github.com/NationalSecurityAgency/ghidra/issues/5543
While googling a decompilation of OVERLAY.LIB, I stumbled upon:
https://borland.public.cpp.borlandcpp.narkive.com/9hI7JTC4/platform-dos-overlay
>All the software I own is legitimate. I don't know any real programmer that
would knowingly purchase pirated/illegal software. We all know how much work
goes into getting something that works, is good looking and marketable.
Toxic American morality: deny everything, pretend to be overly pious,
all while daily shoplifting and watching gigabytes of CP.
The C library part of STRONG.EXE also has a string COMPAQ, but that comes
from CRTINIT.OBJ, which checks for COMPAQ to implement some hacks.
CRT stands for cathode-ray-tube, not C Run-Time. And its source is CRTINIT.CAS.
DOS syscalls
________________________________________________
DOS uses int 21h to invoke the OS code. The AH is set to the function index.
Ghidra doesn't support the DOS int 21h or syscalls recognition at all,
but it provides a general mechanism for documenting and decompiling syscall.
That is done through the creation of separate "imaginary" address space
for mapping function indexes to actual functions.
To create one for DOS we can do
Window -> Memory Map -> Add new block
Name it say DOS21, make it overlay block at 0x0 of size 0x100
Then we just methodically add DOS syscall labels, turning each one into
a function by selecting the appropriate byte with mouse drag rect.
Then turn all int 21h into a CALL_OVERRIDE_UNCONDITIONAL onto the operand.
The swi(0x21) will remain cluttering the decompiled code,
but after it a proper function call will be present.
It is also possible to populate the DOS21:: segment from a txt file.
That is done with Data/ImportSymbolsScript.py
Which takes a text file with entries like
DOS_GetDTAAddress DOS21::2f f
DOS_GetDOSVersion DOS21::30 f
Where DOS21::2f is the function id and `f` means we are making a function map.
That is what I actually did.
Locating the main()
________________________________________________
With standard library functions and syscalls being recognized and decompiled,
we are ready for going into the program's code. That involves locating
the main function. Of course we can cheat by using Reko for that, but doing
it manually for once has some educational value.
During the compilation the main() routine is referenced from the startup file.
Luckily we have access to the compiler's startup files and we know which exact
one to use (C0L.OBJ for the large model). So we can just open the C0L.OBJ,
which invokes the analyze dialogue. After that we go to the main symbol
and see where it is referenced from... but is not referenced from anywhere!!!
Absence of known calls to main() is due to Ghidras analyzer being very
conservative and ignoring the TEXT section (since no symbols export it
from the OBJ file). We have to initiate the disassembly manually.
The start of the TEXT section is basically our entry routine for the strong.exe.
Borland C++ entry is called start() (or startx() if you consider it also
including the cleanup code).
After the disassembly, we clearly see that the main() routine is referenced
directly at the end of the entry() code. Therefore we just open our strong.exe
disassembly, and go to the entry. That unknown FUN_ there will be the
strong.exe's main().
Note though that Borland's main also includes environment argument:
int main(int argc, char **argv, char **envp)
Strong.exe's main doesn't use envp, and the start() pops it for us,
because in the cdecl calling convention caller is responsible for popping
the passed args from stack, because that simplifies the implementation variable
argument functions, like `printf`.
Conclusion
________________________________________________
Moving from IDA to Ghidra feels like moving from 3ds max to Blender.
Everything is obscure and clunky. For example, in IDA creating new *.sig
was just a matter of running plb and sigmake, which unfortunately are part of
FLAIR, which most pirated IDA distributions miss.
But IDA costs several thousand USD and by buying it you support Russians.
So I doubt any person ever purchased IDA legally, outside of pirating
the versions their employers purchased.
Radare2 appears to be a viable alternative to Ghidra, but it is more difficult
to set up, and will be an overkill for a small simple project. After all,
nothing stops you from using Reko to locate your main()
Future work
________________________________________________
Next step would be preparing the game for complete decompilation.
That requires recovering the overlays table, which will require a Ghidra script.
The static analysis alone isn't enough to decompile larger programs, where
while we slowly replace every function one by one with its decompiled version.
That requires comparative dynamic analysis, where we run original and decompiled
versions side by side, comparing control and data flow traces.
In addition, we need a flexible way to hook recompiled code over the original.
Both goals achieved by modifying existing emulator with breakpoints, which
can redirect execution towards our decompiled code, , which can as well be
compiled by the host C compiler (or whatever language you like).
Given my previous experience modifying DOSBox analysis, when I dumped
the sprites from Whizz game, hooking the decompiled code shouldn't be
too hard - just set breakpoint on execute and make DOSBox call decompiled code.
The hardest task is actually compiling the DOSBox on Windows, since it requires
a Unix userspace (to run ./configure scripts), while I only have the simplified
w64devkit. MinGW apparently provides it together with a mount command:
https://www.dosbox.com/wiki/Building_DOSBox_with_MinGW
And if one building against Cygwin, SDL2 could then expect actual XWindows
for output. Still DOSBox is an easy route compared to actually emulating DOS.
To be continued...