Ghidra Issues Last part ended with installing Ghidra and planning to recover overlays.
But the Borland overlay format is not well documented.
The easiest way to approach it is to look inside the Overlay Manager code.
The overlay manager gets initialized by the C0.ASM's StartExit routine,
which gets called just before the main(). StartExit seems like a nice place
to test the decompiler and get accustomed to its quirks.
The good thing is that decompiler does work out of the box.
The bad things is that most of the variables it generates can't be renamed
for some obscure reason. I.e. doing `RMB -> "Rename Variable` doesn't work.
Most pointer variables can be renamed if we first do "Adjust Pointer Offset",
which brings in the "Create Relative Pointer" dialog.
There we set the pointer's type, typename and offset.
Now "Rename Variable" will suddenly work.
In a few other cases you can make it work, by introducing
the explicit write references onto the CF flag.
Yet in most cases Ghidra will just add your name under the function variables:
undefined1 HASH:5f05ba6... VariableName
while the decompiled name won't change.
Well, `undefined1` is type, but what is `HASH:5f05ba6...`?
What is going on?!! How do we get custom names?!!!
Ghidra's decompiler has rather obscure conventions.
When Ghidra users and developers want to introduce some abstract entity,
they create address space for the entities of such type.
We did that already for the DOS 21h syscalls map.
Same happened with the temporary decompiler variables, which never existed in
the original machine code. They are placed into the `HASH:` address space,
when user tries to rename them.
Unfortunately after doing the decompilation, Ghidra doesn't use the names
user supplied for these `HASH:` variables. And searcinh Internet for a solution
doesn't seem to help. There was a fixed issue at the Ghidra's github:
https://github.com/NationalSecurityAgency/ghidra/issues/193
But the latest github checkout of Ghidra still fails to use user supplied names.
So the only option is modifying the Ghildra's source code.
Compiling Ghidra
________________________________________________
The first step would be actually compiling the Ghidra from sources.
Following the guide, https://daniao.ws/notes/quick-tips/build-ghidra
We install Visual Studio, git and gradle.
Then issue the following commands:
git clone https://github.com/NationalSecurityAgency/ghidra.git
gradle --init-script gradle/support/fetchDependencies.gradle init
gradle buildGhidra -x ip -x createJavadocs -x createJsondocs -x zipJavadocs -x sleighCompile
Surprisingly the github checkout of Ghidra builds and runs on Windows.
Even the C++ decompiler part doesn't give errors.
The results ends up in build/dist/ghidra_11.2_DEV_20240602_win_x86_64.zip
Out of the box. Without issues.
Unless you follow the guide recommending the so called "docker container".
Apparently if Ghidra fails to run after build,
then "running `gradle clean`, might get it to work."
The DevGuide.md guide also says that Linux build fails on non-English locales
There is a known issue in Gradle that can prevent it from discovering
native toolchains on Linux if a non-English system locale is being used.
As a workaround, set the following environment variable prior to running
your Gradle task: LC_MESSAGES=en_US.UTF-8"
Luckily we are blessed with Windows 11!
The second step is actually getting to the place, where decompilation happens.
Ghidra is written in Java. There are far worse languages now, like Rust and Go.
Still Java involves organizing program as a thousands of *.java files,
each holding a single class. Navigating this mess involves special software.
The problem of searching specific code in a large code base can be approached
from several direction:
1. Starting from the main() and studying the general project's structure.
As well as accustoming with the frameworks used.
Then leveraging that knowledge navigating towards the most promising path.
2. Using debugger or just grep to search for the keywords of interest.
That applies to working with Ghidra's source, just as well as to working with
the decompiled code.
Since I have no time and my task encompases only a tiny poriton of Ghidra,
I used the grep approach, looking for the the decompiler's window tile text
"Decompile:"
which instantly got me to the
Features\Decompiler\src\main\java\ghidra\app\plugin\core\decompile\DecompilerProvider.java
There we must note that Java uses the keyword `super` to call parent methods:
super(arg0, arg1, .... argN); //ivnoke parent's constructor
super.methodName(); //invoke parent's method
The decompiler itself is under Ghidra\Features\Decompiler\src\decompile\cpp
Notice how the Ghidra IDE and the Ghidra's decompiler are two separate programs.
The IDE is written in Java, while the decompiler is pure C/C++.
Java code starts the decompiler as a daemon process and then feeds it XML data
for the function to be decompiled.
C/C++ decompiler source is easier to read due to the header files exposing
all the structures, separately from code.
Looking at varmap.h/NameRecommend, the original idea was apparently for
the C/C++ decompiler to query the Java process for user names and apply them,
but that never happens. And fortunately it is not our task to investigate why.
That out of the way, we broke into private DecompilerProvider::updateTitle(),
Which in that file gets called from the public decompileDataChanged().
Now we have to check all the places decompileDataChanged gets called from,
hoping to get a better vantage point, showing us way towards the decompiler's
C/C++ process communication code.
Going through the myriads of Java files is tedious and annoying exercise,
asking for a proper navigation framework, made specially for Java.
The best Java editor appears to be IntelliJ IDEA, it even comes with bytecode
decompiler, so you can go inside the compiled Java *.class files.
It is also the easiest, just works without all the open source / linux bullshit.
Just like IDA Pro, IntelliJ IDEA is expensive and made by Russians,
so it is okay to pirate it.
Unfortunately Ghidra's README.md states the following:
To develop the Ghidra tool itself, it is highly recommended to use Eclipse,
which the Ghidra development process has been highly customized for.
So to avoid any additional issues we have to stick with Ghidra practices,
and use Eclipse. Eclipse was also highly recommeded to me by the members of
the so called open source community over the IntelliJ IDEA.
Eclipse while at first installing perfectly and starting up, after
being closed and restarted again, came up with an obscure error:
Incompatible JVM. Version 1.8.0_401 of the JVM is not suitable for
this product. Version: 17 or greater is required;
Apparently in 2024 Eclipse is unable to locate the JVM on its own.
The only solution is to add in front of -vmargs in eclipse.ini the lines:
-vm
C:\Program Files\Java\jdk-17\bin\javaw.exe
Still a mystery why Eclipse was able to start the first time on its own,
but we are up to a rough start.
Next, since the Ghidra uses Swing while we explore the code base from a UI call,
we can try compile a SwingHello.java program, which will allow us to
quickly familarize ourselves with how Java projects function.
So lets create a new Eclipse project and...
Importing a premade file into an Eclipse project is already a difficult task.
One has to manually copy the files int the src/ folder of the project,
using the file manager (not Eclipse). Then left-clicking the src/ folder
inside the Eclipse's file browser and picking "Refresh".
That will produce "(default package)" folder, holding refreshed SwingHello.java.
Additionally Eclipse immediately tried to parse SwingHello.java, reporting that
The package java.awt is not accessible
The package javax.swing is not accessible
Syntax error on token "module", interface expected | module-info.java
Well, folks, last time I checked Java was in 2004, and apparently it haven't got
any easier. Modern times require modern solutions.
So I asked ChatGPT. And ChatGPT said that I need javaSE-1.7,
which was the last JDK version to provide AWT and Swing by default.
Luckily the new JDK still includes Swing, which is buried deep inside
Project -> Properties - Java Build Path -> Libraries
-> Add Library - JRE System Library -> Execution Environment
That fixes `The package ... is not accessible` errors, but the
Syntax error on token "module", interface expected | module-info.java
Hasn't gone anywhere.
ChatGPT said that is because "you are using a modular JDK (Java 9 and above)"
Apparently one could also explicitly specify the `java.desktop` module somehow,
since the `import javax.swing` directive doesn't auto enable it.
That module-info.java is something Eclipse have created for the project,
just to make you confused. So I just commented out the
module swing {
}
trash, and the SwingHello.java finally compiled and ran!
Yes! That is what it takes to do HelloWorld.java in 2024.
So yeah. Java universe definitely got more "special" since 2004.
Unsurprisingly Eclipse has many other issues, like tabs instead of spaces, or
inability to properly rebuild the project if some file changes,
now way to copy filesystem's *.java path or go to a specific filepath,
coupled with a cryptic UI (i.e. Bookmarks are burried deep under several menus).
And Java still fails to overload `==` for strings, and has no alternative to
the C99 #define, so no way you can do
#include "common.h"
#define PS public static
#define Str String[]
Yet the decision was to use Ghidra's IDE, so we have to deal with imperfections.
Otherwise we have more luck adapting Ghidra's decompiler (which is C++)
to work as a plugin with the pirated IDA Pro SDK.
Anyway, since Eclipse is ready, when can run:
gradle buildGhidra prepdev eclipse buildNatives
After that we import the Ghidra projects into Eclipse:
* Right click on the projects view
* *File* -> *Import...*
* *General* | *Existing Projects into Workspace*
* Select root directory to be your downloaded or cloned ghidra source repository
* Check *Search for nested projects*
* Click *Finish*
Now we left click the ___root folder and pick Run As, and pick Ghidra.
After that F11 or Ctrl-F11 could be used to run it quicker.
Navigating The Code
________________________________________________
Finally we are prepared to fiddle with the Ghidra's code.
So in DecompilerProvider.java we rename "Decompile: " to "Our Decompile: "
Press F11 and see what we expect to see.
Next is locating the decompiler's return code.
The point of interest in DecompilerProvider.java is:
private void doRefresh(boolean optionsChanged) {
...
controller.setOptions(decompilerOptions);
if (currentLocation != null) {
controller.refreshDisplay(program, currentLocation, null);
}
To find the refreshDisplay function, we must
* place the editing cursor on the name
* press Ctrl-G (Go to declaration)
* press enter.
Or through RMB -> Declartions -> Workspace.
There is also Ctrl-Shift-G to locate the uses.
A tad easier than using grep or ctags, which just grind through the OOP mess.
In addition to Ctrl-G, we will need bookmarks.
To set a book mark one either RMBs left of the line number, or Edit->Bookmark.
Accessing the set bookmarks is harder. One has to open the Bookmarks view:
Windows -> Show View -> Other -> General -> Bookmarks
Just like Ctrl-G, bookmarks are mandatory when working with Java projects,
which tends to have millions SLOC of boilerplate in thousandos of files.
Anyway, now we are inside
Ghidra/Features/Decompiler/src/main/java/ghidra/app/decompiler/component/DecompilerController.java
Which does:
clearCache();
decompilerMgr.decompile(program, location, null, debugFile, true);
Again, we place cursor on `decompile` and press Ctrl-G, getting to
new DecompileRunnable(program, location, debugFile, viewerPosition, this);
one more Ctrl-G and we are inside DecompileRunnable.java, seeing
public void monitoredRun(TaskMonitor monitor) {
...
decompilerManager.decompile(program, functionToDecompile, debugFile, monitor);
...
Another Ctrl-G leads use to Decompiler.java and its's
return ifc.decompileFunction(function, timeout, monitor);
Next Ctrl-G got us to
Ghidra\Features\Decompiler\src\main\java\ghidra\app\decompiler\DecompInterface.java
We are finally at the place where Java code communicates with the C++'s
DecompileProcess. The DecompileCallback is used by the C++ code
to query information from the Ghidra's database.
But we are interested in
decompileFunction(Function func, int timeoutSecs, TaskMonitor monitor) {
...
new DecompileResults(...);
DecompileResults calls the `decodeStream(Decoder decoder)`, which
uses `ClangMarkup.buildClangTree(decoder, hfunc)` to parse the XML results
of the decompilation into the docroot variable.
The docroot variable has all the raw decompiled names as text tokens.
We can dump them to the stdout using Ghidra's
Msg.info(this, "Hello, " + "World!");` to printf.
But the `System.out.printf("Hello %s!%n", "World")` works okay too.
To rename these "unique" local variables we need to compare hashes.
So we do similar, searching for "Rename Local Variable" title,
and then Ctrl-G'ing from it towards the place where it is saved.
That gets us to the:
HighFunctionDBUtil.updateDBVariable(...)
which basically does:
tmpHigh = highSymbol.getHighVariable();
DynamicEntry entry = DynamicEntry.build(tmpHigh.getRepresentative());
storage = entry.getStorage();
var = createLocalVariable(function, dataType, storage, pcAddr, source);
The DynamicEntry.build basically remaps the variable's storage from
the decompilers unique: space the IDE's HASH:, which we see the listing.
Modifying The Code
________________________________________________
Now we have to dig into the parsed DecompileResults's docroot,
replacing the variable names using getText and setText.
public class HighSymbol {
public void setName(String n) { name = n; }
...
public class DecompileResults {
...
private void decodeStream(Decoder decoder) {
...
decoder.closeElement(docel);
renameUniqs();
}
public void renameUniqs() {
//System.out.printf("Decompiled HighSymbols for %s:%n", function.getName());
LocalSymbolMap lsm = getHighFunction().getLocalSymbolMap();
java.util.HashMap<String,String> renameMap = new java.util.HashMap<String,String>();
for (HighSymbol highSymbol : lsm.getNameToSymbolMap().values()) {
VariableStorage storage = highSymbol.getStorage();
if (!storage.isUniqueStorage()) continue; //skip non unique: variables
//unique variables have to be converted to HASH: storage, so we can compare them
//with the user named hashes.
HighVariable tmpHigh = highSymbol.getHighVariable();
if (!storage.isHashStorage() && tmpHigh != null && tmpHigh.requiresDynamicStorage()) {
DynamicEntry entry = DynamicEntry.build(tmpHigh.getRepresentative());
storage = entry.getStorage();
}
String hash = storage.getFirstVarnode().getAddress().toString(true);
//System.out.printf(" * %s %s%n", highSymbol.getName(), hash);
for (ghidra.program.model.listing.Variable var : getFunction().getLocalVariables()) {
if (!var.isUniqueVariable()) continue;
String hash2 = var.getFirstStorageVarnode().getAddress().toString(true);
if (hash.equals(hash2)) {
//System.out.printf(" %s is %s%n", highSymbol.getName(), var.getName());
renameMap.put(highSymbol.getName(), var.getName());
highSymbol.setName(var.getName());
}
}
}
if (renameMap.size() == 0) return;
List<ClangNode> alltoks = new ArrayList<>();
docroot.flatten(alltoks);
if (alltoks.isEmpty()) return;
for (int i = 0; i < alltoks.size(); ++i) {
ClangToken token = (ClangToken) alltoks.get(i);
String tokenText = token.getText();
//System.out.printf(" %d:'%s'%n", token.isVariableRef()?1:0, tokenText);
String renamed = renameMap.get(tokenText);
if (renamed != null) token.setText(renamed);
}
}
Now Rename Variable works as expected.
But we are still far from doing the actually fun stuff.
The way is blocked by our first end level boss - the Borland overlay manager.
To be continued...
Current Mood: amusedCurrent Music: Pilotpriest - Don't Forget To Breathe