GHIDRA, Playstation 1 executables, FLIRT signatures and PsyQ

Hello to all,

I don’t know about you, but I always wanted to reverse the old console games, having a decompiler in stock as well. And now, this joyful moment in my life has come - GHIDRA is out. I will not write about what it is, you can easily google it. And the reviews are so different (especially from retrogrades) that it will be difficult for a beginner to even decide to launch this miracle ... Here’s an example: “ I’ve been working for 20 years, and I look at your Hydra with great distrust, because the NSA. But I will start something and check it in business ".

If in a nutshell - run Hydra is not scary. And what we get after the launch will block all your fear of bookmarks-and-backdoors from the omnipresent NSA.

So, what am I talking about ... There is such a prefix: Sony Playstation 1 ( PS1 , PSX , Ployka ). Under it was created a lot of cool games, there was a bunch of franchises that are still popular. And I once wanted to find out how they work: what are the data formats, whether resource compression is used, try to translate something into Russian (I’ll say right away, until I have translated a single game).

I started by writing a cool utility for working with the TIM format with a friend in Delphi (this is something like BMP from the Playstation world): Tim2View . At one time I enjoyed success (and maybe even now). Then I wanted to delve into the compression.

And then the problems started. I was not yet familiar with the MIPS processor. Undertook to study. I was not familiar with IDA Pro either (I came to reverse games on the Sega Mega Drive later on the Playstation ). But, thanks to the Internet, I learned that IDA Pro does support the download and analysis of PS1 executable files: PS-X EXE . I tried to upload the game file (it seems it was Lemmings ) with a strange name and extension, such as SLUS_123.45 in Ida, got a bunch of assembler code lines (fortunately, I already had an idea of what it was, thanks to Windows exe x86), and began to understand.

The first difficult place to understand was the conveyor of instructions. For example, you see a call to a function, and immediately after it is loaded into the register of the parameter that should be used in this function. In short, before any jumps and calls to functions, first the instruction following the jump / call is executed, and only then the call itself or the jump.

After all the difficulties passed, I managed to write several game resource packers / unpackers. But I was never engaged in studying the code. Why? Well, everything is trivial: there was a lot of code, access to the BIOS and functions that were practically impossible to understand (they were library-based, and the SDK was not available to me at the time), instructions working with three registers at the same time, no decompiler.

And now, after many, many years, GHIDRA comes GHIDRA . Among the platforms supported by the decompiler is MIPS . Oh, joy! Let's try to decompile something soon! But ... I was waiting for a bummer. PS-X EXE not supported by Hydra. Do not worry, write your own!

Actually code

Enough lyrical digressions, let's write the code. How to create your own loaders for Ghidra , I already had an idea about what I wrote earlier . Therefore, it remains only to find the Memory Map of the first ploy, the addresses of registers and, you can collect and load binaries. No sooner said than done.

The code was ready, registers and regions were added and recognized, but there was still a big white spot in the field of calls to library functions and BIOS functions. And, unfortunately, Hydra did not have FLIRT support. If not, let's add.

The FLIRT format of the signatures is known and described in the pat.txt file, which can be found in the Ida SDK. Also, Ida has a utility for creating these signatures specifically from the Playstation library files, and is called: ppsx . I downloaded the SDK for a PsyQ Playstation Development Kit called PsyQ Playstation Development Kit , I found lib files there and tried to create at least some signatures from them - successfully. It turns out tekstovichok in which each line has a specific format. It remains to write the code that will parse these lines, and apply them to the code.

Pat parser

Since each line has a specific format, it will be logical to write a regular expression. It turned out like this:

 private static final Pattern linePat = Pattern.compile("^((?:[0-9A-F\\.]{2})+) ([0-9A-F]{2}) ([0-9A-F]{4}) ([0-9A-F]{4}) ((?:[:\\^][0-9A-F]{4}@? [\\.\\w]+ )+)((?:[0-9A-F\\.]{2})+)?$");

Well, to select later in the list of modules a separate offset, type, and function name, we write a separate regexp:

 private static final Pattern modulePat = Pattern.compile("([:\\^][0-9A-F]{4}@?) ([\\.\\w]+) ");

Now let's go through the components of each signature separately:

First comes the hex-sequence of bytes ( 0-9A-F ), where some of them can be any (the dot character "."). Therefore, we create a class that will store such a sequence. I called it MaskedBytes :

MaskedBytes.java

 package pat; public class MaskedBytes { private final byte[] bytes, masks; public final byte[] getBytes() { return bytes; } public final byte[] getMasks() { return masks; } public final int getLength() { return bytes.length; } public MaskedBytes(byte[] bytes, byte[] masks) { this.bytes = bytes; this.masks = masks; } public static MaskedBytes extend(MaskedBytes src, MaskedBytes add) { return extend(src, add.getBytes(), add.getMasks()); } public static MaskedBytes extend(MaskedBytes src, byte[] addBytes, byte[] addMasks) { int length = src.getBytes().length; byte[] tmpBytes = new byte[length + addBytes.length]; byte[] tmpMasks = new byte[length + addMasks.length]; System.arraycopy(src.getBytes(), 0, tmpBytes, 0, length); System.arraycopy(addBytes, 0, tmpBytes, length, addBytes.length); System.arraycopy(src.getMasks(), 0, tmpMasks, 0, length); System.arraycopy(addMasks, 0, tmpMasks, length, addMasks.length); return new MaskedBytes(tmpBytes, tmpMasks); } }

The length of the block from which the CRC16 is calculated.
CRC16 , which uses its own polynomial ( 0x8408 ):

CRC16 counting code

 public static boolean checkCrc16(byte[] bytes, short resCrc) { if ( bytes.length == 0 ) return true; int crc = 0xFFFF; for (int i = 0; i < bytes.length; ++i) { int a = bytes[i]; for (int x = 0; x < 8; ++x) { if (((crc ^ a) & 1) != 0) { crc = (crc >> 1) ^ 0x8408; } else { crc >>= 1; } a >>= 1; } } crc = ~crc; int x = crc; crc = (crc << 8) | ((x >> 8) & 0xFF); crc &= 0xFFFF; return (short)crc == resCrc; }

The total length of the "module" in bytes.
List of global names (what we need).
List of links to other names (also needed).
Tail bytes.

Each name in the module has a specific type and offset from the beginning. The type can be indicated by one of the symbols::, ^, @, depending on the type:

" : NAME ": global name. It was for the sake of such names that I started everything;
" : NAME @ ": local name / label. You can not designate, but let it be;
" ^ NAME ": link to the name.

On the one hand, everything is simple, but, the link can easily be not a reference to a function (and, accordingly, the jump will be relative), but to a global variable. What, you say, is the problem? And it is that in PSX it’s impossible to push a whole DWORD into a register with one instruction. For this you need to download it in the form of halves. The fact is, in MIPS size of the instruction is limited to four bytes. And, it would seem, you just need to first get one half of one instruction, and then disassemble the next - and get the other half. But everything is not so simple. The first half can be downloaded instructions 5 back, and the link in the module will be given only after loading its second half. I had to write a sophisticated parser (probably it can be modified).

As a result, we create enum for three types of names:

ModuleType.java

 package pat; public enum ModuleType { GLOBAL_NAME, LOCAL_NAME, REF_NAME; public boolean isGlobal() { return this == GLOBAL_NAME; } public boolean isLocal() { return this == LOCAL_NAME; } public boolean isReference() { return this == REF_NAME; } @Override public String toString() { if (isGlobal()) { return "Global"; } else if (isLocal()) { return "Local"; } else { return "Reference"; } } }

Let's write code that converts hexadecimal text sequences and points to MaskedBytes type:

hexStringToMaskedBytesArray ()

 private MaskedBytes hexStringToMaskedBytesArray(String s) { MaskedBytes res = null; if (s != null) { int len = s.length(); byte[] bytes = new byte[len / 2]; byte[] masks = new byte[len / 2]; for (int i = 0; i < len; i += 2) { char c1 = s.charAt(i); char c2 = s.charAt(i + 1); masks[i / 2] = (byte) ( (((c1 == '.') ? 0x0 : 0xF) << 4) | (((c2 == '.') ? 0x0 : 0xF) << 0) ); bytes[i / 2] = (byte) ( (((c1 == '.') ? 0x0 : Character.digit(c1, 16)) << 4) | (((c2 == '.') ? 0x0 : Character.digit(c2, 16)) << 0) ); } res = new MaskedBytes(bytes, masks); } return res; }

You can already think about the class that will store information about each individual function: the name of the function, the offset in the module, and the type:

ModuleData.java

 package pat; public class ModuleData { private final long offset; private final String name; private final ModuleType type; public ModuleData(long offset, String name, ModuleType type) { this.offset = offset; this.name = name; this.type = type; } public final long getOffset() { return offset; } public final String getName() { return name; } public final ModuleType getType() { return type; } }

And finally: a class that will store everything that is specified in each line of the pat file, that is, bytes, crc, a list of names with offsets:

SignatureData.java

 package pat; import java.util.Arrays; import java.util.List; public class SignatureData { private final MaskedBytes templateBytes, tailBytes; private MaskedBytes fullBytes; private final int crc16Length; private final short crc16; private final int moduleLength; private final List<ModuleData> modules; public SignatureData(MaskedBytes templateBytes, int crc16Length, short crc16, int moduleLength, List<ModuleData> modules, MaskedBytes tailBytes) { this.templateBytes = this.fullBytes = templateBytes; this.crc16Length = crc16Length; this.crc16 = crc16; this.moduleLength = moduleLength; this.modules = modules; this.tailBytes = tailBytes; if (this.tailBytes != null) { int addLength = moduleLength - templateBytes.getLength() - tailBytes.getLength(); byte[] addBytes = new byte[addLength]; byte[] addMasks = new byte[addLength]; Arrays.fill(addBytes, (byte)0x00); Arrays.fill(addMasks, (byte)0x00); this.fullBytes = MaskedBytes.extend(this.templateBytes, addBytes, addMasks); this.fullBytes = MaskedBytes.extend(this.fullBytes, tailBytes); } } public MaskedBytes getTemplateBytes() { return templateBytes; } public MaskedBytes getTailBytes() { return tailBytes; } public MaskedBytes getFullBytes() { return fullBytes; } public int getCrc16Length() { return crc16Length; } public short getCrc16() { return crc16; } public int getModuleLength() { return moduleLength; } public List<ModuleData> getModules() { return modules; } }

Now the main thing: write code to create all these classes:

Parsing a single pat line

 private List<ModuleData> parseModuleData(String s) { List<ModuleData> res = new ArrayList<ModuleData>(); if (s != null) { Matcher m = modulePat.matcher(s); while (m.find()) { String __offset = m.group(1); ModuleType type = __offset.startsWith(":") ? ModuleType.GLOBAL_NAME : ModuleType.REF_NAME; type = (type == ModuleType.GLOBAL_NAME && __offset.endsWith("@")) ? ModuleType.LOCAL_NAME : type; String _offset = __offset.replaceAll("[:^@]", ""); long offset = Integer.parseInt(_offset, 16); String name = m.group(2); res.add(new ModuleData(offset, name, type)); } } return res; }

Parsing all lines of a pat file

 private void parse(List<String> lines) { modulesCount = 0L; signatures = new ArrayList<SignatureData>(); int linesCount = lines.size(); monitor.initialize(linesCount); monitor.setMessage("Reading signatures..."); for (int i = 0; i < linesCount; ++i) { String line = lines.get(i); Matcher m = linePat.matcher(line); if (m.matches()) { MaskedBytes pp = hexStringToMaskedBytesArray(m.group(1)); int ll = Integer.parseInt(m.group(2), 16); short ssss = (short)Integer.parseInt(m.group(3), 16); int llll = Integer.parseInt(m.group(4), 16); List<ModuleData> modules = parseModuleData(m.group(5)); MaskedBytes tail = null; if (m.group(6) != null) { tail = hexStringToMaskedBytesArray(m.group(6)); } signatures.add(new SignatureData(pp, ll, ssss, llll, modules, tail)); modulesCount += modules.size(); } monitor.incrementProgress(1); } }

Function creation code where one of the signatures was recognized:

Create function

 private static void disasmInstruction(Program program, Address address) { DisassembleCommand cmd = new DisassembleCommand(address, null, true); cmd.applyTo(program, TaskMonitor.DUMMY); } public static void setFunction(Program program, FlatProgramAPI fpa, Address address, String name, boolean isFunction, boolean isEntryPoint, MessageLog log) { try { if (fpa.getInstructionAt(address) == null) disasmInstruction(program, address); if (isFunction) { fpa.createFunction(address, name); } if (isEntryPoint) { fpa.addEntryPoint(address); } if (isFunction && program.getSymbolTable().hasSymbol(address)) { return; } program.getSymbolTable().createLabel(address, name, SourceType.IMPORTED); } catch (InvalidInputException e) { log.appendException(e); } }

The most difficult place, as mentioned earlier, is the calculation of a link to another name / variable (perhaps the code needs to be improved):

Reference counting

 public static void setInstrRefName(Program program, FlatProgramAPI fpa, PseudoDisassembler ps, Address address, String name, MessageLog log) { ReferenceManager refsMgr = program.getReferenceManager(); Reference[] refs = refsMgr.getReferencesFrom(address); if (refs.length == 0) { disasmInstruction(program, address); refs = refsMgr.getReferencesFrom(address); if (refs.length == 0) { refs = refsMgr.getReferencesFrom(address.add(4)); if (refs.length == 0) { refs = refsMgr.getFlowReferencesFrom(address.add(4)); Instruction instr = program.getListing().getInstructionAt(address.add(4)); if (instr == null) { disasmInstruction(program, address.add(4)); instr = program.getListing().getInstructionAt(address.add(4)); if (instr == null) { return; } } FlowType flowType = instr.getFlowType(); if (refs.length == 0 && !(flowType.isJump() || flowType.isCall() || flowType.isTerminal())) { return; } refs = refsMgr.getReferencesFrom(address.add(8)); if (refs.length == 0) { return; } } } } try { program.getSymbolTable().createLabel(refs[0].getToAddress(), name, SourceType.IMPORTED); } catch (InvalidInputException e) { log.appendException(e); } }

And, the final touch - apply signatures:

applySignatures ()

 public void applySignatures(ByteProvider provider, Program program, Address imageBase, Address startAddr, Address endAddr, MessageLog log) throws IOException { BinaryReader reader = new BinaryReader(provider, false); PseudoDisassembler ps = new PseudoDisassembler(program); FlatProgramAPI fpa = new FlatProgramAPI(program); monitor.initialize(getAllModulesCount()); monitor.setMessage("Applying signatures..."); for (SignatureData sig : signatures) { MaskedBytes fullBytes = sig.getFullBytes(); MaskedBytes tmpl = sig.getTemplateBytes(); Address addr = program.getMemory().findBytes(startAddr, endAddr, fullBytes.getBytes(), fullBytes.getMasks(), true, TaskMonitor.DUMMY); if (addr == null) { monitor.incrementProgress(sig.getModules().size()); continue; } addr = addr.subtract(imageBase.getOffset()); byte[] nextBytes = reader.readByteArray(addr.getOffset() + tmpl.getLength(), sig.getCrc16Length()); if (!PatParser.checkCrc16(nextBytes, sig.getCrc16())) { monitor.incrementProgress(sig.getModules().size()); continue; } addr = addr.add(imageBase.getOffset()); List<ModuleData> modules = sig.getModules(); for (ModuleData data : modules) { Address _addr = addr.add(data.getOffset()); if (data.getType().isGlobal()) { setFunction(program, fpa, _addr, data.getName(), data.getType().isGlobal(), false, log); } monitor.setMessage(String.format("%s function %s at 0x%08X", data.getType(), data.getName(), _addr.getOffset())); monitor.incrementProgress(1); } for (ModuleData data : modules) { Address _addr = addr.add(data.getOffset()); if (data.getType().isReference()) { setInstrRefName(program, fpa, ps, _addr, data.getName(), log); } monitor.setMessage(String.format("%s function %s at 0x%08X", data.getType(), data.getName(), _addr.getOffset())); monitor.incrementProgress(1); } } }

Here you can talk about one interesting function: findBytes() . With its help, you can search for specific sequences of bytes, with the specified bit masks for each byte. The method is called like this:

 Address addr = program.getMemory().findBytes(startAddr, endAddr, bytes, masks, forward, TaskMonitor.DUMMY);

The result is the address from which the bytes begin, or null .

We write analyzer

Let's do it nicely, and we will not use signatures if we don’t want, but let us choose this step to the user. To do this, you will need to write your own code analyzer (you could see similar ones in this list - all of them, yes):

So, to get into this list, you will need to inherit from the AbstractAnalyzer class and override some methods:

Constructor. It will have to call the constructor of the base class with the name, description of the analyzer, and its type (more on that later). I look something like this:

 public PsxAnalyzer() { super("PSYQ Signatures", "PSX signatures applier", AnalyzerType.INSTRUCTION_ANALYZER); }

getDefaultEnablement() . Determines whether our analyzer is always available, or only when certain conditions are met (for example, if our loader is used).
canAnalyze() . Is it possible to use this analyzer at all in a downloadable binary file?
Points 2 and 3 can, in principle, be checked with one single function:

 public static boolean isPsxLoader(Program program) { return program.getExecutableFormat().equalsIgnoreCase(PsxLoader.PSX_LOADER); }

Where PsxLoader.PSX_LOADER stores the name of the loader, and is defined earlier in it.

Total, we have:

 @Override public boolean getDefaultEnablement(Program program) { return isPsxLoader(program); } @Override public boolean canAnalyze(Program program) { return isPsxLoader(program); }

registerOptions() . It is not necessary to redefine this method, but if we need to ask the user, for example, the path to the pat-file, before analyzing, then this is best done in this method. We get:

 private static final String OPTION_NAME = "PSYQ PAT-File Path"; private File file = null; @Override public void registerOptions(Options options, Program program) { try { file = Application.getModuleDataFile("psyq4_7.pat").getFile(false); } catch (FileNotFoundException e) { } options.registerOption(OPTION_NAME, OptionType.FILE_TYPE, file, null, "PAT-File (FLAIR) created from PSYQ library files"); }

Here it is necessary to clarify. The static getModuleDataFile() method of the Application class returns the full path to the file in the data directory that is in the tree of our module, and can store any necessary files that we want to refer to later.

Well, the registerOption() method registerOption() an option with the name specified in OPTION_NAME , the File type (that is, the user will have the opportunity to select a file through a common dialog box), default value and description.

Further. Because then we will not have a normal opportunity to refer to the registered option; you will need to override the optionsChanged() method:

 @Override public void optionsChanged(Options options, Program program) { super.optionsChanged(options, program); file = options.getFile(OPTION_NAME, file); }

Here we just update the global variable according to the new value.

The added() method. Now the main thing: the method that will be called when the analyzer is started. In it we will receive a list of addresses available for analysis, but we need only those that contain the code. Therefore, you need to filter. Summary Code:

Added method

 @Override public boolean added(Program program, AddressSetView set, TaskMonitor monitor, MessageLog log) throws CancelledException { if (file == null) { return true; } Memory memory = program.getMemory(); AddressRangeIterator it = memory.getLoadedAndInitializedAddressSet().getAddressRanges(); while (!monitor.isCancelled() && it.hasNext()) { AddressRange range = it.next(); try { MemoryBlock block = program.getMemory().getBlock(range.getMinAddress()); if (block.isInitialized() && block.isExecute() && block.isLoaded()) { PatParser pat = new PatParser(file, monitor); RandomAccessByteProvider provider = new RandomAccessByteProvider(new File(program.getExecutablePath())); pat.applySignatures(provider, program, block.getStart(), block.getStart(), block.getEnd(), log); } } catch (IOException e) { log.appendException(e); return false; } } return true; }

Here we go through the list of addresses that are executable, and try to apply signatures there.

Conclusions and finals

Look like that's it. In fact, there is nothing super complicated here. There are examples, the community is alive, you can calmly ask about what is not clear, as long as you write code. The bottom line: the Playstation 1 workload loader and analyzer.

All source codes are available here: ghidra_psx_ldr
Releases here: Releases

Source: https://habr.com/ru/post/448098/

All Articles

GHIDRA, Playstation 1 executables, FLIRT signatures and PsyQ

Actually code

Pat parser

We write analyzer

Conclusions and finals

More articles: