CGEN for IDA Pro

It all started when I wanted to analyze some MeP code. Usually, I do all my disassembly in IDA Pro, but this is one of the few processors that isn’t supported by IDA. Luckily, there is objdump for this obscure architecture. After fumbling around for a bit, I was convinced that porting the disassembler to IDA would be a much better use of my time than manually drawing arrows and annotating the objdump output.

Process

Turns out, there isn’t too many resources online on writing IDA processor modules. The readme of the SDK was minimal (it tells you to read the sample code and the header files) and refers to two documents: an online guide that is now gone and The IDA Pro Book by Chris Eagle. Opening the book to the chapter on writing processor modules, you’re greeted with a dire warning to turn back (along with a note about the lack of documentation) as many have tried and failed.

One of the reasons why writing a processor module is so challenging is that the processor_t struct contains 56 fields that must be initialized, and 26 of those fields are function pointers, while 1 of the fields is a pointer to an array of one or more struct pointers that each point to a different type of struct (asm_t) that contains 59 fields requiring initialization. Easy enough, right?

Chris Eagle, The IDA Book 2nd Edition

Well, I’m not easily discouraged, so I read on and familiarized myself with the process of creating a processor module. I’m not going to describe the process here in detail because Chris does a great job in the book, but I’ll give a brief outline

IDA Processor Module

There are four components of a processor module. The “analyzer” parses the raw bits of the machine code and generates the information about an instruction. The “emulator” uses that information to help IDA with further analysis. For example, if an instruction references data, your module can tell IDA to look for data at that address. If the instruction performs a function call, your module can tell IDA to create a function there. Contrary to its name, it does not actually “emulate” the instruction set. The “outputter” does just that: given the data generated from the analyzer, it prints out the disassembly to the user. Finally, there’s the architectural information, which is not a component mentioned elsewhere but I consider it one. This is not code, but static structures that tell IDA information such as the names of the registers, the mnemonic of the instructions, alignments and so on.

CGEN

The binutils (objdump) for MeP are machine generated by CGEN. CGEN attempts to abstract away the task of writing CPU tools (assemblers, disassemblers, simulators, etc) into writing CPU definitions. The definitions describe the CPU (including the hardware elements, the instruction sets, operands, etc) with the Scheme language. CGEN takes the definition and outputs C/C++ code for all the needed CPU tools. Originally I wanted to avoid CGEN and just wrap the (generated) binutils code to an IDA module (à la Hexagon). In theory, your module does not have to follow the convention laid out above. You can make the analyzer record the raw bits, the emulator do nothing, and the outputter use binutils to generate a complete line and print it out. However, in doing so, you essentially lose most of the power of IDA (finding xrefs, stack variables, etc). It would also be a shame to not use all the information given to us by the CGEN CPU definitions. These definitions (in theory) are strong enough to generate RTL code to implement the processor, so we would like to give as much of this information to IDA as possible.

CGEN Generators

The generators themselves (CGEN docs refer to them as “applications”) are also written in Scheme, in the Guile dialect. I have never written a line of functional code before, so it took me a day to understand the relatively small codebase. CGEN has its own object system that they call COS. Everything defined in the CPU descriptions becomes objects, and each generator gives these objects methods to print themselves out. For example: the simulator might give the operand object a “generate code to get value” method. Then a call to generate the semantics of an instruction into C code would use these objects’ methods. Like a true software engineer, I cut out functions from the generators for the simulator, disassembler, and architecture description and stitched them together with my own code to generate various components of an IDA module.

The analyzer used, as its base, the simulator instruction decode generator. I had to modify CGEN to record the order of operands as specified by the instruction syntax (the only modification to CGEN itself, everything else are additions). Then, I overwrote the simulator’s method definitions for extracting the operands from the instruction with code to populate the “cmd” structure in IDA (which requires the operands to be ordered).

The emulator used the simulator model generator as its base and was the most difficult to write (in terms of code complexity). The one major issue here is that while the generated simulator expects code to run in order and have state information stored, the IDA emulator does not store state information and IDA does not guarantee that your emulator will be called in the order that the instructions appear. That means we cannot make any assumptions about the state and our emulator can only make decisions based on the instruction alone. Because we only care about finding data and code references with the emulator, we can make the following simplifications:

  • Any conditionals will have the condition stripped out and all paths will be taken
  • Using any values from registers will stop the emulator and return immediately
  • Setting any values to registers will evaluate the value but discard the result

The first point allows for xrefs to be found regardless of the condition. A conditional branch, for example, will allow for a code xref to be generated. The second point is there because we do not know the state, so any dependencies on a register value that’s not already stripped out will make finding a xref difficult. In theory, we can still find offset xrefs this way, but we would have to know that only additions/subtractions are used and only a single register is used and that adds a lot of complexity. The third point allows memory reads to be captured. With those simplifications in place, we know that any memory reads/writes and any PC reads/writes that are found without knowing the state can be turned into xrefs.

The outputter used the syntax parser (binutils’ opcode builder) as its base. It reads the instruction definition in order to output the right orderings of parenthesis, commas, and so on. I just replaced the generate print methods of the hardware objects to generate calls to IDA output functions.

Results

MeP executable loaded and recognized by IDA Pro

MeP executable loaded and recognized by IDA Pro. All the blue is a result of the auto-analysis.

At the most basic level, the generated modules outputs what you would expect from objdump. The analyzer will find the right type for the operands (if possible). The emulator tries to find all constant addresses and adds xrefs to them (code and data). The outputter will print all instructions correctly, and the operands with the right type/size/name if needed.

The main thing it doesn’t do right now is keeping track of the stack pointer. It also does not identify if branches are jumps or calls (requiring CF_CALL flag). It does not identify if an instruction does not continue flow (requiring the CF_STOP flag) (it’s actually trivial to add this, but harder to add it without introducing emulator code to other generators. Since it’s easy to identify the instructions by hand, I decided to leave it out).

Usage

Once you generate the IDA module components, you still need to manually write the processor_t structure, the notify() function (optional), and implement and special print functions (as defined by the CPU definition). Then you can copy the CGEN headers from binutils and compile it with the IDA SDK. Take a look at the MeP module as an example. You can reuse most of the non-generated code as-is (just change some strings and constants). If you run into any issues, feel free to contact me. I haven’t tested this on anything other than MeP because of laziness but I hope the code works more generally.

Downloads

The CGEN code is here and the Toshiba MeP module is here. The MeP module has basic stack tracking and call recognition added manually. When I have the time, I’ll port the rest of the CGEN supported modules that IDA does not support over.

On the future of Rejuvenate

Since, the announcement ten days ago, Rejuvenate received tons of positive reception and thousands of downloads. Progress on both SDK projects is moving at fast speeds. There are already Vita homebrew projects in the works. No doubt, there are more to come. However, Sony’s response has not been positive. Yesterday, Sony released firmware 3.52 which revokes access to PSM DevAssistant and PSM Unity DevAssistant along with a friendly request for PSM developers to delete the DevAssistant from their devices. This means that if you ever want to run homebrew on your Vita, regardless of your opinions on the current limitations and regardless of your ability to use PSM, do NOT update to 3.52.

CHIP-8 emulator by @xerpi, picture by @Chihab_rm

CHIP-8 emulator by @xerpi, picture by @Chihab_rm

Continue reading

Hacking the PS Vita

The following was taken from a series of unpublished posts I wrote back in September 2012 (almost three years ago). The posts not only detail the exploit I found but also the thought process that led me to it. I intended to publish it as soon as the exploit was patched by Sony or after someone found another exploit on the system by examining the memory dumps. However, as of today, the PSM privilege escalation is still the only known way to execute native ARM code on the PS Vita. Apologizes for the outdated references.

Pathfinding

To start, lets brainstorm the different ways we can attack this black box of a device. Typically, a new device is unlocked in a process that usually involves: 1) dumping the device’s RAM/ROM/NAND, 2) analyzing the dumps for information and vulnerabilities, 3) using the vulnerability to create a tool that allows others to easily gain root access.
Continue reading

Rejuvenate Public Beta Release

Rejuvenate, announced last week allows users to install unofficial applications and games (homebrew) onto their PS Vita device. Please read that announcement post for more information. Today, the public beta is ready for testing.

The beta is only for those who were able to obtain a publisher’s license (whose application was approved by Sony before the deadline on May 31). For the rest of you who do not have the publisher license (and no friends with a publisher license) but only the DevAssist app on your Vita, please wait for further instructions to come.

Continue reading

Credits to egarrote from elotrolado.net for the logo!

Rejuvenate: Native homebrew for PSVita

(Sadly, they did not give me a spot at the Sony E3 conference, so I have to make do with this blog post.) I am excited to announce Rejuvenate, a native homebrew platform for PS Vita. The tools that will be released through the next couple of weeks will allow developers (not in contract with Sony) to develop and test games, apps, and more on the PS Vita. These unofficial software can run on any PS Vita handheld device without approval by Sony. These tools cannot enable pirated or backup games to run (I’m not just saying this… the exploits used does not have enough privilege to enable such tasks). Rejuvenate requires PlayStation Mobile Development Assistant to be installed on your Vita! Sony will remove this from PSN soon, so if you wish to ever run homebrew apps on your PS Vita, you must download this app now!

Continue reading

Calling all coders: We need you to help create an open Vita SDK!

One of largest barrier to native PS Vita homebrew is the lack of an open toolchain and SDK. Essentially, we need something like pspsdk for the Vita. The reason why we don’t have it is because there are people who have an understanding of how the Vita’s executable format works but lack the time to code up the tools and there are people who have the time and ability to create such tools but lack the knowledge of Vita’s internals. The solution, I believe is to publish a comprehensive document detailing how the Vita’s executable format is laid out and the requirements for an open toolchain. Anyone with coding skills can now work on an open SDK; no Vita knowledge required! Continue reading