CGEN for IDA Pro

It all started when I wanted to analyze some MeP code. Usually, I do all my disassembly in IDA Pro, but this is one of the few processors that isn’t supported by IDA. Luckily, there is objdump for this obscure architecture. After fumbling around for a bit, I was convinced that porting the disassembler to IDA would be a much better use of my time than manually drawing arrows and annotating the objdump output.

Process

Turns out, there isn’t too many resources online on writing IDA processor modules. The readme of the SDK was minimal (it tells you to read the sample code and the header files) and refers to two documents: an online guide that is now gone and The IDA Pro Book by Chris Eagle. Opening the book to the chapter on writing processor modules, you’re greeted with a dire warning to turn back (along with a note about the lack of documentation) as many have tried and failed.

One of the reasons why writing a processor module is so challenging is that the processor_t struct contains 56 fields that must be initialized, and 26 of those fields are function pointers, while 1 of the fields is a pointer to an array of one or more struct pointers that each point to a different type of struct (asm_t) that contains 59 fields requiring initialization. Easy enough, right?

Chris Eagle, The IDA Book 2nd Edition

Well, I’m not easily discouraged, so I read on and familiarized myself with the process of creating a processor module. I’m not going to describe the process here in detail because Chris does a great job in the book, but I’ll give a brief outline

IDA Processor Module

There are four components of a processor module. The “analyzer” parses the raw bits of the machine code and generates the information about an instruction. The “emulator” uses that information to help IDA with further analysis. For example, if an instruction references data, your module can tell IDA to look for data at that address. If the instruction performs a function call, your module can tell IDA to create a function there. Contrary to its name, it does not actually “emulate” the instruction set. The “outputter” does just that: given the data generated from the analyzer, it prints out the disassembly to the user. Finally, there’s the architectural information, which is not a component mentioned elsewhere but I consider it one. This is not code, but static structures that tell IDA information such as the names of the registers, the mnemonic of the instructions, alignments and so on.

CGEN

The binutils (objdump) for MeP are machine generated by CGEN. CGEN attempts to abstract away the task of writing CPU tools (assemblers, disassemblers, simulators, etc) into writing CPU definitions. The definitions describe the CPU (including the hardware elements, the instruction sets, operands, etc) with the Scheme language. CGEN takes the definition and outputs C/C++ code for all the needed CPU tools. Originally I wanted to avoid CGEN and just wrap the (generated) binutils code to an IDA module (à la Hexagon). In theory, your module does not have to follow the convention laid out above. You can make the analyzer record the raw bits, the emulator do nothing, and the outputter use binutils to generate a complete line and print it out. However, in doing so, you essentially lose most of the power of IDA (finding xrefs, stack variables, etc). It would also be a shame to not use all the information given to us by the CGEN CPU definitions. These definitions (in theory) are strong enough to generate RTL code to implement the processor, so we would like to give as much of this information to IDA as possible.

CGEN Generators

The generators themselves (CGEN docs refer to them as “applications”) are also written in Scheme, in the Guile dialect. I have never written a line of functional code before, so it took me a day to understand the relatively small codebase. CGEN has its own object system that they call COS. Everything defined in the CPU descriptions becomes objects, and each generator gives these objects methods to print themselves out. For example: the simulator might give the operand object a “generate code to get value” method. Then a call to generate the semantics of an instruction into C code would use these objects’ methods. Like a true software engineer, I cut out functions from the generators for the simulator, disassembler, and architecture description and stitched them together with my own code to generate various components of an IDA module.

The analyzer used, as its base, the simulator instruction decode generator. I had to modify CGEN to record the order of operands as specified by the instruction syntax (the only modification to CGEN itself, everything else are additions). Then, I overwrote the simulator’s method definitions for extracting the operands from the instruction with code to populate the “cmd” structure in IDA (which requires the operands to be ordered).

The emulator used the simulator model generator as its base and was the most difficult to write (in terms of code complexity). The one major issue here is that while the generated simulator expects code to run in order and have state information stored, the IDA emulator does not store state information and IDA does not guarantee that your emulator will be called in the order that the instructions appear. That means we cannot make any assumptions about the state and our emulator can only make decisions based on the instruction alone. Because we only care about finding data and code references with the emulator, we can make the following simplifications:

  • Any conditionals will have the condition stripped out and all paths will be taken
  • Using any values from registers will stop the emulator and return immediately
  • Setting any values to registers will evaluate the value but discard the result

The first point allows for xrefs to be found regardless of the condition. A conditional branch, for example, will allow for a code xref to be generated. The second point is there because we do not know the state, so any dependencies on a register value that’s not already stripped out will make finding a xref difficult. In theory, we can still find offset xrefs this way, but we would have to know that only additions/subtractions are used and only a single register is used and that adds a lot of complexity. The third point allows memory reads to be captured. With those simplifications in place, we know that any memory reads/writes and any PC reads/writes that are found without knowing the state can be turned into xrefs.

The outputter used the syntax parser (binutils’ opcode builder) as its base. It reads the instruction definition in order to output the right orderings of parenthesis, commas, and so on. I just replaced the generate print methods of the hardware objects to generate calls to IDA output functions.

Results

MeP executable loaded and recognized by IDA Pro

MeP executable loaded and recognized by IDA Pro. All the blue is a result of the auto-analysis.

At the most basic level, the generated modules outputs what you would expect from objdump. The analyzer will find the right type for the operands (if possible). The emulator tries to find all constant addresses and adds xrefs to them (code and data). The outputter will print all instructions correctly, and the operands with the right type/size/name if needed.

The main thing it doesn’t do right now is keeping track of the stack pointer. It also does not identify if branches are jumps or calls (requiring CF_CALL flag). It does not identify if an instruction does not continue flow (requiring the CF_STOP flag) (it’s actually trivial to add this, but harder to add it without introducing emulator code to other generators. Since it’s easy to identify the instructions by hand, I decided to leave it out).

Usage

Once you generate the IDA module components, you still need to manually write the processor_t structure, the notify() function (optional), and implement and special print functions (as defined by the CPU definition). Then you can copy the CGEN headers from binutils and compile it with the IDA SDK. Take a look at the MeP module as an example. You can reuse most of the non-generated code as-is (just change some strings and constants). If you run into any issues, feel free to contact me. I haven’t tested this on anything other than MeP because of laziness but I hope the code works more generally.

Downloads

The CGEN code is here and the Toshiba MeP module is here. The MeP module has basic stack tracking and call recognition added manually. When I have the time, I’ll port the rest of the CGEN supported modules that IDA does not support over.

Opening Up CARDBOARD: Crafting an American New 3DS (non-XL)

Last time, I analyzed now update checks worked on the 3DS. That was a straightforward process. CARDBOARD (known colloquially as “System Transfer”) is a bundle of complexity with no less than three separate servers communicating with each other as well as the device. A custom proprietary protocol is used for 3DS to 3DS communication. Finally, we have multiple unique identifiers the console uses to identify itself with Nintendo (serial, certificates, console id, account id, etc). I can’t imagine this will be comprehensive, but I hope that whoever is reading can gain new insight on the complexity of the 3DS ecosystem. Continue reading

Reversing Gateway Ultra Stage 3: Owning ARM9 Kernel

First, some background: the 3DS has two main processors. Last time, I went over how Gateway Ultra exploited the ARM11 processor. However, most of the interesting (from a security perspective) functionalities are handled by a separate ARM946 processor. The ARM9 processor is in charge of the initial system bootup, some system services, and most importantly all the cryptographic functions such as encryption/decryption and signature/verification. In this post, we will look at how to run (privileged) code on the ARM9 processor with privileged access to the ARM11 processor. Please note that this writeup is a work in progress as I have not completely figured out how the exploit works (only the main parts of it). Specifically there are a couple of things that I do not know if it is done for the sake of the exploit or if it is done purely for stability or obfuscation. From a developer’s perspective, it doesn’t matter because as long as you perform all the steps, you will achieve code execution. But from a hacker’s perspective, the information is not complete unless all aspects are known and understood. I am posting this now as-is because I do not know when I’ll have time to work on the 3DS again. However, when I do, I will update the post and hopefully clear up all confusion. Continue reading

Reversing Gateway Ultra Stage 2: Owning ARM11 Kernel

It’s been a couple of days since my initial analysis of Gateway Ultra, released last week to enable piracy on 3DS. I spent most of this time catching up on the internals of the 3DS. I can’t thank the maintainers of 3dbrew enough (especially yellows8, the master of 3DS reversing) for the amount of detailed and technical knowledge found on the wiki. The first stage was a warmup and did not require any specific 3DS knowledge to reverse. The problem with the second stage is that while it is easy to see the exploit triggered and code to run, the actual exploit itself was not as clear. I looked at all the function calls made and made a couple of hypothesis of where the vulnerability resided, and reversed each function to the end to test my hypothesis. Although there was many dead ends and false leads, the process of reversing all these functions solidified my understanding of the system. Continue reading

Reversing Gateway Ultra First Stage (Part 2)

When we last left off, we looked at the ROP code that loaded a larger second-part of the payload. Now we will walk through what was loaded and how userland native code execution was achieved. I am still an amateur at 3DS hacking so I am sure to get some things wrong, so please post any corrections you have in the comments and I will update the post as needed. Continue reading

Reversing Gateway Ultra First Stage (Part 1)

And now for something completely different…

As a break from Vita hacking, I’ve decided to play around with the Nintendo 3DS exploit released by Gateway yesterday. The 3DS is a much easier console to hack, but unfortunately, the scene is dominated by a piracy company who, ironically, implement various “features” to protect their intellectual property (one such feature purposely bricks any user of a cloned piracy cart–and also “legitimate” users too). Ethics aside, it would be useful to reverse Gateway’s exploits and use them for homebrew loading so I took a quick look at it. The first stage of the exploit is an entry-point into the system that allows code to run in the unprivileged user-mode. It is usually used to exploit a kernel vulnerability, which is the second stage. In the unique case of Gateway, the first stage is broken up into two parts (in order for them to obfuscate their payload). I am only going to look at the first part for now. Continue reading

Unlimited Backgrounding on iOS

Since iOS4, developers have the ability to perform background tasks with some limitations. Background tasks must fit one of the five different categories for background supported apps. Music and streaming apps can be backgrounded as long as they play music. Newsstand apps can wake once a day to download updates. Location aware apps can wake up once in a while to update their position. VOIP apps can have one socket (I found out the hard way that the one socket does not include listener sockets) connected in the background. General apps can request up to 10 minutes to finish some task. While this is enough for most backgrounding purposes, sometimes we need backgrounding for more advanced tasks. Specifically, I wanted to write a HTTP proxy server that runs on the device (in the future, this proxy server will work as an ad-blocking proxy) in the background. I will show you the steps of making this work. Please note that Apple will certainly reject any app that abuses their backgrounding policy so doing so would only be useful for personal and enterprise uses. Continue reading