Cosmo3DS: The CFW nobody wanted

In the last article, I talked about my plan for creating 3DS mods. Now, I will put that plan to the test with a CFW (modified firmware) that nobody wants except me.

The idea for this CFW is that I want

  • Keep my 3DS on the hackable 9.2 firmware but still use the latest system software (emuNAND)
  • Play games region free right from the home menu
  • Change the system region without possibly bricking the device
  • Use the eShop with region changed systems

and only those things. Specifically, I don’t want to patch signatures that allow for installing pirated content (personal choice, I don’t care what you do). I don’t want threads running in the background to support fancy features such as replacing version strings or taking screenshots or in-game menus. So I created my own CFW in two components: a stripped down version of ReiNAND where everything except emuNAND is removed and an implementation of my custom loader which does all the patches. No threads. No glut.

I understand that I’m the only one who needs such a CFW and most people can make do with one of the ones out there. This is mostly written just for my own use and for me to test out my idea of patching the system. However, I’m happy that my N3DS is now a cosmopolitan.

If you want to use it, you need the CFW as well as the custom loader. Then, the important part is injecting the custom loader into the right FIRM file. (If you search for “reinand 3.1 firmware.bin pastebin” you should be able to find it)

3DS Code Injection through “Loader”

I’ve seen many CFWs (custom firmware; actually they’re just modified firmwares) for the 3DS but there seems to be a lack of organization and design in most of them. I believe that without a proper framework for patching the system, writing mods for the 3DS is extremely difficult and usually requires an in depth knowledge of the system even to make simple modifications. So here I present a plan that I hope developers will pick up on and contribute to.

3DS System Firmware: An Introduction 

The 3DS has a microkernel architecture. This means most functionalities are implemented as modules (running as separate processes) and IPC is used by modules to talk to one another. The kernel has limited functionality (basic memory management, threading and synchronization, and access to IO memory) and these basic functions are exposed to modules (where the main work is done). All modules run in userland, but the security is not equal. There are three categories of modules: system, applets, and applications. (This is not an official categorization, as different modules have permission settings based on service-call access, access to other modules, memory space, and so on. However, you can basically group modules by these security properties into these three groups.) System modules do the backend work: installing content, TLS stack, communication with Nintendo’s servers, etc. Applets are system software the user “sees”: home menu, notifications, Nintendo ID login, keyboard, etc. Applications are the rest: settings app, camera app, games, etc.

The reason why it’s hard to write patches or mods for the Nintendo 3DS is because of this modularization of the system. When we take control of the kernel, patching system features is not as simple as finding the right address in memory and overwriting it. Because everything is in different processes (and each have their own virtual memory space), you either have to recognize the kernel layout and parse kernel objects and map the right memory addresses to patch (NTR does this) or you have to stupidly scan all of physical memory (slow and high risk of false positives) for the right thing to patch. There’s additional complications if you’re using a hack that reboots the firmware or requires patching a process that’s currently not running: you have to know when the process launches and modify the code when it does. It is possible to insert hooks when you first gain control in order to run code when processes are launched. However, most of the CFWs out there does something insanely non-practical: it busy-loops in a separate thread and scans all of physical memory for the right thing to patch… ad infinitum.

So we want to do the following

  • Patch code for any module (system, applet, or application) we want easily
  • Ideally we can insert the hook early in the boot process so we can patch most modules BEFORE they start (an example would be to change the region of the system: once the CFG module is loaded, every other module gets the region information from CFG; so we can either patch CFG or patch every other module that uses it)
  • Everything should be fast and use minimal resources (linear scan of physical memory is NOT okay)
  • Ideally the code should not be overly dependent on offsets–you should not have to compile the code separately for each firmware version

3ds-bootLuckily there is a place in the boot-chain we can patch that does just that. The “loader” module is in charge of parsing the CXI executable format and loading the code into memory. That means that if we change “loader” to patch the code after it is read from the CXI file (and decompressed) but before it signals to the kernel to run (schedule) it, we can patch most modules in the system. The kernel bootstraps the five main modules: fs (file IO), loader (CXI loader), pm (process manager), sm (IPC), and pxi (talks to security processor) and then pm uses loader to start the rest of the modules. Other modules can start processes using pm, which in turn uses loader.

So loader is a good place to start if we want flexibility in patching the system and it fits the bill for what we require in loading patches, but there are some complications. First, the loader module does not have permission to access the SD card (where we want to use to store patches). Second, and more importantly, the code to read and parse patches from the SD card and run them when a process is being loaded seems like a good amount of code. If we wish to insert it into the loader module, we might have to move stuff around–a big pain with object code. So we need a better plan than just hijacking “loader.”

[2016/01/19 04:15:01] <shinyquag> You could probably code your own “loader”, I’d wager it’s probably doable

Shinyquag would have won the wager because, luckily, loader is also the smallest module on the system. It has a couple of commands and no complicated logic. It’s small enough (20KB) to completely reverse in a day. Once we rewrite the module, adding functionality is a trivial matter. In fact, since Nintendo uses a lot of boilerplate code so our loader implementation would actually be even more lean. Finally, since this module has not changed much since 2.x, we can likely use it on future firmwares without much changes. However, just because it is doable, does not mean it will be easy…

Debugging without a Debugger

First I reversed the module and rewrote it in C. Then it got difficult. Since this was the first time anyone tried to build a sysmodule with the homebrew development environment, I had to make changes to many parts of the toolchain. Then there’s the problem of testing: we’re replacing a key component of the system very early in the boot process. There’s no iterative development since we cannot isolate the module without ending up rewriting the entire operating system. So, I just wrote the entire module in one sitting and hoped for the best. Unfortunately the best is a black screen and we don’t have access to anything like gdb or even “printf”. The only output we have is that the console boots or it does not boot. After re-reading my code over and over again and fixing bugs each time until I can’t spot anything else obvious, I’ve discovered, thanks to #3dsdev, that prayer was not the only means of debugging this.

Enter XDS, the 3DS emulator we all forgot about. ichfly managed to get it to boot the 3DS home menu a while ago, but development has since ceased (likely because of other more promising emulators in the works). There’s one thing XDS does that the other 3DS emulators does not do is that XDS fully emulates the 5 sysmodules bootstrapped by the kernel (the other guys use HLE). This is perfect because we only need to run “loader” in the boot process and let it interact with the other bootstrapped modules until enough modules are loaded by “loader” that we can consider it working. After porting XDS to OSX, I was able to make use of the generous debug logs (comparing the IPC requests/responses from my loader with stock) to find the final bugs in my loader and get it to boot on the 3DS.

Our loader can now correctly duplicate the functionality of the original but ultimately, we wish to add more features. The most important one is SD card reading. The way that access permissions work is that each module specifies in its CXI header what filesystem devices it has access to. However, in the case of bootstrapped modules, the access fields are ignored. So in order to get “loader” the right permissions to read the SD card, it is not enough to modify the CXI headers. We need to talk with the “fs” module and request the permissions directly (this is what “pm” does with other modules after loader puts the code into memory). This involves reversing enough of the “fs” module to figure out how the permissions are stored and then making the right requests to “fs” to give “loader” those permissions. Since XDS does not emulate the SD card, I had to find a different avenue to test my code. What I decided to do was modify the URL to the Themes Store in the Home Menu to point to my server + the return value of the file IO calls. Then I would load up the Themes Store to find the return value and look through my reversed object code to see what caused the error. This worked well because I did not want to add more code (e.g: framebuffer or networking) just for debugging. (Who debugs the debugger?) With SD card access, future debugging is much easier because we can just write logs to the SD card.

Loader Based CFWs

I believe that using a custom “loader” will make it much easier to write mods for the 3DS. We could see hacks such as a “Homebrew” button in the Home Menu or custom keyboards or custom themes outside of what Nintendo officially supports. We might also see hacks for games similar to HANS but without requiring access to a dump of the game. I hope 3DS developers will pick up on this and make cool mods and hacks for the system.

The code can be found here. Developers should modify patcher.c and replace it with their own implementation.

CGEN for IDA Pro

It all started when I wanted to analyze some MeP code. Usually, I do all my disassembly in IDA Pro, but this is one of the few processors that isn’t supported by IDA. Luckily, there is objdump for this obscure architecture. After fumbling around for a bit, I was convinced that porting the disassembler to IDA would be a much better use of my time than manually drawing arrows and annotating the objdump output.

Process

Turns out, there isn’t too many resources online on writing IDA processor modules. The readme of the SDK was minimal (it tells you to read the sample code and the header files) and refers to two documents: an online guide that is now gone and The IDA Pro Book by Chris Eagle. Opening the book to the chapter on writing processor modules, you’re greeted with a dire warning to turn back (along with a note about the lack of documentation) as many have tried and failed.

One of the reasons why writing a processor module is so challenging is that the processor_t struct contains 56 fields that must be initialized, and 26 of those fields are function pointers, while 1 of the fields is a pointer to an array of one or more struct pointers that each point to a different type of struct (asm_t) that contains 59 fields requiring initialization. Easy enough, right?

Chris Eagle, The IDA Book 2nd Edition

Well, I’m not easily discouraged, so I read on and familiarized myself with the process of creating a processor module. I’m not going to describe the process here in detail because Chris does a great job in the book, but I’ll give a brief outline

IDA Processor Module

There are four components of a processor module. The “analyzer” parses the raw bits of the machine code and generates the information about an instruction. The “emulator” uses that information to help IDA with further analysis. For example, if an instruction references data, your module can tell IDA to look for data at that address. If the instruction performs a function call, your module can tell IDA to create a function there. Contrary to its name, it does not actually “emulate” the instruction set. The “outputter” does just that: given the data generated from the analyzer, it prints out the disassembly to the user. Finally, there’s the architectural information, which is not a component mentioned elsewhere but I consider it one. This is not code, but static structures that tell IDA information such as the names of the registers, the mnemonic of the instructions, alignments and so on.

CGEN

The binutils (objdump) for MeP are machine generated by CGEN. CGEN attempts to abstract away the task of writing CPU tools (assemblers, disassemblers, simulators, etc) into writing CPU definitions. The definitions describe the CPU (including the hardware elements, the instruction sets, operands, etc) with the Scheme language. CGEN takes the definition and outputs C/C++ code for all the needed CPU tools. Originally I wanted to avoid CGEN and just wrap the (generated) binutils code to an IDA module (à la Hexagon). In theory, your module does not have to follow the convention laid out above. You can make the analyzer record the raw bits, the emulator do nothing, and the outputter use binutils to generate a complete line and print it out. However, in doing so, you essentially lose most of the power of IDA (finding xrefs, stack variables, etc). It would also be a shame to not use all the information given to us by the CGEN CPU definitions. These definitions (in theory) are strong enough to generate RTL code to implement the processor, so we would like to give as much of this information to IDA as possible.

CGEN Generators

The generators themselves (CGEN docs refer to them as “applications”) are also written in Scheme, in the Guile dialect. I have never written a line of functional code before, so it took me a day to understand the relatively small codebase. CGEN has its own object system that they call COS. Everything defined in the CPU descriptions becomes objects, and each generator gives these objects methods to print themselves out. For example: the simulator might give the operand object a “generate code to get value” method. Then a call to generate the semantics of an instruction into C code would use these objects’ methods. Like a true software engineer, I cut out functions from the generators for the simulator, disassembler, and architecture description and stitched them together with my own code to generate various components of an IDA module.

The analyzer used, as its base, the simulator instruction decode generator. I had to modify CGEN to record the order of operands as specified by the instruction syntax (the only modification to CGEN itself, everything else are additions). Then, I overwrote the simulator’s method definitions for extracting the operands from the instruction with code to populate the “cmd” structure in IDA (which requires the operands to be ordered).

The emulator used the simulator model generator as its base and was the most difficult to write (in terms of code complexity). The one major issue here is that while the generated simulator expects code to run in order and have state information stored, the IDA emulator does not store state information and IDA does not guarantee that your emulator will be called in the order that the instructions appear. That means we cannot make any assumptions about the state and our emulator can only make decisions based on the instruction alone. Because we only care about finding data and code references with the emulator, we can make the following simplifications:

  • Any conditionals will have the condition stripped out and all paths will be taken
  • Using any values from registers will stop the emulator and return immediately
  • Setting any values to registers will evaluate the value but discard the result

The first point allows for xrefs to be found regardless of the condition. A conditional branch, for example, will allow for a code xref to be generated. The second point is there because we do not know the state, so any dependencies on a register value that’s not already stripped out will make finding a xref difficult. In theory, we can still find offset xrefs this way, but we would have to know that only additions/subtractions are used and only a single register is used and that adds a lot of complexity. The third point allows memory reads to be captured. With those simplifications in place, we know that any memory reads/writes and any PC reads/writes that are found without knowing the state can be turned into xrefs.

The outputter used the syntax parser (binutils’ opcode builder) as its base. It reads the instruction definition in order to output the right orderings of parenthesis, commas, and so on. I just replaced the generate print methods of the hardware objects to generate calls to IDA output functions.

Results

MeP executable loaded and recognized by IDA Pro

MeP executable loaded and recognized by IDA Pro. All the blue is a result of the auto-analysis.

At the most basic level, the generated modules outputs what you would expect from objdump. The analyzer will find the right type for the operands (if possible). The emulator tries to find all constant addresses and adds xrefs to them (code and data). The outputter will print all instructions correctly, and the operands with the right type/size/name if needed.

The main thing it doesn’t do right now is keeping track of the stack pointer. It also does not identify if branches are jumps or calls (requiring CF_CALL flag). It does not identify if an instruction does not continue flow (requiring the CF_STOP flag) (it’s actually trivial to add this, but harder to add it without introducing emulator code to other generators. Since it’s easy to identify the instructions by hand, I decided to leave it out).

Usage

Once you generate the IDA module components, you still need to manually write the processor_t structure, the notify() function (optional), and implement and special print functions (as defined by the CPU definition). Then you can copy the CGEN headers from binutils and compile it with the IDA SDK. Take a look at the MeP module as an example. You can reuse most of the non-generated code as-is (just change some strings and constants). If you run into any issues, feel free to contact me. I haven’t tested this on anything other than MeP because of laziness but I hope the code works more generally.

Downloads

The CGEN code is here and the Toshiba MeP module is here. The MeP module has basic stack tracking and call recognition added manually. When I have the time, I’ll port the rest of the CGEN supported modules that IDA does not support over.

Opening Up CARDBOARD: Crafting an American New 3DS (non-XL)

Last time, I analyzed now update checks worked on the 3DS. That was a straightforward process. CARDBOARD (known colloquially as “System Transfer”) is a bundle of complexity with no less than three separate servers communicating with each other as well as the device. A custom proprietary protocol is used for 3DS to 3DS communication. Finally, we have multiple unique identifiers the console uses to identify itself with Nintendo (serial, certificates, console id, account id, etc). I can’t imagine this will be comprehensive, but I hope that whoever is reading can gain new insight on the complexity of the 3DS ecosystem. Continue reading

Reversing Gateway Ultra Stage 3: Owning ARM9 Kernel

First, some background: the 3DS has two main processors. Last time, I went over how Gateway Ultra exploited the ARM11 processor. However, most of the interesting (from a security perspective) functionalities are handled by a separate ARM946 processor. The ARM9 processor is in charge of the initial system bootup, some system services, and most importantly all the cryptographic functions such as encryption/decryption and signature/verification. In this post, we will look at how to run (privileged) code on the ARM9 processor with privileged access to the ARM11 processor. Please note that this writeup is a work in progress as I have not completely figured out how the exploit works (only the main parts of it). Specifically there are a couple of things that I do not know if it is done for the sake of the exploit or if it is done purely for stability or obfuscation. From a developer’s perspective, it doesn’t matter because as long as you perform all the steps, you will achieve code execution. But from a hacker’s perspective, the information is not complete unless all aspects are known and understood. I am posting this now as-is because I do not know when I’ll have time to work on the 3DS again. However, when I do, I will update the post and hopefully clear up all confusion. Continue reading

Reversing Gateway Ultra Stage 2: Owning ARM11 Kernel

It’s been a couple of days since my initial analysis of Gateway Ultra, released last week to enable piracy on 3DS. I spent most of this time catching up on the internals of the 3DS. I can’t thank the maintainers of 3dbrew enough (especially yellows8, the master of 3DS reversing) for the amount of detailed and technical knowledge found on the wiki. The first stage was a warmup and did not require any specific 3DS knowledge to reverse. The problem with the second stage is that while it is easy to see the exploit triggered and code to run, the actual exploit itself was not as clear. I looked at all the function calls made and made a couple of hypothesis of where the vulnerability resided, and reversed each function to the end to test my hypothesis. Although there was many dead ends and false leads, the process of reversing all these functions solidified my understanding of the system. Continue reading

Reversing Gateway Ultra First Stage (Part 2)

When we last left off, we looked at the ROP code that loaded a larger second-part of the payload. Now we will walk through what was loaded and how userland native code execution was achieved. I am still an amateur at 3DS hacking so I am sure to get some things wrong, so please post any corrections you have in the comments and I will update the post as needed. Continue reading

Reversing Gateway Ultra First Stage (Part 1)

And now for something completely different…

As a break from Vita hacking, I’ve decided to play around with the Nintendo 3DS exploit released by Gateway yesterday. The 3DS is a much easier console to hack, but unfortunately, the scene is dominated by a piracy company who, ironically, implement various “features” to protect their intellectual property (one such feature purposely bricks any user of a cloned piracy cart–and also “legitimate” users too). Ethics aside, it would be useful to reverse Gateway’s exploits and use them for homebrew loading so I took a quick look at it. The first stage of the exploit is an entry-point into the system that allows code to run in the unprivileged user-mode. It is usually used to exploit a kernel vulnerability, which is the second stage. In the unique case of Gateway, the first stage is broken up into two parts (in order for them to obfuscate their payload). I am only going to look at the first part for now. Continue reading

Unlimited Backgrounding on iOS

Since iOS4, developers have the ability to perform background tasks with some limitations. Background tasks must fit one of the five different categories for background supported apps. Music and streaming apps can be backgrounded as long as they play music. Newsstand apps can wake once a day to download updates. Location aware apps can wake up once in a while to update their position. VOIP apps can have one socket (I found out the hard way that the one socket does not include listener sockets) connected in the background. General apps can request up to 10 minutes to finish some task. While this is enough for most backgrounding purposes, sometimes we need backgrounding for more advanced tasks. Specifically, I wanted to write a HTTP proxy server that runs on the device (in the future, this proxy server will work as an ad-blocking proxy) in the background. I will show you the steps of making this work. Please note that Apple will certainly reject any app that abuses their backgrounding policy so doing so would only be useful for personal and enterprise uses. Continue reading

libVitaMTP & OpenCMA: Vita content management on Linux (and more)

More than a year ago, I’ve analyzed how the Vita communicates with the computer. I mentioned at the end that I started a project that will be an open source implementation of the protocol that the Vita uses. This protocol is just MTP (media transfer protocol) with some additional commands that I had to figure out. MTP is used by most Windows supported media players and cameras, so I was able to use a lot of existing code from libmtp and gphoto2. After lots of on and off work, I am happy to announce the first (beta) version of libVitaMTP and OpenCMA. Continue reading