Published

Wed 09 February 2022

←Home

Zynq Part 4: CVE-2021-44850

Introduction

Also known via DAAR-76964.

I hadn't intended to revisit this, but somewhere between idly clicking through my IDB and looking at the ZU+, I ended up finding another ROM flaw in the Zynq-7000. This time, no complex external hardware is required - All that's required is a target that can boot via SD card...and of course the ability to write to that SD card. This does allow for any other access that allows raw SD card or boot partition writes to persist trivially. All the other impact is the same as previous: Secure boot/bitstream bypass and disclosure.

TL;DR // Executive Summary

  • Zynq-7000 secure boot continues to be insufficient to protect bitstreams or boot integrity.
  • The attacker is only required to place a crafted BOOT.BIN image on the FAT partition from which the Zynq boots. This may certainly be a persistence technique.
  • Impact is effectively the same as the previous bug, but with a much lower bar for exploitation and no prerequisite of physical modification.

Bug

The bug here is, like many, an intersection of a few factors:

The Zynq family's boot images all contain something Xilinx refers to as "Register Initialization List(s)". These are just a series of (address, data) tuples ostensibly used to initialize hardware (clocks, muxes, power) before the bootrom moves into any heavy lifting. I guess it's primarily a performance thing?

In the Zynq ROM, these are covered by secure boot signatures. However, that signature is checked long after they're parsed and used - so any issue here is fair game! The same goes for the field in the bootrom header which indicates secure/non-secure boot.

The only constraint is a whitelist of addresses - This list is further reduced when the header is marked as part of a secure boot image. Another issue: nothing prevents us from providing a locked Zynq with an insecure image and reaching the larger "insecure" whitelist of registers. It just (ostensibly) won't boot once the signatures are checked.

Xilinx did a decent job of constraining these lists: No DMA destination/source addresses are allowed, primarily. There apparently was an issue in pre-release silicon that resulted in insufficient protection here, so I wrote a little script to test the released version. The details of that are a little out of scope here, but in short I used Unicorn's Python bindings to emulate the ROM, hooking some functions and some memory regions as required to progress the boot flow. The results look a little like this:

emulated bootrom panicking (invalid source offset field)
emulated bootrom panicking (invalid source offset field)

Back to register init lists (RIL, from now on). I modified this emulator to just run arbitrary addresses (or ranges of addresses) through do_register_init_lists() (0x05BC0) - this allowed me to quickly test any suspicions against something close to ground truth.

In a modern (ish) SoC like Zynq, DMA is not all unified: The SDIO controller has its own SDMA (Simple? DMA) controller and sure enough: The DMA destination address is blocked for writing. Interestingly, the rest of the SDIO peripheral is not...

emulated bootrom showing blocked DMA address

The controller also contains an ADMA (Advanced?) subsystem, which instead uses a table of descriptors in memory. This is used in the ZU+ bootrom, which at least mitigates this at face value.

My initial thought was that I could just stuff a new value for the DMA block size into the registers, or something that persists...But the ROM resets all of the interesting SDIO config registers each time it goes to send a command:

ROM:000096D4 ; int __fastcall sdio_cmd(int r0_command, int r1_argument, _DWORD *r2_out)
ROM:000096D4 CMP             R2, #0
ROM:000096D8 PUSH            {R4}
ROM:000096DC MOVNE           R3, #0
ROM:000096E0 MOV             R4, #0
ROM:000096E4 STRNE           R3, [R2]
ROM:000096E8 MOVT            R4, #0xE010       ; R4 = 0xe0100000
ROM:000096EC loc_96EC:
ROM:000096EC DMB             SY
ROM:000096F0 LDR             R12, [R4,#0x24]
ROM:000096F4 MOV             R3, #0
ROM:000096F8 TST             R12, #3
ROM:000096FC MOVT            R3, #0xE010       ; R3 = 0xE0100000
ROM:00009700 BNE             loc_96EC          ; block for pending operations
ROM:00009704 MOV             R12, #0xFFFFFFFF
ROM:00009708 STR             R12, [R3,#0x30]   ; clear pending interrupts
ROM:0000970C DMB             SY
ROM:00009710 MOV             R12, #sdio_buf
ROM:00009718 STR             R12, [R3]         ; set DMA destination pointer
ROM:0000971C DMB             SY
ROM:00009720 MOV             R12, #0x200
ROM:00009724 STRH            R12, [R3,#4]      ; transfer sz: 512B. SDMA_buffer_size: 4KiB
ROM:00009728 DMB             SY
ROM:0000972C MOV             R12, #1
ROM:00009730 STRH            R12, [R3,#6]      ; ask for 1 block
ROM:00009734 DMB             SY
ROM:00009738 MOV             R12, #0xA
ROM:0000973C STRB            R12, [R3,#0x2E]   ; clear resets, set timeout
ROM:00009740 DMB             SY
ROM:00009744 STR             R1, [R3,#8]       ; set 32 bit argument from r1_argument
ROM:00009748 DMB             SY
ROM:0000974C MOV             R1, #0x13
ROM:00009750 STRH            R1, [R3,#0xC]     ; DMA Enable, Block Count Enable, DAT used for reads
; proceeds to a big jumptable and other misc housekeeping before firing off the command

Notably, even though it blocks on the transaction completion...It doesn't clear out the DMA base address register. So assuming any SDIO command has been sent (and this is a gimme: we're booting from the SD card!), there will be a pointer to the ROM's 512 byte page buffer! Actually: pointing to the end of it - the DMA hardware updates this address after a transaction, which is even better!

Something else caught my eye when I was digging into this: Let's take a look at how the DMA transaction is actually started. From the TRM:

SDIO controller DMA usage from the Zynq TRM

Turns out, to start a transaction we don't have to touch the base address register at all! Writing to the command register (bits 29:24 of Transfer_Mode_Command, for the sdio0 controller this is located at 0xe010000c) kicks things off. Checking back in with the ROM shows that it will happily write to all of these other registers (see above)!

Looking back at the ROM's RAM layout, the easiest target for corruption is the boot_func pointer at 0x7068c - this is where the read callback for the current boot method is stored. Assuming we're able to survive (avoid crashing) long enough to hit another read, it should be trivial to get code execution from here.

You might notice that this is well outside of the range of a single block (512 bytes) - This is true, and helpfully the SDIO controller supports multi-block reads. Unfortunately, this does mean that we're stuck reading a whole 0x600 bytes in order to hit this pointer. Some care will need to be taken to fix up some structures in this range, since this includes the FAT filesystem contexts.

But...That won't be necessary to get a crash. I threw together a PoC to check everything out on silicon - sure enough, I was readily able to read arbitrary amounts of data out of the SD card during the RIL processing, presumably stomping all over RAM after 0x07021c (sdio_buf + 512 from the DMA auto incrementing).

Exploit

I assumed this would be a simpler exploit than the ONFI bug, and I was...mostly right.

I spent a lot of time chasing memory coherency issues - Basically, I needed the A9 core's dcache to evict the line containing whatever corruption I intended to take advantage of. If this was written to (by the A9) before the cache was flushed, it would overwrite my corruption. Luckily, the latter wasn't a problem: boot_func is written only once, and although the first read uses a stale value the exploit payload contains some fixups for the other corrupted data structures in order to prevent panics until the cacheline containing our pointer of interest is re-fetched. Unfortunately this first read is a re-read of the header checksum, and the variable it's read into lies right in the middle of the region we're stomping all over. Rather than worry about the details of memory ordering here, I just abused some unused fields in the bootrom header to force the checksum to 0 and made sure the corresponding region in my payload was likewise zero.

While I did hand-craft a few of these remaining structures, the intent in doing so was primarily to cause as many fetches of the boot_func pointer as possible. To do so, I simply dumped the FATFS structures from the emulator (one could also use another exploit to grab them from RAM on a live device), and fixed up some fields to cause f_read() (0x008f7c) to return as quickly as possible. There does exist enough complexity in this struct to generate a clever write-what-where primitive, but that's far too much work ;)

While the sdio_cmd function does block on pending sdio transactions, we don't quite have that luxury when abusing the RIL: it was necessary to stuff a bunch of dummy writes between sdio transactions to allow for sufficient time that they complete. I used writes of 0 XQSPIPS_IDR_OFFSET, which should have no relevant side effects. Finally, another trick I added was using RIL to futz with the clocks - I used a relatively fast SD card, and sped up the clock as much as I could (basically just watching the logic analyzer and waiting until it failed...). Similarly, I downclocked the CPU as much as possible by writing the maximum allowed clock divisor to 0xf8000120 (ARM_CLK_CTRL) - after writing the last sdio command, of course. You can find the script I used to build the final header image here.

This should be fairly straightforward for the motivated reader to reproduce - You'll need to generate your own header image and update the inits array in that script in order to read from the right partition on your SD card. My payload should work out of the box, but it's been long enough between writing the exploit and this blog post that I've forgotten if that's particuarly correct...

Everything else should be mostly plug-and-play.

Closing Thoughts

This pretty conclusively breaks Zynq's security guarantees, but no one's relying on a decade old chip...right?

I didn't do the math, but it might be possible to find a slow enough SD card such that this isn't exploitable. Or use another FPGA to prevent the attack...Uh, good luck!

Go Top