Implement ARM64 support and RiscyROP chaining algorithm #124

bkrl · 2025-01-04T08:44:25Z

For my intern project at Trail of Bits, I modified angrop to add support for ARM64 and implement algorithms from this paper: RiscyROP: Automated Return-Oriented Programming Attacks on RISC-V and ARM64. The new algorithm is able to generate more complex register setting chains like this one which controls all eight argument registers and the return address register on ARM64 glibc:

Example chain

# gadget 1
0x429b68:  ldp  x19, x20, [sp, #0x10]  # load value for x5 into x20
0x429b6c:  ldp  x21, x22, [sp, #0x20]  # load value for x4 into x22,
0x429b70:  ldp  x29, x30, [sp], #0x30  # and set x21 to address of gadget 3
0x429b74:  ret

# gadget 2
0x4d3b54:  mov  x5, x20                # set x5
0x4d3b58:  mov  x4, x22                # set x4
0x4d3b5c:  mov  x0, x19
0x4d3b60:  mov  x6, #0
0x4d3b64:  blr  x21                    # x21 set in gadget 1

# gadget 3
0x46ae00:  ldr  x0, [sp, #0x18]        # set x0 for conditional branch in gadget 4
0x46ae04:  ldp  x29, x30, [sp], #0x20
0x46ae08:  ret

# gadget 4
0x4502dc:  cmn  w0, #1                 # x0 set in gadget 3
0x4502e0:  ldr  x3, [sp, #0x98]        # set x3
0x4502e4:  mov  x8, #1
0x4502e8:  b.eq  #0x44f5f0             # conditional branch
0x44f5f0:  mov  w19, #-1
0x44f5f4:  adrp  x0, #0x59f000
0x44f5f8:  ldr  x0, [x0, #0xe68]
0x44f5fc:  ldr  x2, [sp, #0x198]       # set x2 for conditional branch
0x44f600:  ldr  x1, [x0]
0x44f604:  subs  x2, x2, x1            # x1 is a constant
0x44f608:  mov  x1, #0
0x44f60c:  b.ne  #0x450acc             # conditional branch
0x44f610:  ldp  x21, x22, [sp, #0x20]
0x44f614:  mov  w0, w19
0x44f618:  ldp  x19, x20, [sp, #0x10]
0x44f61c:  ldp  x23, x24, [sp, #0x30]
0x44f620:  ldp  x25, x26, [sp, #0x40]
0x44f624:  ldp  x27, x28, [sp, #0x50]
0x44f628:  ldp  x29, x30, [sp], #0x1a0
0x44f62c:  ret

# gadget 5
0x4d4174:  ldp  x21, x30, [sp, #0x10]
0x4d4178:  ldp  x19, x20, [sp], #0x20  # set x20 to address of gadget 7
0x4d417c:  ret

# gadget 6
0x43ec80:  mov  x2, x21                # set x2
0x43ec84:  blr  x20                    # x20 set in gadget 5

# gadget 7
0x46ae00:  ldr  x0, [sp, #0x18]        # set x0 for conditional branch in gadget 8
0x46ae04:  ldp  x29, x30, [sp], #0x20
0x46ae08:  ret

# gadget 8
0x4c0e78:  cmn  w0, #1                 # x0 set in gadget 7
0x4c0e7c:  ldr  x7, [sp, #0x78]        # set x7
0x4c0e80:  ldp  w6, w11, [sp, #0x80]
0x4c0e84:  ldp  w10, w9, [sp, #0x88]
0x4c0e88:  b.eq  #0x4c0e18             # conditional branch
0x4c0e18:  ldp  x19, x20, [sp, #0x10]
0x4c0e1c:  mov  w0, #-1
0x4c0e20:  ldp  x21, x22, [sp, #0x20]
0x4c0e24:  ldp  x23, x24, [sp, #0x30]
0x4c0e28:  ldp  x25, x26, [sp, #0x40]
0x4c0e2c:  ldp  x27, x28, [sp, #0x50]
0x4c0e30:  ldp  x29, x30, [sp], #0x90
0x4c0e34:  ret

# gadget 9
0x434af4:  ldr  x1, [sp, #0x18]        # set x1 for conditional branch in gadget 12
0x434af8:  ldp  x29, x30, [sp], #0x20
0x434afc:  mov  x0, x1
0x434b00:  ret

# gadget 10
0x44a398:  ldp  x21, x22, [sp, #0x20]  # load address of gadget 12 into x21
0x44a39c:  ldp  x25, x26, [sp, #0x40]
0x44a3a0:  ldr  x0, [sp, #0x78]        # load value for x6 into x0
0x44a3a4:  ldp  x29, x30, [sp], #0xc0
0x44a3a8:  ret

# gadget 11
0x49c270:  mov  x16, x21               # move address of gadget 12 into x16
0x49c274:  ldr  x21, [sp, #0x20]
0x49c278:  ldp  x29, x30, [sp], #0x90  # set x30 to address of gadget 13
0x49c27c:  br  x16

# gadget 12
0x44a9c0:  mov  x6, x0                 # set x6
0x44a9c4:  mov  x0, #0
0x44a9c8:  cbz  x1, #0x44aad8          # conditional branch, x1 set in gadget 9
0x44aad8:  ret                         # x30 set in gadget 11

# gadget 13
0x434af4:  ldr  x1, [sp, #0x18]        # set x1
0x434af8:  ldp  x29, x30, [sp], #0x20
0x434afc:  mov  x0, x1
0x434b00:  ret

# gadget 14
0x44a398:  ldp  x21, x22, [sp, #0x20]  # load final jump address into x21
0x44a39c:  ldp  x25, x26, [sp, #0x40]
0x44a3a0:  ldr  x0, [sp, #0x78]        # set x0
0x44a3a4:  ldp  x29, x30, [sp], #0xc0
0x44a3a8:  ret

# gadget 15
0x49c270:  mov  x16, x21               # move final jump address to x16
0x49c274:  ldr  x21, [sp, #0x20]
0x49c278:  ldp  x29, x30, [sp], #0x90  # set x30
0x49c27c:  br  x16

I'm making this a draft PR for now since there's probably more work to be done before this can be merged and I would appreciate any feedback. I'm happy to discuss things and continue working on this.

Changes

Chaining algorithm

The new chaining algorithm is a recursive DFS that generates the chain backwards instead of forwards. It keeps track of the set of registers that it needs to control, and it tries to prepend gadgets to the chain that control some of those registers. The target register set is updated after each gadget is prepended, and this repeats until the set becomes empty. A heuristic is used to estimate how easy it is to set each register by counting the number of gadgets that pop the register, and gadgets are selected based on the estimated difficulty of the resulting set of registers.

Building the chain backwards instead of forwards is advantageous in situations where gadgets can be used to control multiple registers, but not at the same time. This commonly happens with gadgets like gadget 9 in the example above that copy a value from one register to another. They allow us to control either the source register or the destination register, but not both since the registers will always have the same value. When we build the chain backwards, we will know which of the registers need to be controlled, and if we need both registers then we know that the gadget can't be used.

A disadvantage with the DFS algorithm is that if the first few gadgets that it chooses are bad then it can take a long time to backtrack. This hasn't happened much in my testing but I think there are further improvements that can be made to the algorithm.

Conditional branches

Gadgets with conditional branches are now supported. For each gadget, the set of registers that affect branch conditions is stored, and the the chaining algorithm adds those registers to the set of target registers. A list of basic block addresses is stored for each gadget so that we know which branch to execute when the chain is built.

For compatibility, the gadget analyzer doesn't return gadgets with conditional branches unless a new allow_conditional_branches option is added. With conditional branches, there can be multiple different gadgets at the same address. I updated some tests to account for this and the gadget analyzer returns lists when the option is enabled. Chain builders other than RegSetter discard gadgets with conditional branches for now.

Chain construction

I rewrote most of the Builder._build_reg_setting_chain() function. It supports conditional branches now, and it determines which stack location corresponds to which gadget address or register value by checking which symbolic stack variable gets loaded into the instruction pointer or target register. It creates a dictionary mapping symbolic stack variables to gadgets or RopValues, which is used when generating the payload. I think this is simpler and more robust than comparing concrete values or inspecting the solver constraints. It also doesn't assume that gadget addresses appear in order of execution, which is often not the case. For example, in the chain above, gadget 5 loads the address of gadget 7 into a register before jumping to gadget 6, and the address of gadget 7 has to be placed before the address of gadget 6.

For compatibility, the address of the first gadget is added as the first value in the payload because the existing code assumes that this is the case, but on ARM64 the initial address would usually have to be placed somewhere else since GCC puts the return address near the start of the stack frame instead of at the end. I think it makes more sense to separate the initial address from the rest of the payload, since where it has to be placed depends on how the chain is entered.

Gadget filtering

I disabled the code in the chain builders that compare every gadget with every other gadget, since that doesn't scale well and can be slow when there are a lot of gadgets. It doesn't seem to significantly affect the chain building.

Minor changes

Fixed a bug in the calculation of first address to check which causes the minimum address to be skipped if it is aligned.
Fixed GadgetAnalyzer.is_in_kernel() throwing an exception on some addresses because obj is None. The code was mostly a duplicate of rop_utils.is_in_kernel() so I made it call that function instead.
Removed the broad except clause in the gadget analyzer since it would catch things that we don't want to catch like KeyboardInterrupt.
The pc_reg and jump_reg attributes of gadgets seem to always be the same and the jump_reg computation doesn't work with conditional branches so I just set jump_reg to pc_reg.
Gadgets with pc_offset >= stack_change are rejected since we assume that gadgets don't read past where the leave the stack pointer. These gadgets can cause conflicts since other gadgets might load a value from the same location.
Fixed (act.data.ast == ip).symbolic causing an exception in the gadget analyzer because act.data.ast is a floating point expression instead of a bitvector, so the == operator returns False instead of a claripy expression.
For each popped register, the symbolic stack variable that gets popped into the register is stored so that the chain builder can check if two registers are popped from the same location on the stack and can't be independently controlled. For example, in gadget 9 of the chain above, both x0 and x1 are popped registers but they can't both be controlled.
In some function calling tests, max_steps is set when executing the chain since the chain doesn't know how many steps the fake function gadget needs.
test_roptest_x86_64() is in test_rop.py is rewritten so that it executes the chain and checks the resulting register values instead of checking for a fixed sequence of gadget addresses.
execute_chain() in test_rop.py checks chain.next_pc_idx(). The chain concretization code tries to append a no-op gadget if next_pc_idx is not None, but this doesn't always work on architectures like ARM64 that don't have a stack return instruction.
The gadget analyzer catches timeout exceptions.
Periodically restart worker processes to workaround memory leak issue described below.

Limitations

Gadget finding speed and memory usage

The new gadget finding code is a lot slower than the existing code. With fast mode disabled (which is necessary to do anything useful on ARM64), finding gadgets on glibc takes one to two hours with 16 processes. I think there are probably ways to improve this, though the RiscyROP paper's implementation is even slower.

There appears to be some sort of memory leak issue involving z3 that causes the memory usage to keep going up when finding gadgets. The z3 version has a big effect on the memory usage, and with the latest version it quickly grows to several GBs per process. It's not as bad with older versions like 4.12.6.0, and I added a workaround that periodically restarts the worker processes. This seems to mostly keep the memory usage below 2 GB per process.

I've also encountered an issue a few times where things would hang at the end of the gadget finding process, though this might be cause by one of the processes getting killed due to running out of memory.

Compatibility and integration

I've integrated the new algorithms with the existing code enough that I think all of the tests pass. The use_partial_controller option is not supported in the new RegSetter at this point and the generated chains might have longer payloads since the new algorithm doesn't necessarily return the chain with the smallest stack change. The new algorithm doesn't yet support using gadgets that set registers to concrete values, so I also used the existing _find_all_candidate_chains() function in combination with the new algorithm.

Architecture support

Some of the chain builders other than RegSetter might not work on ARM64 because they still have some architecture specific assumptions. For example, shift gadgets that shift by one value are sometimes used as no-op gadgets. This works on architectures like x86 with stack return instructions, but on other architectures like ARM64 there might not be any gadgets that shift exactly one value, especially due to stack alignment.

This makes angrop run on aarch64 binaries, but it's not able to do anything useful yet.

This fixes the handling of gadgets where the address of the next gadget has to be inserted in the middle instead of just concatenated to the end. angrop is now able to set registers if there is a gadget that loads into the register from the stack, and it can also chain gadgets together as long as each gadget ends with a jump to an address loaded from the stack.

Prioritize gadgets that result in fewer register dependencies and gadgets with less instructions.

There appears to be some kind of memory leak in angr that causes the memory usage to keep going up during gadget finding. This periodically restarts the worker processes to work around that.

We don't want to go into SimProcedures when finding gadgets since those are probably not useful.

Some timeouts are normal such as when finding gadgets, so we shouldn't always print a message. It's also not that useful to print a timeout message without more information about where it came from so it makes more sense to do this in exception handlers instead.

SimSolverModeError gets raised when there's a timeout in claripy.

If gadgets access memory outside of the stack, make sure the addresses are valid so that it doesn't crash.

Turns out that project.factory.blank_state() initializes the ip to the entry point instead of making it unconstrained.

gadget.block_length is only the size of the first block.

If a gadget pops a value into a register but the value is used in a branch condition or memory access address, we can't fully control it so we remove it from the set of popped registers. This also makes gadget.constraint_regs include registers that affect memory access addresses.

Backtrack if we've already seen an equivalent chain that is not longer than our current chain.

Estimate how hard it is to set a register by counting the number of gadgets that pop the register, and prioritize gadgets that result in target registers which are easier to set.

If two registers are popped from the same location on the stack, we can control either one of them but not both.

The gadget filtering code in RegSetter and RegMover tries to compare every gadget with every other gadget, so it doesn't scale well when there are a large number of gadgets. Things seem to work fine without the filtering so I'm removing it for now.

The stack needs to have one extra value initialized to account for the address of the first gadget being popped off.

When choosing gadgets, break ties by choosing the gadget with the smaller stack change to minimize the size of the payload.

See reasoning in commit f47cedd.

Long chains can take a couple of seconds to build.

The other chain builders can't handle conditional branches.

The previous code incorrectly skips the first address if it is aligned.

The chain doesn't know how many steps the fake function gadget needs so we have to override the max steps calculation.

I accidentally broke some chain builders when trying to remove the slow filtering loops that compare every gadget with every other gadget.

See explanation in commit 3da7f48.

Setting the maximum number of steps to the total number of blocks doesn't work with the fake gadgets used in function call chains, so use the old default of twice the number of gadgets if it's higher.

This makes the behavior closer to the previous algorithm which tries to find the chain with the smallest payload size.

The gadget comparison code has to handle multiple gadgets at the same address due to conditional branches.

This preserves the previous behavior where if the chain ends with a jump to some address in the middle of the chain, the concretization function would attempt to append a no-op gadget so that the chain jumps to the address immediately after it instead.

If chain.next_pc_idx() is not None, the address to jump to after the chain should be placed at that index instead of after the chain. In cases like this the chain concretization function will attempt to append a no-op gadget so that the chain jumps to the address immediately after it, but this is not always possible on architectures without a stack return instruction.

Tests should check if the chains do what we want, and they should not rely on the chain matching a fixed sequence of gadgets since there can be multiple chains that do the same thing.

The BFS algorithm is able to use gadgets that set registers to concrete values which the new algorithm doesn't support yet. This tries the BFS algorithm first since it's fast while the new algorithm can take a long time if it can't find a chain.

The previous code can throw exceptions if the address is not in any object and there is already a function in rop_utils for this.

act.data.ast can be a floating-point expression instead of a bitvector, which would cause the == operator to return False instead of a claripy expression.

I rewrote the existing Builder._build_reg_setting_chain() function instead.

Kyle-Kyle · 2025-01-05T03:52:46Z

Hi, thank you so much for the amazing work. This PR seems to have addressed a lot of features missing from angrop for years. I'd definitely love to merge it after review.
Currently, this PR seems to be failing the CI, would you mind fixing them?

bkrl · 2025-01-06T06:38:08Z

I'll fix the lints but it looks like the rest of the tests are failing because they're using the old gadget caches. I added some fields to the RopGadget class for things like conditional branch support so the gadget caches will have to be regenerated.

Kyle-Kyle · 2025-01-16T01:02:42Z

Btw. We are currently working on https://github.com/angr/angrop/commits/feat/aarch64/ trying to merge this amazing PR to angrop :D

Kyle-Kyle · 2025-01-16T23:52:57Z

btw, I'm going to re-enable gadget filtering and improve its implementation.
I just did an experiment. Without it, indeed, it takes no time to initialize the chainbuilder, but it takes 20s to generate a chain (the 20s here is for finishing the whole script, so includes initialization and chain building).
With gadget filtering, even though it takes 8s to initialize the chainbuilder, it only takes 17s in total to generate a chain.
So I think it is worth it to do gadget filtering.

Also, the algorithm I wrote was never meant to be fast, I'm now going to write a faster version for it. Thanks for pointing it out though!

Kyle-Kyle · 2025-01-17T23:14:50Z

Now gadget filtering is pretty fast. The whole process of finding a chain in libc is reduced to 8s. (for the slow arm_func_call testcase, it is even faster for x86_64)

bkrl added 30 commits December 17, 2024 11:20

Add aarch64 detection

13e5a00

This makes angrop run on aarch64 binaries, but it's not able to do anything useful yet.

Start implementing RiscyROP gadget chaining

525dfa8

Check for constrained writes to target registers

21e46f0

Implement concrete chain generation

5bfcff5

Handle gadgets with conditional branches

cd4a4ff

Prevent infinite recursion

9eedc43

Support printing gadgets containing jumps

1883f53

Fix default argument mutation bug

2d54d7f

Optimize gadget search order

0b6ee13

Prioritize gadgets that result in fewer register dependencies and gadgets with less instructions.

Limit maximum chain length

887ee58

Increase max block size for aarch64

df088b7

Work around memory leak issue

8607399

There appears to be some kind of memory leak in angr that causes the memory usage to keep going up during gadget finding. This periodically restarts the worker processes to work around that.

Catch timeout exceptions when analyzing gadgets

b2e9e78

Avoid gadgets with hooked addresses

3241788

We don't want to go into SimProcedures when finding gadgets since those are probably not useful.

Ensure path constraints are controllable

7dfc26b

Remove timeout message

93b83b7

Some timeouts are normal such as when finding gadgets, so we shouldn't always print a message. It's also not that useful to print a timeout message without more information about where it came from so it makes more sense to do this in exception handlers instead.

Restart worker processes more frequently

92b2a0e

Don't print timeout exceptions when gadget finding

f6c8219

SimSolverModeError gets raised when there's a timeout in claripy.

Constrain memory access addresses

6395d04

If gadgets access memory outside of the stack, make sure the addresses are valid so that it doesn't crash.

Catch exceptions from initial symbolic execution

bfa7927

Avoid printing timeout exceptions

88f3c0e

Set initial ip properly when concretizing chain

b39ef7c

Turns out that project.factory.blank_state() initializes the ip to the entry point instead of making it unconstrained.

Update gadget comparison for gadgets with jumps

eefde9f

gadget.block_length is only the size of the first block.

Prune search tree

12d91ce

Backtrack if we've already seen an equivalent chain that is not longer than our current chain.

Add register weight heuristic

3897bc8

Estimate how hard it is to set a register by counting the number of gadgets that pop the register, and prioritize gadgets that result in target registers which are easier to set.

Limit number of instructions in gadgets

15fe7d3

Handle two regs being popped with the same value

4baeb2f

If two registers are popped from the same location on the stack, we can control either one of them but not both.

Disable slow gadget filtering code

f47cedd

The gadget filtering code in RegSetter and RegMover tries to compare every gadget with every other gadget, so it doesn't scale well when there are a large number of gadgets. Things seem to work fine without the filtering so I'm removing it for now.

bkrl added 24 commits January 1, 2025 19:06

Fix stack size when building the chain

fbf1aa1

The stack needs to have one extra value initialized to account for the address of the first gadget being popped off.

Try to minimize the payload size

8b56e1c

When choosing gadgets, break ties by choosing the gadget with the smaller stack change to minimize the size of the payload.

Fix bug in compatibility workaround

e0defb4

Document memory leak workaround

71ff1b6

Disable slow gadget filtering

fb550d2

See reasoning in commit f47cedd.

Increase chain building timeout

eb81667

Long chains can take a couple of seconds to build.

Only use conditional branch gadgets in RegSetter

275cb14

The other chain builders can't handle conditional branches.

Fix start address calculation

62c9298

The previous code incorrectly skips the first address if it is aligned.

Add tests for new chaining algorithm

e980fe4

Set max execution steps for func call test

3da7f48

The chain doesn't know how many steps the fake function gadget needs so we have to override the max steps calculation.

Fix gadget filtering

19d0dc7

I accidentally broke some chain builders when trying to remove the slow filtering loops that compare every gadget with every other gadget.

Make new chain builder work with RegMover

2e2ea93

Set max steps for func call tests

a7be855

See explanation in commit 3da7f48.

Set chain execution max steps conservatively

20f7cd9

Setting the maximum number of steps to the total number of blocks doesn't work with the fake gadgets used in function call chains, so use the old default of twice the number of gadgets if it's higher.

Prioritize minimizing payload size

00cac37

This makes the behavior closer to the previous algorithm which tries to find the chain with the smallest payload size.

Update tests for conditional branches

f55bad7

The gadget comparison code has to handle multiple gadgets at the same address due to conditional branches.

Use block.pp() instead of block.capstone.pp()

56de100

Rewrite test to not rely on exact chain

11f519e

Tests should check if the chains do what we want, and they should not rely on the chain matching a fixed sequence of gadgets since there can be multiple chains that do the same thing.

Fix is_in_kernel check

8dc6df5

The previous code can throw exceptions if the address is not in any object and there is already a function in rop_utils for this.

Fix exception during gadget finding

8892835

act.data.ast can be a floating-point expression instead of a bitvector, which would cause the == operator to return False instead of a claripy expression.

Remove unused function

abc163d

I rewrote the existing Builder._build_reg_setting_chain() function instead.

Kyle-Kyle self-assigned this Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement ARM64 support and RiscyROP chaining algorithm #124

Implement ARM64 support and RiscyROP chaining algorithm #124

bkrl commented Jan 4, 2025 •

edited

Loading

Kyle-Kyle commented Jan 5, 2025

bkrl commented Jan 6, 2025 •

edited

Loading

Kyle-Kyle commented Jan 16, 2025

Kyle-Kyle commented Jan 16, 2025

Kyle-Kyle commented Jan 17, 2025

Implement ARM64 support and RiscyROP chaining algorithm #124

Are you sure you want to change the base?

Implement ARM64 support and RiscyROP chaining algorithm #124

Conversation

bkrl commented Jan 4, 2025 • edited Loading

Changes

Chaining algorithm

Conditional branches

Chain construction

Gadget filtering

Minor changes

Limitations

Gadget finding speed and memory usage

Compatibility and integration

Architecture support

Kyle-Kyle commented Jan 5, 2025

bkrl commented Jan 6, 2025 • edited Loading

Kyle-Kyle commented Jan 16, 2025

Kyle-Kyle commented Jan 16, 2025

Kyle-Kyle commented Jan 17, 2025

bkrl commented Jan 4, 2025 •

edited

Loading

bkrl commented Jan 6, 2025 •

edited

Loading