-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ARM64 support and RiscyROP chaining algorithm #124
base: master
Are you sure you want to change the base?
Conversation
This makes angrop run on aarch64 binaries, but it's not able to do anything useful yet.
This fixes the handling of gadgets where the address of the next gadget has to be inserted in the middle instead of just concatenated to the end. angrop is now able to set registers if there is a gadget that loads into the register from the stack, and it can also chain gadgets together as long as each gadget ends with a jump to an address loaded from the stack.
Prioritize gadgets that result in fewer register dependencies and gadgets with less instructions.
There appears to be some kind of memory leak in angr that causes the memory usage to keep going up during gadget finding. This periodically restarts the worker processes to work around that.
We don't want to go into SimProcedures when finding gadgets since those are probably not useful.
Some timeouts are normal such as when finding gadgets, so we shouldn't always print a message. It's also not that useful to print a timeout message without more information about where it came from so it makes more sense to do this in exception handlers instead.
SimSolverModeError gets raised when there's a timeout in claripy.
If gadgets access memory outside of the stack, make sure the addresses are valid so that it doesn't crash.
Turns out that project.factory.blank_state() initializes the ip to the entry point instead of making it unconstrained.
gadget.block_length is only the size of the first block.
If a gadget pops a value into a register but the value is used in a branch condition or memory access address, we can't fully control it so we remove it from the set of popped registers. This also makes gadget.constraint_regs include registers that affect memory access addresses.
Backtrack if we've already seen an equivalent chain that is not longer than our current chain.
Estimate how hard it is to set a register by counting the number of gadgets that pop the register, and prioritize gadgets that result in target registers which are easier to set.
If two registers are popped from the same location on the stack, we can control either one of them but not both.
The gadget filtering code in RegSetter and RegMover tries to compare every gadget with every other gadget, so it doesn't scale well when there are a large number of gadgets. Things seem to work fine without the filtering so I'm removing it for now.
The stack needs to have one extra value initialized to account for the address of the first gadget being popped off.
When choosing gadgets, break ties by choosing the gadget with the smaller stack change to minimize the size of the payload.
See reasoning in commit f47cedd.
Long chains can take a couple of seconds to build.
The other chain builders can't handle conditional branches.
The previous code incorrectly skips the first address if it is aligned.
The chain doesn't know how many steps the fake function gadget needs so we have to override the max steps calculation.
I accidentally broke some chain builders when trying to remove the slow filtering loops that compare every gadget with every other gadget.
See explanation in commit 3da7f48.
Setting the maximum number of steps to the total number of blocks doesn't work with the fake gadgets used in function call chains, so use the old default of twice the number of gadgets if it's higher.
This makes the behavior closer to the previous algorithm which tries to find the chain with the smallest payload size.
The gadget comparison code has to handle multiple gadgets at the same address due to conditional branches.
This preserves the previous behavior where if the chain ends with a jump to some address in the middle of the chain, the concretization function would attempt to append a no-op gadget so that the chain jumps to the address immediately after it instead.
If chain.next_pc_idx() is not None, the address to jump to after the chain should be placed at that index instead of after the chain. In cases like this the chain concretization function will attempt to append a no-op gadget so that the chain jumps to the address immediately after it, but this is not always possible on architectures without a stack return instruction.
Tests should check if the chains do what we want, and they should not rely on the chain matching a fixed sequence of gadgets since there can be multiple chains that do the same thing.
The BFS algorithm is able to use gadgets that set registers to concrete values which the new algorithm doesn't support yet. This tries the BFS algorithm first since it's fast while the new algorithm can take a long time if it can't find a chain.
The previous code can throw exceptions if the address is not in any object and there is already a function in rop_utils for this.
act.data.ast can be a floating-point expression instead of a bitvector, which would cause the == operator to return False instead of a claripy expression.
I rewrote the existing Builder._build_reg_setting_chain() function instead.
Hi, thank you so much for the amazing work. This PR seems to have addressed a lot of features missing from angrop for years. I'd definitely love to merge it after review. |
I'll fix the lints but it looks like the rest of the tests are failing because they're using the old gadget caches. I added some fields to the |
Btw. We are currently working on https://github.com/angr/angrop/commits/feat/aarch64/ trying to merge this amazing PR to angrop :D |
btw, I'm going to re-enable gadget filtering and improve its implementation. Also, the algorithm I wrote was never meant to be fast, I'm now going to write a faster version for it. Thanks for pointing it out though! |
Now gadget filtering is pretty fast. The whole process of finding a chain in libc is reduced to 8s. (for the slow |
For my intern project at Trail of Bits, I modified angrop to add support for ARM64 and implement algorithms from this paper: RiscyROP: Automated Return-Oriented Programming Attacks on RISC-V and ARM64. The new algorithm is able to generate more complex register setting chains like this one which controls all eight argument registers and the return address register on ARM64 glibc:
Example chain
I'm making this a draft PR for now since there's probably more work to be done before this can be merged and I would appreciate any feedback. I'm happy to discuss things and continue working on this.
Changes
Chaining algorithm
The new chaining algorithm is a recursive DFS that generates the chain backwards instead of forwards. It keeps track of the set of registers that it needs to control, and it tries to prepend gadgets to the chain that control some of those registers. The target register set is updated after each gadget is prepended, and this repeats until the set becomes empty. A heuristic is used to estimate how easy it is to set each register by counting the number of gadgets that pop the register, and gadgets are selected based on the estimated difficulty of the resulting set of registers.
Building the chain backwards instead of forwards is advantageous in situations where gadgets can be used to control multiple registers, but not at the same time. This commonly happens with gadgets like gadget 9 in the example above that copy a value from one register to another. They allow us to control either the source register or the destination register, but not both since the registers will always have the same value. When we build the chain backwards, we will know which of the registers need to be controlled, and if we need both registers then we know that the gadget can't be used.
A disadvantage with the DFS algorithm is that if the first few gadgets that it chooses are bad then it can take a long time to backtrack. This hasn't happened much in my testing but I think there are further improvements that can be made to the algorithm.
Conditional branches
Gadgets with conditional branches are now supported. For each gadget, the set of registers that affect branch conditions is stored, and the the chaining algorithm adds those registers to the set of target registers. A list of basic block addresses is stored for each gadget so that we know which branch to execute when the chain is built.
For compatibility, the gadget analyzer doesn't return gadgets with conditional branches unless a new
allow_conditional_branches
option is added. With conditional branches, there can be multiple different gadgets at the same address. I updated some tests to account for this and the gadget analyzer returns lists when the option is enabled. Chain builders other thanRegSetter
discard gadgets with conditional branches for now.Chain construction
I rewrote most of the
Builder._build_reg_setting_chain()
function. It supports conditional branches now, and it determines which stack location corresponds to which gadget address or register value by checking which symbolic stack variable gets loaded into the instruction pointer or target register. It creates a dictionary mapping symbolic stack variables to gadgets orRopValue
s, which is used when generating the payload. I think this is simpler and more robust than comparing concrete values or inspecting the solver constraints. It also doesn't assume that gadget addresses appear in order of execution, which is often not the case. For example, in the chain above, gadget 5 loads the address of gadget 7 into a register before jumping to gadget 6, and the address of gadget 7 has to be placed before the address of gadget 6.For compatibility, the address of the first gadget is added as the first value in the payload because the existing code assumes that this is the case, but on ARM64 the initial address would usually have to be placed somewhere else since GCC puts the return address near the start of the stack frame instead of at the end. I think it makes more sense to separate the initial address from the rest of the payload, since where it has to be placed depends on how the chain is entered.
Gadget filtering
I disabled the code in the chain builders that compare every gadget with every other gadget, since that doesn't scale well and can be slow when there are a lot of gadgets. It doesn't seem to significantly affect the chain building.
Minor changes
GadgetAnalyzer.is_in_kernel()
throwing an exception on some addresses becauseobj
isNone
. The code was mostly a duplicate ofrop_utils.is_in_kernel()
so I made it call that function instead.except
clause in the gadget analyzer since it would catch things that we don't want to catch likeKeyboardInterrupt
.pc_reg
andjump_reg
attributes of gadgets seem to always be the same and thejump_reg
computation doesn't work with conditional branches so I just setjump_reg
topc_reg
.pc_offset >= stack_change
are rejected since we assume that gadgets don't read past where the leave the stack pointer. These gadgets can cause conflicts since other gadgets might load a value from the same location.(act.data.ast == ip).symbolic
causing an exception in the gadget analyzer becauseact.data.ast
is a floating point expression instead of a bitvector, so the==
operator returnsFalse
instead of a claripy expression.x0
andx1
are popped registers but they can't both be controlled.max_steps
is set when executing the chain since the chain doesn't know how many steps the fake function gadget needs.test_roptest_x86_64()
is intest_rop.py
is rewritten so that it executes the chain and checks the resulting register values instead of checking for a fixed sequence of gadget addresses.execute_chain()
intest_rop.py
checkschain.next_pc_idx()
. The chain concretization code tries to append a no-op gadget ifnext_pc_idx
is notNone
, but this doesn't always work on architectures like ARM64 that don't have a stack return instruction.Limitations
Gadget finding speed and memory usage
The new gadget finding code is a lot slower than the existing code. With fast mode disabled (which is necessary to do anything useful on ARM64), finding gadgets on glibc takes one to two hours with 16 processes. I think there are probably ways to improve this, though the RiscyROP paper's implementation is even slower.
There appears to be some sort of memory leak issue involving z3 that causes the memory usage to keep going up when finding gadgets. The z3 version has a big effect on the memory usage, and with the latest version it quickly grows to several GBs per process. It's not as bad with older versions like 4.12.6.0, and I added a workaround that periodically restarts the worker processes. This seems to mostly keep the memory usage below 2 GB per process.
I've also encountered an issue a few times where things would hang at the end of the gadget finding process, though this might be cause by one of the processes getting killed due to running out of memory.
Compatibility and integration
I've integrated the new algorithms with the existing code enough that I think all of the tests pass. The
use_partial_controller
option is not supported in the newRegSetter
at this point and the generated chains might have longer payloads since the new algorithm doesn't necessarily return the chain with the smallest stack change. The new algorithm doesn't yet support using gadgets that set registers to concrete values, so I also used the existing_find_all_candidate_chains()
function in combination with the new algorithm.Architecture support
Some of the chain builders other than
RegSetter
might not work on ARM64 because they still have some architecture specific assumptions. For example, shift gadgets that shift by one value are sometimes used as no-op gadgets. This works on architectures like x86 with stack return instructions, but on other architectures like ARM64 there might not be any gadgets that shift exactly one value, especially due to stack alignment.