You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The codebase has generous use (hundreds) of panic-causing statements: unwrap(), panic!(), assert, assert_eq, assert_ne and expect.
In some cases they are safe, but we have identified many cases for which it is unclear whether the asserted condition is always true, or can be violated due to e.g. network errors or misbehavior/bugs in other Gravity Bridge components.
Below we provide some examples of those statements, characterized into three categories:
Assertions that depend on other Gravity Bridge components
These assertions may be fine as soon as the other Gravity components behave correctly. Checking whether this is indeed true is an extremely complex and time-consuming process. Also, even if those components do behave correctly, there is always a possibility of some bug, or a malicious action.
for member inself.members.iter(){ifletSome(eth_address) = member.eth_address{ifletSome(sig) = signatures_hashmap.get(ð_address){assert_eq!(sig.get_eth_address(), eth_address);assert!(sig.get_signature().is_valid());
This is probably safe as the valset updates are coming from trusted validators. Now the question is:
what if due to some bug an invalid signature is included?
what if a malicious validator intentionally includes a wrong signature? This will lead to all other Orchestrators stopping, and all other validators being slashed.
// the go code verifies signatures, if we ever see this it means// that something has gone horribly wrong with our parsing or ordering// in the orchestrator, therefore we panic.panic!("Found invalid signature for {} how did this get here?",
sig.get_eth_address())
if cosmos_valset != *ethereum_valset {// if this is not true then we have a logic error on the Cosmos chain// or with our Ethereum searchassert_eq!(cosmos_valset.nonce, ethereum_valset.nonce);
If, as the comment says, there is indeed some logic error, is this the valid reason for all Orchestrators stopping and validators being slashed?
// the go represents all nonces as u64, there's no// reason they should ever overflow without a user// submitting millions or tens of millions of dollars// worth of transactions. But we properly check and// handle that case here.let real_num = Uint256::from_bytes_be(&val);Ok(downcast_uint256(real_num).expect("Valset nonce overflow! Bridge Halt!"))
This depends on the correct behavior of the Solidity contract, and on the nonces never overflowing 64 bits. While unlikely, both conditions still could be violated.
Assertions in the initialization phases or in the client code
ignores CosmosGRPCError which can happen because of reasons not related to balance or requesting batch.
There are other expects in the code. But we do not mention them as they are at the initialization phase or due to user specific errors.
Failed to create directory or write to file.
Failed to load cosmos or eth keys.
These errors probably won't lead to severe consequences, such as validator slashing. But they will definitely occur many times, and the error will be communicated to the user via panicing and printing a stack trace. This is not the best way to communicate errors to the user, and should be improved.
Vacuously true assertions
These assertions are guaranteed to be safe due to the code structure, and can thus be removed. Keeping them in the code makes code inspection exceptionally difficult, as we spend time on false positives. Minimizing the usage of them would be helpful for future code inspections. Here are a couple of examples:
safe as the check is done before reaching this call.
Problem Scenarios
As there is a very large number (hundreds) of panic-resulting statements, it is close to impossible to enumerate all possible problem scenarios. Here are a few example ones:
Due to a bug in other Gravity Bridge component, an input comes to the Orchestrator that triggers one of the panic-statements;
The input triggering a panic is maliciously crafted (this may happen even from a validator side);
A network error occurs, which propagates to one of the panic statements.
The effect depends on in the part of the Orchestrator in which the panic occurs. A panic in the initialization code will probably result only in the user inconvenience and annoyance. If the panic occurs a long time after the Orchestrator has been started, it will stop, and this may lead to validators being slashed without malicious actions from their part.
It should be noted that Orchestrator is a single binary responsible for all kinds of operations: a panic triggered during any of those (even seemingly unimportant) will cause the whole processing to stop, with the aforementioned severe consequences.
Recommendation
Short term
We recommend different course of action depending of the panic category:
Assertions that depend on other Gravity Bridge components:
Carefully inspect the code in the other components on which the assertion depends;
Move error-handling code in the Orchestrator to the input-receiving boundary;
For the cases when this is not possible, propagate the errors using Results to the top level;
At the top level, implement a uniform error handling with alerts issued to validators;
Do not stop processing of other legitimate, non-erroneous requests.
Assertions in the initialization phases or in the client code:
Move error-handling code to the input-receiving boundary or propagate errors to the top level;
At the top level, implement a uniform error handling with meaningful error messages issued to the user.
Vacuously true assertions:
Inspect the code, and if the assertion proves to be vacuously true, remove it.
Long term
Refactor the Orchestrator into independent binaries each responsible for different operation kind. Ensure that all binaries responsible for possible validator slashing adhere to the highest possible security standards; they should not stop under any circumstances, and should employ defensive programming techniques against possible failures in other components.
The text was updated successfully, but these errors were encountered:
Surfaced from @informalsystems audit of Althea Gravity Bridge at commit 19a4cfe
severity: High
type: Implementation bug
difficulty: Intermediate
Involved artifacts
Description
The codebase has generous use (hundreds) of
panic
-causing statements:unwrap()
,panic!()
,assert
,assert_eq
,assert_ne
andexpect
.In some cases they are safe, but we have identified many cases for which it is unclear whether the asserted condition is always true, or can be violated due to e.g. network errors or misbehavior/bugs in other Gravity Bridge components.
Below we provide some examples of those statements, characterized into three categories:
Assertions that depend on other Gravity Bridge components
These assertions may be fine as soon as the other Gravity components behave correctly. Checking whether this is indeed true is an extremely complex and time-consuming process. Also, even if those components do behave correctly, there is always a possibility of some bug, or a malicious action.
orchestrator/gravity_utils/src/types/valsets.rs#L151
This is probably safe as the
valset
updates are coming from trustedvalidators
. Now the question is:orchestrator/gravity_utils/src/types/valsets.rs#L167
Same as above.
orchestrator/relayer/src/find_latest_valset.rs#L93
If, as the comment says, there is indeed some logic error, is this the valid reason for all Orchestrators stopping and validators being slashed?
orchestrator/relayer/src/valset_relaying.rs#L49-53
This is probably safe, because the
rpc
calls ask for the specific nonce; but what if there is a bug in the RPC call?orchestrator/ethereum_gravity/src/utils.rs#L144
orchestrator/ethereum_gravity/src/utils.rs#L168
orchestrator/ethereum_gravity/src/utils.rs#L119
orchestrator/ethereum_gravity/src/utils.rs#L94
This depends on the correct behavior of the Solidity contract, and on the nonces never overflowing 64 bits. While unlikely, both conditions still could be violated.
Assertions in the initialization phases or in the client code
CosmosGRPCError
which can happen because of reasons not related to balance or requesting batch.orchestrator/gbt/src/client/cosmos_to_eth.rs#L59
orchestrator/gbt/src/client/cosmos_to_eth.rs#L109
Web3Error
which can happen without incorrect gravity address or ERC20 queries.orchestrator/gbt/src/client/eth_to_cosmos.rs#L25
orchestrator/gbt/src/client/eth_to_cosmos.rs#L32
orchestrator/gbt/src/client/eth_to_cosmos.rs#L39
expect
s in the code. But we do not mention them as they are at the initialization phase or due to user specific errors.These errors probably won't lead to severe consequences, such as validator slashing. But they will definitely occur many times, and the error will be communicated to the user via
panic
ing and printing a stack trace. This is not the best way to communicate errors to the user, and should be improved.Vacuously true assertions
These assertions are guaranteed to be safe due to the code structure, and can thus be removed. Keeping them in the code makes code inspection exceptionally difficult, as we spend time on false positives. Minimizing the usage of them would be helpful for future code inspections. Here are a couple of examples:
orchestrator/gravity_utils/src/types/batches.rs#L59-L60
Vec
s are populated in a same length loop.orchestrator/ethereum_gravity/src/valset_update.rs#L26
Problem Scenarios
As there is a very large number (hundreds) of
panic
-resulting statements, it is close to impossible to enumerate all possible problem scenarios. Here are a few example ones:panic
-statements;The effect depends on in the part of the Orchestrator in which the panic occurs. A panic in the initialization code will probably result only in the user inconvenience and annoyance. If the panic occurs a long time after the Orchestrator has been started, it will stop, and this may lead to validators being slashed without malicious actions from their part.
It should be noted that Orchestrator is a single binary responsible for all kinds of operations: a panic triggered during any of those (even seemingly unimportant) will cause the whole processing to stop, with the aforementioned severe consequences.
Recommendation
Short term
We recommend different course of action depending of the panic category:
Result
s to the top level;Long term
Refactor the Orchestrator into independent binaries each responsible for different operation kind. Ensure that all binaries responsible for possible validator slashing adhere to the highest possible security standards; they should not stop under any circumstances, and should employ defensive programming techniques against possible failures in other components.
The text was updated successfully, but these errors were encountered: