Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe error handling in Orchestrator #351

Open
andrey-kuprianov opened this issue Oct 11, 2021 · 0 comments
Open

Unsafe error handling in Orchestrator #351

andrey-kuprianov opened this issue Oct 11, 2021 · 0 comments

Comments

@andrey-kuprianov
Copy link

Surfaced from @informalsystems audit of Althea Gravity Bridge at commit 19a4cfe

severity: High
type: Implementation bug
difficulty: Intermediate

Involved artifacts

Description

The codebase has generous use (hundreds) of panic-causing statements: unwrap(), panic!(), assert, assert_eq, assert_ne and expect.
In some cases they are safe, but we have identified many cases for which it is unclear whether the asserted condition is always true, or can be violated due to e.g. network errors or misbehavior/bugs in other Gravity Bridge components.

Below we provide some examples of those statements, characterized into three categories:

Assertions that depend on other Gravity Bridge components

These assertions may be fine as soon as the other Gravity components behave correctly. Checking whether this is indeed true is an extremely complex and time-consuming process. Also, even if those components do behave correctly, there is always a possibility of some bug, or a malicious action.

  • orchestrator/gravity_utils/src/types/valsets.rs#L151

    for member in self.members.iter() {
        if let Some(eth_address) = member.eth_address {
            if let Some(sig) = signatures_hashmap.get(&eth_address) {
                assert_eq!(sig.get_eth_address(), eth_address);
                assert!(sig.get_signature().is_valid());

    This is probably safe as the valset updates are coming from trusted validators. Now the question is:

    • what if due to some bug an invalid signature is included?
    • what if a malicious validator intentionally includes a wrong signature? This will lead to all other Orchestrators stopping, and all other validators being slashed.
  • orchestrator/gravity_utils/src/types/valsets.rs#L167

    // the go code verifies signatures, if we ever see this it means
    // that something has gone horribly wrong with our parsing or ordering
    // in the orchestrator, therefore we panic.
    panic!(
        "Found invalid signature for {} how did this get here?",
        sig.get_eth_address()
    )

    Same as above.

  • orchestrator/relayer/src/find_latest_valset.rs#L93

    if cosmos_valset != *ethereum_valset {
        // if this is not true then we have a logic error on the Cosmos chain
        // or with our Ethereum search
        assert_eq!(cosmos_valset.nonce, ethereum_valset.nonce);

    If, as the comment says, there is indeed some logic error, is this the valid reason for all Orchestrators stopping and validators being slashed?

  • orchestrator/relayer/src/valset_relaying.rs#L49-53

    let valset = get_valset(grpc_client, latest_nonce).await;
    if let Ok(Some(valset)) = valset {
        assert_eq!(valset.nonce, latest_nonce);

    This is probably safe, because the rpc calls ask for the specific nonce; but what if there is a bug in the RPC call?

  • orchestrator/ethereum_gravity/src/utils.rs#L144

  • orchestrator/ethereum_gravity/src/utils.rs#L168

  • orchestrator/ethereum_gravity/src/utils.rs#L119

  • orchestrator/ethereum_gravity/src/utils.rs#L94

    // the go represents all nonces as u64, there's no
    // reason they should ever overflow without a user
    // submitting millions or tens of millions of dollars
    // worth of transactions. But we properly check and
    // handle that case here.
    let real_num = Uint256::from_bytes_be(&val);
    Ok(downcast_uint256(real_num).expect("Valset nonce overflow! Bridge Halt!"))

    This depends on the correct behavior of the Solidity contract, and on the nonces never overflowing 64 bits. While unlikely, both conditions still could be violated.

Assertions in the initialization phases or in the client code

These errors probably won't lead to severe consequences, such as validator slashing. But they will definitely occur many times, and the error will be communicated to the user via panicing and printing a stack trace. This is not the best way to communicate errors to the user, and should be improved.

Vacuously true assertions

These assertions are guaranteed to be safe due to the code structure, and can thus be removed. Keeping them in the code makes code inspection exceptionally difficult, as we spend time on false positives. Minimizing the usage of them would be helpful for future code inspections. Here are a couple of examples:

Problem Scenarios

As there is a very large number (hundreds) of panic-resulting statements, it is close to impossible to enumerate all possible problem scenarios. Here are a few example ones:

  • Due to a bug in other Gravity Bridge component, an input comes to the Orchestrator that triggers one of the panic-statements;
  • The input triggering a panic is maliciously crafted (this may happen even from a validator side);
  • A network error occurs, which propagates to one of the panic statements.

The effect depends on in the part of the Orchestrator in which the panic occurs. A panic in the initialization code will probably result only in the user inconvenience and annoyance. If the panic occurs a long time after the Orchestrator has been started, it will stop, and this may lead to validators being slashed without malicious actions from their part.

It should be noted that Orchestrator is a single binary responsible for all kinds of operations: a panic triggered during any of those (even seemingly unimportant) will cause the whole processing to stop, with the aforementioned severe consequences.

Recommendation

Short term

We recommend different course of action depending of the panic category:

  • Assertions that depend on other Gravity Bridge components:
    • Carefully inspect the code in the other components on which the assertion depends;
    • Move error-handling code in the Orchestrator to the input-receiving boundary;
    • For the cases when this is not possible, propagate the errors using Results to the top level;
    • At the top level, implement a uniform error handling with alerts issued to validators;
    • Do not stop processing of other legitimate, non-erroneous requests.
  • Assertions in the initialization phases or in the client code:
    • Move error-handling code to the input-receiving boundary or propagate errors to the top level;
    • At the top level, implement a uniform error handling with meaningful error messages issued to the user.
  • Vacuously true assertions:
    • Inspect the code, and if the assertion proves to be vacuously true, remove it.

Long term

Refactor the Orchestrator into independent binaries each responsible for different operation kind. Ensure that all binaries responsible for possible validator slashing adhere to the highest possible security standards; they should not stop under any circumstances, and should employ defensive programming techniques against possible failures in other components.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant