# Audit of Zeko's circuits

- **Client**: Zeko
- **Date**: May 21, 2025
- **Tags**: mina, ocaml, pickles, kimchi, zkapp, circuits

## Introduction

On January 13th 2025, zkSecurity started a security audit of Zeko's circuits. The audit was performed on the public repository https://github.com/zeko-labs/zeko at commit `fab6a03555`. The engagement lasted six days and was conducted by two consultants.

Zeko is a zk-rollup on the Mina blockchain. We can also refer to Zeko as a Layer 2 (L2) to disambiguate it from Mina's base layer (L1).

Zeko is isomorphic to Mina, which means that it offers the same interface to zkapp developers (apart from a few deliberate exceptions). Compared to the L1, it promises a better experience thanks to differentiators like high throughput and removal of transaction size limits.

### Scope

Zeko is built as a fork of the Mina repository. It uses Mina's zk libraries Snarky and Pickles, and many of Mina's protocol-related data structures and tooling. Furthermore, the circuits for proving state transitions of the L2 are largely shared with Mina's own "transaction snark" which proves state transitions of the L1.

Within the repo, Zeko's main logic that is not shared with Mina is located at `src/app/zeko`. Few additional changes were made to other parts of Mina to modify the transaction snark's behavior in specific ways.

The scope of the audit was the following:

- All of `src/app/zeko/circuits`, which hosts Zeko's own circuits, including
  - Zeko's "outer" contracts, which are zkapps on the Mina L1, and
  - "inner" contracts that operate on the L2.
- The diff to the original Mina repo, as far as it affects Zeko's circuits and in particular the transaction snark.

At the time of the audit, the circuit code was newly rewritten and not integrated into the overall rollup yet. Even witness generation code was, in many parts, not implemented. This put a hard limit on the audit scope: We really just looked at the circuits themselves, not the wider context in which they are used.

Nonetheless, we believe that the audit covers a significant part of the attack surface of the zk-rollup as a whole, because the circuits are the largest critical component.

As much as possible, we tried to not only review circuits for local bugs such as missing constraints, but also audit the security of the protocol design as a whole.

### Summary and recommendations

Zeko's overall protocol design is deemed to be sound. Various attack vectors seem to be carefully considered and mitigated. We found a few significant bugs, each of which seems easy to fix, in a code base that is quite fresh and untested.

Given the partial scope and the number of high-severity findings, a small follow-up audit when the code base has stabilized could provide additional assurance.

## Protocol Overview

At its heart, Zeko is a zkapp on Mina, which holds a commitment to the L2 ledger in its state. The contract's central method, `Rule_commit`, moves the ledger state forward: by proving it has some transactions that, when applied to the source ledger, result in the target ledger.
(Note: Contract methods are called "rules" in Zeko.)

This zkapp is called the _outer account_, where "outer" refers to the L1. There is also an _inner account_ on the L2. The two cooperate to support a messaging scheme between L1 and L2 that can be used by other zkapps.

Apart from the outer and inner rollup accounts, Zeko also maintains _bridge accounts_ on both sides, which have rules to bridge tokens to and from the L2. The bridge contracts can be considered the first and foremost examples for how to use messaging.

### Messaging

Messaging is based on _actions_, a Mina protocol construct. In general, an action is arbitrary data attached to a transaction, that is appended to a Merkle list commitment on the target account when the transaction is applied. The list commitment is called the _action state_ of the account.

Both the outer and inner accounts have rules that submit an action: `Rule_action_witness` and `Rule_inner_action_witness`, respectively. Both rules allow arbitrary child account updates, and the action contains the hash of those child updates plus an additional field element defined by the caller (for committing to application-specific data).

Consequently, the **outer action witness rule** allows you to perform any interaction on the L1, and records it on the outer account's action state for later reference. Similarly the **inner action witness rule** allows you to record interactions on the L2.

Messaging is achieved by making the outer action state available on the inner account's app state, and vice versa. In pseudo-code:

```ocaml
outer_account = {
  app_state: { inner_action_state, ledger_hash, ... };
  action_state
}

inner_account = {
  app_state: { outer_action_state };
  action_state
}
```

To see how this is useful, let's say you want to move tokens to the L2. You deposit them to an L1 bridge account, inside a call to the outer action witness rule. On the L2, you can refer to `inner_account.outer_action_state` to prove that this L1 deposit happened. When doing this, an inner bridge account lets you withdraw the tokens on the L2.

#### How are action states made available across layers?

Making the action state available from **L2 to L1** is straightforward: The outer **commit rule**, after proving transition of the L2 ledger, looks up the updated `inner_account` in that ledger, and sets its own `inner_action_state` to `inner_account.action_state`. Since this is the only way that `inner_action_state` can be updated, we know it commits to actions submitted on the inner account.

The direction from **L1 to L2** is harder because the inner contract doesn't have a proof of what happened on the L1. Instead, it has a **sync rule** which moves the `outer_action_state` field forward by applying _some_ actions, without proving anything about those actions.

Proving correctness of `inner_account.outer_action_state` is deferred to the outer commit rule: after updating the L2 ledger, it checks that `outer_action_state` is an ancestor of the current `outer_account.action_state`.
In other words, we can only ever commit to an L2 in which the inner `outer_action_state` contains real L1 actions.

Let's try to understand why this deferred proof scheme is secure. On the one hand, nothing in the sync rule prevents it from being called with L1 actions that never happened, and the L2 could go into an "unlawful" state. For example, you could invent deposits, and use them to withdraw tokens out of thin air from the inner bridge account.

However, an unlawful L2 can never _finalize_: It wouldn't pass the proof of the outer commit rule. In particular, transactions on the L1 can always rely on the L2 state stored in the outer account: it can never originate from L2 interactions that assumed a fake `inner_account.outer_action_state`.

Put simply: while you could print yourself tokens on a fake L2, you could never get those tokens out on the L1.

In practice, as clarified by the Zeko team, sequencers will control calls to the inner sync rule so that their L2 state never becomes invalid.

#### Action state extensions

Recall that action states are Merkle list commitments: They are the result of hashing all historical actions in a linear chain. One step in that chain looks like $S_{i+1} = hash(S_i, A)$ where $A$ is the action appended to the action state $S_i$.[^1]

[^1]: Technical detail: when using "action" in the same sense as Mina does, $A$ in the equation above actually represents a non-empty list of actions. We change terminology and call those lists "actions".

Frequently in Zeko's contracts, we need to show that some action $A$ is contained in a given action state $S$. This means proving that there exists a previous action state $S_0$ and some remaining actions $A_1, \ldots, A_n$ such that

$$
S = \left( hash(\cdot, A_n) \circ \ldots \circ hash(\cdot, A_1) \circ hash(\cdot, A) \right) (S_0)
$$

Since the hash chain can have an arbitrary length, we can't verify it in a fixed-size circuit. To solve this, Zeko uses a recursive proof of the hash chain. This is called an ASE, for "action state extension".

ASEs are created using the `Folder` module that implements a generic recursive state machine proof. We [found a missing constraint](#finding-high-00-missing-merge-consistency) in `Folder` that breaks the proof.

The impact of proving invalid ASEs is devastating: It allows you to invent actions that never happened. For example, to withdraw from the L1 bridge, you need to show existence of a deposit into the L2 bridge. Since the deposit (an action on the inner account) can be spoofed, an attacker could easily withdraw arbitrary amounts. This is the most severe issue found during the audit, a "billion-dollar-bug".

### Bridge contracts

Zeko is designed to have two bridge contracts for every token: One on the L1 and one on the L2. The L1 contract will be deployed to _multiple different accounts_, only one of which is enabled at a time. Essentially, the idea is to spread the risk of holding funds.

Bridge accounts have the following main methods:

- Inner bridge account: **finalize-deposit rule**
- Outer bridge accounts: **finalize-withdrawal rule** and **finalize-cancelled-deposit rule**.

In addition, outer bridge accounts have methods to be disabled and enabled within certain windows of time.

We already hinted at how deposits and withdrawals work, but didn't discuss important aspects yet, like cancelling and prevention of double-spends. To recap, here are the simplified steps to deposit tokens from the L1 to the L2:

- Deposit tokens into one of the L1 bridge accounts (no matter which one), inside the outer action witness rule.
- Wait until the L1 deposit is synced to the L2, which happens when the inner sync rule is called.
- Call the finalize-deposit rule on the inner bridge account, which sends the tokens to an account of your choosing.

Along with the deposit, the user posts additional parameters: a receiver and timeout.
The receiver is a public key that is able to finalize the deposit, by signature.

The timeout enables a light-weight way to cancel a deposit, via the finalize-cancelled-deposit rule, which doesn't require cooperation by a sequencer. A deposit can be cancelled as soon as any action is posted on the outer account that has slot bounds higher than the timeout. "Slots" are the measure of time on Mina, and transactions can specify slot bounds outside of which they are invalid. This establishes a fuzzy timestamp for onchain events, used extensively by Zeko.

By contrast, a deposit is marked as "accepted" if followed by a commit action that came before the timeout. At that point, it can be finalized on the L2. Being cancellable and finalizable are mutually exclusive, and once either of them is true, the deposit status cannot change. This is important, otherwise you could get the funds out on both sides and double your money.

#### Preventing double-spends

Another attack vector that has to be defended against is finalizing the same deposit twice (or cancelling it twice). The same applies for withdrawals.

To prevent double-spends, finalize-deposit introduces a helper account that has the receiver's public key and a token id owned by the L2 bridge contract. (Note: this relies on Mina's "token owner" feature, not explained in this report.) On the helper account, we store the action index of the last processed deposit, and require that this index is _strictly increasing_.

Since both the helper account and the deposit's action index are uniquely determined by the deposit, this ensures that a deposit can be finalized at most once.

Side note: If deposits are processed out of order, it could happen that one is skipped and funds are irredeemably lost. That must be prevented by appropriate software to construct user transactions, and we view it as outside the scope of the present audit.

We have several interesting findings related to double-spends. One, pointed out to us early on by the Zeko team, is that in the case of L1 withdrawals, the helper account is _not_ determined uniquely by the withdrawal: Since multiple L1 accounts are deployed for every token, there are multiple possible helper accounts, and double-spends are possible. Details are in the [write-up of the finding](#finding-high-02-missing-helper-token-owner).

Another issue is related to a peculiar property of Mina's transaction snark: it doesn't prevent that the same account is added to the ledger multiple times. Allowing that would, again, destroy the assumption that the helper account is unique, and enable double-spends on the L2 bridge. Not only that: since duplicate accounts break a fundamental assumption of the protocol, they would most likely pose a security risk for userland zkapps as well.

To prove that no account is added twice, Zeko adds a new construction to its modified transaction snark: An _indexed Merkle tree_, which supports non-inclusion proofs and can therefore be used to implement a _set of unique accounts_.

Auditing Zeko's account set was fascinating. We recommend reading our [finding on unconstrained account set updates](#finding-high-01-unconstrained-account-set-update). In addition to that finding, we found 5 low-level issues in the field comparison gadget, a complex primitive needed for the indexed Merkle tree; see [1](#finding-high-03-range-check-not-canonical), [2](#finding-high-04-range-check-not-strict), [3](#finding-high-05-range-check-missing-lookups), [4](#finding-medium-05-range-check-overconstrained-zero-carry), [5](#finding-low-hard-coded-decrement-constant).
In total, four out of the 7 high-severity issues in our report are related to breaking uniqueness of the account set.

<!-- ### Account set and indexed Merkle tree -->

<!-- ### Genesis public key

from gregor:

> This snippet (from outer_rules) is interesting: They create a public key in a way that its discrete log / private key will never be known.
> This is the public key of their "inner account" (syncing account on the L2).
> Background: On Mina, normal accounts are the same as zkapp accounts, and you can even control some parts of the account with signatures (like a normal EOA) and some parts with proofs (what you'd expect from a zkapp). The details are defined by the permissions field on the account.
> By picking a public key with inaccessible private key, Zeko is making sure that you can never use signatures on this account, so it's only controlled by proofs.
> Which means that the only way to modify this account is by calling one of the zkapp methods (edited)

david: oh I see, I was wondering why the public key was hardcoded like that :smile: but I thought you could set the permission of an account to "update by proofs only" anyway no?

> you can, but it has small caveats. o1 didn't want to let contracts with proof authorization be completely broken when they change the proof system in a future hard fork. so the permission for updating the verification key automatically switches from "proof" back to "signature" when a certain "network version" changes.
> same for the permission to update permissions.
> Just realized, you can't create an account with inaccessible private key via a transaction, because an account always starts out with signature permission, and so the first account update (which changes the permissions and sets the verification key) must always be signed. So this is only possible because they hard-code this account into their genesis ledger -->

## Findings

### Arbitrary Token Withdrawals Allowed By Not Linking Proofs In Merge Rule

- **Severity**: High
- **Location**: folder.ml

**Description.** `Folder.Make(...)` implements a recursive proof of a general state machine.

The proof has a public output of the form `{ source: Stmt; target: Stmt }`. Given some function `step: Elt -> Stmt -> Stmt`, it asserts that there exists some sequence of elements `Elt` such that `step`, when applied repeatedly to the elements, transitions from `source : Stmt` to `target : Stmt`.

One of the allowed branches of this proof is a merge rule. It accepts a "left" transition `left.source -> left.target` and a "right" transition `right.source -> right.target`, and merges them to the combined transition `left.source -> right.target`, by verifiying two recursive proofs.

The problem is that the merge rule never does the consistency check that `left.target == right.source`. Which means that, in the proof output, nothing forces `source` and `target` to be related in any way.

**Impact.** Folder is used throughout the rollup contracts for critical components, such as the different variants of `Ase` (action state extension) which prove that one action state is the result of applying actions to another. Consequently, this issue entirely breaks the bridging protocol and allows users to prove arbitrary actions on either the L1 or the L2.

For example, this could be exploited by anyone to withdraw all available tokens from the L1 bridge contracts, by witnessing an L2 withdrawal action that didn't actually happen.

**Recommendation.** Add the missing consistency check to `Rule_merge.main`:

```diff
let%snarkydef_ main (w : Witness.t V.t) =
  let* Witness.{ left; left_proof; right; right_proof } =
    exists ~compute:(V.get w) Witness.typ
  in
+ let* () = assert_equal Stmt.typ left.target right.source in
  let new_stmt : Trans.var =
    { source = left.source; target = right.target }
  in
  Checked.return
    Compile_simple.
      { prevs =
          Two_prevs
            ( { proof_must_verify = Boolean.true_
              ; public_input = left
              ; proof = left_proof
              }
            , { proof_must_verify = Boolean.true_
              ; public_input = right
              ; proof = right_proof
              } )
      ; out = new_stmt
      }
```

**Client Response.** The issue was fixed as recommended.

### Unconstrained Account Set Update Caused By Skipped Indexed Merkle Tree Check

- **Severity**: High
- **Location**: indexed_merkle_tree.ml, zeko_transaction_snark.ml

<!-- TODO: add this description to intro instead? this finding is super long -->

**Description.** Zeko tracks all accounts in an _account set_, a Merkle tree. We found a bug which means a prover can update the account set in arbitrary ways. The issue allows sequencers to prove invalid state transitions with far-reaching consequences, like double-spending of L2 deposits.

First, some background. In contrast to the main account ledger, another Merkle tree used by Mina, Zeko's account set supports non-inclusion proofs, thanks to being an _indexed Merkle tree_. For more on indexed Merkle trees, see [the paper that introduced them](https://eprint.iacr.org/2021/1263.pdf) and this [talk and blog post by Aztec](https://docs.aztec.network/aztec/concepts/storage/trees/indexed_merkle_tree).

The account set construction exists to close a gap in Mina's "transaction snark", the circuit(s) that Zeko uses to prove valid state transitions of the L2 ledger. In the original transaction snark, when processing an account update, the prover is free to _either_ prove it belongs to an existing account in the current ledger, _or_ to witness a Merkle path to an "empty" account. In the empty case, no connection to the account id of the account update is proved. The empty account is updated to have the acount id matching the account update, and this update is recorded in the ledger. This is how new accounts are created in the protocol.

In other words, nothing in Mina's original circuit prevents a prover from creating the same account multiple times, and picking to which of these to apply updates later on. The Mina L1 relies on consensus to prevent duplicated accounts from being accepted. Zeko, on the other hand, is centralized and doesn't have consensus, and must rely on proofs to guarantee that only valid state transitions are performed on its L2.

This is where the account set comes in. In Zeko, every processed account, along with its "is empty" flag, is passed to a function called `update_acc_set`. If the flag is true, the function will _prove that the account doesn't exist in the account set yet_, and insert it. This is just the piece of circuit logic missing from the original transaction snark.

```ocaml
let update_acc_set accounts init ~witness =
  Checked.List.fold accounts ~init
    ~f:(fun set (account_id, is_empty_and_writeable) ->
      (* ... ZKSECURITY: skipping lines ... *)
      let* y = derive_token_id ~owner:account_id in
      (* ZKSECURITY: `y`, a hash of the `account_id`, is added to the set. *)
      let* `Before_adding_y set', `After_adding_y new_set =
        Account_set.add_key_var ~x ~path_x ~y ~path_y ~z
          (* ZKSECURITY: Via the `check` flag, an insertion is skipped for
             accounts that already existed *)
          ~check:is_empty_and_writeable ()
      in
      let*| () = assert_equal ~label:__LOC__ Account_set.typ set set' in
      new_set )
```

The problem happens in the case where `is_empty_and_writeable` is false, and `y` is in the tree already. Note that in this case, we don't really want to perform any check here, since elsewhere we already proved inclusion of the `account_id` in the main ledger. To skip the check, this function passes `check:false` to `Account_set.add_key_var`.

Let's look at `add_key_var`, the main method on Zeko's indexed Merkle tree implementation. The function inserts a node `y` in the tree, using witnesses `x` and `z` such that `{key: x, next_key: z}` exist in the original tree and `x < y < z`. It turns out that the method is both over- and under-constrained if the check is `false` and `y` was already in the tree.

```ocaml
let add_key_var ?check ~x ~path_x ~y ~path_y ~z () =
  let* () =
    with_label __LOC__ (fun () -> assert_x_less_than_y_less_than_z ~x ~y ~z)
  in
  (* ZKSECURITY: after showing that x < y < z, we prove inclusion of `(x, z)` in the input root.
     this alone implies that y did not exist in the tree before! *)
  let* root = implied_root { key = x; next_key = z } path_x in
  let* root_intermediate = implied_root { key = x; next_key = y } path_x in
  (* ZKSECURITY: `root_intermediate'` commits to _some_ tree has an empty entry for `y`.
     But this root is not connected to the input root yet... *)
  let* root_intermediate' =
    implied_root_raw Field.(constant typ zero) path_y
  in
  let* root_intermediate' =
    match check with
    | Some check ->
        if_ check ~typ:F.typ ~then_:root_intermediate'
          ~else_:root_intermediate
    | None ->
        Checked.return root_intermediate'
  in
  (* ZKSECURITY: we check that `path_y` really belongs to the same tree as `path_x`,
     by asserting the roots are equal. But this assertion is SKIPPED if check=false *)
  let* () =
    assert_equal ~label:__LOC__ F.typ root_intermediate root_intermediate'
  in
  (* ZKSECURITY: we update the root with the insertion of y and return it.
     this is a valid insertion if check=true or None, but an unconstrained update if check=false. *)
  let* root_new = implied_root { key = y; next_key = z } path_y in
  Checked.return (`Before_adding_y root, `After_adding_y root_new)
```

There are two issues:

1. The **input root is over-constrained**: If `y` is already in the tree of `Before_adding_y root`, then the first check `assert_x_less_than_y_less_than_z` must fail. This is a core property of indexed Merkle trees, maintained by this function by updating `x`'s `next_key` to `y`. It means that `update_acc_set` fails for accounts already in the set.

2. The **output root is under-constrained**: Using `check: false` makes the function skip the check that `root_intermediate == root_intermediate'`. This means that `path_y` is not validated to be a path in the same tree, and the updated root `After_adding_y root_new` could belong to any tree.

In `update_acc_set`, the returned root is just accepted as the new value. Because of the second issue, the prover could just switch to a different account set whenever an account already existed (i.e. during a majority of transactions).

**Impact.** The over-constrainedness issue simply means that the implementation would not have worked. Note that the audited code is very new and was not tested prior to the audit.

The more serious issue is the second one, which would not have surfaced on its own: Exploiting it, a prover can add the same account to the L2 ledger multiple times. This is just the problem that the account set was there to address in the first place.

For an example attack enabled by non-unique accounts, consider the bridge method for finalizing an L2 deposit. It prevents double-spending of the deposit by tracking a `deposit_index` field on a specific receiver account. If that account could be added another time, the same deposit can be withdrawn again.

Such a double-spending attack might only be accessible to the sequencer (depending on the exact operating setup of the rollup). But that seems bad enough, as we want to force sequencers to behave according to the protocol, using zk proofs.

**Recommendation.** Both issues can be addressed by adjusting `add_key_var`. To fix the over-constrained input root, the method could accept the root to update as an argument, instead of returning it. If `check` is false, the input root is replaced (using if-else) with that of an empty Merkle tree, before asserting it to equal the root implied by the `(x, z)` node. In this case, the witnesses passed in for `x` and `z` need to be the default min and max nodes that are in the indexed Merkle tree initially, to make sure the `assert_x_less_than_y_less_than_z` passes.

To fix the under-constrained output root, we just need another if-else at the end that replaces the output with the _original_ input root, if `check` is false.

**Client Response.** The issue was fixed, by swapping out the _original_ root in `add_key_var` in the `check=false` case. This addresses both the soundness and the completeness issue, without changing the function signature.

<!--
DUMP FROM CHAT:

I'm pretty sure I found a critical bug in the account set logic:
* in indexed_merkle_tree > add_key_var, if check is Boolean.false_, the assert equals between root_intermediate and root_intermediate' is skipped
* check: false is used when processing accounts that already exist
* so I assume you added the flag so that it doesn't fail on existing accounts, which by itself is fine (existing accounts are proved to exist in the txn snark already)
* BUT the problem is that with check: false, the path_y in add_key_var is never validated to be an actual path in the account set tree! (the root it creates is not anchored anywhere)
* which means that the output root, root_new, is essentially unconstrained: it could be the root of _any_ merkle tree that contains the (y, z) node
* which means the prover can arbitrarily change the account set every time this is called on an existing account
 -->

### Multiple L1 Bridge Accounts Allow Double-Spending Without Helper Token Owner

- **Severity**: High
- **Location**: rule_bridge_finalize_withdrawal.ml, rule_bridge_finalize_cancelled_deposit.ml

This finding was indicated to us by the Zeko team during the audit. We include it here for completeness and to flesh out its impact.

As explained in the protocol spec, Zeko plans to deploy multiple bridge accounts on the L1 for the same token. This is a conservative measure: Just one of the accounts will be enabled at any given time, which means that even in the case the rollup is hacked, most of the L1 funds are inaccessible to the hacker.

However, using the code base at the time of audit, deploying multiple versions of the same bridge contract would enable double-spending by processing the same withdrawal on each of the accounts.

Double-spending prevention works by checking a certain state field called `next_withdrawal` on a `helper_account`. This account has the `helper_token_id`, which is derived from and owned by the bridge contract itself:

```ocaml
(* ZKSECURITY: helper token id is derived from bridge account *)
let helper_token_id =
  let account_id =
    Account_id.Checked.create public_key
      (constant Token_id.typ token_id_l1)
  in
  Account_id.Checked.derive_token_id ~owner:account_id

(* ... ZKSECURITY: skipping lines ... *)

let helper_account =
  { default_account_update with
    public_key = base_params.recipient
  (* ZKSECURITY: this account is only unique if `helper_token_id` is *)
  ; token_id = helper_token_id
  ; authorization_kind = authorization_signed ()
  ; use_full_commitment = Boolean.true_
  ; may_use_token = constant May_use_token.typ Parents_own_token
  (* ZKSECURITY: if this account is not unique, we can do the state update
     multiple times with the same `next_withdrawal` index,
     which means we can double-spend a withdrawal *)
  ; update =
      { default_account_update.update with
        app_state =
          Outer_user_state.fine
            { next_withdrawal = Some next_withdrawal
            ; next_cancelled_deposit = None
            }
          |> var_to_app_state_fine
```

The problem with multiple L1 bridge contracts is that we also get multiple different helper accounts, and so the fact that `next_withdrawal` is incremented on _one_ of them doesn't prevent us to repeat the withdrawal using _another_ of them.

The same issue is present in the method for cancelling a deposit, which also must only be allowed once since it involves paying back the deposit.

The solution to this issue is already outlined in the `bridge-spec.md`: There is a _single_ zkapp called `helper_token_owner_l1`, which owns the `helper_account` token id, and sits between the bridge and helper account updates in a transaction similar to the one above.

**Recommendation.** Implement the `helper_token_owner_l1` and add it to the withdrawal and cancel-deposit rules. This zkapp needs two methods, one for withdrawal and one for cancel-deposit, which produces the account update layout needed by the bridge rules without further restrictions on when the method can be called.

Note that there's currently no way to assert the `helper_token_owner_l1` is only called in the context of a bridge call, but this is fine since, thanks to signature authorization_kind on the `helper_account`, only the `recipient` of the withdrawal could ever call this method. So, this allows voiding a withdrawal by user error, but not maliciously for a different user.

**Client Response**. The issue was fixed in commit [cf78d9](https://github.com/zeko-labs/zeko/pull/303/commits/cf78d9b84efbfe93f02dddf9004c51d1b04b3bc8).

<!--
DUMP FROM CHAT:

Las, [15.01.25 14:38]
Since you might find this, to save you time:
- While looking over the code it seems like I forgot to implement the helper token owner for L1 which is needed since you have many L1 holder accounts.
- The inner holder account needs to have access to = Proof because it has child accounts / it’s a token owner. This also means because of bad design decisions that even though receive might be None it would still need a proof, so it needs an additional rule to allow receiving funds unconditionally (but no children) I think.

Gregor Mitscha-Baude, [17.01.25 15:33]
Thanks for pointing these out! By now I also understand them :D

So I guess the zkapp logic for helper_token_owner_l1 will be: Two methods that produce the layout needed by finalize cancelled / finalize withdrawal resp., with no further restrictions on calling those methods?

Thanks to the signature requirement on the child, there is no danger that it allows you to maliciously increase someone else's next_withdrawal / next_cancelled_deposit
-->

### Underconstrained Comparison: Mistake In Checking Canonical Representation

- **Severity**: High
- **Location**: zeko_transaction_snark.ml

<!--
DUMP FROM CHAT:

Gregor Mitscha-Baude, [21.01.25 17:12]
and another one :D
here, you actually want to subtract fp - x - 1, to assert fp > x, and not fp - y - 1
 -->

**Description**. For its indexed Merkle tree, Zeko implements an $x > y$ assertion where x and y can range over the full field. This is achieved by representing x and y as bigints.

The conversion of field elements to bigints works by asserting that

$$
x = x_0 + 2^\ell x_1 + 2^{2 \ell} x_2 \mod{fp}.
$$

Here, $x$ is the field element, $(x_0, x_1, x_2)$ its representation as bigint, $\ell = 88$ is the limb size, and $fp$ is the native field size.

Because $3 \ell = 264$ is more bits than the field size, the split of x into limbs is underconstrained: It allows to convert x into a bigint that represents x plus a small multiple of $fp$, like $x + fp$ for example.

In order to assert that $x > y$, we need to confirm two conditions:

1. The bigint version of $x$ is larger than that of $y$
2. The bigint version of $x$ is less than $fp$

To see why the second condition is necessary, imagine we chose a bigint representation of $x + fp$. In that case, the first point would just prove that $x + fp > y$, which is trivially true and doesn't tell us that $x > y$.

The second check can be performed using `sub_then_dec` which calculates $fp - x - 1$ and range-checks the result.
However, in `assert_greater_than_full`, this step is done with y instead of x as input, and therefore only checks that $y < fp$.

```ocaml
let* () =
sub_then_dec
    ~dec:Field.(constant typ one)
    ~x0:(constant Field.typ fp0) ~x1:(constant Field.typ fp1)
    ~x2:(constant Field.typ fp2) ~y0 ~y1 ~y2
(* ZKSECURITY: we range-checked fp - y - 1, which doesn't help *)
```

With the $x < fp$ check missing, the gadget is unsound, and can be trivially exploited to "prove" $x > y$ even if $x \le y$ by adding a multiple of the modulus to the bigint $x$.

**Impact.** Faking a less-than assertion allows you to break the indexed Merkle tree and add duplicate entries. Due to its use for the account set, the finding lets the prover make old accounts appear as if they are new. As explained in the intro, this implies attacks like double-spending L2 deposits, by any entity that is able to call the commit rule on the L1.

**Recommendation**.
Update the code to ensure that `x0`, `x1`, and `x2` are passed to `sub_then_dec` as the `y0`, `y1`, and `y2` parameters.

**Client Response.** The issue was fixed as recommended.

### Underconstrained Comparison: Equal Inputs Accepted Due To Range-Check On Wrong Variable

- **Severity**: High
- **Location**: zeko_transaction_snark.ml

<!--
DUMP FROM CHAT:

Gregor Mitscha-Baude, [21.01.25 16:54]
ah and you should multi-range check the final result w, not the intermediate z. (otherwise your assert-less-than allows x and y to be equal)
 -->

**Description**.
The `sub_then_dec` function performs subtraction between two big integers $x$ and $y$ (represented as three limbs) and then decremented by a constant $dec$ which can be 0 or 1. There are also two big integers $z$ and $w$ which represent the result of $x - y$ and $z - dec$, respectively.

The current implementation of `sub_then_dec` applies the multi-range check constraint to the intermediate result $z$ instead of the final result $w$, as seen in the following lines of code:

```ocaml
let sub_then_dec ~dec ~x0 ~x1 ~x2 ~y0 ~y1 ~y2 =
    let* (z0, z1), z2 =
    exists
        Typ.(F.typ * F.typ * F.typ)
        ~compute:
        (let- x0 in
            let- x1 in
            let- x2 in
            let- y0 in
            let- y1 in
            let- y2 in
            Zeko_as_prover.sub ~x0 ~x1 ~x2 ~y0 ~y1 ~y2 |> As_prover.return )
    in
    let* (w0, w1), w2 =
    exists
        Typ.(F.typ * F.typ * F.typ)
        ~compute:
        (let- z0 in
            let- z1 in
            let- z2 in
            Zeko_as_prover.sub ~x0:z0 ~x1:z1 ~x2:z2 ~y0:Field.one
            ~y1:Field.zero ~y2:Field.zero
            |> As_prover.return )

    (* ... ZKSECURITY: skipping lines ... *)

    in
    (* ZKSECURITY: this should have been w0 w1 w2 *)
    multi_range_check z0 z1 z2
```

The idea of the check is that by proving $w = x - y - 1 \ge 0$, you guarantee that $x > y$. Because the function only performs a range-check on the intermediate value $z = x - y$, it only shows that $x \ge y$.

Consequently, when `sub_then_dec` is used within `assert_greater_than_full`, it allows x and y to be equal.

**Impact.** This "off-by-one" error is enough to break the indexed Merkle tree: You can insert an existing entry y by providing _itself_ as low node witness $x := y$ and then "prove" that $x < y$. Thus, this finding breaks the account set.

**Recommendation**.
Update the `multi_range_check` statement to apply it to `(w0, w1, w2)`.

**Client Response.** The issue was fixed as recommended.

### Underconstrained Comparison: Mixup Of Range-Checked Slices

- **Severity**: High
- **Location**: zeko_transaction_snark.ml

<!-- TODO might add this general background info to introduction -->

**Description**. Kimchi, the proof system underlying Zeko, supports "multi-range checks": using a combination of 3 gates (2x `range_check0` and 1x `range_check1`) to perform an 88-bit range check on all three limbs of a 264-bit bigint. These range checks were originally added to Kimchi to support foreign field arithmetic, and were later reused for bigint arithmetic on the _native field_, to enable relatively cheap full-field comparisons.

However, circuits using multi-range checks were only added to o1js, Kimchi's TS "frontend", and not to the OCaml frontend snarky that Zeko uses. Therefore, Zeko had to reimplement some gadgets down to the most low-level parts, like `multi_range_check` and wrappers around the `range_check0` and `range_check1` gates.

In `multi_range_check`, two calls to `range_check0` serve to range-check the lower 64-bits of each of the lower two limbs. The remaining 24 bits of each of those limbs are left to be range-checked by `range_check1` (which uses up two PLONK rows, with 4 12-bit lookups each). To enable this, `range_check0` returns two 12-bit slices of the limb, and passes them to `range_check1`.

```ocaml
let multi_range_check x y z =
    let* v0p0, v0p1 = range_check0 x in
    let* v1p0, v1p1 = range_check0 y in
    range_check1 ~v2:z ~v0p0 ~v0p1 ~v1p0 ~v1p1
```

Unfortunately, `range_check0` returns the wrong 12-bit slices: `(v0p4, v0p5)` instead of `(v0p0, v0p1)`. As a result, some slices are range-checked twice while `v0p0`, `v0p1` and `v1p0`, `v1p1` (the most significant slices of their respective limb) are not range-checked at all.

Due to the mistake, the entire multi-range check is ineffective, since an arithmetically valid decomposition of each limb can move an arbitrarily large value into the unchecked slices. With its multi-range check turned off, the higher-level `assert_greater_than_full x y` gadget can successfully be called with any x and y.

**Impact.** As in the two previous findings, the indexed Merkle tree is broken and the prover is free to insert entries that already existed.

**Recommendation**. Change the return values of `range_check0` to be `(v0p0, v0p1)`.

**Client Response.** The issue was fixed as recommended.

<!--
DUMP FROM CHAT:

Gregor Mitscha-Baude, [21.01.25 18:05]
final bug in the range check gadgets: in range_check0 you return the wrong two limbs, it's v0p0 and v0p1 that have to be returned (see the naming scheme in range_check1!). these get range-checked in the other gate, so it's important to return the right ones
 -->

### Withdrawal Delay Not Enforced Due To Missing Equality Check

- **Severity**: High
- **Location**: rule_bridge_finalize_withdrawal.ml

<!-- TODO: This could arguably be a "high", since it completely destroys an important defense-in-depth mechanism in case of a rollup hack -->

<!-- TODO: it seems the spec is mis-designed, since it requires `withdrawal_delay` time since the _commit which commits to the current inner action state_.
  this is necessarily the latest commit, and you'd have to wait additional time (withdrawal_delay) on TOP of that before you can withdraw.
  so withdrawal in this design is made hard and attackable simply by frequent commits
-->

**Description.** On Zeko, withdrawals to the L1 are subject to a `withdrawal_delay`: a minimum time that must pass before the withdrawal can be finalized, counted in number of slots since the first L1 commit which includes the L2 withdrawal transaction.

We found that the method to finalize a withdrawal does not actually check that the withdrawal is contained in the mentioned commit. Consequently, a caller of the method could witness an unrelated, much older commit. In that case, the check that `withdrawal_delay` slots passed since the commit tells us nothing about the withdrawal itself. In other words, the withdrawal delay is not enforced.

In contrast to the implementation, the pseudo-code in `bridge-spec.md` does include an assertion that enforces the withdrawal delay:

```ocaml
(* ZKSECURITY: this is pseudo-code taken from the spec *)
let do_finalize_withdrawal
  (* ... ZKSECURITY: skipping lines ... *)
  let inner_action_state =
    List.append actions_after_withdrawal withdraw_action withdrawal_params :: action_state_before_withdrawal
  in
  let outer_action_state =
    List.append actions_after_commit @@ Commit commit :: action_state_before_commit
  in
  (* ZKSECURITY: the following assertion is MISSING from the implementation! *)
  assert commit.inner_action_state = inner_action_state ;
  let inner_action_state_length = List.length actions_after_withdrawal + 1 + withdrawal_index in
  { account_id = holder_account_l1
  ; balance_change = -withdrawal.amount
  ; may_use_token
  ; authorization_kind = Proof
  ; children =
    [ { public_key = helper_token_owner_l1
      (* ... ZKSECURITY: skipping lines ... *)
      }
    ; { public_key = zeko_pk
      ; authorization_kind = outer_authorization_kind
      ; preconditions =
        { action_state = outer_action_state
        ; app_state = { inner_action_state ; inner_action_state_length ; paused = false }
        ; valid_while =
          { lower = commit.valid_while.upper + withdrawal_delay
          ; upper = infinity }
        }
      }
    ]
  }
```

However, in our view, that spec design is also not ideal because it imposes tight bounds on when it is possible to withdraw at all. To understand this point, note that in the pseudo-code, the `inner_action_state` has to precisely match the rollup's current app state:

```ocaml
      ; preconditions =
        (* ... *)
        ; app_state = { inner_action_state ; inner_action_state_length ; paused = false }
```

Because of this, the assertion `commit.inner_action_state = inner_action_state` implies that this is in fact the most recent commit; since any later commit would overwrite the app state with a different inner action state.

But the transaction is only allowed at least `withdrawal_delay` slots _after_ the commit:

```ocaml
      ; preconditions =
        (* ... *)
        ; valid_while =
          { lower = commit.valid_while.upper + withdrawal_delay
```

That means, for your withdrawal to succeed, you have to know the latest commit, wait for the withdrawal delay, and only _then_ squeeze in your transaction before the next commit comes along.

Depending on how long the withdrawal delay is and how frequently commits happen, this might only leave a small time window in which withdrawal is even possible. Also, it makes it easy for rollup sequencers to prevent any withdrawals just by committing too often.

**Impact.** Turning back to the issue that withdrawal delays are not enforced: This will not be a problem as long as the rollup operates normally.

The withdrawal delay is a defense-in-depth mechanism. The idea is that, when the rollup gets hacked or is known to be vulnerable, it can be paused, which makes withdrawals impossible. The delay simply makes sure that there is enough time for honest operators to pause the rollup in such a scenario, before funds are lost to the attacker.

Because of the issue we found, the pausing mechanism is unlikely to help, since an attack can commit (potentially invalid state) and withdraw in a single transaction.

**Recommendation.** The issue should obviously be fixed, but we advise against following the current spec. The witnessed commit should not be required to exactly have `commit.inner_action_state = inner_action_state`.

Instead, a third ASE should be used to show that there is a line of actions starting from the `withdraw_action` and ending at `commit.inner_action_state`. This would let you use any commit which includes the withdrawal, not necessarily the latest one.

**Client response.** The issue was fixed in commit [c291d4d](https://github.com/zeko-labs/zeko/pull/287/commits/c291d4d37e7fcf0008670f73a5de135c0e93e43d) by connecting the commit's inner action state to the end of the withdrawal ASE. It turns out no third ASE was needed: the withdrawal ASE was freed up to be used for this purpose, by no longer tying its target to the rollup's `Outer_state`. In fact, the method doesn't need any precondition on the `Outer_state.inner_action_state`: the fact that the withdrawal is contained in the commit, and the commit is contained in the outer action state, is enough to show that the withdrawal happened on a valid L2.

<!-- Similar flexibility is already allowed in the finalize-cancelled-deposit rule. -->

<!--
DUMP FROM CHAT:

Gregor Mitscha-Baude, [20.01.25 15:40]
I found what seems like unintended behaviour in the finalize-withdrawal rule:
You don't prove any connection between the commit action (on the L1) and the withdrawal action (on the inner action state).
So, while it does prove that the withdrawal action was part of _some_ commit (otherwise it couldn't be in the inner action state), it doesn't prove that it was part of the specific commit you have in the circuit.

In the spec, you have the following line to connect the two, which is not present in the circuit:
assert commit.inner_action_state = inner_action_state ;

The impact is that the withdrawal_delay is ineffective: It is only enforced by requiring that at least withdrawal_delay slots passed since the commit, but since the commit is an arbitrary commit, this doesn't say anything about the actual delay since the withdrawal.

Gregor Mitscha-Baude, [20.01.25 15:45]
Having said that, tbh I struggle to understand why you even have a withdrawal delay. It seems there's something I'm missing. In my current understanding of the protocol, being part of a commit proves that the withdrawal has a legit money source and is safe to withdraw from the bridge contract.

Las, [20.01.25 16:27]
It’s supposed to be some commit. The minimum delay possible is n + slot of earliest commit with withdrawal.

Las, [20.01.25 16:27]
Prevention of exploits since we can pause before it happens.

Gregor Mitscha-Baude, [20.01.25 16:28]
but it doesn't prove that this commit has the withdrawal it could be a much earlier commit for example

Las, [20.01.25 16:29]
It does in the spec.

Las, [20.01.25 16:29]
Did I misimplement it?

Gregor Mitscha-Baude, [20.01.25 16:29]
but not in the code, is what I'm saying
 -->

### Withdrawal Impossible Since L2 Bridge Account Is Missing Method To Receive Funds

- **Severity**: Medium
- **Location**: bridge_rules.ml

This finding was pointed out to us by the Zeko team during the audit. We include it here for completeness and to flesh out its impact.

**Description**. To withdraw tokens from the L2, users need to first send them to the "inner" bridge account. Later, they finalize the withdrawal on the L1 by witnessing the L2 transaction.

The same L2 account is also used to send out funds for finalizing deposits, and in that method it functions as a "token contract": It controls a child account that stores the deposit index in its state.

To make the deposit design secure, the inner bridge account needs to have its "access" permission set to "proof". Otherwise the deposit index on child token accounts could be manipulated with unproved transactions.

However, "access = proof" implies that it's not possible to simply send tokens to the inner bridge contract, without a dedicated method on that contract to support receiving. Currently, this method is missing.

In other words, in the current design, it's possible to deposit tokens into the L2 but impossible to withdraw them.

**Impact.** It seems likely that this issue would have been noticed quickly after deploying or testing the rollup. It might have necessitated a redeployment, manual repayment of funds and other damage control.

**Recommendation.** Add the missing "receive" rule on the inner bridge contract.

**Client response.** The issue was fixed, and the receive rule added in commit [ea5b251](https://github.com/zeko-labs/zeko/commit/ea5b2516c9d052734231a2326188484d33682b06).

<!--
DUMP FROM CHAT:

Las, [15.01.25 14:38]
Since you might find this, to save you time:
- While looking over the code it seems like I forgot to implement the helper token owner for L1 which is needed since you have many L1 holder accounts.
- The inner holder account needs to have access to = Proof because it has child accounts / it’s a token owner. This also means because of bad design decisions that even though receive might be None it would still need a proof, so it needs an additional rule to allow receiving funds unconditionally (but no children) I think.
-->

### Slot Ranges Use Lower In Place Of Upper Bound

- **Severity**: Medium
- **Location**: zeko_transaction_snark.ml

<!--
DUMP FROM CHAT:

Gregor Mitscha-Baude, [20.01.25 20:05]
found a bug in merge_slot_ranges: you use one of the lower bounds for the upper one
  let*| upper =
    Slot.Checked.(x.upper < y.upper)
    >>= if_ ~typ:Slot.typ ~then_:x.lower ~else_:y.lower
  in
 -->

**Description.**

In Zeko, slot ranges define time windows during which specific operations are valid. When multiple operations are batched together, their slot ranges need to be merged to find a valid intersection.

```ocaml
let merge_slot_ranges (x : Slot_range.var) (y : Slot_range.var) :
    Slot_range.var Checked.t =
  let* lower =
    Slot.Checked.(x.lower < y.lower)
    >>= if_ ~typ:Slot.typ ~then_:y.lower ~else_:x.lower
  in
  let*| upper =
    Slot.Checked.(x.upper < y.upper)
    >>= if_ ~typ:Slot.typ ~then_:x.lower ~else_:y.lower
  in
  ({ lower; upper } : Slot_range.var)
```

We found a bug where when determining the upper bound of the merged range, it incorrectly uses the lower bounds (`x.lower` or `y.lower`) instead of the upper bounds (`x.upper` or `y.upper`). This could lead to deposits being incorrectly marked as timed out and users might lose the ability to reclaim funds when they should be able to. The sequencer might fail to create valid transaction batches and valid transactions could be rejected due to invalid time windows.

**Impact.** It seems likely that this issue would have been noticed quickly after deploying or testing the rollup. It might have necessitated a redeployment, manual repayment of funds and other damage control.

**Recommendation.** Use the upper bounds to get the merged upper bound.

```diff
  let*| upper =
    Slot.Checked.(x.upper < y.upper)
-    >>= if_ ~typ:Slot.typ ~then_:x.lower ~else_:y.lower
+    >>= if_ ~typ:Slot.typ ~then_:x.upper ~else_:y.upper
```

**Client Response.** The issue has been fixed.

### Transaction Snark: Unsuccessful Account Update Not Prevented

- **Severity**: Medium
- **Location**: transaction_snark.ml

**Description.** In Mina's original transaction snark, invalid transactions are processed without error, but have no effect except that their fee payments are applied. In other words, if your transaction fails, you don't get your transaction fee back.

Zeko chose to change this behaviour: It doesn't apply fees for invalid transactions. To ensure this, constraints are added to the transaction snark designed to make it impossible to prove the state transition that belongs to a failed transaction.

We believe the change, as currently made to the Mina code base, is ineffective.

The following line is added in `Zkapp_command_snark.main`:

```ocaml
(* ZEKO NOTE: We do not accept failure. *)
with_label __LOC__ (fun () -> Boolean.Assert.is_true local.success) ;
```

However, _before_ this line is called, the account update in question was fully processed, with the following logic at the very end (in `zkapp_command_logic.ml`, `Make(...).apply`):

```ocaml
let local_state =
  (* Make sure to reset the local_state at the end of a transaction.
      The following fields are already reset
      - zkapp_command
      - transaction_commitment
      - full_transaction_commitment
      - excess
      so we need to reset
      - token_id = Token_id.default
      - ledger = Frozen_ledger_hash.empty_hash
      - success = true
      - account_update_index = Index.zero
      - supply_increase = Amount.Signed.zero
  *)
  { local_state with
    ledger =
      Inputs.Ledger.if_ is_last_account_update
        ~then_:(Inputs.Ledger.empty ~depth:0 ())
        ~else_:local_state.ledger
  ; success =
      Bool.if_ is_last_account_update ~then_:Bool.true_
        ~else_:local_state.success
```

In words: if this was the last account update in the transaction, `local_state.success` is reset to `true`, no matter its value up until then.

Therefore, in the `is_last_account_update` case, the assertion that requires `.success` to be true has no effect: It happens too late in the logical flow.

**Impact.** Assessing the impact with 100% certainty is infeasible as it relies on the details of a large amount of out-of-scope code (the entire transaction snark). But from reviewing the diff to Mina, and the way Zeko uses the transaction snark, it seems likely to us that failing transactions are not ruled out, and are allowed to behave the same as in Mina's original code, with fees applied.

**Recommendation.** We recommend to add the assertion `Boolean.Assert.is_true local_state.success` inside `apply`, right before local state is reset in the code section quoted above. This is the appropriate location for such an assertion, and comes for free even if added in addition to the assertion that already exists.

**Client Response.** The issue was fixed, by modifying the definition of `add_check`, the method used to set `local_state.success` to false. Instead of doing that, it now asserts that the input check passes, so that `success` can never become `false`. In addition, the recommended assertion was also added.

<!--
discussed live with Las in the meeting

notes:
in zeko, unsuccessful account updates could still be used for bridging

wait.. what if an AU is unsuccessful on L1 but used to finalize deposit on L2??

doesn't affect L1 action state => can't be used

same goes for L2 -> L1 direction, so unsuccessful updates might not be as bad as I thought

TODO discuss this in intro
-->

<!--
  TODO(gregor): something I might have missed here is that in Zeko, the sequencer pays fees.
  if unsuccessful transactions can take fees, the sequencer account can be drained by failing txs with high fees.

  but who receives those fees? probably the sequencer again. to it doesn't matter?
-->

### Complex Assumptions On Unproven Initial Rollup State

- **Severity**: Medium
- **Location**: rollup_state.ml

**Description.** We found instances where Zeko's outer contract only guarantees correct behavior under additional assumptions on the initial rollup state.

- The initial account set `acc_set` must contain the same accounts as the initial ledger `ledger_hash`
  - It it contains fewer, accounts can be added twice; if it contains more, some accounts can never be processed. Both can have serious security implications.
- The initial nodes in the account set must form a linked list ordered by `key`, and there has to be a minimum and maximum node such that hashes outside the min-max range can not be efficiently computed.
  - Again, if this is not the case it could allow duplicated accounts or forbidden accounts.
- Genesis ledger accounts must not contain positive token balances, unless they either legitimately correspond to initial funds in the L1 bridge contracts; or serve as liquidity for L2 bridge contracts that cannot be withdrawn in any way except by finalizing L2 deposits.
  - Note: Details depend on the permissions configuration on these accounts, which is not part of the codebase yet.

Finally, the commit rule assumes that the inner rollup account is stored at the left-most node in the ledger Merkle tree. It does check the public key of that account matches the expected one, but does not check the token id:

```ocaml
(* We check that we're dealing with the correct account. *)
let* () =
  with_label __LOC__ (fun () ->
      PC.Checked.Assert.equal old_inner_acc.public_key
        (constant PC.typ inner_public_key) )
(* ZKSECURITY: we didn't check the token id, so it could still be an incorrect account *)
```

If the left-most tree node would contain a different account (with the same public key), the sequencer can potentially commit L2 actions that don't correspond to L2 transactions. This could be exploited to, for example, withdraw tokens the sequencer doesn't own.

In summary, security critically relies on the correctness of initial state on the outer rollup account.

Zeko's contracts don't contain any provable initialization logic, which means that initial state must be set in a signed transaction by the controller of Zeko's private key, before (potentially) changing account permissions and yielding control over state to the zkapp code.

**Impact.** On Mina, it's hard to transparently ensure that an account was initialized correctly. The `proved_state` state flag shows whether the entire account state was last updated by a proof, but this is not reliable information if the account allows verification key upgrades: the entire state can be swapped out by switching to a different verification key and back, in a single transaction, and `proved_state` would still be true.

Therefore, any way of verifying correct initialization has to involve looking at the transaction history. However, the assumptions on Zeko's rollup account are sufficiently complex that even that seems hard to achieve for a motivated user. It's unlikely that state initialization, on which security of the rollup relies, will be independently verified.

**Recommendation.** We recommend to add a provable init method to the outer contract, which sets the entire state and is called after deploying the rollup. The method should have a way of ensuring it can only be called once.

Adding an init method would make it easier to verify validity of the rollup's state, with a script that loops through all transactions and checks that the verification key was not changed since initialization.

Furthermore, designing the init method would force the team to precisely nail down the requirements on initial state, making the init process easier to audit and better documented.

Regarding `Rule_commit` and its reliance on having the inner account in the left-most Merkle node: we recommend to simply add a check for the token id. This would eliminate one more complex assumption at no cost.

**Client response.** The Zeko team opted for a different way of documenting the intended initial state: They added an official spec for the initial state, that rollup implementations can be checked against. This avoids the overhead of adding an entire zkapp method that can only be used once, and mostly addresses the concern we raised regarding auditability.

<!--
DUMP FROM CHAT:

Gregor Mitscha-Baude, [16.01.25 09:28]
I have something, which at first glance looked like a critical issue:
* In rule_comit.ml, to prove that you correctly move the inner action state forward, you look up some account in the ledger, and show that it's the inner account by checking it has the same public key
* However, having the same public key still leaves open the possibility that they are different accounts with different token ids
* If an operator could use an inner account with a different token id here, they could move the inner action state in arbitrary ways

Gregor Mitscha-Baude, [16.01.25 09:30]
I realize that the inner public key is special in that it has an inaccessible private key, and the only way its accounts can end up on the L2 ledger is by making them genesis accounts

Gregor Mitscha-Baude, [16.01.25 09:32]
Still, AFAIU the sequencer can pick the genesis ledger, and could maliciously put some inner token accounts there to later trick the commit rule with actions that never happened?

Gregor Mitscha-Baude, [16.01.25 09:33]
So I would recommend to assert equality of token ids in that snippet

Las, [16.01.25 11:15]
The public key check is completely redundant actually. It’s fixed to being the left-most account in the merkle tree.

Las, [16.01.25 11:17]
Genesis ledger being correct is our assumption. We can’t really guard against this in circuit

In the future I’ve considered having a separate piece of software that verifies that the zkapp has been correctly deployed.

Gregor Mitscha-Baude, [16.01.25 12:38]
ah so it could also be fixed by fixing all path entries to have is_left = true!
but currently that's not the case - the merkle path is witnessed and can point anywhere in the tree

(btw, assert_equal is just adding wires between variables and is for free in terms of constraints, so efficiency-wise it doesn't matter how you constrain it to be the correct account)

Gregor Mitscha-Baude, [16.01.25 12:50]
that makes sense, so this makes my finding above non-critical to fix since you already rely on genesis ledger correctness
(and now I'm interested in how else the genesis ledger could mess with users in hard-to-detect ways)

I think in any case you want to try and minimize the amount of damage that can be done by tweaking the genesis ledger, wdyt? If only to keep the complexity of any security analysis low

Las, [16.01.25 12:52]
The Merkle path should already be all left I think

Las, [16.01.25 12:52]
Good to know about equality assertions, I wasn’t entirely confident that was the case.

Las, [16.01.25 12:53]
If it’s low cost then sure, even if just to prevent accidentally initializing it incorrectly.

Gregor Mitscha-Baude, [16.01.25 12:53]
ah damn, I missed that you used a special implied_root function, yep ok

Las, [16.01.25 12:54]
But in general you can’t have any confidence that it’s been done correctly without verifying it externally since proved state is borked
 -->

### Overconstrained Comparison Due To Forcing Zero Carry

- **Severity**: Medium
- **Location**: zeko_transaction_snark.ml

**Description.** For its indexed Merkle tree, Zeko implements an $x > y$ assertion where x and y can range over the full field. This is achieved by representing x and y as bigints, and leveraging snarky's custom foreign field gates. We found that the circuit is overconstrained and will not work for about half the (x, y) input pairs.

If x and y are bigints, we can check that $x > y$ by performing a bigint subtraction $w = x - y - 1$ followed by a range-check on $w$. This is implemented in `suc_then_dec` by calling the `ForeignFieldAdd` gate twice: For $z = x - y$ and $w = z - 1$.

In general, `ForeignFieldAdd` allows for overflow around a field, but here, the gate is instantiated with a modulus of 0 and a `field_overflow` of 0. For the sign, -1 is used to instantiate a subtraction. In this simplified form, the gate asserts two equations:

<!-- TODO pack these into one align -->

$$
x_{01} - y_{01} = z_{01} + 2^{176} c
$$

$$
x_2 - y_2 = z_2 - c
$$

Here $x_2$ is the highest 88-bit limb (of three), $x_{01} = x_0 + 2^{88} x_1$ is a concatenation of the lower two limbs, and same for y and z. The carry $c$ is constrained by the gate to be either 0, 1 or -1.

In `sub_then_dec`, two addition gates are chained where the second subtracts the hard-coded bigint $1 = (1, 0, 0)$. This implies the constraints

$$
x_{01} - y_{01} - 1 = w_{01} + 2^{176} (c_0 + c_1)
$$

$$
x_2 - y_2 = w_2 - (c_0 + c_1)
$$

for some carries $c_0$ and $c_1$, where w is the final output.

Since the limbs of x, y and w are range-checked, these equations must hold over the integers and not just modulo the circuit field. Summing them as $eq_{01} + 2^{176} eq_2$ cancels out the carries and proves that $x - y - 1 = w$, as desired. This way of using the gates is sound.

The only remaining challenge is witnessing the correct carries and outputs to make the constraints hold. This is where the implementation takes a short-cut and hard-codes both carries to 0. So the first equation simplifies to

$$
x_{01} - y_{01} - 1 = w_{01}.
$$

This forces $x_{01} > y_{01}$ since $w_{01} \in [0, 2^{176})$. However, even if $x > y$ we can not deduce any relationship between the lower two limbs, and in fact we expect $x_{01} \le y_{01}$ in about half of all cases. In other words, this circuit is overconstrained.

**Impact.** When properly tested before deployment, this bug would have surfaced quickly. If the rollup would be deployed as is, it would soon come to a halt because of a transaction that can't be applied in-circuit. This is because `sub_then_dec` is called with hashes of account ids as input, which are essentially random field elements.

**Recommendation.** Implement the computation of at least one carry instead of hard-coding both to zero.

To see that one carry is enough, consider the extreme case where $x > y$, but $x_{01} = 0$ and $y_{01} = 2^{176} - 1$. The first equation becomes

$$
-2^{176} = w_{01} + 2^{176} (c_0 + c_1)
$$

This is satisfiable with $c_0 = -1$, so it's fine to hard-code $c_1 = 0$.

In fact, this also means that the subtraction and decrement could be done in a single `ForeignFieldAdd` gate:

```ocaml
let* () =
  add_plonk_constraint
    (ForeignFieldAdd
        { left_input_lo = x0
        ; left_input_mi = x1
        ; left_input_hi = x2
        ; right_input_lo = Field.Var.add y0 dec
        ; right_input_mi = y1
        ; right_input_hi = y2
        ; sign = Field.of_int (-1)
        ; carry = carry
        ; field_overflow = Field.(constant typ zero)
        ; foreign_field_modulus0 = Field.zero
        ; foreign_field_modulus1 = Field.zero
        ; foreign_field_modulus2 = Field.zero
        } )
```

**Client Response**. The issue was fixed by computing and witnessing the carry, and collapsing the two foreign field gates into one as recommended.

<!--
DUMP FROM CHAT:

Gregor Mitscha-Baude, [21.01.25 16:49]
found one more thing - in sub_then_dec (used for assert_less_than_full, for the indexed merkle tree), you force the carry to be zero.
this will not work in general: the carry serves to offset a negative difference in the lower two limbs of x - y, i.e. x0 + 2^L*x1 - y0 - 2^L*y1. this part can be negative even if x > y, so you do need a carry, otherwise this circuit is overconstrained and will not be provable for some inputs.
-->

### Bridge Rules Must Be Compiled As Separate Rule Sets

- **Severity**: Medium
- **Location**: bridge_rules.ml

**Description.** Zeko's design features two bridge contracts: An "outer" bridge contract on the L1, and an "inner" bridge contract on the L2.

Furthermore, the outer contract can be in two different "states": Enabled and disabled. When disabled, the contract should have a different verification key, with only a single method: the method to enable it.

In total, we end up with three separate sets of contract methods / verification keys:

- Outer contract, when enabled:
  - `Rule_bridge_finalize_withdrawal`
  - `Rule_bridge_finalize_cancelled_deposit`
  - `Rule_bridge_disable`
- Outer contract, when disabled:
  - `Rule_bridge_enable`
- Inner contract:
  - `Rule_bridge_finalize_deposit`
  - A currently missing rule to receive tokens, see [finding](#finding-medium-01-missing-receive-rule-inner-bridge)

In the current `bridge_rules.ml` implementation, instead of compiling three separate rule sets, all rules are mixed together in a single verification key.

**Impact.** When deploying the mixed rule set both to the L1 and the L2, contract interactions outside the actual protocol design would be available to users.

In particular, a "disabled" outer contract with all the other rules still present would not actually be disabled.

**Recommendation.** Split up compilation into three different rule sets.

**Client response.** The issue was fixed in commit [c291d4d](https://github.com/zeko-labs/zeko/pull/287/commits/c291d4d37e7fcf0008670f73a5de135c0e93e43d).

<!--
DUMP FROM CHAT:

Las, [15.01.25 20:04]
The spec tries to clarify which is which but:
Deposit rule on L2
Others on L1

Gregor Mitscha-Baude, [15.01.25 20:10]
So the other rules are still deployed and available on the L2 but simply not used?

Las, [15.01.25 20:29]
I’m just now realizing that we’re compiling it incorrectly

Las, [15.01.25 20:29]
Yeah that’s not supposed to be the case

Las, [15.01.25 20:29]
Should be compiled twice/thrice
 -->

### Hard-Coded Constant Used As Decrement Value

- **Severity**: Low
- **Location**: zeko_transaction_snark.ml

**Description**.
The function `sub_then_dec` performs subtraction $w = x - y - dec$ where $x$, $y$, and $w$ are big integers represented as three limbs and $dec$ is a small constant value.

But in the implementation, to compute the witness of $w$, a hard-coded constant value of $1$ is used instead of deriving it from the `dec` parameter, as seen in the following snippet.

```ocaml
let* (w0, w1), w2 =
    exists
        Typ.(F.typ * F.typ * F.typ)
        ~compute:
        (let- z0 in
            let- z1 in
            let- z2 in
            Zeko_as_prover.sub ~x0:z0 ~x1:z1 ~x2:z2 ~y0:Field.one
            ~y1:Field.zero ~y2:Field.zero
            |> As_prover.return )
    in
```

**Impact**.
The circuit is incomplete if the value of $dec$ is not equal to one. In practice, an issue might not occur because in the audited code base, the function is always used with $dec = 1$.

**Recommendation**.
Update the parameters of `(y0, y1, y2)` to use the value from the `dec` for computing the witness of `(w0, w1, w2)`.

### Configuration Error In Custom Token Bridge Contract

- **Severity**: Informational
- **Location**: bridge_rules.ml

The bridge contracts are designed to work for bridging either MINA or custom tokens. However, due to differences in how the zkapp protocol handles those kinds of tokens, the logic sometimes has to disambiguate between the two cases.

This is why the bridge rules depend on abstract configuration modules called `Deposit_params` and `Withdrawal_params` which have two instantiations, one for MINA and one for custom tokens: `Deposit_params_base` vs `Deposit_params_custom`, and same for withdrawals.

In `bridge_rules.ml`, two different functors called `Make_mina` and `Make_custom` are defined for the two cases. However, both functors accidentally use `Deposit_params_base` and `Withdrawal_params_base`, i.e. the native MINA configuration.

The Zeko team was already aware of this bug when we notified them.

<!--
DUMP FROM CHAT:

Gregor Mitscha-Baude, [17.01.25 15:54]
Minor thing, in Bridge_rules.Make_custom you use Deposit_params_base and Withdrawal_params_base instead of the versions for custom tokens

Las, [17.01.25 16:10]
Yes I’ve noticed.

Las, [17.01.25 16:11]
That circuit doesn’t actually compile on the compatible branch because of that
-->

### Naming

- **Severity**: Informational
- **Location**: TODO

**Description**

rule_action_witness.ml, rule_inner_action_witness.ml, and rule_pause.ml

all have 

```ocaml
branch_name = "zeko action witness"
```

I'm not sure where this is used, but it seems like it's used as `pickles.inductive_rules.identifier`?

there's another identifier that's derived from that, but they seem to include a hash of the CS as well

### Can't unpause

- **Severity**: Informational
- **Location**: TODO

<!-- 
gregor: I think there's no finding to be made of this, since they clarified that unpausing is intended to be done with a signed transaction, and therefore doesn't need a contract method to exist.
 -->

**Description**

it seems like we can call rule_pause to pause the outer zkapp, but we can't unpause it?

---

This report was published on the [zkSecurity Audit Reports](https://reports.zksecurity.xyz) site by [ZK Security](https://www.zksecurity.xyz), a leading security firm specialized in zero-knowledge proofs, MPC, FHE, and advanced cryptography. For the full list of audit reports, see [llms.txt](https://reports.zksecurity.xyz/llms.txt).
