# Audit of Aleo Upgradability Update

- **Client**: Aleo
- **Date**: June 14th, 2025
- **Tags**: Protocols, Upgrades

## Introduction

Between June 1st and June 14th, zkSecurity was tasked by Provable to audit the upcoming "Program Upgradability" update for Aleo.
Two consultants worked over two weeks to review the codebase for potential security issues and provide design feedback and recommendations.

Prior to the engagement, the team spent two weeks getting acquainted with the SnarkVM codebase and the Aleo ecosystem. In the last several days of the audit, we also verified the fixes for all findings reported in this assessment.

## Aleo Program Upgradability

Prior to this upgrade all Aleo programs were *immutable*:
a program was deployed once and could not be modified or upgraded hereafter.
With the upcoming "Program Upgradability" update, programs may now change after initial deployment.
This has interesting implications throughout the system which may have relied on the immutability of programs up until this point,
we explore these and note a number of security implications/considerations with the proposed design and implementation (prior to the release).

### Glossary

For the reader's convenience, we include a brief glossary of central terms used within the Aleo SnarkVM:

- **Program** : Collection of functions, mappings, records, closures. Called a "contract" in other systems.
- **Program ID** : Unique program identifier, composed of a name (e.g. `example.aleo`) and a network identifier.
- **Transition** : Call to a single function in a program.
- **Execution** : Sequence of *transitions*, for the root call and any internal calls.
- **Deployment** : Deployment of a new program or (after this update) an upgrade of an existing program.
- **Transaction** : Execution or a deployment.
- **Constructor** (new) : Function run during deployment; restricts upgradability of the program.

### Constructors

This upgrade introduces the ability to upgrade programs on-chain, by redeploying them.
Every time a program is upgraded, the `edition` of the program must increment by one, prior to the update,
the edition was not exposed to the snarkVM and internally fixed to zero.
When and under which conditions a program can be upgraded is controlled by a method added to all newly deployed programs called "the constructor".

For instance, the following constructor disallows any upgrades, by requiring that the `edition` of the *new program* to be zero:

```text
program example.aleo;

constructor:
  assert.eq example.aleo/edition 0u16;
```

Note that the constructor is also run during the initial deployment of the program,
and that the constructor above can only be satisfied during the initial deployment.
As a result, note that e.g.

```text
program example.aleo;

constructor:
  assert.eq false true;
```

Is an undeployable program.

The following constructor requires that program and any upgrades are deployed by a specific address:

```text
program example.aleo;

constructor:
  assert.eq example.aleo/program_owner <ADDRESS>;
```

Constructors have access to the mappings of a program and hence the rules
for upgrading a program can be controlled dynamically by manipulating the mappings using the other functions in the program.
However, the constructor itself is immutable and cannot be modified or upgraded.
All legacy programs, which do not have a constructor are immutable.

### Permissible Upgrades

Upgrades are only allowed to expand or leave unchanged the interface of a program,
e.g. by only *adding* new functions or new mappings -- which can be read externally.
This is important to avoid breaking any dependent program which call methods of the upgraded program: such programs would not "type" after the upgrade,
referencing e.g. functions which no longer exist.
Note however that functions can be "functionally" deleted by making them trivially unsatisfiable,
and may otherwise change their behavior in arbitrary ways and thus there is no
guarantee that the dependent program will remain satisfiable after the upgrade.

### Program Owner

The program owner is the address which deployed (the latest edition of) a program.
Depending on the constructor logic, this party may have special privileges and
the program owner is used in the constructor to identify the party deploying the upgrade,
allowing the constructor to check if the party is eligible to deploy the program.
Cryptographically, the program owner is bound to the deployment by signing the "deployment id"
which is meant to uniquely identify the program being deployed in the transition.

## Findings

### Unstable Program Load Order Can Stall Node Bootup

- **Severity**: High
- **Location**: snarkVM

During node bootup, programs are loaded in order of block height. However, within a single block, the load order of multiple programs is not stable. This instability can cause loading failures and stall node bootup.

```rust
/// Initializes the VM from storage.
#[inline]
pub fn from(store: ConsensusStore<N, C>) -> Result<Self> {
    [...]
    // Retrieve the list of deployment transaction IDs and their associated block heights.
    let deployment_ids = transaction_store.deployment_transaction_ids().collect::<Vec<_>>();
    let mut deployment_ids = cfg_into_iter!(deployment_ids)
        .map(|transaction_id| {
            // Retrieve the height.
            let height =
                match block_store.find_block_hash(&transaction_id)?.map(|hash| block_store.get_block_height(&hash))
                {
                    Some(Ok(Some(height))) => height,
                    _ => {
                        bail!("Block height for deployment transaction '{transaction_id}' is not found in storage.")
                    }
                };
            Ok((transaction_id, height))
        })
        .collect::<Result<Vec<_>>>()?;
    // Sort the deployment transaction IDs by their block heights.
    deployment_ids.sort_unstable_by(|(_, a), (_, b)| a.cmp(b));

    // Load the deployments in order of their block heights.
    const PARALLELIZATION_FACTOR: usize = 256;
    for (i, chunk) in deployment_ids.chunks(PARALLELIZATION_FACTOR).enumerate() {
        // Load the deployments.
        let deployments = cfg_iter!(chunk)
            .map(|(transaction_id, _)| {
                // Retrieve the deployment from the transaction ID.
                match transaction_store.get_deployment(transaction_id)? {
                    Some(deployment) => Ok(deployment),
                    None => bail!("Deployment transaction '{transaction_id}' is not found in storage."),
                }
            })
            .collect::<Result<Vec<_>>>()?;
        // Add the deployments to the process.
        // Note: This iterator must be serial, to ensure deployments are loaded in the order of their dependencies.
        deployments.iter().try_for_each(|deployment| process.load_deployment(deployment))?;
    }
    [...]
}
```

SnarkVM enforces restrictions on `finalize_cost` and `number_of_calls` for programs, which are checked during program initialization. If Program B imports and calls Program A, an upgrade to Program A may cause Program B's functions to exceed these restrictions. This is not an issue during deployment execution, since Program B is not re-checked after Program A is upgraded. However, during node bootup, every program is re-checked, and because the program load order within a block is not stable, Program B may be loaded after Program A's upgrade. This can trigger a restriction check failure and prevent the node from booting.

Example sequence:

1. Block 1: Deploy Program A.
2. Block 2: Deploy Program B (which imports A) and upgrade Program A, increasing its calls or finalize instructions.
3. During node bootup, when loading programs in Block 2, if Program B is loaded after Program A's upgrade, the restriction check fails and node bootup is stalled.

#### Recommendation

It is recommended to load programs in the order of their transaction index within each block during node bootup to ensure a stable and deterministic load order.

#### Client Response

The client fixed this by sorting the deployment transaction according to the block height and transaction index.

```rust
let mut deployment_ids = cfg_into_iter!(deployment_ids)
        .map(|transaction_id| {
            // Retrieve the block hash for the deployment transaction ID.
            let Some(hash) = block_store.find_block_hash(&transaction_id)? else {
                bail!("Deployment transaction '{transaction_id}' is not found in storage.")
            };
            // Retrieve the height.
            let Some(height) = block_store.get_block_height(&hash)? else {
                bail!("Block height for deployment transaction '{transaction_id}' is not found in storage.")
            };
            // Get the corresponding block's transactions.
            let Some(transactions) = block_store.get_block_transactions(&hash)? else {
                bail!("Transactions for deployment transaction '{hash}' is not found in storage.")
            };
            // Find the index of the deployment transaction ID in the block's transactions.
            let Some(index) = transactions.transactions().get_index_of(transaction_id.deref()) else {
                bail!("Transaction for deployment transaction '{transaction_id}' is not found in storage.")
            };
            Ok((transaction_id, (height, index)))
        })
        .collect::<Result<Vec<_>>>()?;
```

### Edition of a Deployment is Malleable

- **Severity**: High
- **Location**: snarkVM

When a program is deployed, the deployment structure contains an `edition` field that tracks the version number:

```rust
Deployment {
    edition: 0,
    program: PROGRAM_A,
    verification_keys: VKS_A,
    program_checksum: CHECKSUM_A,
}
```

The program owner signs this deployment, and subsequent upgrades increment the edition number.
However, since the edition field is not included in the signature, an attacker can:

1. Take an old deployment with its valid signature.
2. Modify the edition number to be higher than the current version.
3. Redeploy the old program version with the manipulated edition number.

#### Attack

Consider this sequence of events:

1. Initial deployment (edition 0) with `PROGRAM_A` - signed by the program owner
2. Upgrade deployment (edition 1) with `PROGRAM_B` - signed by the program owner
3. Attacker takes the old deployment, changes edition to 2, and redeploys `PROGRAM_A`

This works, because the edition field is not included in the signature,
and results in a potentially unauthorized (as defined by the constructor) rollback to older version.
Note that the rollback must satisfy the conditions in `check_upgrade_is_valid` which means that the old version of the program `PROGRAM_A`
must have the *same* interface as the new version `PROGRAM_B`; for instance, `PROGRAM_B` might be an updated version of `PROGRAM_A` which includes a bug fix, but otherwise has the same functionality.
In the case where a program is *only deployed once*, an attacker can still cause a denial of service by redeploying the program with the maximum edition number `u16::MAX`, this makes the program unupgradable regardless of the conditions in the constructor.

#### Recommendation

Include the edition field in the program owner's signature (or add it to the deployment id).

We recommend making the deployment id dependent on the contents of the *whole* deployment to avoid *any possible* mallability issues.

#### Client Response

The fix implemented by Provable changes the computation of the `deployment_tree` (of which the `deployment_id` is the root) into:

```rust
pub fn deployment_tree_v2(deployment: &Deployment<N>) -> Result<DeploymentTree<N>> {
    // Ensure the number of leaves is within the Merkle tree size.
    Self::check_deployment_size(deployment)?;

    // Compute a hash of the deployment bytes.
    let deployment_hash = N::hash_sha3_256(&to_bits_le!(deployment.to_bytes_le()?))?;

    // Prepare the header for the hash.
    let header = to_bits_le![deployment.version()? as u8, deployment_hash];

    // Prepare the leaves.
    let leaves = deployment.program().functions().values().enumerate().map(|(index, function)| {
        // Construct the transaction leaf.
        Ok(TransactionLeaf::new_deployment(
            u16::try_from(index)?,
            N::hash_bhp1024(&to_bits_le![header, function.to_bytes_le()?])?,
        )
        .to_bits_le())

    });

    // Compute the deployment tree.
    N::merkle_tree_bhp::<TRANSACTION_DEPTH>(&leaves.collect::<Result<Vec<_>>>()?)
}
```

Meaning every leaf in the tree (function), is bound to the hash `deployment_hash` of the entire deployment,
which includes the `edition` (and the verification keys as well).
This means that the owner signature is computed over the *whole* deployment as recommended.

### `Operand::Edition` and `Operand::Checksum` Can Return Stale Values in the Function Scope

- **Severity**: Medium
- **Location**: snarkVM

The new operands `Operand::Edition` and `Operand::Checksum` are designed to retrieve the edition and checksum of a given program. Currently, they are valid in both the function scope (off-chain execution) and the finalize scope (on-chain execution). Since the edition and checksum of a program can change after an upgrade, these operands are expected to always provide the latest values. However, in the function scope, they are assigned as constants in the circuit:

```rust
match operand {
    // If the operand is the checksum, retrieve the checksum from the stack.
    Operand::Checksum(program_id) => {
        let checksum = match program_id {
            Some(program_id) => *self.get_external_stack(program_id)?.program_checksum(),
            None => *self.program_checksum(),
        };
        Ok(circuit::Value::Plaintext(circuit::Plaintext::from(checksum.map(circuit::U8::constant))))
    }
    // If the operand is the edition, retrieve the edition from the stack.
    Operand::Edition(program_id) => {
        let edition = match program_id {
            Some(program_id) => *self.get_external_stack(program_id)?.program_edition(),
            None => *self.program_edition(),
        };
        Ok(circuit::Value::Plaintext(circuit::Plaintext::from(circuit::Literal::U16(
            circuit::U16::new(circuit::Mode::Constant, edition),
        ))))
    }
}
```

The verifying key of the circuit is fixed at program deployment. As a result, in the function scope, the edition and checksum values are also fixed. Even if another program is upgraded, these values remain unchanged, so the operands may return stale information. For example:

1. Deploy program `foo.aleo`, which retrieves the edition of `bar.aleo` using the `Operand::Edition` operand. The current edition of `bar.aleo` is `0`.
2. Upgrade `bar.aleo`, increasing its edition to `1`.
3. Call `foo.aleo` to get the edition of `bar.aleo`. It still returns `0`, which is now outdated.

#### Recommendation

It is recommended to disallow the use of `Operand::Edition` and `Operand::Checksum` in the function scope to prevent returning stale values.

#### Client Response

The client opted to remove both the `Operand::Edition` and `Operand::Checksum` from the set of allowed operands for in-circuit Aleo instructions.
They remain accessible from "finalize", which can also be used to access them from within the circuit, should the user wish to:
by witnessing these values in circuit and returning them to the finalize, which then ensures that the values exported from the function agree with `Operand::Edition` and `Operand::Checksum` as obtained in finalize.

### No explicit binding between requests and programs

- **Severity**: Medium
- **Location**: snarkVM

Aleo allows the delegation of SNARK computation to a third-party by creating
"requests" which is subsequently proved by a third-party.
These are signatures on the inputs/outputs of every function called ("transition") during the execution of the transaction.
Prior to the upgradability update, all programs were immutable meaning the transitions a request are guaranteed
to lead to the execution of a specific and static set of instructions.
With the upgradability update, programs can now be upgraded,
since a request only signs the inputs/outputs of the transitions it calls and not the program itself,
this means that there is no binding between a request and the current version of a program being invoked,
or any of its dependencies.

This means that semantics of a request can change between the point of creation (when the user signs the request)
and the point of execution (when the request is proved).
This is most obvious when the callgraph remains the same but the functions in the callgraph are upgraded,
however, because there is no explicit binding between a parent transition (with `is_root = True`)
and its child transitions (from functions invoked by the parent transition),
a malicious prover could theoretically "stitch together" a request which proves the execution of a newer version of the root program,
e.g. the root program contains a function of the form:

```asm
import foo.aleo;

program bar.aleo;

function root:
  ...
  call foo.aleo/sub r0 into r1;
  ...
```

With one call from `bar.aleo/root` to `foo.aleo/sub`.
The user creates two transactions, calling `bar.aleo/root`, this includes two transactions invoking `foo.aleo/sub`.
The `foo.aleo` program is now upgraded to a newer version:

```asm
import foo.aleo;

program bar.aleo;

function root:
  ...
  call foo.aleo/sub r0 into r1;
  call foo.aleo/sub r2 into r3;
  ...
```

With two calls from `bar.aleo/root` to `foo.aleo/sub`.
Note that this upgrade is allowed as the interfaces of both `foo.aleo` and `bar.aleo` remain unchanged.
However, the malicious prover can still "stitch together" a request which proves the execution of a newer version of the root program
assuming the two original calls to `foo.aleo/sub` have the same inputs as the calls in the new version of `foo.aleo/root`.

#### Recommendation

A request should be bound to the (versions of) all programs in the callgraph:
by including a hash of all the checksums into the signature, ensuring that if any of the programs involved in the transaction changes, the request becomes invalid.
Additionally, the monotonically increasing "edition" of all the programs should be included in the hash as well,
such that an *invalid request* cannot become valid again at a later time due to a program rolling back to an older version.
Observe that since the request is verified in-circuit,
this requires feeding the hash as public input to the circuit.
For security, this hash is only required to be included for the root transition.

We believe that this is the most straightforward semantics for the user to reason about.

#### Client Response

The chosen mitigation is different from the suggested mitigation above, but successfully mitigates the issue as well.
The signature of every request is computed over a message which includes:

- The checksum of the program to which the called function belongs.
- The `root_tvk` (the root transition view key).

The checksum is exposed from the SNARK as public input directly,
where as the `root_tvk` is exposed from the SNARK as public input *indirectly* via the `scm` which is a commitment to the `root_tvk`:

```rust
let root_tvk = root_tvk.unwrap_or(tvk);
let scm = N::hash_psd2(&[signer.deref().to_x_coordinate(), root_tvk])?;
```

This means that the `root_tvk` acts as a per-authorization (consisting of multiple transitions/requests) nonce.
This serves to bind each request to a unique authorization, this prevents "cut-and-paste" attacks:
where a malicious prover constructs a new valid authorization from requests in multiple different authorizations.
Observe, that this alone does not prevent "cut" attacks, where a malicious prover might try to simply remove requests from an authorization.

The overall security argument is then fairly straightforward:

- The checksum of a request uniquely identifies the program and its version.
- Since the Aleo VM is deterministic, the set of child calls is uniquely determined by:
  - The checksum of the parent
  - The arguments to the parent

Applying this observation inductively over the callgraph from the root,
we conclude that the execution is uniquely identified by the set of checksums.
Finally, the set of checksums cannot be mauled across different authorizations because the checksum
of each request is bound to the authorization via a unique nonce, the `root_tvk`,
which separates the domain of signatures across different authorizations.
Since the set of calls is uniquely determined,
removing any request from an honestly produced authorization,
which will have exactly one signed request per call in the callgraph, as determined by the arguments to the root function and the set of checksums,
would result in a call with a missing request.

### Program Upgrade and Constructor Execution In Deployment Transaction Are Not Atomic

- **Severity**: Medium
- **Location**: synthesizer/src/vm/finalize.rs

During a program upgrade, snarkVM first executes the program constructor and then replaces the old program code (`Stack`) with the new code. Ideally, the replacement of `Stack` should happen immediately after constructor execution to ensure atomicity. However, currently, these steps are performed separately: the program replacement occurs at the end of the entire block execution. As a result, other transactions can be executed in between. This lack of atomicity allows these transactions to read the updated map (modified by the constructor) but still see the old edition/checksum of the program.

For example:

1. Program `foo.aleo` imports `bar.aleo` and reads its edition and map.
2. The upgrade transaction for `bar.aleo` is executed; the constructor runs and updates the map.
3. A transaction for `foo.aleo` is executed, which reads the updated map but the old edition of `bar.aleo` (since the program `Stack` has not been replaced yet).
4. The old program `bar.aleo` is then replaced with the new version.

#### Recommendation

It is recommended to replace the program `Stack` immediately after executing the constructor to ensure atomicity.

#### Client Response

The client fixed this by replacing the program `Stack` immediately after executing the constructor for the deployment transaction.

### Non-Deterministic Constructor Execution

- **Severity**: Low
- **Location**: synthesizer/process/src/finalize.rs

For deployments, in the snarkVM's constructor finalization process the transition ID from the fee transition is used to seed the ChaCha random number generator. Since there is no cryptographic binding between the deployment transition (in particular, the `program_owner` signature)
and the fee transition paying for the deployment,
an attacker can manipulate the "randomness" produced by `chacha.rand` inside the constructor finalize
by creating a new transaction with a different fee transitions while reusing the same deployment.

During program deployment, when a constructor exists, the system executes:

```rust
if deployment.program().contains_constructor() {
    let operations = finalize_constructor(state, store, &stack, *fee.transition_id())?;
    finalize_operations.extend(operations);
    lap!(timer, "Execute the constructor");
}
```

The `fee.transition_id()` is passed to the constructor finalization process and subsequently used in the ChaCha random number generator's seed computation. The seed preimage includes the transition ID as a key component:

```rust
let preimage = if (ConsensusVersion::V1..=ConsensusVersion::V2).contains(&consensus_version) {
    to_bits_le![
        registers.state().random_seed(),
        **registers.transition_id(),  // This comes from fee.transition_id()
        stack.program_id(),
        registers.function_name(),
        self.destination.locator(),
        self.destination_type.type_id(),
        seeds
    ]
} else {
    // Similar structure with additional nonce field
}
```

#### Attack

An attacker can exploit this vulnerability through the following steps:

1. **Create Initial Deployment**: The attacker creates a legitimate program deployment transaction with a constructor that uses `rand.chacha` operations.
2. **Extract Deployment Transition**: The attacker extracts the deployment transition from the original transaction, leaving it completely unchanged.
3. **Generate New Fee Transitions**: The attacker creates multiple new transactions, each containing:
   - The same unchanged deployment transition
   - A different fee transition with a different transition ID
4. **Grind ChaCha.Rand Outputs**: By controlling the fee transition ID, the attacker can influence the ChaCha seed and potentially affect the execution of the constructor.

#### Recommendation

The constructor execution should be deterministic based upon:

- The new deployment.
- The current program state.

To achieve this, the constructor should use a deterministic seed that cannot be manipulated by changing fee transitions.

Two obvious solutions exist:

1. **Use Deployment ID**: Replace `fee.transition_id()` with the deployment ID instead.
1. **Use Constant Seed**: Use a constant transition ID for all deployment finalizations.

#### Client Response

The client decided to seed `chacha.rand` using the default transition ID.

---

This report was published on the [zkSecurity Audit Reports](https://reports.zksecurity.xyz) site by [ZK Security](https://www.zksecurity.xyz), a leading security firm specialized in zero-knowledge proofs, MPC, FHE, and advanced cryptography. For the full list of audit reports, see [llms.txt](https://reports.zksecurity.xyz/llms.txt).
