Description. Aleo’s implementation of Bullshark includes a new feature which allows the validator set to update itself on every new block created. Where a block is an object containing a DAG of batches of transactions that are being committed, and potentially including several anchor batches (as explained in Commit Flow Can Lead To Safety Violation”).
This dynamic committee feature is not included in the Bullshark or Narwhal papers, and is not specified by Aleo, which makes it hard to understand if the protocol is safely designed and implemented. Potential issues could exist where the liveness of the network is impacted by nodes being stuck by not being able to understand what the current set of validators is, or worse safety issues could exist where the dynamic change of the quorum threshold required to commit leads to forks.
Currently, the reconfiguration works by allowing a change of committee at every single block. This committee of validators is dictated by the execution of smart contracts triggered by the transactions of a committed block. A different committee means that different validators, with different stakes, will now be part of advancing the consensus protocol. We summarize how reconfiguration affects the protocol in the following diagram:

Reconfiguring a validator set is arguably one of the most tricky features to get right. In the rest of this section we give a number of examples of unexpected or surprising behavior that comes from the addition of a dynamic committee.
Handling outdated quorum size. As the set of validators as well as their stake can change dynamically, and validators might have missed commits, the value of (resp. ) to commit (resp. certify) vertices might appear too low or too high to a validator.
This can result in a scenario where a validator could wrongly commit to a batch: a validator could believe that a certified batch is both an anchor and has received enough certified votes due to being on an outdated committee.
While this issue seems addressable by fixing Commit Flow Can Lead To Safety Violation”, it still can lead to previous non-committed batches being committed. For example, if the previous scenario leads to triggering a commit, which leads to traversing a DAG to a previous anchor, that anchor will be committed (due to being in the path) even if it should not have been committed.
Handling new validators. Could a node be stuck because they can’t get enough messages to advance in the protocol, due to the new validators not being able to communicate with the lagging node? As can be seen in process_batch_certificate_from_peer, new validators messages are discarded:
async fn process_batch_certificate_from_peer(
&self,
peer_ip: SocketAddr,
certificate: BatchCertificate<N>,
) -> Result<()> {
// TRUNCATED...
if !self.gateway.is_authorized_validator_address(author) {
// Proceed to disconnect the validator.
self.gateway.disconnect(peer_ip);
bail!("Malicious peer - Received a batch certificate from a non-committee member ({author})");
where is_authorized_validator_address can only consider the previous and current committee:
/// Returns `true` if the given address is an authorized validator.
pub fn is_authorized_validator_address(&self, validator_address: Address<N>) -> bool {
// Determine if the validator address is a member of the previous or current committee.
// We allow leniency in this validation check in order to accommodate these two scenarios:
// 1. New validators should be able to connect immediately once bonded as a committee member.
// 2. Existing validators must remain connected until they are no longer bonded as a committee member.
// (i.e. meaning they must stay online until the next block has been produced)
self.ledger
.get_previous_committee_for_round(self.ledger.latest_round())
.map_or(false, |committee| committee.is_committee_member(validator_address))
|| self
.ledger
.current_committee()
.map_or(false, |committee| committee.is_committee_member(validator_address))
}
Handling timers. Timeout-related logic designed in non-dynamic settings might not be working as intended in dynamic settings. This might affect the liveness of the protocol. For example, lagging validators will wait and attempt to include the wrong certificates in their edges in odd rounds:
// Compute the stake for the leader certificate.
let (stake_with_leader, stake_without_leader) =
self.compute_stake_for_leader_certificate(leader_certificate_id, current_certificates, &previous_committee);
// Return 'true' if any of the following conditions hold:
stake_with_leader >= previous_committee.availability_threshold()
|| stake_without_leader >= previous_committee.quorum_threshold()
|| self.is_timer_expired()
as well as in even rounds:
// Determine the leader of the current round.
let leader = match previous_committee.get_leader(current_round) {
Ok(leader) => leader,
Err(e) => {
error!("BFT failed to compute the leader for the even round {current_round} - {e}");
return false;
}
};
// Find and set the leader certificate, if the leader was present in the current even round.
let leader_certificate = current_certificates.iter().find(|certificate| certificate.author() == leader);
*self.leader_certificate.write() = leader_certificate.cloned();
self.is_even_round_ready_for_next_round(current_certificates, previous_committee, current_round)
Nodes will also perform checks on incorrect thresholds:
if self.is_timer_expired() {
debug!("BFT (timer expired) - Checking for quorum threshold (without the leader)");
// Retrieve the certificate authors.
let authors = certificates.into_iter().map(|c| c.author()).collect();
// Determine if the quorum threshold is reached.
return committee.is_quorum_threshold_reached(&authors);
}
Handling new connections. In the layers below, connecting to new validators (or ignoring old ones) might lead to unexpected behavior. For example, in Primary::propose_batch, the primary will not propose a batch if they are not connected to enough validators (according to their own view of the validator set):
// Check if the primary is connected to enough validators to reach quorum threshold.
{
// Retrieve the committee to check against.
let committee = self.ledger.get_previous_committee_for_round(round)?;
// Retrieve the connected validator addresses.
let mut connected_validators = self.gateway.connected_addresses();
// Append the primary to the set.
connected_validators.insert(self.gateway.account().address());
// If quorum threshold is not reached, return early.
if !committee.is_quorum_threshold_reached(&connected_validators) {
debug!(
"Primary is safely skipping a batch proposal {}",
"(please connect to more validators)".dimmed()
);
trace!("Primary is connected to {} validators", connected_validators.len() - 1);
return Ok(());
}
}
Recommendations. This finding is a tricky one as it appears that fixing the dynamic committee feature is not trivial. Ensuring that safety is correctly guarded throughout validator set changes, especially as validators might have an outdated view of the committee while advancing through newer rounds of the protocol, is not easy.
We recommend specifying a solution and writing up a proof that it is safe. Note also that recently Sui Lutris came out with a description of committee reconfigurations in a similar protocol.
Client Response. Aleo fixed the finding by relying on a previous committee from 100 rounds in the past. This fix relies on the assumption that the network will not remain asynchronous for more than 100 rounds. In addition, Aleo added a “committee ID” field to the batch headers to ensure that validators don’t process messages with incompatible committee IDs, allowing the protocol to give up on liveness instead of safety in rare cases where the 100 rounds lookback is not enough.