Introduction
On October 20th, 2025, Celestia tasked zkSecurity with auditing its high-performance data availability codec, RSEMA1D (Reed-Solomon Evans-Mohnblatt-Angeris 1D). The specific code to review was shared via a private GitHub repository. The audit lasted one week with two consultants.
The RSEMA1D codec is based on the ZODA paper, specifically implementing the Hadamard variant from Appendix D with a single column, which served as the primary theoretical foundation for this audit. Additionally, the Celestia team shared a specification as well as an internal Notion document that provided a high-level overview of the codec.
Scope
The scope of this audit covered the entire RSEMA1D repository at commit 65cc9d4c3887936393763d32e0cf4e665d9d2705. All files and components present in that commit were considered in-scope. Dependencies outside this repository (in particular, Celestia’s Leopard Reed–Solomon codec) were not formally in scope, although we occasionally referred to them for context.
Strategic Recommendations
In addition to the findings discussed later in this report, we have the following strategic recommendations:
Audit of underlying Reed-Solomon implementation. This report assumed that Celestia’s Leopard Reed-Solomon GF16 codec was safely implemented, particularly the core functions New, Encode, Reconstruct, and GF16Mul. While we reviewed the RSEMA1D layer built on top of this codec, the correctness and security of the entire system fundamentally depends on the underlying Reed-Solomon implementation. We recommend a dedicated audit of the Leopard library, unless already done, to ensure its cryptographic primitives are sound and properly implemented.
Review of system integration. The scope of this audit was limited to the RSEMA1D codec library itself. However, the security of a data availability system depends not only on the codec’s correctness but also on how it is integrated and used throughout the broader Celestia stack. We recommend an audit of the integration layer to verify that the codec is being invoked correctly, that data flows between components are secure, and that the overall system maintains the intended security properties when all components interact together.
Overview
In this section we give an overview of data availability, the original ZODA construction, and some detail of the construction implemented by Celestia.
What is Data Availability?
Data availability (DA) refers to the guarantee that a blob of data (e.g. all transaction data in a block of a rollup protocol) has been published and is accessible to a network. Without DA, if even a small portion of the data were to be withheld, honest nodes wouldn’t be able to recover the full original data. In protocols like rollups where the production of block data is being done off-chain, not having a decentralized way to ensure that the rollup data is available (for a certain period of time) could prevent users from verifying and trusting the protocol, or leave them stuck and unable to recover from a protocol halting by being unable to download the necessary data to reconstruct the latest state. In traditional blockchains, full nodes trivially ensure data availability by downloading 100% of the block data themselves. However, this approach does not scale well: requiring every node to fetch every block in its entirety quickly becomes a throughput bottleneck.
The modern alternative to downloading every block is known as Data Availability Sampling (DAS). In a nutshell, DAS lets nodes probabilistically verify that a block’s full data is available by downloading random samples of it rather than the entire block. If enough nodes have performed enough sampling, we are convinced that the data is available and recoverable if needed, even if some pieces of it go missing.
This probabilistic verification becomes possible through a specific way of coding the block’s data. Before a block is broadcast, its data is processed with erasure coding to introduce redundancy. Specifically, many modern DA layers use a one or two-dimensional Reed-Solomon encoding approach: the block’s data is arranged in a matrix and then extended with parity chunks for each row/column. This means the block is effectively duplicated with error-correction data in horizontal/vertical directions. For this extended data matrix, even if some parts are lossy, any chunk can be recovered from the redundant pieces. In other words, as long as a sufficient subset of pieces is available, the entire original block can be reconstructed.
DA schemes typically include commitment and proof systems for the encoded data. The block producer publishes a commitment to the entire extended matrix (for example, a KZG or Merkle commitment). Then, for any chunk a node wants to sample, the producer (or any full node) can provide a short proof that this chunk is part of the committed block. This proof allows the node to verify the integrity of that chunk without having to download the whole block.
In short, DA protocols are typically a combination of redundant encoding and succinct proofs, which allow nodes to ask for just a few random pieces of data to contribute to a blob’s availability as if they had downloaded the whole blob.
The ZODA Protocol
ZODA, or Zero-Overhead Data Availability, is a trust-minimized and highly-optimized protocol for providing data availability. Its main contribution is the use of the Ligero check to produce a secure DA scheme based on hash functions.
Note that rsema1d, the protocol implemented by Celestia, is not a direct implementation of the full 2D ZODA protocol, but rather corresponds to the Hadamard variant described in Appendix D of the ZODA paper, simplified to use a single column. This means Celestia only encodes vertically (along columns) rather than applying a full 2D tensor code. We still provide an overview of the full ZODA protocol here in order to introduce the necessary context.
Let’s first consider how data is typically encoded in DA protocols. Typically in DA protocols, an encoder encodes a data expressed as a matrix over both the rows and the columns. This is often referred to as a tensor code, as it is equivalent to applying the tensor product of two separate codes. As an example, a tensor encoding of some data matrix would be as follows:
Where and are the generator matrices for the two respective codes.
Here, ZODA applies a random factor to part of the encoding. Specifically, the final tensor encoding looks like the following:
Where is a diagonal matrix composed of random values . The resulting matrix is an matrix.
(Note that there are multiple variants to how the random factor can be applied in the Appendix of the ZODA paper, but the core idea is the same.)
The protocol is essentially split in three algorithms: encoding, sampling, decoding, which we introduce in more detail below.
Encoding. To encode the original data matrix the encoder follows these steps:
- Compute an encoding of the columns and commit to its rows:

-
Receive a uniformly sampled vector and construct a diagonal matrix using uniformly sampled random values.
-
Multiply the columns of the data matrix by the random factor and compute the encoding of the resulting rows and commit to its columns:

- Compute the complete encoding and commit to it:

Sampling. Other nodes can now “sample” by retrieving random rows and columns from the committed encoding:
-
Sample a set of rows of from the encoder, written as .
-
Sample a set of columns of from the encoder, written as . Denote each column as for every .
-
Check for every and :

- Verify that for each and .
The 3rd step is the main check that ensures that the rows of are the correct encodings of the columns of an original matrix . This is similar to the check done in the Ligero paper, which checks that many vectors are codewords using the fact that if a random linear combination of multiple vectors is also a codeword, then they must be codewords as well with high probability.
Similarly, to check that each vector in (i.e. each column of ) is a codeword, we can check whether a random linear combination of all columns in is a codeword or not. Taking a single row in as , we can see that results in a scaled vector . Then, we compute an inner product sum of this scaled vector with the -th column of (i.e. ) to get a scalar value .
(Note that if is a generator matrix for a systematic code (i.e. ), the first columns of will be the identity matrix. This means that for any , will turn all but one -th column of to zero (i.e. ), thus failing to capture the random linear combination of all columns of , as is required for the Ligero check. Consequently, when using a systematic code for , we need to sample columns to avoid using the identity matrix.)
Since the RHS is matrix multiplied by a generator matrix , we can be certain that is a valid codeword. Therefore, we can compute the -th column of and compare its -th row value to .
Whether this check is sufficient to ensure that all rows of are valid encodings of is a probabilistic one that depends on the properties of the code. We direct the reader to Section 3.2 of the ZODA paper for a detailed discussion on this topic.
Decoding. The decoder can reconstruct the original data by first running the sampling check and then sample either enough rows or columns to decode the original data. More formally, the decoder can reconstruct the original data by performing either of the following options:
-
Sample rows and decode via an erasure decoding algorithm for .
-
Sample columns and decode via an erasure decoding algorithm for .
The Adversarial Reconstruction Property of ZODA. Another guarantee that ZODA provides is the ability to reconstruct the original data even under adversarial conditions. Specifically, when the decoder requests rows or columns from the encoder, the encoder may give the decoder rows and columns of their choosing. As long as the decoder has run the sampling check beforehand, however, ZODA guarantees that the decoder can still decode the original data. We direct the reader to Section 3.2.4 of the ZODA paper for a detailed discussion on this topic.
RSEMA1D Fields
In this section, we will give a brief overview of the fields used in the RSEMA1D protocol and the conversion between fields to the encoding format. The protocol uses two binary fields: and .
Leopard Interleaved Format. The Reed-Solomon encoding is computed over i.e. the generator matrix , the input data matrix and the encoding contain elements from . The Leopard codec library (which is used to compute the encoding) processes input data in an interleaved format as shown below:
The above diagram shows a single row containing bytes or symbols. The bytes indexed - contain low bytes and the bytes indexed - contain high bytes of the symbols. Thus a symbol from in the little-endian format is computed as:
This conversion is illustrated in the following diagram.
Extension Field. For computing the random linear combination, the random coefficients are sampled from the extension field . In the implementation, a single element from is represented as a vector of elements of .
Suppose we want to compute a random linear combination of the columns of the input data matrix which contains symbols in a single row, we sample a random vector containing symbols and the random linear combination is then computed as a matrix-vector product . If the number of rows of are then will be a vector of elements from . Let us represent the element at index in the vector by . We can represent each using elements from i.e.
where are elements from and are the basis of over . This is illustrated in the following figure:
This gives us the following two representations for :
- As a vector with elements from
- As a matrix with elements from
Encoding an extension field vector. Now we want to extend the random linear combination to compute the encoding . We can represent the vector as coefficients of a polynomial in i.e.
where are some basis polynomials. Let us further decompose using the decomposition of .
Rearranging the summations, we get:
where is a polynomial with coefficients from the th column of in its matrix representation as shown in the following figure.
The Reed-Solomon encoding of is just evaluations of the polynomial over an extended domain from . This is equivalent to encoding each of the columns and then interpreting each row of the encoding, which contains elements, as a single element from . The entire encoding process is illustrated in the following diagram.
In the above diagram, we assume extension by i.e. the number of rows of the encoding are . The columns of the encoding are denoted by which are an extension of the columns of the data . The elements of the encoding are symbols from and each row contains symbols which can be packed into a single element of using the same packing equation as follows:
Here, even though we have shown that we can compute from the symbols of a row, in the implementation we don’t compute since a symbol in is represented as a vector of symbols.
Note that before extending , we convert the rows of symbols into Leopard interleaved format which was described earlier. Similarly, after extending , we convert from Leopard interleaved format to rows of symbols. We omit this conversion from the diagram for simplicity.
RSEMA1D Codec
In this section we review the ZODA-inspired protocol implemented by Celestia.
Note that the 1D variant implemented by Celestia is essentially the Hadamard variant specified in Appendix D of the ZODA paper, but with a single column. This simplification means that instead of applying a full 2D tensor code (encoding both rows and columns), the protocol only encodes the columns vertically using Reed-Solomon encoding.
Encoding Algorithm. The encoding algorithm uses Reed-Solomon encoding over to vertically extend the input data matrix i.e. the generator matrix and the input data matrix contain elements from . The generator matrix is in systematic form i.e. , where is the identity matrix and is the matrix containing the parity check information.
We outline a simplified description of the protocol:
- Compute the encoding of the columns
- Commit to the encoding using a Merkle tree where each leaf of the Merkle tree is the hash of a row of the encoding and let represent the root of this tree
- Derive, using Fiat-Shamir, a uniformly random vector which contains elements from an extension field
- Compute the random linear combination of the columns
- Compute the encoding of the random linear combination as
- Commit to the random encoding using a Merkle tree and let represent the root of this tree
- Commit to both the roots as

Sampling Algorithm. To sample rows of the committed encoding and knowing , a node/verifier request Merkle proofs for a set of rows and verifies them in the following way:
- Recompute the extended RLC vector
rlcExtended(i.e. ) - Verify each Row opening using the known root commitment
rowRoot - Rederive the random vector using Fiat-Shamir
- Compute the random linear combination (using ) of each Row and check that they match the expected entry in
rlcExtended - Combine
rowRootand the knownrlcRootto check the main commitment.
Standalone Proof. A node can directly reconstruct a row of the original data given a Merkle proof of the expected result in rlcExtended, and assuming that a proof of data availability exists (i.e. at least one honest node successfully sampled the encoding via the previous algorithm). The verification steps go as follow:
- Verify the Row opening using the known root commitment
rowRoot - Rederive the random vector using Fiat-Shamir
- Compute the random linear combination for the row
- Ensure that the result correctly opens at the expected path in the RLC Merkle tree (given the opening proof)
- Combine both roots to recompute the commitment and check it matches the public one.
Note on Commitments. Merkle tree commitments are used throughout the scheme, where leaves are SHA-256 hashes (with prefix 0x00) of the rows of a matrix, which are encoded in a breadth-first order way in a contiguous array. Nodes in the tree use the prefix 0x01 when hashing children.

