# Audit of Celestia's RSEMA1D Codec

- **Client**: Celestia
- **Date**: October 20th, 2025
- **Tags**: Data Availability, DA, ZODA, Codes, Go

## Introduction

On October 20th, 2025, Celestia tasked zkSecurity with auditing its high-performance *data availability codec*, RSEMA1D (Reed-Solomon Evans-Mohnblatt-Angeris 1D). The specific code to review was shared via a private GitHub repository. The audit lasted one week with two consultants.

The RSEMA1D codec is based on the [ZODA paper](https://eprint.iacr.org/2025/034.pdf), specifically implementing the Hadamard variant from Appendix D with a single column, which served as the primary theoretical foundation for this audit. Additionally, the Celestia team shared a specification as well as an internal Notion document that provided a high-level overview of the codec.

### Scope

The scope of this audit covered the entire RSEMA1D repository at commit `65cc9d4c3887936393763d32e0cf4e665d9d2705`. All files and components present in that commit were considered in-scope. Dependencies outside this repository (in particular, Celestia’s Leopard Reed–Solomon codec) were not formally in scope, although we occasionally referred to them for context.

### Strategic Recommendations

In addition to the findings discussed later in this report, we have the following strategic recommendations:

**Audit of underlying Reed-Solomon implementation**. This report assumed that Celestia's Leopard Reed-Solomon GF16 codec was safely implemented, particularly the core functions `New`, `Encode`, `Reconstruct`, and `GF16Mul`. While we reviewed the RSEMA1D layer built on top of this codec, the correctness and security of the entire system fundamentally depends on the underlying Reed-Solomon implementation. We recommend a dedicated audit of the Leopard library, unless already done, to ensure its cryptographic primitives are sound and properly implemented.

**Review of system integration**. The scope of this audit was limited to the RSEMA1D codec library itself. However, the security of a data availability system depends not only on the codec's correctness but also on how it is integrated and used throughout the broader Celestia stack. We recommend an audit of the integration layer to verify that the codec is being invoked correctly, that data flows between components are secure, and that the overall system maintains the intended security properties when all components interact together.

## Overview

In this section we give an overview of data availability, the original ZODA construction, and some detail of the construction implemented by Celestia.

### What is Data Availability?

**Data availability (DA)** refers to the guarantee that a blob of data (e.g. all transaction data in a block of a rollup protocol) has been published and is accessible to a network. 
Without DA, if even a small portion of the data were to be withheld, honest nodes wouldn't be able to recover the full original data.
In protocols like rollups where the production of block data is being done off-chain, not having a decentralized way to ensure that the rollup data is available (for a certain period of time) could prevent users from verifying and trusting the protocol, or leave them stuck and unable to recover from a protocol halting by being unable to download the necessary data to reconstruct the latest state.
In traditional blockchains, full nodes trivially ensure data availability by downloading 100% of the block data themselves. However, this approach does not scale well: requiring every node to fetch every block in its entirety quickly becomes a throughput bottleneck.

The modern alternative to downloading every block is known as **Data Availability Sampling (DAS).** In a nutshell, DAS lets nodes probabilistically verify that a block’s full data is available by downloading random samples of it rather than the entire block. If enough nodes have performed enough sampling, we are convinced that the data is available and recoverable if needed, even if some pieces of it go missing.

This probabilistic verification becomes possible through a specific way of coding the block’s data. Before a block is broadcast, its data is processed with erasure coding to introduce redundancy. Specifically, many modern DA layers use a one or two-dimensional Reed-Solomon encoding approach: the block’s data is arranged in a matrix and then extended with parity chunks for each row/column. This means the block is effectively duplicated with error-correction data in horizontal/vertical directions. For this extended data matrix, even if some parts are lossy, any chunk can be recovered from the redundant pieces. In other words, as long as a sufficient subset of pieces is available, the entire original block can be reconstructed.

DA schemes typically include commitment and proof systems for the encoded data. The block producer publishes a commitment to the entire extended matrix (for example, a KZG or Merkle commitment). Then, for any chunk a node wants to sample, the producer (or any full node) can provide a short proof that this chunk is part of the committed block. This proof allows the node to verify the integrity of that chunk without having to download the whole block.

In short, DA protocols are typically a combination of redundant encoding and succinct proofs, which allow nodes to ask for just a few random pieces of data to contribute to a blob's availability as if they had downloaded the whole blob.

### The ZODA Protocol

ZODA, or Zero-Overhead Data Availability, is a trust-minimized and highly-optimized protocol for providing data availability. Its main contribution is the use of the [Ligero check](https://eprint.iacr.org/2022/1608) to produce a secure DA scheme based on hash functions.

Note that rsema1d, the protocol implemented by Celestia, is not a direct implementation of the full 2D ZODA protocol, but rather corresponds to the Hadamard variant described in Appendix D of the ZODA paper, simplified to use a single column. This means Celestia only encodes vertically (along columns) rather than applying a full 2D tensor code. We still provide an overview of the full ZODA protocol here in order to introduce the necessary context.

Let's first consider how data is typically encoded in DA protocols. Typically in DA protocols, an _encoder_ encodes a data expressed as a matrix over both the rows and the columns. This is often referred to as a tensor code, as it is equivalent to applying the tensor product of two separate codes. As an example, a tensor encoding of some data matrix $\tilde{X} \in F^{n\times n'}$ would be as follows:

$$
G \tilde{X} G'^T
$$

Where $G$ and $G'$ are the generator matrices for the two respective codes.

Here, ZODA applies a random factor to part of the encoding. Specifically, the final tensor encoding looks like the following:

$$
Z = G \tilde{X} D_{\overline{r}} G'^T
$$

Where $D_{\overline{r}}$ is a diagonal matrix composed of random values $\overline{r}$. The resulting matrix $Z$ is an $m \times m'$ matrix.

(Note that there are multiple variants to how the random factor can be applied in the Appendix of the ZODA paper, but the core idea is the same.)

The protocol is essentially split in three algorithms: encoding, sampling, decoding, which we introduce in more detail below.

**Encoding**. To encode the original data matrix $\tilde X$ the encoder follows these steps:

1. Compute an encoding of the columns $X = G\tilde{X}$ and commit to its rows:

<img src="./img/encoding-1.png" 
     alt="image" 
     style="width:50%; display:block; margin:auto;">

2. Receive a uniformly sampled vector $\overline{r}$ and construct a diagonal matrix $D_{\overline{r}}$ using uniformly sampled random values.

3. Multiply the columns of the data matrix by the random factor and compute the encoding of the resulting rows $Y = \tilde{X} D_{\overline{r}}G'^T$ and commit to its columns:

<img src="./img/encoding-2.png" 
     alt="image" 
     style="width:90%; display:block; margin:auto;">

4. Compute the complete encoding $Z = G\tilde{X} D_{\overline{r}}G'^T = GY$ and commit to it:

<img src="./img/encoding-3.png" 
     alt="image" 
     style="width:90%; display:block; margin:auto;">

**Sampling**. Other nodes can now "sample" by retrieving random rows and columns from the committed encoding:

1. Sample a set $S$ of rows of $X$ from the encoder, written as $X_S$.

2. Sample a set $S'$ of columns of $Y$ from the encoder, written as $Y_S'$. Denote each column as $y_j$ for every $j \in S'$.

3. Check $X_i D_{\overline{r}}g'^T = (Gy_j)_i$ for every $i \in S$ and $j \in S'$:

<img src="./img/sampling-1.png" 
     alt="image" 
     style="width:90%; display:block; margin:auto;">

4. Verify that $Z_{ij} = g_i^Ty_j$ for each $i \in S$ and $j \in S'$.

The 3rd step is the main check that ensures that the rows of $X$ are the correct encodings of the columns of an original matrix $\tilde{X}$.
This is similar to the check done in the [Ligero](https://eprint.iacr.org/2022/1608) paper, which checks that many vectors are codewords using the fact that if a random linear combination of multiple vectors is also a codeword, then they must be codewords as well with high probability.

Similarly, to check that each vector in $X$ (i.e. each column of $X$) is a codeword, we can check whether a random linear combination of all columns in $X$ is a codeword or not. Taking a single row $i$ in $X$ as $X_i$, we can see that $X_iD_{\overline{r}}$ results in a scaled vector $[X_i[0]\cdot r_0, X_i[1]\cdot r_1, \dots, X_i[n-1]\cdot r_{n-1}]$. Then, we compute an inner product sum of this scaled vector with the $j$-th column of $G'^T$ (i.e. $g_j'^T$) to get a scalar value $X_iD_{\overline{r}}g_j'^T$.

(Note that if $G'$ is a generator matrix for a systematic code (i.e. $G' = \begin{bmatrix} I \\ P' \end{bmatrix}$), the first $n$ columns of $G'^T$ will be the identity matrix. This means that for any $j < n$, $g_j'^T$ will turn all but one $j$-th column of $X_iD_{\overline{r}}$ to zero (i.e. $[0, 0,\dots, X_i[j] \cdot r_j, \dots, 0]$), thus failing to capture the random linear combination of _all columns_ of $X$, as is required for the Ligero check. Consequently, when using a systematic code for $G'$, we need to sample columns $j \geq n$ to avoid using the identity matrix.)

Since the RHS is matrix multiplied by a generator matrix $G$, we can be certain that $GY$ is a valid codeword. Therefore, we can compute the $j$-th column of $GY$ and compare its $i$-th row value to $X_iD_{\overline{r}}g_j'^T$.

Whether this check is sufficient to ensure that all rows of $X$ are valid encodings of $\tilde{X}$ is a probabilistic one that depends on the properties of the code. We direct the reader to Section 3.2 of the [ZODA paper](https://eprint.iacr.org/2025/034.pdf) for a detailed discussion on this topic.

**Decoding**. The decoder can reconstruct the original data by first running the sampling check and then sample either enough rows or columns to decode the original data. More formally, the decoder can reconstruct the original data by performing either of the following options:

1. Sample $|S| > m - d$ rows and decode $G_S\tilde{X}=X_S$ via an erasure decoding algorithm for $G$.

2. Sample $|S'| > m' - d'$ columns and decode $G'_{S'}D_{\overline{r}}\tilde{X}^T = (Y^T)_{S'}$ via an erasure decoding algorithm for $G'$.

**The Adversarial Reconstruction Property of ZODA**. Another guarantee that ZODA provides is the ability to reconstruct the original data even under adversarial conditions. Specifically, when the decoder requests rows or columns from the encoder, the encoder may give the decoder rows and columns of their choosing. As long as the decoder has run the sampling check beforehand, however, ZODA guarantees that the decoder can still decode the original data. We direct the reader to Section 3.2.4 of the [ZODA paper](https://eprint.iacr.org/2025/034.pdf) for a detailed discussion on this topic.

### RSEMA1D Fields

In this section, we will give a brief overview of the fields used in the RSEMA1D protocol and the conversion between fields to the encoding format. The protocol uses two binary fields: $\mathsf{GF}(2^{16})$ and $\mathsf{GF}(2^{128})$.

**Leopard Interleaved Format**. The Reed-Solomon encoding is computed over $\mathsf{GF}(2^{16})$  i.e. the generator matrix $G$, the input data matrix $\widetilde{X}$ and the encoding $X = G \widetilde{X}$ contain elements from $\mathsf{GF}(2^{16})$. The [Leopard codec](https://github.com/catid/leopard) library (which is used to compute the encoding) processes input data in an interleaved $\mathsf{GF}(2^{16})$ format as shown below:

<img src="./img/0-leopard-interleaved.svg" 
     alt="image" 
     style="width:75%; display:block; margin:auto;">

The above diagram shows a single row containing $64$ bytes or $32$ $\mathsf{GF}(2^{16})$ symbols. The bytes indexed $0$-$31$ contain low bytes and the bytes indexed $32$-$63$ contain high bytes of the $32$ $\mathsf{GF}(2^{16})$ symbols. Thus a symbol $s_i$ from $\mathsf{GF}(2^{16})$ in the little-endian format is computed as:
$$
s_i = (l_i \; | \; h_i), \; \text{for } i \in [0,32)
$$
This conversion is illustrated in the following diagram.

<img src="./img/1-leopard-conversion.svg" 
     alt="image" 
     style="width:75%; display:block; margin:auto;">

**Extension Field**. For computing the random linear combination, the random coefficients are sampled from the extension field $\mathsf{GF}(2^{128})$. In the implementation, a single element from $\mathsf{GF}(2^{128})$ is represented as a vector of $8$ elements of $\mathsf{GF}(2^{16})$. 

Suppose we want to compute a random linear combination of the columns of the input data matrix $\widetilde{X}$ which contains $32$ $\mathsf{GF}(2^{16})$ symbols in a single row, we sample a random vector $g_r$ containing $32$ $\mathsf{GF}(2^{128})$ symbols and the random linear combination is then computed as a matrix-vector product $\widetilde{X}_r = \widetilde{X} g_r$. If the number of rows of $\widetilde{X}$ are $K$ then $\widetilde{X}_r$ will be a vector of $K$ elements from $\mathsf{GF}(2^{128})$. Let us represent the element at index $i$ in the vector $\widetilde{X}_r$ by $\tilde{a}_i$. We can represent each $\tilde{a}_i \in \mathsf{GF}(2^{128})$ using $8$ elements from $\mathsf{GF}(2^{16})$ i.e.
$$
\tilde{a}_i = \sum_{j=0}^{7} \tilde{a}_{i,j} \cdot \beta_j
$$
where $\tilde{a}_{i,j}$ are elements from $\mathsf{GF}(2^{16})$ and $\beta_j$ are the basis of $\mathsf{GF}(2^{128})$ over $\mathsf{GF}(2^{16})$. This is illustrated in the following figure:

<img src="./img/2-extension.svg" 
     alt="image" 
     style="width:90%; display:block; margin:auto;">

This gives us the following two representations for $\widetilde{X}_r$:

1. As a vector with elements $\tilde{a}_i$ from $\mathsf{GF}(2^{128})$
2. As a matrix with elements $\tilde{a}_{i,j}$ from $\mathsf{GF}(2^{16})$

**Encoding an extension field vector**. Now we want to extend the random linear combination $\widetilde{X}_r$ to compute the encoding $X_r = G\widetilde{X}_r$. We can represent the vector $\widetilde{X}_r$ as coefficients of a polynomial $\widetilde{X}_r(x)$ in $\mathsf{GF}(2^{128})$ i.e.
$$
\widetilde{X}_r(x) = \sum_{i=0}^{K-1} \tilde{a}_i \cdot L_i(x)
$$
where $L_i(x)$ are some basis polynomials. Let us further decompose $\widetilde{X}_r(x)$ using the decomposition of $\tilde{a}_i$.
$$
\widetilde{X}_r(x) = \sum_{i=0}^{K-1} \bigg(\sum_{j=0}^{7} \tilde{a}_{i,j} \cdot \beta_j\bigg) \cdot L_i(x)
$$
Rearranging the summations, we get:
$$
\widetilde{X}_r(x) = \sum_{j=0}^{7} \bigg(\sum_{i=0}^{K-1} \tilde{a}_{i,j} \cdot L_i(x)\bigg) \cdot \beta_j = \sum_{j=0}^{7} \widetilde{A}_j(x) \cdot \beta_j
$$
where $\widetilde{A}_j(x) = \sum_{i=0}^{K-1} \tilde{a}_{i,j} \cdot L_i(x)$ is a polynomial with coefficients from the $j$th column of $\widetilde{X}_r$ in its matrix representation as shown in the following figure.

<img src="./img/3-extension.svg" 
     alt="image" 
     style="width:50%; display:block; margin:auto;">

The Reed-Solomon encoding of $\widetilde{X}_r$ is just evaluations of the polynomial $\widetilde{X}_r(x)$ over an extended domain from $\mathsf{GF}(2^{16})$. This is equivalent to encoding each of the columns $\widetilde{A}_j$ and then interpreting each row of the encoding, which contains $8$ $\mathsf{GF}(2^{16})$ elements, as a single element from $\mathsf{GF}(2^{128})$. The entire encoding process is illustrated in the following diagram.

<img src="./img/4-rlc-encoding.svg" 
     alt="image" 
     style="width:90%; display:block; margin:auto;">

In the above diagram, we assume extension by $N$ i.e. the number of rows of the encoding are $N+K$. The columns of the encoding $X_r$ are denoted by $A_j$ which are an extension of the columns $\widetilde{A}_j$ of the data $\widetilde{X}_r$. The elements $a_{i, j}$ of the encoding $X_r$ are symbols from $\mathsf{GF}(2^{16})$ and each row contains $8$ $\mathsf{GF}(2^{16})$ symbols which can be packed into a single element of $\mathsf{GF}(2^{128})$ using the same packing equation as follows: 
$$
{a}_i = \sum_{j=0}^{7} {a}_{i,j} \cdot \beta_j
$$
Here, even though we have shown that we can compute ${a}_i \in \mathsf{GF}(2^{128})$ from the $8$ $\mathsf{GF}(2^{16})$ symbols of a row, in the implementation we don't compute $a_i$ since a symbol in $\mathsf{GF}(2^{128})$ is represented as a vector of $8$ $\mathsf{GF}(2^{16})$ symbols.

Note that before extending $\widetilde{X}_r$, we convert the rows of $\mathsf{GF}(2^{16})$ symbols into Leopard interleaved format which was described earlier. Similarly, after extending $\widetilde{X}_r$, we convert from Leopard interleaved format to rows of $\mathsf{GF}(2^{16})$ symbols. We omit this conversion from the diagram for simplicity.

### RSEMA1D Codec

In this section we review the ZODA-inspired protocol implemented by Celestia.

Note that the 1D variant implemented by Celestia is essentially the Hadamard variant specified in Appendix D of the ZODA paper, but with a single column. This simplification means that instead of applying a full 2D tensor code (encoding both rows and columns), the protocol only encodes the columns vertically using Reed-Solomon encoding.

**Encoding Algorithm**. The encoding algorithm uses Reed-Solomon encoding over $\mathbb{F}_{2^{16}}$ to vertically extend the input data matrix i.e. the generator matrix $G$ and the input data matrix $\widetilde{X}$ contain elements from $\mathbb{F}_{2^{16}}$. The generator matrix $G$ is in systematic form i.e. $G = \big[\frac{I}{A}\big]$, where $I$ is the identity matrix and $A$ is the matrix containing the parity check information.

We outline a simplified description of the protocol:

1. Compute the encoding of the columns $X = G \widetilde{X}$
2. Commit to the encoding $X$ using a Merkle tree where each leaf of the Merkle tree is the hash of a row of the encoding $X$ and let $\mathsf{root}_{X}$ represent the root of this tree
3. Derive, using Fiat-Shamir, a uniformly random vector $g_r$ which contains elements from an extension field $\mathbb{F}_{2^{128}}$
4. Compute the random linear combination of the columns $\widetilde{X}_r = \widetilde{X} g_r$
5. Compute the encoding of the random linear combination as $X_r = G \widetilde{X}_r$
6. Commit to the random encoding $X_r$ using a Merkle tree and let $\mathsf{root}_{X_r}$ represent the root of this tree
7. Commit to both the roots as $\mathsf{Sha256}(\mathsf{root}_{X} \; || \;\mathsf{root}_{X_r})$

![commitment](./img/commitment.png)

**Sampling Algorithm**. To sample rows of the committed encoding and knowing $\widetilde{X}_r$, a node/verifier request Merkle proofs for a set of rows and verifies them in the following way:

1. Recompute the extended RLC vector `rlcExtended` (i.e. $G \widetilde{X}_r$)
2. Verify each Row opening using the known root commitment `rowRoot`
3. Rederive the random vector $g_r$ using Fiat-Shamir
4. Compute the random linear combination (using $g_r$) of each Row and check that they match the expected entry in `rlcExtended`
5. Combine `rowRoot` and the known `rlcRoot` to check the main commitment.

**Standalone Proof**. A node can directly reconstruct a row of the original data given a Merkle proof of the expected result in `rlcExtended`, and assuming that a proof of data availability exists (i.e. at least one honest node successfully sampled the encoding via the previous algorithm). The verification steps go as follow:

1. Verify the Row opening using the known root commitment `rowRoot`
2. Rederive the random vector $g_r$ using Fiat-Shamir
3. Compute the random linear combination for the row
4. Ensure that the result correctly opens at the expected path in the RLC Merkle tree (given the opening proof)
6. Combine both roots to recompute the commitment and check it matches the public one.

**Note on Commitments**. Merkle tree commitments are used throughout the scheme, where leaves are SHA-256 hashes (with prefix `0x00`) of the rows of a matrix, which are encoded in a breadth-first order way in a contiguous array. Nodes in the tree use the prefix `0x01` when hashing children.

![commitments](./img/merkle-encoding.png)

## Findings

### Multiple Openings to the Same Committed Row Allowed

- **Severity**: High
- **Location**: proof.go

**Description**. `VerifyStandaloneProof` and `VerifyRowWithContext` are missing length checks to both `proof.Row` and `proof.RowProof`, which opens up the possibility to reveal the `proof.Index` with different data.

An adversary can construct a malicious Merkle tree like below. With different proof lengths, one can construct proofs to different nodes with the same index:

<img src="./img/multiple-openings.png" 
     alt="image" 
     style="width:100%; display:block; margin:auto;">

Suppose that $(r_1, r_2, ..., r_n)$ and $(r'_1, r'_2, ..., r'_{n'})$ are the symbols that are extracted from the rows of the two nodes. Their RLCs are respectively defined by

$$
\text{RLC}  &= \sum_{i=1}^n r_i \cdot c_i \qquad
\text{RLC}' &= \sum_{i=1}^{n'} r'_i \cdot c_i.
$$

Since there is no checks to the length for `proof.Row`, it is possible that $n \neq n'$. One can set 

$$(r'_1, r'_2, ..., r'_n, r'_{n+1}, ..., r'_{n'}) = (r_1, r_2, ..., r_n, 0, ..., 0),$$

implying that $\text{RLC} = \text{RLC}'$.

The below test case shows that an adversary can create multiple proofs to the same index, ending up having different `proof.Row`s:

```go
func TestVerifyRowWithContextWithMultipleOpenings(t *testing.T) {
	config := &Config{
		K:           8,
		N:           8,
		RowSize:     256,
		WorkerCount: 1,
	}
	config.Validate() // populate .kPadded and .totalPadded

	// === PROVER ===
	// the prover are being malicious, and they hope that no one will open at index 0

	data := make([][]byte, 8)
	for i := 0; i < 8; i++ {
		digest := sha512.Sum512([]byte{byte(i)})
		data[i] = append(digest[:], 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
	}

	// mostly copied from the Encode function from codec.go
	extended, err := encoding.ExtendVertical(data, config.N)
	assert.NoError(t, err)

	nodes, asNodes := buildAdversarialPaddedRowTree(extended)
	rowRoot := nodes[0]

	coeffs := deriveCoefficients([32]byte(rowRoot), config)
	rlcOrig := computeRLCOrig(data, coeffs, config)
	rlcExtended, err := encoding.ExtendRLCResults(rlcOrig, config.N)
	assert.NoError(t, err)
	rlcTree := buildPaddedRLCTree(rlcExtended, config)
	rlcRoot := rlcTree.Root()

	h := sha256.New()
	h.Write(rowRoot[:])
	h.Write(rlcRoot[:])
	var commitment Commitment
	h.Sum(commitment[:0])

	// === VERIFIER ===
	// assuming that the verifier wants to open at index 3
	ctx, err := CreateVerificationContext(rlcOrig, config)
	assert.NoError(t, err)

	// ...it is possible to open as some data (doing nicely)
	proof1 := &RowProof{
		Index:    3,
		Row:      extended[3],
		RowProof: [][]byte{nodes[17], nodes[7], nodes[4], nodes[2]},
	}
	if err := VerifyRowWithContext(proof1, commitment, ctx); err != nil {
		t.Error("VerifyStandaloneProof error:", err)
	}

	// ...it is also possible to open as some data, with zeros concatenated
	proof2 := &RowProof{
		Index:    3,
		Row:      extended[3][:256-64],
		RowProof: [][]byte{asNodes[17], asNodes[7], asNodes[4], asNodes[2], nodes[16], nodes[8], nodes[4], nodes[2]},
	}
	if err := VerifyRowWithContext(proof2, commitment, ctx); err != nil {
		t.Error("VerifyStandaloneProof error:", err)
	}
}

func buildAdversarialPaddedRowTree(extended [][]byte) ([][]byte, [][]byte) {
	nodes := make([][]byte, 31)
	asNodes := make([][]byte, 31) // adversary subtree nodes

	// build the adversary subtree

	for i := 0; i < 16; i++ {
		digest := sha256.Sum256(append([]byte{0}, extended[i][:256-64]...))
		asNodes[15+i] = digest[:] // SHA256(00 || data) for leaf nodes
	}
	for i := 0; i < 8; i++ {
		digest := sha256.Sum256(append([]byte{1}, append(asNodes[15+2*i], asNodes[15+2*i+1]...)...))
		asNodes[7+i] = digest[:] // SHA256(01 || left || right) for non-leaf nodes
	}
	for i := 0; i < 4; i++ {
		digest := sha256.Sum256(append([]byte{1}, append(asNodes[7+2*i], asNodes[7+2*i+1]...)...))
		asNodes[3+i] = digest[:]
	}
	for i := 0; i < 2; i++ {
		digest := sha256.Sum256(append([]byte{1}, append(asNodes[3+2*i], asNodes[3+2*i+1]...)...))
		asNodes[1+i] = digest[:]
	}
	for i := 0; i < 1; i++ {
		digest := sha256.Sum256(append([]byte{1}, append(asNodes[1+2*i], asNodes[1+2*i+1]...)...))
		asNodes[0+i] = digest[:]
	}

	nodes[15+0] = asNodes[0]
	for i := 1; i < 16; i++ { // starts from one
		digest := sha256.Sum256(append([]byte{0}, extended[i][:256]...))
		nodes[15+i] = digest[:]
	}
	for i := 0; i < 8; i++ {
		digest := sha256.Sum256(append([]byte{1}, append(nodes[15+2*i], nodes[15+2*i+1]...)...))
		nodes[7+i] = digest[:]
	}
	for i := 0; i < 4; i++ {
		digest := sha256.Sum256(append([]byte{1}, append(nodes[7+2*i], nodes[7+2*i+1]...)...))
		nodes[3+i] = digest[:]
	}
	for i := 0; i < 2; i++ {
		digest := sha256.Sum256(append([]byte{1}, append(nodes[3+2*i], nodes[3+2*i+1]...)...))
		nodes[1+i] = digest[:]
	}
	for i := 0; i < 1; i++ {
		digest := sha256.Sum256(append([]byte{1}, append(nodes[1+2*i], nodes[1+2*i+1]...)...))
		nodes[0+i] = digest[:]
	}

	return nodes, asNodes
}
```

The prover can reveal the `proof.Index` with different data (one with additional trailing null bytes) as long as `proof.Index` is nonzero.

**Recommendation**. Check also the length of `proof.Row` to be `config.RowSize`, and the length of `proof.RowProof` to be the binary logarithm of `c.totalPadded`.

**Client Response**. This was fixed in https://github.com/celestiaorg/rsema1d/pull/8

### Verification API Can Crash If Passed Invalid Inputs

- **Severity**: Medium
- **Location**: proof.go

**Description**. Both the `VerifyRowWithContext` and `VerifyStandaloneProof` API assume that their `proof` and `context` inputs are not `nil`, and immediately dereference internal fields of these inputs. For example, `VerifyRowWithContext` starts with:

```go
func VerifyRowWithContext(proof *RowProof, commitment Commitment, context *VerificationContext) error {
    if proof.Index < 0 || proof.Index >= context.config.K+context.config.N {
        …
    }
```

An attacker can deliberately pass `nil` for `proof` or `context`, and the very first access (`proof.Index` or `context.config`) will immediately panic. Thus, a single malformed request can deterministically crash every verifier, resulting in a denial of service.

**Recommendation**. Add explicit `nil` guards for proofs and contexts so the verifiers fail with a normal error instead of panicking.

**Client Response**. This was fixed in https://github.com/celestiaorg/rsema1d/pull/11.

### Include the Configuration in Fiat-Shamir

- **Severity**: Low
- **Location**: commitment.go

**Description**. The protocol implements the Fiat-Shamir transformation by hashing, using SHA-256, the root of the encoded ODS (the encoded Original Data Square matrix). 
This derivation is implemented in `deriveCoefficients`, and produces a vector of random value that can later be used to produce a random linear combination of the vectors of that same matrix.

```go
func deriveCoefficients(rowRoot [32]byte, config *Config) []field.GF128 {
	seed := sha256.Sum256(rowRoot[:])
	numSymbols := config.RowSize / 2 // Each GF16 symbol is 2 bytes
```

The derivation of the random values does not include the configuration itself. Especially, we're interested in the configuration fields that form our problem instance:

```go
type Config struct {
	// Core parameters (required)
	K       int // Number of original rows (can be arbitrary)
	N       int // Number of parity rows (can be arbitrary)
	RowSize int // Size of each row in bytes (multiple of 64)
```

This means that different *compatible* instances of the same matrix, but with different `K`, `N`, or `RowSize` values will produce the same exact random coefficients.

This does not seem to be a problem in practice, as one would still need to produce the same exact Merkle tree root as well. This limits the `K` and `N` values used to only add 0 rows to the matrix without changing the padding (so `K` and `N` should not push the derived padded configuration to the next powers of two), and the `RowSize` to only allow for hashed leaves that will produce the same Merkle root (effectively "impossible").

That being said, the collision of Fiat-Shamir transcript is still somewhat odd, and the overhead of adding these values seems negligible.

**Client Response**. This was fixed in https://github.com/celestiaorg/rsema1d/pull/10

### Negative Indices Allow for Incorrect Proofs

- **Severity**: Low
- **Location**: proof.go

**Description**. There is a missing check in `VerifyStandaloneProof` which allows `proof.Index` to be negative. Since it is only used in `merkle.ComputeRootFromProof` and it is not range-checked, it can still be considered as a correct proof. The below test case should not pass:

```go
func TestVerifyStandaloneProofWithNegativeProofIndex(t *testing.T) {
	config := &Config{
		K:           8,
		N:           8,
		RowSize:     256,
		WorkerCount: 1,
	}

	data := makeTestData(8, 256)
	extData, commitment, _, err := Encode(data, config)

	index := 6
	proof, err := extData.GenerateStandaloneProof(index)
	if err != nil {
		t.Fatalf("GenerateStandaloneProof(%d) error: %v", index, err)
	}

	// negative proof.Index is bad
	proof.Index = -index

	// Verify standalone proof
	if err := VerifyStandaloneProof(proof, commitment, config); err != nil {
		t.Errorf("VerifyStandaloneProof(%d) error: %v", index, err)
	}

	// Verify proof contains correct row data
	if !bytes.Equal(proof.Row, extData.rows[index]) {
		t.Errorf("Standalone proof data doesn't match extended data for index %d", index)
	}
}
```

If `proof.Index` is used as an index to read from an array, it will cause panic because it is trying to read out of bounds.

**Recommendation**. Check `proof.Index` to be non-negative.

**Client Response**. This was fixed in https://github.com/celestiaorg/rsema1d/pull/9

### Commit to rlcOrig, Not rlcExtended

- **Severity**: Low
- **Location**: codec.go

**Description**. The current implementation commits to `rlcExtended` ($K+N$ RLC values) in the Merkle tree, but only the original $K$ values (`rlcOrig`) are ever used via Merkle proofs. The extended $N$ parity RLC values are never accessed individually through the tree. Committing to only `rlcOrig` would be simpler, more efficient, and less error-prone.

In `codec.go`, the prover commits to all $K+N$ RLC values:

```go
// 6. Extend RLC results
rlcExtended, err := encoding.ExtendRLCResults(rlcOrig, config.N)
if err != nil {
    return nil, Commitment{}, nil, fmt.Errorf("failed to extend RLC results: %w", err)
}

// 7. Build padded RLC Merkle tree matching row tree structure
rlcTree := buildPaddedRLCTree(rlcExtended, config)
rlcRoot := rlcTree.Root()
```

Only the standalone proof API makes use of that committed RLC Merkle tree, but it does so **only supporting original rows** (i.e. `index < K`). This is enforced in `proof.go`:

```go
func VerifyStandaloneProof(proof *StandaloneProof, commitment Commitment, config *Config) error {
    // ...
    if proof.Index >= config.K {
        return errors.New("standalone verification only supports original rows")
    }
```

The extended RLC values (indices $K$ to $K+N-1$) are **never accessed via Merkle proofs**.

It is important that a verifier/sampler does not, in the sampling API, use the extended RLC vector directly, and instead verifies it to be a correct encoding. The verifier does that in practice by encoding the RLC vector themselves.

The current approach is error-prone, as it allows a different API where the sampler would not produce the RLC encoding themselves.

**Recommendation**. Commit to only `rlcOrig` ($K$ values) instead of the extended version.

### Cleanup Suggestions for Clarity and Consistency

- **Severity**: Informational
- **Location**: -

During the audit, we noted a handful of tidy-up items that do not impact security but will pay off in readability and long-term maintenance.

- **Redundant Row-Size Check in *config.go***
`Validate` currently rejects `RowSize <= 0`, but a later check already enforces `RowSize >= 64`, making the former check redundant. Removing the `RowSize <= 0` check simplifies the validation logic and clarifies the true lower bound.
- **Missing check in *config.go***
`Validate` should assert that K and N are both `< 65536` in order to avoid integer overflows if K or N is close to `2^63`
```
if c.K > 65536 {
    return fmt.Errorf("k must be <= 65536, got %d", c.K)
}
if c.N > 65536 {
    return fmt.Errorf("n must be <= 65536, got %d", c.N)
}
```
- **Lingering TODO in *commitment.go***
`deriveCoefficients` still contains an open TODO about error propagation. Consider addressing the `binary.Write` failure explicitly.
- **Unused Input Parameter in `computeRLC`**
In *commitment.go*, the function signature of `computeRLC` accepts `config *Config`, but never references it. Consider removing the argument.
- **Inconsistent Specification of Irreducible Polynomial**
In section 2.1 Field Definitions, *SPEC.md* contains an inconsistency regarding the irreducible polynomial used by the Leoapard RS codec. More specifically, the specification lists the irreducible polynomial as `x^{16} + x^{12} + x^3 + x + 1` (which does *not* correspond to `0x1002D`), while the actual code uses `x^{16} + x^5 + x^3 + x^2 + 1` (which does correspond to `0x1002D`). Consider updating the specification accordingly.
- **`DefaultConfig` Comments in *config.go* Should Say “MiB”**
The comments of `DefaultConfig` say “128 MB,” yet the values are computed in multiples of 1,048,576 bytes. Switching the wording to “128 MiB” keeps the documentation precise.
- **Unused Function in *gf128.go***
Outside of tests, the function `FromBytes128` is unreferenced. Consider dropping the function unless there is a concrete plan to use it in the future.

**Client Response.**

Pending.

### `HashToGF128` Truncates Input Beyond 32 Bytes

- **Severity**: Informational
- **Location**: field/gf128.go

**Description**. `HashToGF128` requires an input slice of at least 32 bytes but silently ignores everything beyond the first 32. As a result, any two inputs that only differ after the first 32 bytes produce the same GF128 element. For example, the following test will pass.

```go
t.Run("extra_bytes_ignored", func(t *testing.T) {
    base := make([]byte, 32)
    for i := range base {
        base[i] = byte(i)
    }

    extended := make([]byte, 35)
    copy(extended, base)
    extended[32] = 0xAA
    extended[33] = 0xBB
    extended[34] = 0xCC

    resultBase := HashToGF128(base)
    resultExtended := HashToGF128(extended)

    t.Logf("HashToGF128(base) = %v", resultBase)
    t.Logf("HashToGF128(extended) = %v", resultExtended)

    if !Equal128(resultBase, resultExtended) {
        t.Errorf("HashToGF128 should ignore bytes beyond 32: %v != %v", resultBase, resultExtended)
    }
})
```

**Impact**

Currently, this does not impact security because the only caller (`deriveCoefficients` in *commitment.go*) always feeds in a 32-byte SHA-256 digest. Still, the implicit truncation could surprise future callers who expect the full slice to influence the result.

**Recommendation**

Either document the truncation explicitly or reject inputs longer than 32 bytes.

**Client Response.**

Pending.

### Integer Overflow Allows `Config.Validate` Bypass

- **Severity**: Informational
- **Location**: config.go

**Description**

Relevant code:

```go
if c.K + c.N > 65536 {
    return fmt.Errorf("k + n must be <= 65536, got %d", c.K + c.N)
}
```

The validation intended to ensure that the total number of rows (`K + N`) does not exceed $2^{16}$ can be bypassed due to integer overflow.

Because both `K` and `N` are of type `int`, setting a large value for `N` (ex: $N = 2^{63}-1$) causes `K + N` to overflow into a **negative** integer.

As a result, the check `c.K + c.N > 65536` incorrectly evaluates to **false**, allowing the configuration to pass validation even though the actual total far exceeds the intended maximum.

In that case, `totalPadded` will equal `1`. Due to `nextPowerOfTwo(negative value) = 1`.

**Impact**

Typical deployments are unlikely to use extreme values.

- Bypass upper-bound validation for `K + N`
- Cause later code relying on valid bounds to allocate or index out of range
- Potentially crash or corrupt Merkle tree computations

**Recommendation**

Assert that both `N` and `K` are less than $2^{16}$.

**Client Response.**

Pending.

### Excessive Merkle Tree Padding May Exceed Expected Shards Limit

- **Severity**: Informational
- **Location**: config.go, padding.go

**Description**

During the Merkle tree construction step (in `buildPaddedRowTree` / `buildPaddedRLCTree`) the implementation ensures that the number of rows is a power of two, based on `config.kPadded` and `config.totalPadded`, by padding the data rows first, then appending the parity rows, and finally applying a second round of padding to reach the next power of two.

When validating the config, it’s not checked that `totalPadded` doesn’t exceed $2^{16}$.

Although the intended maximum number of total rows (data + parity) is $2^{16}$, the additional padding applied to balance the Merkle tree can increase the number of rows up to $2^{17}$.

For example:

```
K = 10 data rows
N = 65526 parity rows
→ kPadded      = 16
→ kPadded + N  = 65542
→ totalPadded  = 2^17 = 131072 rows
```

**Impact**

Minor performance overhead in Merkle tree construction.

This does not affect correctness or security, but slightly increases computation and memory requirements when building the Merkle tree.

**Recommendation**

Add a sanity check in `config.Validate()` to avoid padding beyond the expected maximum total rows.

**Client Response.**

Pending.

---

This report was published on the [zkSecurity Audit Reports](https://reports.zksecurity.xyz) site by [ZK Security](https://www.zksecurity.xyz), a leading security firm specialized in zero-knowledge proofs, MPC, FHE, and advanced cryptography. For the full list of audit reports, see [llms.txt](https://reports.zksecurity.xyz/llms.txt).
