laitimes

1kx: The market needs trust-minimized and horizontally scalable blockchain systems

Ethereum is a permissionless world computer that has (arguably) the highest economic security at the time of writing, being the settlement ledger for a large number of assets, applications, and services. Ethereum does have its limitations – at Ethereum Layer 1 (L1), block space is a scarce and expensive resource. Layer 2 (L2) scaling has always been seen as a solution to this problem, and many projects have appeared on the market in recent years, mainly in the form of rollups. However, a rollup in the strict sense (i.e., rollup data on Ethereum L1) does not allow Ethereum to scale indefinitely, and can only allow a maximum of a few thousand transactions per second.

Trust Minimization – If an L2 system does not require trust outside of the underlying L1 at runtime, then it is trust minimized.

Scale-out – If instances can be added without creating a global bottleneck, then the system is scalable.

In this article, we argue that trust-minimized and horizontally scalable systems are the most promising ways to scale blockchain applications, but they are currently underexplored. We present our argument by exploring three questions:

Why minimize the trust level of your application?
Why build a system that scales out?
How can we minimize trust and improve horizontal scalability?

(Disclaimer: While we're using Ethereum as the base L1 in this article, much of what we're discussing here applies to decentralized settlement layers outside of Ethereum).

Why should applications minimize trust?

Applications can connect to Ethereum in a trusted way – they can write to the Ethereum blockchain and read data from the Ethereum blockchain, but the operator must execute the business logic correctly. Centralized exchanges like Binance and Coinbase are examples of trusted applications. Connecting with Ethereum means that applications can take advantage of a variety of assets across the global settlement network.

Trusted off-chain services present significant risks. The collapse of major exchanges and services like FTX and Celsius in 2022 is a good warning of what happens when trusted services behave erratically and fail.

Trust-minimized applications, on the other hand, can verifiably write to and read Ethereum. Examples of this include smart contract applications such as Uniswap, rollup applications such as Arbitrum or zkSync, and coprocessors such as Lagrange and Axiom. Broadly speaking, trust is eliminated as more functions (see below) are outsourced to L1 and the Ethereum network secures the application. As a result, trust-minimized financial services can be provided without counterparty or custodian risk.

Applications and services that can be outsourced to L1 have three key attributes:

Validity (and sequencing): Transactions submitted by users should be included (executed and settled) in a timely manner.

Validity: Transactions are processed according to pre-defined rules.

Data (and state) availability: Users have access to historical data and current application state.

For each of these attributes, we can consider what the trust assumptions are and in particular, whether Eth L1 provides the attribute or whether an external trust is required. The following table categorizes this for different architectural paradigms.

Why build a system that scales out?

Scale-out refers to scaling by adding independent or parallel instances of a system, such as an application or rollup. This requires that there are no global bottlenecks in the system. Scaling out enables and facilitates exponential growth.

Scaling up refers to scaling by increasing the throughput of a monolithic system, such as ETH L1 or the data availability layer. When scaling out hits a bottleneck on such shared resources, it often requires scaling up.

Claim 1: (Transactional data) rollups can't scale out because they're subject to data availability (DA) bottlenecks. Scaling up data availability solutions requires compromises in terms of decentralization.

Data availability (DA) is still the bottleneck for rollups. Currently, the maximum target size per L1 block is ~1 MB (85 KB/s). With EIP-4844, the available data will increase by about 2 MB (171 KB/s) (in the long run). With Danksharding, Ethereum L1 can eventually support up to 1.3 MB/s of DA bandwidth. Ethereum L1 DA is a shared resource that many applications and services compete for. Therefore, while using L1 for DA provides the best security, it creates a bottleneck for the potential scalability of such systems. Systems that leverage L1 for DA are (often) not scalable and are not cost-effective. Other DA layers, such as Celestia or EigenDA, also have bandwidth limitations (albeit larger, 6.67 MB/s and 15 MB/s, respectively). But this comes at the cost of moving trust assumptions from Ethereum to another (often less decentralized) network, compromising (economic) security.

1kx: The market needs trust-minimized and horizontally scalable blockchain systems

Claim 2: The only way to scale out a trust-minimized service is to get (near) zero-margin L1 data. The two methods that are known so far are state difference rollups (SDRs) and validiums.

A state difference rollup (SDR) is a rollup that publishes a batch of aggregate transactions to Ethereum L1 by publishing a state difference. For EVMs, as the number of transaction batches increases, the data for each transaction published to L1 decreases to a constant that is much smaller than the transaction data rollup.

For example, during a stress test event with an influx of inscriptions, zkSync saw calldata reduced to a minimum of 10 bytes per transaction. In contrast, transaction data clusters like Arbitrum, Optimism, and Polygon zkEVM typically have a data volume of around 100 bytes per transaction under normal traffic.

A validium is a system that publishes proof of validity of state transitions to Ethereum, with no associated transaction data or state. Validiums are highly horizontally scalable, even under low-traffic conditions. This is especially important because the settlements of different validiums can be aggregated together.

In addition to horizontal scalability, validiums also provide on-chain privacy (not affected by public observers). Validiums with Privacy DA have centralized and gated data and state availability, which means that users must authenticate before accessing data, and operators can enforce good privacy practices. This makes the user experience similar to traditional web or financial services – user activity is not subject to public scrutiny, but there is a trusted custodian of user data, in this case, the Validium operator.

What is the difference between a centralized sequencer and a decentralized sequencer?In order to maintain the horizontal scalability of the system, it is essential to instantiate independent sequencers (centralized or decentralized). It's important to note that while systems that use shared sequencers are atomically composable, they can't scale out because sequencers become bottlenecks as the number of systems increases.

What about interoperability? If all systems are settled to the same L1, scale-out systems can interoperate without additional trust because information can be sent from one system to another through a shared settlement layer. There is a trade-off between running costs and message latency (which may be addressed at the application layer).

Trust minimization for horizontally scalable systems

Can we further minimize trust requirements for validity, sequencing, and data availability in a horizontally scalable system?

It's worth noting that we know how to salvage trustless effectiveness and data availability at the cost of horizontal scalability. For example, an L2 transaction can be initiated from L1 to guarantee inclusion. Volition can provide users with selective L1 state availability.

Another solution is decentralization (but not relying on L1). Systems can become more decentralized by replacing a single sequencer with a decentralized sequencer, such as Espresso Systems or Astria, which minimizes the trust needed for validity, sorting, and data availability. Doing so comes with a number of limitations compared to single-op solutions: (1) performance may be limited by the performance of distributed systems, and (2) for validators with privacy data analytics, if the decentralized sequencer network is permissionless, then the default privacy guarantee is lost.

How much more can we reduce trust with a single-action Validation or SDR? Here are a few open directions.

Direction 1: Trust-minimized effective data availability. Plasma solves the state availability problem to some extent - it either only solves the withdrawal problem of certain state models (including the UTXO state model) or requires users to be online regularly (Plasma Free).

Direction 2: Responsible pre-confirmation during the SDR and validity period. The goal here is to provide the user with a quick pre-confirmation, i.e., the sequencer confirms whether the transaction is included, and if the promise of inclusion is not fulfilled, the confirmation should allow the user to question and cut the economic benefit of the sequencer. The challenge here is that proving that the non-inclusion (a necessary condition required for cut) may require the user to provide additional data, which can simply be withheld by the sequencer. Therefore, it is reasonable to assume that we would at least require the SDR or Validium to employ a (possibly permitted) data availability committee for all of its calldata or transaction history, so that the committee can provide proof of non-inclusion (of pre-confirmed transactions) upon request.

Direction 3: Recover quickly from delayed failures. A single operating system can be affected by real-time failures (e.g., Arbitrum goes offline in an inscription event). Can we design a system with minimal service disruption in this case? In a sense, L2 that allows self-ordering and state proposal does guarantee that there will be no prolonged validity failures. At present, there is a lack of in-depth research on the design of single operating systems that are more resistant to short-term delay failures. One potential solution is to make delayed failures a liability by providing cuts for delayed failures. Another possible solution is to shorten the delay before the takeover (currently set at about a week).

conclusion

Scaling the global settlement ledger while minimizing trust is a challenge. In today's rollup and data availability space, there's no clear distinction between scaling up and scaling out. To truly extend trust-minimized systems to every person on the planet, we need to build trust-minimized and horizontally scalable systems.

Read on