Scalable and Probabilistic Leaderless BFT Consensus through Metastability Team Rocket, Maofan Yin, Kevin Sekniqi, Robbert van Renesse, and Emin Gün Sirer Cornell University∗ Abstract—This paper introduces a family of leaderless Byzan- consumes around 63.49 TWh/year [20], about twice as all tine fault tolerance protocols, built around a metastable mecha- of Denmark [14]. Moreover, these protocols suffer from an nism via network subsampling. These protocols provide a strong inherent scalability bottleneck that is difficult to overcome probabilistic safety guarantee in the presence of Byzantine adver- saries while their concurrent and leaderless nature enables them through simple reparameterization [17]. arXiv:1906.08936v1 [cs.DC] 21 Jun 2019 to achieve high throughput and scalability. Unlike blockchains This paper introduces a new family of consensus protocols that rely on proof-of-work, they are quiescent and green. Unlike called Snow. Inspired by gossip algorithms, this family gains traditional consensus protocols where one or more nodes typically its properties through a deliberately metastable mechanism. process linear bits in the number of total nodes per decision, no Specifically, the system operates by repeatedly sampling the node processes more than logarithmic bits. It does not require accurate knowledge of all participants and exposes new possible network at random, and steering correct nodes towards a com- tradeoffs and improvements in safety and liveness for building mon outcome. Analysis shows that this metastable mechanism consensus protocols. is powerful: it can move a large network to an irreversible The paper describes the Snow protocol family, analyzes its state quickly, where the irreversibility implies that a sufficiently guarantees, and describes how it can be used to construct the core large portion of the network has accepted a proposal and a of an internet-scale electronic payment system called Avalanche, which is evaluated in a large scale deployment. Experiments conflicting proposal will not be accepted with any higher than demonstrate that the system can achieve high throughput (3400 negligible (ε) probability. tps), provide low confirmation latency (1.35 sec), and scale well Similar to Nakamoto consensus, the Snow protocol family compared to existing systems that deliver similar functionality. provides a probabilistic safety guarantee, using a tunable For our implementation and setup, the bottleneck of the system security parameter that can render the possibility of a consen- is in transaction verification. sus failure arbitrarily small. Unlike Nakamoto consensus, the I. I NTRODUCTION protocols are green, quiescent and efficient; they do not rely on proof-of-work [23] and do not consume energy when there are Achieving agreement among a set of distributed hosts lies at no decisions to be made. The efficiency of the protocols stems the core of countless applications, ranging from Internet-scale partly from removing the leader bottleneck: each node requires services that serve billions of people [12], [30] to cryptocurren- O(1) communication overhead per round and O(log n) rounds cies worth billions of dollars [1]. To date, there have been two in expectation, whereas classical consensus protocols have one main families of solutions to this problem. Traditional consen- or more nodes that require O(n) communication per round sus protocols rely on all-to-all communication to ensure that all (phase). Further, the Snow family tolerates discrepancies in correct nodes reach the same decisions with absolute certainty. knowledge of membership, as we discuss later. In contrast, Because they require quadratic communication overhead and classical consensus protocols require the full and accurate accurate knowledge of membership, they have been difficult knowledge of n as its safety foundation. to scale to large numbers of participants. On the other hand, Snow’s subsampled voting mechanism has two additional Nakamoto consensus protocols [8], [24], [26], [35], [43]–[46], properties that improve on previous approaches for consensus. [53]–[55] have become popular with the rise of Bitcoin. These Whereas the safety of quorum-based approaches breaks down protocols provide a probabilistic safety guarantee: Nakamoto immediately when the predetermined threshold f is exceeded, consensus decisions may revert with some probability ε. A Snow’s probabilistic safety guarantee degrades smoothly when protocol parameter allows this probability to be rendered Byzantine participants exceed f . This makes it easier to arbitrarily small, enabling high-value financial systems to be pick the critical threshold f . It also exposes new tradeoffs constructed on this foundation. This family is a natural fit for between safety and liveness: the Snow family is more efficient open, permissionless settings where any node can join the sys- when the fraction of Byzantine nodes is small, and it can be tem at any time. Yet, these protocols are costly, wasteful, and parameterized to tolerate more than a third of the Byzantine limited in performance. By construction, they cannot quiesce: nodes by trading off liveness. their security relies on constant participation by miners, even To demonstrate the potential of this protocol family, we when there are no decisions to be made. Bitcoin currently illustrate a practical peer-to-peer payment system, Avalanche. In effect, Avalanche executes multiple Snowball instances with ∗ Blasts off at the speed of light! — Team Rocket the aid of a Directed Acyclic Graph (DAG). The DAG serves to An earlier version of this paper published on May 16th 2018, IPFS, was titled piggyback multiple instances, reducing the cost from O(log n) Snowflake to Avalanche: A Novel Metastable Consensus Protocol Family for Cryptocurrencies. to O(1) per node and streamlining the path where there are

no conflicting transactions. 100 Failure Probability Overall, the main contribution of this paper is to introduce 10−20 a brand new family of consensus protocols, based on ran- domized sampling and metastable decision. The next section 3f + 1 classical 10−50 provides the model, goals and necessary assumptions for the Bitcoin new protocols. Section III gives intuition behind the proto- Snowflake-7 cols, followed by their full specification, Section IV provides Snowflake-8 methodology used by our formal analysis of safety and liveness 10−107 in Appendix A, Section V describes Avalanche, a Bitcoin-like 0.0 0.1 0.2 0.3 0.4 0.5 payment system, Section VI evaluates Avalanche, Section VII presents related work, and finally, Section VIII summarizes Percentage of Byzantine Nodes (f /n) our contributions. Fig. 2: Figure 1 with log-scaled y-axis. II. M ODEL AND G OALS at the expense of safety. Lastly, unlike traditional consensus a) Key Guarantees protocols and similar to Nakamoto, our protocols benefit from lower adversarial presence, as discussed in property P3 below. Safety: Unlike classical consensus protocols, and similar to longest-chain-based consensus protocols such as Nakamoto consensus [43], we adopt an ε-safety guarantee that is proba- Number of Rounds Number of Blocks bilistic. In practice, this probabilistic guarantee is as strong as 103 α=8 traditional safety guarantees, since appropriately small choices 103 of ε can render consensus failure negligible, lower than the 102 probability of hardware failure due to random events. 102 101 α=7 1.0 Bitcoin Snowflake Failure Probability 3f + 1 classical 101 Bitcoin 0.0 0.1 0.3 0.5 0.0 0.1 0.3 0.5 Snowflake-7 Percentage of Byzantine Nodes (f /n) 0.5 Snowflake-8 Fig. 3: The relation between f /n and the convergence speed, given ε = 10−20 . The left figure shows the expected number of blocks to guarantee ε in Bitcoin, which, counter to commonly 0.0 accepted folk wisdom, is not a constant 6, but depends on 0.0 0.1 0.2 0.3 0.4 0.5 adversary size to withhold the same ε. The right figure shows Percentage of Byzantine Nodes (f /n) the maximum number of rounds required by Snowflake, where being different from Bitcoin, the asymptote is below 0.5 and Fig. 1: The relation between f /n and the probability of system varies by the choice of parameters. safety failure (decision of two conflicting proposals), given a choice of finality. Classical BFT protocols that tolerate f Formal Guarantees: Let the system be parameterized for failures will encounter total safety failure when the threshold an ε safety failure probability under a maximum expected f is exceeded even by one additional node. The Bitcoin curve number of adversarial nodes. Let O(log n) < tmax < ∞ be shows a typical finality choice for Bitcoin where a block the upper bound of the execution of the protocols. The Snow is considered final when it is “buried” in a branch having protocols then provide the following guarantees: 6 additional blocks compared to any other competing forks. P1. Safety. When decisions are made by any two correct Snowflake belongs to the Snow family, and it is configured nodes, they decide on conflicting transactions with negligible with k = 10, β = 150. Snowflake-7,8 uses α = 7 and α = 8 probability (≤ ε). respectively. P2. Liveness (Upper Bound). Snow protocols terminate with a strictly positive probability within tmax √ rounds. Liveness: All our protocols provide a non-zero probability P3. Liveness (Lower Bound). If f ≤ O( n), then the Snow guarantee of termination within a bounded amount of time. protocols terminate with high probability (≥ 1−ε) in O(log n) This bounded guarantee is similar to various protocols such rounds. as Ben-Or [7] and longest-chain protocols. In particular, for Nakamoto consensus, the number of required blocks for a b) Network transaction increases exponentially with the number of ad- In the standard definition of asynchrony [7], message trans- versarial nodes, with an asymptote at f = n/2 wherein mission is finite, but the distribution is undefined. This implies the number is infinite. In other words, the time required that the scheduling of message transmission itself could behave for finality approaches ∞ as f approaches n/2 (Figure 3). arbitrarily, and potentially even maliciously. We use a modified Furthermore, the required number of rounds is calculable version of this model, which is well-accepted [6], [22], [25], ahead of time, as to allow the system designer to tune liveness [33], [39] in the analysis of epidemic networks and gossip-

based stochastic systems. In particular, we fix the distribution can use an already established proof-of-stake based mecha- of message delay to that of the exponential distribution. We nism [27]. The full design of a peer-to-peer payment system note that, just like in the standard asynchronous model, there incorporating staking, unstaking and minting mechanism is is a strictly non-zero probability that any correct node may beyond the scope of this paper, whose focus is on the core execute its next local round only after an arbitrarily large consensus protocol. amount of time has passed. Furthermore, we also note that f) Flooding Attacks scheduling only applies to correct nodes, and the adversary may execute arbitrarily, as discussed later. Flooding/spam attacks are a problem for any distributed system. Without a protection mechanism, an attacker can c) Achieving Liveness generate large numbers of transactions and flood protocol Classical consensus that works with asynchrony does not data structures, consuming storage. There are a multitude of get stuck in a single phase of voting because the vote initiator techniques to deter such attacks, including network-layer pro- always polls votes from all known participants and wait for tection, proof-of-authority, local proof-of-work and economic n − f responses. In our system, however, nodes operate via mechanisms. In Avalanche, we use transaction fees, making subsampling, hence it is possible for a single sample to select a such attacks costly even if the attacker is sending money back majority of adversarial nodes, and therefore the node gets stuck to addresses under its control. waiting for the responses. To ensure liveness, a node should be g) Additional Assumptions able to wait with some timeout. Therefore, our protocols are synchronous in order to guarantee liveness. Lastly, it is worth We do not assume that all members of the network are noting that Nakamoto consensus is synchronous, in which known to all participants, but rather may temporarily have the required difficulty of proof-of-work is dependent on the some discrepancies in network view. We quantify the bounds maximum network delay [44]. on the discrepancy in Appendix A-F. We assume a safe boot- strapping mechanism, similar to that of Bitcoin, that enables a d) Adversary node to connect with sufficiently many correct nodes to acquire The adversarial nodes execute under their own internal a statistically unbiased view of the network. We do not assume scheduler, which is unbounded in speed, meaning that all a PKI. Finally, we make standard cryptographic assumptions adversarial nodes can execute at any infinitesimally small related to digital signatures and hash functions. point in time, unlike correct nodes. The adversary can view III. P ROTOCOL D ESIGN the state of every honest node at all times and can instantly modify the state of all adversarial nodes. It cannot, however, We start with a non-BFT protocol called Slush and pro- schedule or modify communication between correct nodes. gressively build up to Snowflake and Snowball, all based on Finally, we make zero assumptions about the behavior of the the same common majority-based metastable voting mecha- adversary, meaning that it can choose any execution strategy of nism. These protocols are single-decree consensus protocols its liking. In short, the adversary is computationally bounded of increasing robustness. We provide full specifications for the (it cannot forge digital signatures) but otherwise is point-to- protocols in this section, and defer the analysis to the next point informationally unbounded (knows all state) and round- section, and present formal proofs in the appendix. adaptive (can modify its strategy at any time). A. Slush: Introducing Metastability e) Sybil Attacks The core of our approach is a single-decree consensus Consensus protocols provide their guarantees based on as- protocol, inspired by epidemic or gossip protocols. The sim- sumptions that only a fraction of participants are adversarial. plest protocol, Slush, is the foundation of this family, shown These bounds could be violated if the network is naively left in Figure 4. Slush is not tolerant to Byzantine faults, only open to arbitrary participants. In particular, a Sybil attack [21], crash-faults (CFT), but serves as an illustration for the BFT wherein a large number of identities are generated by an protocols that follow. For ease of exposition, we will describe adversary, could be used to exceed the adversarial bound. the operation of Slush using a decision between two conflicting A long line of work, including PBFT [13], treats the colors, red and blue. Sybil problem separately from consensus, and rightfully so, In Slush, a node starts out initially in an uncolored state. as Sybil control mechanisms are distinct from the underlying, Upon receiving a transaction from a client, an uncolored node more complex agreement protocol1 . Nakamoto consensus, for updates its own color to the one carried in the transaction instance, uses proof-of-work [4] to limit Sybils, which requires and initiates a query. To perform a query, a node picks a miners to continuously stake a hardware investment. Other small, constant sized (k) sample of the network uniformly at protocols, discussed in Section VII, rely on proof-of-stake or random, and sends a query message. Upon receiving a query, proof-of-authority. The consensus protocols presented in this an uncolored node adopts the color in the query, responds with paper can adopt any Sybil control mechanism, although proof- that color, and initiates its own query, whereas a colored node of-stake is most aligned with their quiescent operation. One simply responds with its current color. Once the querying node 1 collects k responses, it checks if a fraction ≥ α are for the This is not to imply that every consensus protocol can be coupled/decoupled same color, where α > ⌊k/2⌋ is a protocol parameter. If the α with every Sybil control mechanism. threshold is met and the sampled color differs from the node’s

1: procedure ON Q UERY(v, col ′ ) 1: procedure SNOWFLAKE L OOP(u, col 0 ∈ {R, B, ⊥}) 2: if col = ⊥ then col := col ′ 2: col := col 0 , cnt := 0 3: RESPOND (v, col ) 3: while undecided do 4: procedure SLUSH L OOP(u, col 0 ∈ {R, B, ⊥}) 4: if col = ⊥ then continue 5: col := col 0 // initialize with a color 5: K := SAMPLE(N \u, k) 6: for r ∈ {1 . . . m} do 6: P := [QUERY(v, col ) for v ∈ K] 7: // if ⊥, skip until ON Q UERY sets the color 7: maj := false 8: if col = ⊥ then continue 8: for col ′ ∈ {R, B} do 9: // randomly sample from the known nodes 9: if P.COUNT(col ′ ) ≥ α then 10: K := SAMPLE(N \u, k) 10: maj := true 11: P := [QUERY(v, col ) for v ∈ K] 11: if col ′ 6= col then 12: for col ′ ∈ {R, B} do 12: col := col ′ , cnt := 1 13: if P.COUNT(col ′ ) ≥ α then 13: else 14: col := col ′ 14: if ++cnt > β then ACCEPT(col ) 15: ACCEPT (col ) 15: if maj = false then cnt := 0 Fig. 4: Slush protocol. Timeouts elided for readability. Fig. 5: Snowflake. 1: procedure SNOWBALL L OOP(u, col 0 ∈ {R, B, ⊥}) own color, the node flips to that color. It then goes back to 2: col := col 0 , lastcol := col 0 , cnt := 0 the query step, and initiates a subsequent round of query, for a 3: d[R] := 0, d[B] := 0 total of m rounds. Finally, the node decides the color it ended 4: while undecided do up with at time m. 5: if col = ⊥ then continue Slush has a few properties of interest. First, it is almost 6: K := SAMPLE(N \u, k) 7: P := [QUERY(v, col ) for v ∈ K] memoryless: a node retains no state between rounds other 8: maj := false than its current color, and in particular maintains no history 9: for col ′ ∈ {R, B} do of interactions with other peers. Second, unlike traditional 10: if P.COUNT(col ′ ) ≥ α then consensus protocols that query every participant, every round 11: maj := true involves sampling just a small, constant-sized slice of the 12: d[col ′ ]++ 13: if d[col ′ ] > d[col ] then network at random. Third, Slush makes progress under any 14: col := col ′ network configuration (even fully bivalent state, i.e. 50/50 split 15: if col ′ 6= lastcol then between colors), since random perturbations in sampling will 16: lastcol := col ′ , cnt := 1 cause one color to gain a slight edge and repeated samplings 17: else afterwards will build upon and amplify that imbalance. Finally, 18: if ++cnt > β then ACCEPT(col ) if m is chosen high enough, Slush ensures that all nodes will 19: if maj = false then cnt := 0 be colored identically with high probability (whp). Each node Fig. 6: Snowball. has a constant, predictable communication overhead per round, and m grows logarithmically with n. show, there exists an irreversible state after which a decision is The Slush protocol does not provide a strong safety guar- inevitable. Correct nodes begin to commit past the irreversible antee in the presence of Byzantine nodes. In particular, if the state to adopt the same color, whp. For additional intuition, correct nodes develop a preference for one color, a Byzantine which we do not expand in this paper, there also exists a phase- adversary can attempt to flip nodes to the opposite so as to keep shift point, where the Byzantine nodes lose ability to keep the network in balance, preventing a decision. We address this network in a bivalent state. in our first BFT protocol that introduces more state storage at C. Snowball: Adding Confidence the nodes. Snowflake’s notion of state is ephemeral: the counter gets B. Snowflake: BFT reset with every color flip. Snowball augments Snowflake with confidence counters that capture the number of queries that Snowflake augments Slush with a single counter that cap- have yielded a threshold result for their corresponding color tures the strength of a node’s conviction in its current color. (Figure 6). A node decides if it gets β consecutive chits for a This per-node counter stores how many consecutive samples color. However, it only changes preference based on the total of the network by that node have all yielded the same color. accrued confidence. The differences between Snowflake and A node accepts the current color when its counter exceeds Snowball are as follows: β, another security parameter. Figure 5 shows the amended 1) Upon every successful query, the node increments its protocol, which includes the following modifications: confidence counter for that color. 1) Each node maintains a counter cnt; 2) A node switches colors when the confidence in its current 2) Upon every color change, the node resets cnt to 0; color becomes lower than the confidence value of the new 3) Upon every successful query that yields ≥ α responses for color. the same color as the node, the node increments cnt. When the protocol is correctly parameterized for a given IV. A NALYSIS threshold of Byzantine nodes and a desired ε-guarantee, it can Due to space limits, we move some core details to Ap- ensure both safety (P1) and liveness (P2, P3). As we later pendix A, where we show that under certain independent and

distinct assumptions, the Snow family of consensus protocols the reversibility probabilities of the Slush process. All the other provide safety (P1) and liveness (P2, P3) properties. In this formal results in this paper are, informally speaking, intuitive section, we summarize our core results and provide some proof derivations and augmentations of this result. sketches. Theorem 1. Let the configuration of the system at time t be a) Notation St = n/2+δ, meaning that the network has drifted to 2δ more Let the network consist of a set of n nodes (represented by blue nodes than red nodes (δ = 0 means that red and blue are set N ), where c are correct nodes (represented by set C) and f equal). Let ξδ be the probability of absorption to the all-red are Byzantine nodes (represented by set B). Let u, v ∈ C refer state (minority). Then, for all 0 ≤ δ ≤ n/2, we have α k−α to any two correct nodes in the network. Let k, α, β ∈ Z+ 1/2 − δ/n 1/2 + δ/n ξδ ≤ be positive integers where α > ⌊k/2⌋. From now on, k will α/k 1 − α/k (1) always refer to the network sample size, where k ≤ n, and α ≤ e−2((α/k)−(1/2)+(δ/n)) 2 k will be the majority threshold required to consider the voting experiment a “success”. In general, we will refer to S as the Proof. This bound follows from the Hoeffding-derived tail state (or configuration) of the network at any given time. bounds of the hypergeometric distribution by Chvatal [15]. b) Modelling Framework We note that Chvatal’s bounds are introduced for simplicity To formally model our protocols, we use continuous-time of exposition and are extremely weak. We leave the full closed- Markov processes (CTMC). The state space is enumerable (and form expression in Theorem 2 to the appendix, which is also finite), and state transitions occur in continuous time. CTMCs significantly stronger than the Chvatal bound. Nonetheless, naturally model our protocols since state transitions do not using the loose Chvatal bound, we make the key observation occur in epochs and in lockstep for every node (at the end of that as the drift δ increases, given fixed α and k, the probability every time unit) but rather occur at any time and independently of moving towards the minority value decreases exponentially of each other. fast (in fact, even faster, since there is a quadratic term in We focus on binary consensus, although the safety results the inverse exponent). Additionally, the same result holds for generalize to more than two values. We can think of the increasing α given a fixed k. network as a set of nodes either colored red or blue, and we The outcomes of this theorem demonstrate a key property: will refer to this configuration at time t as St . We model once the network loses full bivalency (i.e. δ > 0), it tends to our protocols through a continuous-time process with two topple and converge rapidly towards the majority color, unable absorbing states, where either all nodes are red or all nodes to revert back to the minority with significant probability. are blue. The state space S of the stochastic process is a This is the fundamental property exploited by our protocols, condensed version of the full configuration space, where each and what makes them secure despite only sampling a small, state {0, . . . , n} represents the total number of blue nodes in constant-sized set of the network. The core result that follows the system. for the safety guarantees in Snowflake is in finding regions The simplification that allows us to analyze this system is (given specific parameter choices) where the reversibility holds to obviate the need to keep track of all of the execution paths, with no higher than ε probability even under adversarial as well as all possible adversarial strategies, and rather focus presence. entirely on a single state of interest, without regards to how b) Snowflake we achieve this state. More specifically, the core extractable For Snowflake, we relax the assumption that all nodes are insight of our analysis is in identifying the irreversibility state correct and assume that some fraction of nodes are adver- of the system, the state upon which so many correct nodes have sarial. In Slush, once the network gains significant majority usurped either red or blue that reverting back to the minority support for one proposal (e.g., the color blue), it becomes color is highly unlikely. unlikely for a minority proposal (e.g., the color red) to ever A. Safety become decided in the future (irreversibility). Furthermore, a) Slush in Slush nodes simply have to execute the protocol for a deterministic number of rounds, m, which is known ahead Unless explicitly stated, we assume that L(u) = N for of protocol execution. When introducing adversarial nodes all u ∈ N . We model the dynamics of the system through with arbitrary strategies, however, nodes cannot simply execute a continuous-time process where two states are absorbing, the protocol for a deterministic number of rounds, since namely the all-red or all-blue state2 . Let {Xt≥0 } be the random the adversary may nondeterministically affect the value of variable that describes the state of the system at time t, where m. Instead, correct nodes must implement a mechanism to X0 = {0, . . . , c}. We begin by immediately discussing the explicitly detect that irreversibility has been reached. To that most important result of the safety dynamics of our processes: end, in Snowflake, every correct node implements a decision function, D(u, St , blue) → {0, 1}, which is a random variable 2 that outputs 1 if node u detects that the network has reached an Note that, in reality, we do not require that all nodes be the same color in order to ensure that we decide on that color, only n − α − 1. This is only a irreversibility state at time t for blue. The decision mechanism simplification in our description. is probabilistic, meaning that it can fail, although it is designed

to do so with negligible probability. We now sketch the proof is strictly stronger than that of Snowflake. of Snowflake. Proof Sketch. We define safety failure to be the event wherein B. Liveness any two correct nodes u and v decide on blue and red, i.e. We assume that the observed adversarial presence 0 ≤ f ′ ≤ D(u, St , blue) → 1 and D(v, St′ , red) → 1, for any two n(k − α − ψ)/k ≤ f , where we refer to ψ as the buffer times t and t′ . We again model the system as a continuous zone. The bigger ψ, the quicker the ability of the decision time random process. The state space is defined the same mechanism to finalize a value. If, of course, ψ approaches way as in Slush. However, we note some critical subtleties. zero or becomes negative, then we violate the upper bound of First, unlike in Slush, where it is clear that, once nodes are adversarial tolerance for the parameterized system, and thus the same color, a decision has been made, this is no longer the the adversary can, with high probability, stall termination by case for Snowflake. In fact, even if all correct nodes accept a simply choosing to not respond, although the safety guarantees color, it is entirely possible for a correct node to switch again. may still hold. Second, we also have to consider the decision mechanism Assuming that ψ is strictly positive, termination is strictly D(∗). To analyze, we obviate the need to keep track of all finite under all network configurations where a proposal has at possible network configurations under all possible adversarial least α support. Furthermore, not only is termination finite with strategies and assume that a node u first decides on blue. Then, probability one, we also have a strictly positive probability conditioned on the state of the network upon u deciding, we of termination within any bounded amount of time tmax , as calculate the probability that another node v decides red, which discussed in Lemma 4, which follows from Theorem 3. This is a function of both the probability that the network reverts captures liveness property P2. towards a minority blue state and that v decides at that precise Proof Sketch. Using the construction of the system to prove state. We show that under appropriate choices of k, α, and irreversibility, we characterize the distribution of the average β, we can construct highly secure instances of Snowflake (i.e. time spent (sojourn times) at each state before the system safety failure with probability ≤ ε) when the network reaches terminates execution by absorption at either absorbing state. some bias of δ, as shown in Figure 7. A concrete example is The termination time is then a union of these times. provided in Figure 1. For non-conflicting transactions, since the adversary is un- able to forge a conflict, the time to decision is simply the δ mixing time of the network starting from a configuration where every correct node is uninitialized. 0 c/2 c Proof Sketch. Mixing times for gossip is well characterized to ≤ε be as O(log n), and this result holds for all our protocols. Liveness guarantees under a fully bivalent network config- uration reduce to an optimal convergence √ time of O(log n) Fig. 7: Representation of the irreversibility state, which exists rounds if the adversary is at most O( n), for α = ⌊k/2⌋ + 1. when – even under f Byzantine nodes – the number of blue We leave additional √ detains to Lemma 5. When the adversary correct nodes exceeds that of red correct nodes by more than surpasses O( n) nodes, the worst-case number of rounds 2δ. increases polynomially, and as f approaches n/2 it approaches exponential convergence rates. Proof Sketch. We modify Theorem 3 to include the adversary, c) Snowball which reverts any imbalances in the network by keeping Snowball is an improvement over Snowflake, where random network fully bivalent. perturbations in network samples are reduced by introducing a) Multi-Value Consensus a limited form of history, which we refer to as confidence. The fundamental takeaway is that the history enables Snow- Our binary consensus protocol could support multi-value ball to provide stronger security against safety failures than consensus by running logarithmic binary instances, one for Snowflake. each bit of the proposed value. However, such theoretical Proof Sketch. We structure the model via a game of balls reduction might not be efficient in practice. Instead, we could and urns, where each urn represents one of the correct nodes, directly incorporate multi-values as multi-colors in the proto- and the ball counts correspond to confidences in either color. col, where safety analysis could still be generalized. Using this model, the analysis applies martingale concentration As for liveness, we sketch a leaderless initialization mecha- inequalities to prove that once the system has reached the nism, which in expectation uses O(log n) rounds under the irreversibility state, then the growth of the confidence of the assumption that the network is synchronized. Every node majority decided color will perpetually grow and drift further operates in three phases: in the first phase, it gossips and away from those of the minority color, effectively rendering collects proposals for O(log n) rounds, where each round lasts reversibility less likely over time. If the drifts ever revert, then for the maximum message delay; in the second phase, each reversibility analysis becomes identical to that of Snowflake. node stops collecting proposals, and instead gossips all new Since now the adversary must overcome the confidence drifts, values for an additional O(log n) rounds; in the third phase, as well as the irreversibility dynamics, the security of Snowball each node samples the proposals it knows of locally, checking

for values that have an α majority, ordered deterministically, 1: procedure INIT such as by hash values. Finally, a node selects the first value by 2: T := ∅ // the set of known transactions 3: Q := ∅ // the set of queried transactions the order as its initial state when it starts the subsequent con- 4: procedure ON G ENERATE T X(data) sensus protocol. In a cryptocurrency setting, the deterministic 5: edges := {T ′ ← T : T ′ ∈ PARENT S ELECTION(T )} ordering function would incorporate fees paid out for every 6: T := T X(data, edges) new proposal, which means that the adversary is financially 7: ON R ECEIVE T X (T ) limited in its ability to launch a fairness attack against the 8: procedure ON R ECEIVE T X(T ) initialization. While the design of initialization mechanisms is 9: if T ∈/ T then 10: if PT = ∅ then interesting, note that it is not necessary for a decentralized 11: PT := {T }, PT .pref := T payment system, as we show in Section V. 12: PT .last := T, PT .cnt := 0 Finally, we discuss churn and view discrepancies in the 13: else PT := PT ∪ {T } appendix. 14: T := T ∪ {T }, cT := 0. V. P EER - TO -P EER PAYMENT S YSTEM Fig. 8: Avalanche: transaction generation. We have implemented a bare-bones payment system, The central challenge in the maintenance of the DAG is to Avalanche, which supports Bitcoin transactions. In this section, choose among conflicting transactions. The notion of conflict we describe the design and sketch how the implementation is application-defined and transitive, forming an equivalence can support the value transfer primitive at the center of cryp- relation. In our cryptocurrency application, transactions that tocurrencies. Deploying a full cryptocurrency involves boot- spend the same funds (double-spends) conflict, and form a strapping, minting, staking, unstaking, and inflation control. conflict set (shaded regions in Figure 11) , out of which only While we have solutions for these issues, their full discussion a single one can be accepted. Note that the conflict set of a is beyond the scope of this paper, whose focus is centered on virtuous transaction is always a singleton. the novel Snow consensus protocol family. Avalanche embodies a Snowball instance for each conflict In a cryptocurrency setting, cryptographic signatures enforce set. Whereas Snowball uses repeated queries and multiple that only a key owner is able to create a transaction that spends counters to capture the amount of confidence built in con- a particular coin. Since correct clients follow the protocol flicting transactions (colors), Avalanche takes advantage of the as prescribed and never double spend coins, in Avalanche, DAG structure and uses a transaction’s progeny. Specifically, they are guaranteed both safety and liveness for their virtuous when a transaction T is queried, all transactions reachable from transactions. In contrast, liveness is not guaranteed for rogue T by following the DAG edges are implicitly part of the query. transactions, submitted by Byzantine clients, which conflict A node will only respond positively to the query if T and with one another. Such decisions may stall in the network, but its entire ancestry are currently the preferred option in their have no safety impact on virtuous transactions. We show that respective conflict sets. If more than a threshold of responders this is a sensible tradeoff, and that resulting system is sufficient vote positively, the transaction is said to collect a chit. Nodes for building complex payment systems. then compute their confidence as the total number of chits in A. Avalanche: Adding a DAG the progeny of that transaction. They query a transaction just once and rely on new vertices and possible chits, added to Avalanche consists of multiple single-decree Snowball in- the progeny, to build up their confidence. Ties are broken by stances instantiated as a multi-decree protocol that maintains an initial preference for first-seen transactions. Note that chits a dynamic, append-only directed acyclic graph (DAG) of all are decoupled from the DAG structure, making the protocol known transactions. The DAG has a single sink that is the immune to attacks where the attacker generates large, padded genesis vertex. Maintaining a DAG provides two significant subgraphs. benefits. First, it improves efficiency, because a single vote on a DAG vertex implicitly votes for all transactions on the B. Avalanche: Specification path to the genesis vertex. Second, it also improves security, Each correct node u keeps track of all transactions it has because the DAG intertwines the fate of transactions, similar learned about in set Tu , partitioned into mutually exclusive to the Bitcoin blockchain. This renders past decisions difficult conflict sets PT , T ∈ Tu . Since conflicts are transitive, if Ti to undo without the approval of correct nodes. and Tj are conflicting, then they belong to the same conflict When a client creates a transaction, it names one or more set, i.e. PTi = PTj . It’s worth noting this relation may sound parents, which are included inseparably in the transaction and counter-intuitive: conflicting transitions have the equivalence form the edges of the DAG. The parent-child relationships relation, because they are equivocations spending the same encoded in the DAG may, but do not need to, correspond to funds. application-specific dependencies; for instance, a child trans- We write T ′ ← T if T has a parent edge to transaction T ′ , ∗ action need not spend or have any relationship with the funds The “←”-relation is its reflexive transitive closure, indicating received in the parent transaction. We use the term ancestor a path from T to T ′ . DAGs built by different nodes are set to refer to all transactions reachable via parent edges back guaranteed to be compatible, though at any one time, the two in history, and progeny to refer to all children transactions and nodes may not have a complete view of all vertices in the their offspring. system. Specifically, if T ′ ← T , then every node in the system

1: procedure AVALANCHE L OOP hcT1 , d(T1 )i = h1, 6i PT1 2: while true do 3: find T that satisfies T ∈ T ∧ T ∈ /Q T1 P T2 = P T 3 4: K := P SAMPLE (N \u, k) P T 9 = P T 6 = P T7 5: P := v∈K QUERY(v, T ) h1, 5i T2 T3 h0, 0i 6: if P ≥ α then 7: cT := 1 8: // update the preference for ancestors h1, 2i T4 T5 h1, 3i T7 ∗ 9: for T ′ ∈ T : T ′ ← T do T6 h0, 0i 10: if d(T ′ ) > d(PT ′ .pref ) then 11: PT ′ .pref := T ′ T8 T9 h0, 0i 12: if T ′ 6= PT ′ .last then h1, 1i h1, 1i 13: PT ′ .last := T ′ , PT ′ .cnt := 1 14: else Fig. 11: Example of hchit, confidencei values. Darker boxes 15: ++PT ′ .cnt indicate transactions with higher confidence values. At most 16: else one transaction in each shaded region will be accepted. ∗ 17: for T ′ ∈ T : T ′ ← T do 18: PT ′ .cnt := 0 19: // otherwise, cT remains 0 forever discovered transactions. In particular, when node u discovers 20: Q := Q ∪ {T } // mark T as queried a transaction T through a query, it starts a one-time query Fig. 9: Avalanche: the main loop. process by sampling k random peers and sending a message to them, after T is delivered via ON R ECEIVE T X. 1: function IS P REFERRED(T ) Node u answers a query by checking whether each T ′ ∗ 2: return T = PT .pref such that T ′ ← T is currently preferred among competing 3: function IS S TRONGLY P REFERRED(T ) ∗ transactions ∀T ′′ ∈ PT ′ . If every single ancestor T ′ fulfills 4: return ∀T ′ ∈ T , T ′ ← T : IS P REFERRED(T ′ ) this criterion, the transaction is said to be strongly preferred, 5: function IS ACCEPTED(T ) and receives a yes-vote (1). A failure of this criterion at any 6: return T ′ yields a no-vote (0). When u accumulates k responses, it ((∀T ′ ∈ T , T ′ ← T : IS ACCEPTED(T ′ )) checks whether there are α yes-votes for T , and if so grants ∧ |PT | = 1 ∧ PT .cnt > β1 ) // safe early commitment the chit (chit value cT := 1) for T . The above process will ∨(PT .cnt > β2 ) // consecutive counter yield a labeling of the DAG with a chit value and associated 7: procedure ON Q UERY(j, T ) confidence for each transaction T . 8: ON R ECEIVE T X (T ) Figure 11 illustrates a sample DAG built by Avalanche. 9: RESPOND (j, IS S TRONGLY P REFERRED (T )) Similar to Snowball, sampling in Avalanche will create a Fig. 10: Avalanche: voting and decision primitives. positive feedback for the preference of a single transaction in its conflict set. For example, because T2 has larger confidence that has T will also have T ′ and the same relation T ′ ← T ; than T3 , its descendants are more likely collect chits in the and conversely, if T ′ ✟ ← future compared to T3 . ✟T , then no nodes will end up with T′ ← T. Similar to Bitcoin, Avalanche leaves determining the accep- Each node u can compute a confidence value, du (T ), from tance point of a transaction to the application. An application the progeny as follows: supplies an IS ACCEPTED predicate that can take into account X the value at risk in the transaction and the chances of a decision du (T ) = cuT ′ being reverted to determine when to decide. ∗ T ′ ∈Tu ,T ←T ′ Committing a transaction can be performed through a safe where cuT ′ stands for the chit value of T ′ for node u. Each early commitment. For virtuous transactions, T is accepted transaction initially has a chit value of 0 before the node gets when it is the only transaction in its conflict set and has a the query results. If the node collects a threshold of α yes- confidence greater than threshold β1 . As in Snowball, T can votes after the query, the value cuT ′ is set to 1, otherwise also be accepted after a β2 number of consecutive successful remains 0 forever. Therefore, a chit value reflects the result queries. If a virtuous transaction fails to get accepted due to from the one-time query of its associated transaction and a problem with parents, it could be accepted if reissued with becomes immutable afterwards, while d(T ) can increase as the different parents. Figure 8 shows how Avalanche performs par- DAG grows by collecting more chits in its progeny. Because ent selection and entangles transactions. Because transactions cT ∈ {0, 1}, confidence values are monotonic. that consume and generate the same UTXO do not conflict In addition, node u maintains its own local list of known with each other, any transaction can be reissued with different nodes Nu ⊆ N that comprise the system. For simplicity, we parents. assume for now Nu = N , and elide subscript u in contexts Figure 9 illustrates the protocol main loop executed by each without ambiguity. node. In each iteration, the node attempts to select a transaction Each node implements an event-driven state machine, cen- T that has not yet been queried. If no such transaction exists, tered around a query that serves both to solicit votes on each the loop will stall until a new transaction is added to T . It then transaction and to notify other nodes of the existence of newly selects k peers and queries those peers. If more than α of those

peers return a positive response, the chit value is set to 1. After TXg that, it updates the preferred transaction of each conflict set of the transactions in its ancestry. Next, T is added to the set Q so it will never be queried again by the node. The code that selects additional peers if some of the k peers are unresponsive TXa Ina1 Ina2 Inb1 Inb2 TXb is omitted for simplicity. Figure 10 shows what happens when a node receives a query for transaction T from peer j. First it adds T to T , TXc Inc1 Inc2 unless it already has it. Then it determines if T is currently strongly preferred. If so, the node returns a positive response Fig. 12: The underlying logical DAG structure used by to peer j. Otherwise, it returns a negative response. Notice Avalanche. The tiny squares with shades are dummy vertices that in the pseudocode, we assume when a node knows T , which just help form the DAG topology for the purpose of it also recursively knows the entire ancestry of T . This can clarity, and can be replaced by direct edges. The rounded gray be achieved by postponing the delivery of T until its entire regions are the conflict sets. ancestry is recursively fetched. In practice, an additional gossip process that disseminates transactions is used in parallel, but is not shown in pseudocode for simplicity. the conflict set could be very large in practice, because a rogue client can generate a large volume of conflicting transactions. C. Multi-Input UTXO Transactions Instead of keeping a container data structure for each conflict In addition to the DAG structure in Avalanche, an unspent set, we create a mapping from each UTXO to the preferred transaction output (UTXO) [43] graph that captures spending transaction that stands as the representative for the entire dependency is used to realize the ledger for the payment conflict set. This enables a node to quickly determine future system. To avoid ambiguity, we denote the transactions that conflicts, and the appropriate response to queries. Finally, we encode the data for money transfer transactions, while we call speed up the query process by terminating early as soon as the the transactions (T ∈ T ) in Avalanche’s DAG vertices. α threshold is met, without waiting for k responses. We inherit the transaction and address mechanisms from Bit- coin. At their simplest, transactions consist of multiple inputs b) DAG and outputs, with corresponding redeem scripts. Addresses are identified by the hash of their public keys, and signatures are Compared to Snowball, Avalanche introduces a DAG struc- generated by corresponding private keys. The full scripting ture that entangles the fate of unrelated conflict sets, each of language is used to ensure that a redeem script is authenticated which is a single-decree instance. This entanglement embodies to spend a UTXO. UTXOs are fully consumed by a valid a tension: attaching a virtuous transaction to undecided parents transaction, and may generate new UTXOs spendable by helps propel transactions towards a decision, while it puts named recipients. Multi-input transactions consume multiple transactions at risk of suffering liveness failures when parents UTXOs, and in Avalanche, may appear in multiple conflict turn out to be rogue. We can resolve this tension and provide sets. To account for these correctly, we represent transaction- a liveness guarantee with the aid of two mechanisms. input pairs (e.g. Ina1 ) as Avalanche vertices. The conflict First we adopt an adaptive parent selection strategy, where relation of transaction-input pairs is transitive because of transactions are attached at the live edge of the DAG, and each pair only spends one unspent output. Then, we use the are retried with new parents closer to the genesis vertex. This conjunction of IS ACCEPTED for all inputs of a transaction procedure is guaranteed to terminate with uncontested, decided to ensure that no transaction will be accepted unless all its parents, ensuring that a transaction cannot suffer liveness inputs are accepted (Figure 12). In other words, a transaction failure due to contested, rogue transactions. A secondary is accepted only if all its transaction-input pairs are accepted mechanism ensures that virtuous transactions with decided in their respective Snowball conflict sets. Following this idea, ancestry will receive sufficient chits. Correct nodes examine we finally implement the DAG of transaction-input pairs such the DAG for virtuous transactions that lack sufficient progeny that multiple transactions can be batched together per query. and emit no-op transactions to help increase their confidence. With these two mechanisms in place, it is easy to see that, a) Optimizations at worst, Avalanche will degenerate into separate instances of We implement some optimizations to help the system scale. Snowball, and thus provide the same liveness guarantee for First, we use lazy updates to the DAG, because the recursive virtuous transactions. definition for confidence may otherwise require a costly DAG Unlike other cryptocurrencies [48] that use graph vertices traversal. We maintain the current d(T ) value for each active directly as votes, Avalanche only uses DAG for the purpose of vertex on the DAG, and update it only when a descendant batching queries in the underlying Snowball instances. Because vertex gets a chit. Since the search path can be pruned at confidence is built by collected chits, and not by just the accepted vertices, the cost for an update is constant if the presence of a vertex, simply flooding the network with vertices rejected vertices have limited number of descendants and the attached to the rejected side of a subgraph will not subvert the undecided region of the DAG stays at constant size. Second, protocol.

D. Communication Complexity B. Throughput We first measure the throughput of the system by saturating Let the DAG induced by Avalanche have an expected it with transactions and examining the rate at which transac- branching factor of p, corresponding to the width of the DAG, tions are confirmed in the steady state. For this experiment, and determined by the parent selection algorithm. Given the β1 we first run Avalanche on 125 nodes with 10 client processes, and β2 decision threshold, a transaction that has just reached each of which maintains 400 outstanding transactions at any the point of decision will have an associated progeny Y. given time. Let m be the expected depth of Y. If we were to let the As shown by the first group of bars in Figure 13, the system Avalanche network make progress and then freeze the DAG at achieves 6851 transactions per second (tps) for a batch size of a depth y, then it will have roughly py vertices/transactions, 20 and above 7002 tps for a batch size of 40. Our system is of which p(y − m) are decided in expectation. Only pm saturated by a small batch size comparing to other blockchains recent transactions would lack the progeny required for a with known performance: Bitcoin batches several thousands decision. For each node, each query requires k samples, of transactions per block, Algorand [27] uses 2–10 Mbyte and therefore the total message cost per transaction is in blocks, i.e., 8.4–41.9K tx/batch and Conflux [38] uses 4 Mbyte expectation (pky)/(p(y − m)) = ky/(y − m). Since m is blocks, i.e., 16.8K tx/batch. These systems are relatively slow a constant determined by the undecided region of the DAG in making a single decision, and thus require a very large batch as the system constantly makes progress, message complexity (block) size for better performance. Achieving high throughput per node is O(k), while the total complexity is O(kn). with small batch size implies low latency, as we will show later. VI. E VALUATION 7.1K 7.2K 7.4K 6.9K 7.0K 6.7K 6.8K 6.9K 6.9K 8000 Throughput (tps.) 6.5K A. Setup 6000 We conduct our experiments on Amazon EC2 by running 4000 from hundreds (125) to thousands (2000) of virtual machine instances. We use c5.large instances, each of which simu- 2000 lates an individual node. AWS provides bandwidth of up to 2 Gbps, though the Avalanche protocol utilizes at most around 0 100 Mbps. 125 250 500 1000 2000 Our implementation supports two versions of transactions: Number of nodes one is the customized UTXO format, while the other uses the Fig. 13: Throughput vs. network size. Each pair of bars is code directly from Bitcoin 0.16. Both supported formats use produced with batch size of 20 and 40, from left to right. secp256k1 crypto library from bitcoin and provide the same address format for wallets. All experiments use the customized format except for the geo-replication, where results for both C. Scalability are given. To examine how the system scales in terms of the number We simulate a constant flow of new transactions from users of nodes participating in Avalanche consensus, we run exper- by creating separate client processes, each of which maintains iments with identical settings and vary the number of nodes separated wallets, generates transactions with new recipient from 125 up to 2000. addresses and sends the requests to Avalanche nodes. We use Figure 13 shows that overall throughput degrades about several such client processes to max out the capacity of our 1.34% to 6909 tps when the network grows by a factor of system. The number of recipients for each transaction is tuned 16 to n = 2000. This degradation is minor compared to the to achieve average transaction sizes of around 250 bytes (1–2 fluctuation in performance of repeated runs. Note that the x- inputs/outputs per transaction on average and a stable UTXO size), the current average transaction size of Bitcoin. To utilize 19.1K 18.8K 19.2K 19.5K 18.1K Throughput (tps.) 20000 the network efficiently, we batch up to 40 transactions during a query, but maintain confidence values at individual transaction 15000 granularity. 7.4K 10000 7.0K 7.1K 7.2K 6.9K All reported metrics reflect end-to-end measurements taken from the perspective of all clients. That is, clients examine the 5000 total number of confirmed transactions per second for through- put, and, for each transaction, subtract the initiation timestamp 0 from the confirmation timestamp for latency. Each throughput 125 250 500 1000 2000 experiment is repeated for 5 times and standard deviation is Number of nodes indicated in each figure. As for security parameters, we pick k = 10, α = 0.8, β1 = 11, β2 = 150, which yields an MTTF Fig. 14: Throughput for batch size of 40, with (left) and of ˜1024 years. without (right) signature verification.

1.0 data show that median latency is more-or-less independent of hist. cdf. 0.03 network size. 0.02 0.5 1.0 2.0 b20-dsa 1.5 0.01 1.0 0.0 0.5 0.00 0.8 0.0 0.01 0.05 0.1 6 6 1 2 0.01 0.05 0.1 6 6 1 2 2.0 0.20 0.42 0.20 0.42 b40-dsa 1.5 Time (sec.) 1.0 0.6 Time (sec.) Fig. 15: Transaction latency distribution for n = 2000. The 0.5 x-axis is the transaction latency in log-scaled seconds, while 0.0 the y-axis is the portion of transactions that fall into the con- 2.0 b20-raw firmation time (normalized to 1). Histogram of all transaction 1.5 0.4 latencies for a client is shown on the left with 100 bins, while 1.0 its CDF is on the right. 0.5 0.0 axis is logarithmic. 2.0 Avalanche acquires its scalability from three sources: first, 0.2 b40-raw 1.5 maintaining a partial order that captures only the spending 1.0 relations allows for more concurrency than a classical BFT 0.5 replicated log that linearizes all transactions; second, the lack 0.0 of a leader naturally avoids bottlenecks; finally, the number of 125 250 500 1000 2000 messages each node has to handle per decision is O(k) and does not grow as the network scales up. Number of nodes D. Cryptography Bottleneck Fig. 16: Transaction latency vs. network size. “b” indicates batch size and “raw” is the run without signature verification. We next examine where bottlenecks lie in our current implementation. The purple bar on the right of each group in Figure 14 shows the throughput of Avalanche with signa- F. Misbehaving Clients ture verification disabled. Throughputs get approximately 2.6x We next examine how rogue transactions issued by misbe- higher, compared to the blue bar on the left. This reveals that having clients that double spend unspent outputs can affect cryptographic verification overhead is the current bottleneck of latency for virtuous transactions created by honest clients. our system implementation. This bottleneck can be addressed We adopt a strategy to simulate misbehaving clients where a by offloading transaction verification to a GPU. Even without fraction (from 0% to 25%) of the pending transactions conflict such optimization, 7K tps is far in excess of extant blockchains. with some existing ones. The client processes achieve this by E. Latency designating some double spending transaction flows among all simulated pending transactions and sending the conflicting The latency of a transaction is the time spent from the transactions to different nodes. We use the same setup with moment of its submission until it is confirmed as accepted. n = 1000 as in the previous experiments, and only measure Figure 15 tallies the latency distribution histogram using the throughput and latency of confirmed transactions. same setup as for the throughput measurements with 2000 Avalanche’s latency is only slightly affected by misbehaving nodes. The x-axis is the time in seconds while the y-axis is clients, as shown in Figure 17. Surprisingly, maximum laten- the portion of transactions that are finalized within the corre- cies drop slightly when the percentage of rogue transactions sponding time period. This figure also outlines the Cumulative increases. This behavior occurs because, with the introduction Distribution Function (CDF) by accumulating the number of of rogue transactions, the overall effective throughput is re- finalized transactions over time. duced and thus alleviates system load. This is confirmed by This experiment shows that most transactions are confirmed 1.0 2.0 Time (sec.) within approximately 0.3 seconds. The most common latencies b40-byz are around 206 ms and variance is low, indicating that nodes 1.5 converge on the final value as a group around the same time. 0.5 1.0 The second vertical line shows the maximum latency we 0.5 observe, which is around 0.4 seconds. 0.0 Figure 16 shows transaction latencies for different numbers 0% 5% 10% 15% 20% 25% of nodes. The horizontal edges of boxes represent minimum, Rogue transactions in percentage first quartile, median, third quartile and maximum latency respectively, from bottom to top. Crucially, the experimental Fig. 17: Latency vs. ratio of rogue transactions.

H. Comparison to Other Systems 8000 7.4K 7.1K 6.7K Throughput (tps.) 6.6K 6.5K 6.5K 6000 Though there are seemingly abundant blockchain or cryp- tocurrency protocols, most of them only present a sketch of 4000 their protocols and do not offer practical implementation or evaluation results. Moreover, among those who do provide re- 2000 sults, most are not evaluated in realistic, large-scale (hundreds to thousands of full nodes participating in consensus) settings. 0 0% 5% 10% 15% 20% 25% Therefore, we choose Algorand and Conflux for our compar- ison. Algorand, Conflux, and Avalanche are all fundamentally Rogue transactions in percentage different in their design. Algorand’s committee-scale consen- Fig. 18: Throughput vs. ratio of rogue transactions. sus algorithm falls into the classical BFT consensus category, and Conflux extends Nakamoto consensus by a DAG structure 1.0 to facilitate higher throughput, while Avalanche belongs to a 0.06 hist. cdf. new protocol family based on metastability. Additionally, we use Bitcoin [43] as a baseline. 0.04 Both Algorand and Avalanche evaluations use a decision 0.5 network of size 2000 on EC2. Our evaluation picked shared 0.02 c5.large instances, while Algorand used m4.2xlarge. These 0.0 two platforms are very similar except for a slight CPU clock 0.00 speed edge for c5.large, which goes largely unused because 0.5 1 6 10 0.5 1 6 10 our process only consumes 30% in these experiments. The 1.35 4.24 1.35 4.24 security parameters chosen in our experiments guarantee a Time (sec.) safety violation probability below 10−9 in the presence of Fig. 19: Latency histogram/CDF for n = 2000 in 20 cities. 20% Byzantine nodes, while Algorand’s evaluation guarantees a violation probability below 5 × 10−9 with 20% Byzantine nodes. Figure 18, which shows that throughput (of virtuous transac- Neither Algorand nor Conflux evaluations take into account tions) decreases with the ratio of rogue transactions. Further, the overhead of cryptographic verification. Their evaluations the reduction in throughput appears proportional to the number use blocks that carry megabytes of dummy data and present of misbehaving clients, that is, there is no leverage provided the throughput in MB/hour or GB/hour unit. So we use the to the attackers. average size of a Bitcoin transaction, 250 bytes, to derive their throughputs. In contrast, our experiments carry real G. Geo-replication transactions and fully take all cryptographic overhead into account. Next experiment shows the system in an emulated geo- The throughput is 3-7 tps for Bitcoin, 874 tps for Algorand replicated scenario, patterned after the same scenario in prior (with 10 Mbyte blocks), 3355 tps for Conflux (in the paper it work [27]. We selected 20 major cities that appear to be claims 3.84x Algorand’s throughput under the same settings). near substantial numbers of reachable Bitcoin nodes, according In contrast, Avalanche achieves over 3400 tps consistently to [9]. The cities cover North America, Europe, West Asia, on up to 2000 nodes without committee or proof-of-work. As East Asia, Oceania, and also cover the top 10 countries with for latency, a transaction is confirmed after 10–60 minutes in the highest number of reachable nodes. We use the latency and Bitcoin, around 50 seconds in Algorand, 7.6–13.8 minutes in jittering matrix crawled from [58] and emulate network packet Conflux, and 1.35 seconds in Avalanche. latency in the Linux kernel using tc and netem. 2000 nodes Avalanche performs much better than Algorand in both are distributed evenly to each city, with no additional network throughput and latency because Algorand uses a verifiable latency emulated between nodes within the same city. Like random function to elect committees, and maintains a totally- Algorand’s evaluation, we also cap our bandwidth per process ordered log while Avalanche establishes only a partial order. to 20 Mbps to simulate internet-scale settings where there are Algorand is leader-based and performs consensus by commit- many commodity network links. We assign a client process to tee, while Avalanche is leader-less. each city, maintaining 400 outstanding transactions per city at Avalanche has similar throughput to Conflux, but its latency any moment. is 337–613x better. Conflux also uses a DAG structure to In this scenario, Avalanche achieves an average throughput amortize the cost for consensus and increase the throughput, of 3401 tps, with a standard deviation of 39 tps. As shown in however, it is still rooted in Nakamoto consensus (PoW), Figure 19, the median transaction latency is 1.35 seconds, with making it unable to have instant confirmation compared to a maximum latency of 4.25 seconds. We also support native Avalanche. Bitcoin code for transactions; in this case, the throughput is In a blockchain system, one can usually improve throughput 3530 tps, with σ = 92 tps. at the cost of latency through batching. The real bottleneck

of the performance is the number of decisions the system Zeno guarantees eventual consistency rather than linearizabil- can make per second, and this is fundamentally limited by ity, meaning that participants can be inconsistent but eventually either Byzantine Agreement (BA∗ ) in Algorand and Nakamoto agree once the network stabilizes. By providing an even weaker consensus in Conflux. consistency guarantee, namely fork-join-causal consistency, Depot [40] describes a protocol that guarantees safety under VII. R ELATED W ORK 2f + 1 replicas. Bitcoin [43] is a cryptocurrency that uses a blockchain NOW [28] uses sub-quorums to drive smaller instances of based on proof-of-work (PoW) to maintain a ledger of UTXO consensus. The insight of this paper is that small, logarithmic- transactions. While techniques based on proof-of-work [4], sized quorums can be extracted from a potentially large set of [23], and even cryptocurrencies with minting based on proof- nodes in the network, allowing smaller instances of consensus of-work [49], [57], have been explored before, Bitcoin was protocols to be run in parallel. the first to incorporate PoW into its consensus process. Unlike Snow White [18] and Ouroboros [34] are some of the ear- more traditional BFT protocols, Bitcoin has a probabilistic liest provably secure PoS protocols. Ouroboros uses a secure safety guarantee and assumes honest majority computational multiparty coin-flipping protocol to produce randomness for power rather than a known membership, which in turn has leader election. The follow-up protocol, Ouroboros Praos [19] enabled an internet-scale permissionless protocol. While per- provides safety in the presence of fully adaptive adversaries. missionless and resilient to adversaries, Bitcoin suffers from HoneyBadger [42] provides good liveness in a network low throughput (˜3 tps) and high latency (˜5.6 hours for with heterogeneous latencies. Tendermint [10], [11] rotates a network with 20% Byzantine presence and 2−32 security the leader for each block and has been demonstrated with guarantee). Furthermore, PoW requires a substantial amount as many as 64 nodes. Ripple [51] has low latency by uti- of computational power that is consumed only for the purpose lizing collectively-trusted sub-networks in a large network. of maintaining safety. The Ripple company provides a slow-changing default list of Countless cryptocurrencies use PoW [4], [23] to maintain trusted nodes, which renders the system essentially centralized. a distributed ledger. Like Bitcoin, they suffer from inherent In the synchronous and authenticated setting, the protocol scalability bottlenecks. Several proposals for protocols exist in [3] achieves constant-3-round commit in expectation, at that try to better utilize the effort made by PoW. Bitcoin- the cost of quadratic message complexity. Stellar [41] uses NG [24] and the permissionless version of Thunderella [46] Federated Byzantine Agreement in which quorum slices enable use Nakamoto-like consensus to elect a leader that dictates heterogeneous trust for different nodes. Safety is guaranteed writing of the replicated log for a relatively long time so as when transactions can be transitively connected by trusted to provide higher throughput. Moreover, Thunderella provides quorum slices. Algorand [27] uses a verifiable random function an optimistic bound that, with 3/4 honest computational power to select a committee of nodes that participate in a novel and an honest elected leader, allows transactions to be con- Byzantine consensus protocol. firmed rapidly. ByzCoin [35] periodically selects a small set Some protocols use a Directed Acyclic Graph (DAG) struc- of participants and then runs a PBFT-like protocol within the ture instead of a linear chain to achieve consensus [5], [8], selected nodes. [53]–[55]. Instead of choosing the longest chain as in Bitcoin, Protocols based on Byzantine agreement [37], [47] typically GHOST [54] uses a more efficient chain selection rule that make use of quorums and require precise knowledge of mem- allows transactions not on the main chain to be taken into bership. PBFT [13], a well-known representative, requires a consideration, increasing efficiency. SPECTRE [53] uses trans- quadratic number of message exchanges in order to reach actions on the DAG to vote recursively with PoW to achieve agreement. The Q/U protocol [2] and HQ replication [16] use a consensus, followed up by PHANTOM [55] that achieves a quorum-based approach to optimize for contention-free cases linear order among all blocks. Like PHANTOM, Conflux also of operation to achieve consensus in only a single round of finalizes a linear order of transactions by PoW in a DAG communication. However, although these protocols improve structure, with better resistance to liveness attack [38]. Similar on performance, they degrade very poorly under contention. to Thunderella, Meshcash [8] combines a slow PoW-based Zyzzyva [36] couples BFT with speculative execution to protocol with a fast consensus protocol that allows a high block improve the failure-free operation case. Past work in permis- rate regardless of network latency, offering fast confirmation sioned BFT systems typically requires at least 3f + 1 replicas. time. Hashgraph [5] is a leader-less protocol that builds a DAG CheapBFT [32] leverages trusted hardware components to via randomized gossip. It requires full membership knowledge construct a protocol that uses f + 1 replicas. at all times, and it is a PBFT-variant that requires quadratic Other work attempts to introduce new protocols under messages in expectation. redefinitions and relaxations of the BFT model. Large-scale VIII. C ONCLUSION BFT [50] modifies PBFT to allow for arbitrary choice of num- ber of replicas and failure threshold, providing a probabilistic This paper introduced a novel family of consensus protocols, guarantee of liveness for some failure ratio but protecting coupled with the appropriate mathematical tools for analyzing safety with high probability. In another form of relaxation. them. These protocols are highly efficient and robust, com- Zeno [52] introduces a BFT state machine replication protocol bining the best features of classical and Nakamoto consensus. that trades consistency for high availability. More specifically, They scale well, achieve high throughput and quick finality,

work without precise membership knowledge, and degrade [18] DAIAN , P., PASS , R., AND S HI , E. Snow white: Provably secure proofs gracefully under catastrophic adversarial attacks. of stake. Cryptology ePrint Archive, Report 2016/919, 2016. https: //eprint.iacr.org/2016/919. There is much work to do to improve this line of research. [19] DAVID , B., G AZI , P., K IAYIAS , A., AND RUSSELL , A. Ouroboros One such improvement could be the introduction of an adver- Praos: An adaptively-secure, semi-synchronous proof-of-stake sarial network scheduler. Another improvement would be to blockchain. In Advances in Cryptology - EUROCRYPT 2018 - 37th Annual International Conference on the Theory and Applications characterize the system’s guarantees under an adversary whose of Cryptographic Techniques, Tel Aviv, Israel, April 29 - May 3, 2018 powers are realistically limited, whereupon performance would Proceedings, Part II (2018), pp. 66–98. improve even further. Finally, more sophisticated initialization [20] D IGICONOMIST. Bitcoin energy consumption index. https:// digiconomist.net/bitcoin-energy-consumption. Accessed: 2018-04. mechanisms would bear fruitful in improving liveness of multi- [21] D OUCEUR , J. R. The sybil attack. In International Workshop on Peer- value consensus. Overall, we hope that the protocols and to-Peer Systems (2002), Springer, pp. 251–260. analysis techniques presented here add to the arsenal of the [22] D RAIEF, M., G ANESH , A., AND M ASSOULI É , L. Thresholds for virus distributed system developers and provide a foundation for new spread on networks. In Proceedings of the 1st international conference on Performance evaluation methodologies and tools (2006), ACM, p. 51. lightweight and scalable mechanisms. [23] DWORK , C., AND NAOR , M. Pricing via processing or combatting junk mail. In Advances in Cryptology - CRYPTO ’92, 12th Annual R EFERENCES International Cryptology Conference, Santa Barbara, California, USA, August 16-20, 1992, Proceedings (1992), pp. 139–147. [1] Crypto-currency market capitalizations. https://coinmarketcap.com. Ac- cessed: 2017-02. [24] E YAL , I., G ENCER , A. E., S IRER , E. G., AND VAN R ENESSE , R. [2] A BD -E L -M ALEK , M., G ANGER , G. R., G OODSON , G. R., R EITER , Bitcoin-NG: A scalable blockchain protocol. In 13th USENIX Sym- M. K., AND W YLIE , J. J. Fault-scalable byzantine fault-tolerant posium on Networked Systems Design and Implementation, NSDI 2016, services. In ACM SIGOPS Operating Systems Review (2005), vol. 39, Santa Clara, CA, USA, March 16-18, 2016 (2016), pp. 45–59. ACM, pp. 59–74. [25] G ANESH , A., M ASSOULI É , L., AND T OWSLEY, D. The effect of [3] A BRAHAM , I., D EVADAS , S., D OLEV, D., NAYAK , K., AND R EN , network topology on the spread of epidemics. In Proceedings IEEE 24th L. Efficient synchronous byzantine consensus. arXiv preprint Annual Joint Conference of the IEEE Computer and Communications arXiv:1704.02397 (2017). Societies. (2005), vol. 2, IEEE, pp. 1455–1466. [4] A SPNES , J., JACKSON , C., AND K RISHNAMURTHY, A. Exposing [26] G ARAY, J. A., K IAYIAS , A., AND L EONARDOS , N. The Bitcoin Back- computationally-challenged byzantine impostors. Tech. rep., Technical bone Protocol: Analysis and applications. In Advances in Cryptology Report YALEU/DCS/TR-1332, Yale University Department of Computer - EUROCRYPT 2015 - 34th Annual International Conference on the Science, 2005. Theory and Applications of Cryptographic Techniques, Sofia, Bulgaria, [5] BAIRD , L. Hashgraph consensus: fair, fast, byzantine fault tolerance. April 26-30, 2015, Proceedings, Part II (2015), pp. 281–310. Tech. rep., Swirlds Tech Report, 2016. [27] G ILAD , Y., H EMO , R., M ICALI , S., V LACHOS , G., AND Z ELDOVICH , [6] BANERJEE , S., C HATTERJEE , A., AND S HAKKOTTAI , S. Epidemic N. Algorand: Scaling byzantine agreements for cryptocurrencies. In thresholds with external agents. In IEEE INFOCOM 2014-IEEE Con- Proceedings of the 26th Symposium on Operating Systems Principles, ference on Computer Communications (2014), IEEE, pp. 2202–2210. Shanghai, China, October 28-31, 2017 (2017), pp. 51–68. [7] B EN -O R , M. Another advantage of free choice (extended abstract): [28] G UERRAOUI , R., H UC , F., AND K ERMARREC , A.-M. Highly dynamic Completely asynchronous agreement protocols. In Proceedings of the distributed computing with byzantine failures. In Proceedings of the second annual ACM symposium on Principles of distributed computing 2013 ACM symposium on Principles of distributed computing (2013), (1983), ACM, pp. 27–30. ACM, pp. 176–183. [8] B ENTOV, I., H UB ÁCEK , P., M ORAN , T., AND NADLER , A. Tortoise and [29] H OEFFDING , W. Probability inequalities for sums of bounded random Hares Consensus: the Meshcash framework for incentive-compatible, variables. Journal of the American statistical association 58, 301 (1963), scalable cryptocurrencies. IACR Cryptology ePrint Archive 2017 (2017), 13–30. 300. [30] H UNT, P., KONAR , M., J UNQUEIRA , F. P., AND R EED , B. Zookeeper: [9] B ITNODES. Global Bitcoin nodes distribution. https://bitnodes.earn. Wait-free coordination for internet-scale systems. In 2010 USENIX com/. Accessed: 2018-04. Annual Technical Conference, Boston, MA, USA, June 23-25, 2010 [10] B UCHMAN , E. Tendermint: Byzantine fault tolerance in the age of (2010). blockchains. PhD thesis, 2016. [31] J OHANSEN , H. D., VAN R ENESSE , R., V IGFUSSON , Y., AND J O - [11] B UCHMAN , E., K WON , J., AND M ILOSEVIC , Z. The latest gossip on HANSEN , D. Fireflies: A secure and scalable membership and gossip bft consensus, 2018. service. ACM Trans. Comput. Syst. 33, 2 (2015), 5:1–5:32. [12] B URROWS , M. The chubby lock service for loosely-coupled distributed [32] K APITZA , R., B EHL , J., C ACHIN , C., D ISTLER , T., K UHNLE , S., systems. In 7th Symposium on Operating Systems Design and Implemen- M OHAMMADI , S. V., S CHR ÖDER -P REIKSCHAT, W., AND S TENGEL , K. tation (OSDI’06), November 6-8, Seattle, WA, USA (2006), pp. 335–350. Cheapbft: resource-efficient byzantine fault tolerance. In Proceedings of [13] C ASTRO , M., AND L ISKOV, B. Practical byzantine fault tolerance. In the 7th ACM european conference on Computer Systems (2012), ACM, Proceedings of the Third USENIX Symposium on Operating Systems pp. 295–308. Design and Implementation (OSDI), New Orleans, Louisiana, USA, [33] K EELING , M. J., AND ROHANI , P. Modeling infectious diseases in February 22-25, 1999 (1999), pp. 173–186. humans and animals. Princeton University Press, 2011. [14] C ENTRAL I NTELLIGENCE AGENCY. The world factbook. https://www. [34] K IAYIAS , A., RUSSELL , A., DAVID , B., AND O LIYNYKOV, R. cia.gov/library/publications/the-world-factbook/geos/da.html. Accessed: Ouroboros: A provably secure proof-of-stake blockchain protocol. In 2018-04. Advances in Cryptology - CRYPTO 2017 - 37th Annual International [15] C HV ÁTAL , V. The tail of the hypergeometric distribution. Discrete Cryptology Conference, Santa Barbara, CA, USA, August 20-24, 2017, Mathematics 25, 3 (1979), 285–287. Proceedings, Part I (2017), pp. 357–388. [16] C OWLING , J., M YERS , D., L ISKOV, B., RODRIGUES , R., AND S HRIRA , [35] KOKORIS -KOGIAS , E., J OVANOVIC , P., G AILLY, N., K HOFFI , I., L. Hq replication: A hybrid quorum protocol for byzantine fault G ASSER , L., AND F ORD , B. Enhancing Bitcoin security and perfor- tolerance. In Proceedings of the 7th symposium on Operating systems mance with strong consistency via collective signing. In 25th USENIX design and implementation (2006), USENIX Association, pp. 177–190. Security Symposium, USENIX Security 16, Austin, TX, USA, August 10- [17] C ROMAN , K., D ECKER , C., E YAL , I., G ENCER , A. E., J UELS , A., 12, 2016. (2016), pp. 279–296. KOSBA , A. E., M ILLER , A., S AXENA , P., S HI , E., S IRER , E. G., S ONG , [36] KOTLA , R., A LVISI , L., DAHLIN , M., C LEMENT, A., AND W ONG , D., AND WATTENHOFER , R. On scaling decentralized blockchains - (a E. L. Zyzzyva: Speculative byzantine fault tolerance. ACM Trans. position paper). In Financial Cryptography and Data Security - FC Comput. Syst. 27, 4 (2009), 7:1–7:39. 2016 International Workshops, BITCOIN, VOTING, and WAHC, Christ [37] L AMPORT, L., S HOSTAK , R. E., AND P EASE , M. C. The byzantine Church, Barbados, February 26, 2016, Revised Selected Papers (2016), generals problem. ACM Trans. Program. Lang. Syst. 4, 3 (1982), 382– pp. 106–125. 401.

[38] L I , C., L I , P., X U , W., L ONG , F., AND YAO , A. C. Scaling a) Hypergeometric Distribution nakamoto consensus to thousands of transactions per second. CoRR abs/1805.03870 (2018). Each network query of k peers corresponds to a sample [39] L IGGETT, T. M., ET AL . Stochastic models of interacting systems. The without replacement out of a network of n nodes, also referred Annals of Probability 25, 1 (1997), 1–29. [40] M AHAJAN , P., S ETTY, S., L EE , S., C LEMENT, A., A LVISI , L., to as a hypergeometric sample. We let the random variable DAHLIN , M., AND WALFISH , M. Depot: Cloud storage with minimal H(N , x, k) → {0, . . . , k} denote the resulting counts of B in trust. ACM Transactions on Computer Systems (TOCS) 29, 4 (2011), the sample (unless otherwise stated), where x is the total count 12. [41] M AZIERES , D. The Stellar consensus protocol: A federated model for of B in the population. The probability that the query achieves internet-level consensus. Stellar Development Foundation (2015). the required threshold of α or more votes is given by: [42] M ILLER , A., X IA , Y., C ROMAN , K., S HI , E., AND S ONG , D. The k ! ! ! Honey Badger of BFT protocols. In Proceedings of the 2016 ACM X x n−x n P (H(N , x, k) ≥ α) = / (2) SIGSAC Conference on Computer and Communications Security, Vienna, j=α j k−j k Austria, October 24-28, 2016 (2016), pp. 31–42. [43] NAKAMOTO , S. Bitcoin: A peer-to-peer electronic cash system, 2008. For ease of notation, we overload H(∗) by implicitly referring [44] PASS , R., S EEMAN , L., AND S HELAT, A. Analysis of the blockchain to P (H(N , x, k) ≥ α) as H(N , x, k, α). protocol in asynchronous networks. In Advances in Cryptology - EUROCRYPT 2017 - 36th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Paris, France, b) Tail Bounds On Hypergeometric Distribution April 30 - May 4, 2017, Proceedings, Part II (2017), pp. 643–673. [45] PASS , R., AND S HI , E. Fruitchains: A fair blockchain. IACR Cryptology We can reduce some of the complexity in Equation 2 ePrint Archive 2016 (2016), 916. by introducing a bound on the hypergeometric distribution k [46] PASS , R., AND S HI , E. Thunderella: Blockchains with optimistic instant induced by HN ,x . Let p = x/n be the ratio of support for confirmation. In Advances in Cryptology - EUROCRYPT 2018 - 37th Annual International Conference on the Theory and Applications of B in the population. The expectation of H(N , x, k) is exactly Cryptographic Techniques, Tel Aviv, Israel, April 29 - May 3, 2018 kp. Then, the probability that H(N , x, k) will deviate from Proceedings, Part II (2018), pp. 3–33. the mean by more than some small constant ψ is given by the [47] P EASE , M. C., S HOSTAK , R. E., AND L AMPORT, L. Reaching agree- Hoeffding tail bound [29], as follows, ment in the presence of faults. J. ACM 27, 2 (1980), 228–234. [48] P OPOV, S. The tangle. https://www.iota.org/research/academic-papers. P (H(C, x, k) ≤ (p − ψ)k) ≤ e−kD(p−ψ,p) Accessed: 2018-04. 2 (3) [49] R IVEST, R., AND S HAMIR , A. Payword and micromint: Two simple ≤ e−2(p−ψ) k micropayment schemes. In Security protocols (1997), Springer, pp. 69– where D(p − ψ, p) is the Kullback-Leibler divergence, mea- 87. [50] RODRIGUES , R., KOUZNETSOV, P., AND B HATTACHARJEE , B. Large- sured as scale byzantine fault tolerance: Safe but not always live. In Proceedings a 1−a of the 3rd Workshop on Hot Topics in System Dependability (2007). D(a, b) = a log + (1 − a) log (4) b 1−b [51] S CHWARTZ , D., YOUNGS , N., B RITTO , A., ET AL . The Ripple protocol consensus algorithm. Ripple Labs Inc White Paper 5 (2014). c) Concentration of Sub-Martingales [52] S INGH , A., F ONSECA , P., K UZNETSOV, P., RODRIGUES , R., AND M ANIATIS , P. Zeno: Eventually consistent byzantine-fault tolerance. Let {X{t≥0} } be a sub-martingale and |Xt − Xt−1 | < ct In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2009, April 22-24, 2009, Boston, MA, almost surely. Then, for all positive reals ψ and all positive USA (2009), pp. 169–184. integers t, [53] S OMPOLINSKY, Y., L EWENBERG , Y., AND Z OHAR , A. SPECTRE: 2 Pt c2t A fast and scalable cryptocurrency protocol. IACR Cryptology ePrint P (Xt ≥ X0 + ψ) ≤ e−ψ /2 i=1 (5) Archive 2016 (2016), 1159. [54] S OMPOLINSKY, Y., AND Z OHAR , A. Secure high-rate transaction B. Slush processing in Bitcoin. In Financial Cryptography and Data Security, San Juan, Puerto Rico, January 26-30, 2015, Revised Selected Papers Slush operates in a non-Byzantine setting; that is, f = 0, c = (2015), pp. 507–527. [55] S OMPOLINSKY, Y., AND Z OHAR , A. PHANTOM: A scalable blockdag n. In this section, we will characterize the irreversibility prop- protocol. IACR Cryptology ePrint Archive 2018 (2018), 104. erties of Slush (which appear in Snowflake and Snowball), as [56] TAN , W. On the absorption probabilities and absorption times of finite well as the precise converge rate distribution. The distribution homogeneous birth-death processes. Biometrics (1976), 745–752. [57] V ISHNUMURTHY, V., C HANDRAKUMAR , S., AND S IRER , E. G. Karma: of of both safety and liveness of Slush translate well to the A secure economic framework for peer-to-peer resource sharing. In Byzantine setting. Workshop on Economics of Peer-to-peer Systems (2003), vol. 35. The procedural version of Slush in Figure 4 made use of [58] W ONDER N ETWORK. Global ping statistics: Ping times between wonder- network servers. https://wondernetwork.com/pings. Accessed: 2018-04. a parameter m, the number of rounds that a node executes Slush queries. What we ultimately want to extract is the total A PPENDIX A number of rounds φ that the scheduler will need to execute in A NALYSIS order to guarantee that the entire network is the same color, In this appendix, we provide an analysis of Slush, Snowflake whp. and Snowball. We analyze the system mainly using a continuous time process. Let {X{t≥0} } be a CTMC. The state space S of A. Preliminaries the stochastic process is a condensed version of the full We assume the network model as discussed in Section II. We configuration space, where each state {0, . . . , n} represents the let R (“red”) and B (“blue”) represent two generic conflicting total number of blue nodes in the system. choices. Without loss of generality, we focus our attention on Let FXs be the filtration, or the history pertaining to the counts of B, i.e. the total number of nodes that prefer blue. process, up to time s. This process is Markovian and time-

homogeneous, conforming to As a reminder, the stationary distribution can be found via P {Xt = j|FXs } = P {Xt = j|Xs } = P {Xt = j|X0 } limt→∞ P (t) = eQt , where we have Throughout the paper, we use Q ≡ (qij , i, j ∈ S) notation ∞ i ∞ i 0 ... 0 X t X t to refer to the infinitesimal generator of the process, where eQt = Qi = B i−1 W1 B i B i−1 Wn−1 i! i! death (i → i − 1) and birth (i → i + 1) rates of configuration i=0 i=0 0 ... 0 transitions are denoted via µi and λi (λi is distinct from the As Tan (eq. 2.3) shows, we have "∞ # clock parameter λ, and will be clear from context). These rates X are ξ(t) = B −1 B i − In−1 W1 ( i=0 µi = i H(N , c − i, k, α), for i → i − 1 Since we want the ultimate probabilities, we have that λi = (c − i) H(N , i, k, α), for j → i + 1 ξ = lim ξ(t) = −B −1 W1 t→∞ We can explicitly compute ξδ in terms of our rates µi and λi , for 1 ≤ i ≤ c − 1, and where i = 0 and i = c are absorbing. getting Let pij (t) refer to the probability of transitioning from state i n−δ X n−l n−1 to j at time t. We always assume that Y Y µi λj λi t + o(t), for j = i + 1 l=1 i=1 j=n−l+1 ξδ = µ t + o(t), for j = i − 1 n n−l n−1 i pij (t) = X Y Y 1 − (λi + µi )t + o(t), for j = i µi λj l=1 i=1 j=n−l+1 o(t), otherwise where all o(t) are uniform in i. However, we note that ui = λn−i . Algebraic manipulation from this observation leads to the two equations in the theorem. a) Irreversibility This expression is strictly lower than the Chvatal bounds used In Section IV, we discussed the loose Chvatal bound which in Section IV. provided intuitive understanding into the strong irreversibility Using the construction for the absorption (and dynamics of our core subsampling mechanism. In particular, (ir)reversibility) probabilities as discussed previously, a natural once the network drifts to some majority value, it tends to follow up computation is in regards to mean convergence revert back with only an exponentially small probability. We time. Let Tz (t) = inf{t ≥ 0 : Xt = {0, n}|X0 = z}, compute the closed-form expression for reversibility, and show and let τz = E[Tz (t)]. τz is the mean time to reach either that it is exponentially small. absorbing state, starting from state z, which corresponds to Theorem 2. Let ξδ be the probability of absorption into the the mean convergence time. The next theorem characterizes all-red state (s0 ), starting from a drift of δ (i.e. δ drift away this distribution. from n/2). Then, assuming δ > 1, Theorem 3. Let τz be the expected time to convergence, δ l−1 Y n−l X Y starting from state z > n/2, to any of the two converging µ2i λj states in the network (all-red or all-blue). Then, l=1 i=1 j=l n−1 ξδ = 1 − (6) X n/2 l−1 n−l x(d)y(d) X Y Y 2 µ2i µj d=1 τz = (8) l=1 i=1 j=l n/2 l−1 n−l X Y Y and 2 µ2i µj ξδ − ξδ+1 λδ+1 l=1 i=1 j=l = ⊓δ+1 = ξδ+1 − ξδ+2 µδ+1 where x(d) and y(d) are k X (n − δ − 1)k (δ + 1)k−j min(z,d) l−1 d−1 n−δ−1 X Y Y n2k−j (7) x(d) = µi λj j=α i=1 ≈ l=1 j=l k (9) X (δ + 1)k (n − δ − 1)k−j n−d−max(z−d,0) n−l n−1 δ+1 X Y Y n2k−j y(d) = µi λj j=α l=1 i=d+1 j=n−l+1 where from now on we refer to ⊓δ+1 as the drift of the process. Proof. Following the calculations from before, −B −1 at row Proof. Our results are derived based on constructions from z provides the number of traversals to each other state starting Tan [56]. We construct a sub-matrix of Q, denoted B, as from z. Calculating their sum, we have our result. The above shown in Figure 20. Let W1′ = (µ1 , 0, . . . , 0), Wn−1 ′ = equation is the full expression of the matrix row sum. (0, . . . , 0, λn−1 ). Then, we can express Q as Theorem 3 leads to the next lemma that captures property 0 ... 0 Q = W1 B Wn−1 P2, under the assumption that at the beginning of the protocol, 0 ... 0 one proposal has at least α support in the network.

−(λ1 + µ1 ) λ1 0 ··· ··· 0 µ2 −(λ2 + µ2 ) λ2 0 ··· 0 0 µ3 −(λ3 + µ3 ) λ3 ··· 0 .. .. .. .. .. .. B= . . . . . . .. .. . . µn−3 −(λn−2 + µn−2 ) λn−3 0 .. . ... 0 µn−1 −(λn−2 + µn−2 ) λn−2 0 ... 0 0 µn−1 −(λn−1 + µn−1 ) Fig. 20: Matrix B. Lemma 4. Slush reaches an absorbing state in finite time proceeds for Snowflake. almost surely. As stated in Section IV, the way to analyze the adversary using the same construction as in Slush is to condition re- Proof. Starting from any non-absorbing, transient state, there versibility on the first node u deciding on blue, which can is a non-zero probability of being absorbed. Additionally, since happen at any state (as specified by D(∗)). At that point, the termination is finite and everywhere differentiable, Theorem 3 adversarial strategy collapses to a single function, which is also implies that the probability of termination of any network to continually vote for red. The probabilities of reversibility, configuration where a proposal has ≥ α support in bounded for all states {1, . . . , c − 1} must encode the probability that time tmax is strictly positive. additional blue nodes commit, and the single function of the C. Snowflake adversary. The birth and death rates are transformed as follows: ( In Snowflake, the sampled set of nodes includes Byzantine µi = i(1 − I[D(∗, i, B)]) H(N , c − i + f, k, α) nodes. We introduce the decision function D(∗), which is λi = (c − i)(1 − I[D(∗, c − i, R)]) H(N , i, k, α) constructed by having each node also keep track of the total From here on, the analysis is the same as in Slush. Under number of consecutive times it has sampled a majority of various k and β, we can find the minimal α that provides the the same color (β). Finally, we introduce a function called system strong irreversibility properties. A(St ), the adversarial strategy, that takes as parameters the The next lemma captures P3, and the proof follows from entire configuration of the network at time t, as well as the central limit theorem. next set of nodes chosen by the scheduler to execute, and as √ a side-effect, modifies the set of nodes B to some arbitrary Lemma 5. If f < O( n), and α = ⌊k/2⌋+1, then Snowflake configuration of colors. terminates in O(log n) rounds with high probability. In order for our prior framework to apply to Snowflake, we Proof. The results follows from central limit theorem, wherein must deal with a key subtlety. Unlike in Slush, where it is for α = ⌊k/2⌋ + 1,√the expected bias in the network after clear that once the network has reached one of the converging sampling will be O( n). An adversary smaller than this bias states and therefore may not revert back, this no longer applies will be unable to keep the network in a fully-bivalent state to Snowflake, since any adversary f ≥ α has strictly positive for more than a constant number of rounds. The logarithmic probability of reverting the system, albeit this probability may factor remains from the mixing time lower bound. be infinitesimally small. The CTMC is flexible enough to deal with a system where there is only one absorbing state, but D. Snowball the long-term behavior of the system is no longer meaningful We make the following observation: if the confidences since, after an infinite amount of time, the system is guaranteed between red and blue are equal, then the adversary has the to revert, violating safety. We could trivially bound the amount same identical leverage in the irreversibility of the system of time, and show safety using this bounded time assumption as in Snowflake, regardless of network configuration. In fact, by simply characterizing the distribution of etQ , where Q is the Snowflake can be viewed as Snowball but where drifts in generator. However, we can make the following observation: confidences never exceed one. The same analysis applies to if the probability of going from state c (all-blue) to c − 1 is Snowball as in Snowflake, with the additional requirement exponentially small, then it will take the attacker exponential of bounding the long-term behavior of the confidences in time (in expectation; note, this is a lower bound, and in reality the network. To that end, analysis follows using martingale it will take much longer) to succeed in reverting the system. concentration inequalities, in particular the one introduced in Hence, we can assume that once all correct nodes are the same Equation 5. Snowball can be viewed as a two-urn system, color, the attack from the adversary will terminate since it is where each urn is a sub-martingale. The guarantees that can impractical to continue an attack. In fact, under reasonably be extracted hereon are that the confidences of the majority bounded timeframes, the variational distance between the exact committed value (in our frame of reference is always blue), approach and the approximation is very small. We leave details grow always more than those of the minority value, with high to the accompanying paper, but we briefly discuss how analysis probability, drifting away as t → tmax .

E. Safe Early Commitment will converge under the previous pessimal assumptions. The As we reasoned previously, each conflict set in Avalanche system designer can easily do this by picking an upper bound can be viewed as an instance of Snowball, where each progeny on γ, γ̄. instance iteratively votes for the entire path of the ancestry. The final step in assuring the correctness of a view change This feature provides various benefits; however, it also can is to account for a mix of nodes that straddle the τ boundary. lead to some virtuous transactions that depend on a rogue We would like the network to avoid an unsafe state no matter transaction to suffer the fate of the latter. In particular, rogue which nodes are using the old and the new views. The easiest transactions can interject in-between virtuous transactions and way to do this is to determine ∆t and ∆t+1 for desired bounds reduce the ability of the virtuous transactions to ever reach on γ, γ̄, and then to use the conservative value ∆t+1 during the required IS ACCEPTED predicate. As a thought experiment, epoch t. In essence, this ensures that no commitments are suppose that a transaction Ti names a set of parent transactions made in configuration St unless they conservatively fulfill the that are all decided, as per local view. If Ti is sampled over safety criteria in state space St+1 . As a result, there is no a large enough set of successful queries without discovering possibility of a node deciding red at time t, the network going any conflicts, then, since by assumption the entire ancestry of through an epoch change and finding itself to the left of the Ti is decided, it must be the case (probabilistically) that we new irreversibility state ∆t+1 . have achieved irreversibility. This approach trades off some of the feasibility space, to To then statistically measure the assuredness that Ti has add the ability to accommodate γ, γ̄ node churn per epoch. been accepted by a large percentage of correct nodes without Overall, if τ is in excess of the time required for a decision any conflicts, we make use of a one-way birth process, where (on the order of minutes to hours), and nodes are loosely a birth occurs when a new correct node discovers the conflict synchronized, they can add or drop up to γ, γ̄ nodes in each of Ti . Necessarily, deaths cannot exist in this model, because epoch using the conservative process described above. We a conflicting transaction cannot be unseen once a correct node leave the precise method of entering and exiting the network discovers it. Our births are as follows: ! by staking and unstaking to a subsequent paper, and instead n−i rely on a membership oracle that acts as a sequencer and γ- c−i k λi = 1− n (10) rate-limiter, using technologies like Fireflies [31]. c k Solving for the expected time to reach the final birth state pro- vides a lower bound to the β1 parameter in the IS ACCEPTED fast-decision branch. The table below shows an example of the analysis for n = 2000, α = 0.8, and various k, where ε ≪ 10−9 , and where β is the minimum required value before deciding. Overall, a very small number of iterations k 10 20 30 40 β 10.87625 10.50125 10.37625 10.25125 are sufficient for the safe early commitment predicate. This supports the choice of β in our evaluation. F. Churn and View Updates Any realistic system needs to accommodate the departure and arrival of nodes. We now demonstrate that Avalanche nodes can admit a well-characterized amount of churn, by showing how to pick parameters such that Avalanche nodes can differ in their view of the network and still safely make decisions. Consider a network whose operation is divided into epochs of length τ , and a view update from epoch t to t + 1 during which γ nodes join the network and γ̄ nodes depart. Under our static construction, the state space St of the network had a key parameter ∆t at time t, induced by ct , f t , nt and the chosen security parameters. This can, at worst, impact the network by adding γ nodes of color B, and remove γ̄ nodes of color R. At time t + 1, nt+1 = nt + γ − γ̄, while f t+1 and ct+1 will be modified by an amount ≤ γ − γ̄, and thus induce a new ∆t+1 for the chosen security parameters. This new ∆t+1 has to be chosen such that the probability of reversibility from state ct+1 /2 + ∆t+1 − γ is ≤ ε, which ensures that the system