Ripple Whitepaper

Thursday, May 3, 2018
Download document
Save for later
Add to list

Ripple Labs Inc, 2014 The Ripple Protocol Consensus Algorithm David Schwartz [email protected] Noah Youngs [email protected] Arthur Britto [email protected] Abstract While several consensus algorithms exist for the Byzantine Generals Problem, specifically as it pertains to distributed payment systems, many suffer from high latency induced by the requirement that all nodes within the network communicate synchronously. In this work, we present a novel consensus algorithm that circumvents this requirement by utilizing collectively-trusted subnetworks within the larger network. We show that the “trust” required of these subnetworks is in fact minimal and can be further reduced with principled choice of the member nodes. In addition, we show that minimal connectivity is required to maintain agreement throughout the whole network. The result is a low-latency consensus algorithm which still maintains robustness in the face of Byzantine failures. We present this algorithm in its embodiment in the Ripple Protocol. Contents 1. Introduction 1 Introduction 1 Interest and research in distributed consensus systems 2 Definitions, Formalization and Previous Work 2 has increased markedly in recent years, with a central 2.1 Ripple Protocol Components . . . . . . . . . . . 2 focus being on distributed payment networks. Such net- 2.2 Formalization . . . . . . . . . . . . . . . . . . . . . . . . 3 works allow for fast, low-cost transactions which are not controlled by a centralized source. While the economic 2.3 Existing Consensus Algorithms . . . . . . . . . 3 benefits and drawbacks of such a system are worthy of 2.4 Formal Consensus Goals . . . . . . . . . . . . . . 3 much research in and of themselves, this work focuses 3 Ripple Consensus Algorithm 4 on some of the technical challenges that all distributed 3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 payment systems must face. While these problems are varied, we group them into three main categories: cor- 3.2 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . 4 rectness, agreement, and utility. 3.3 Agreement . . . . . . . . . . . . . . . . . . . . . . . . . . 5 By correctness, we mean that it is necessary for a 3.4 Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 distributed system to be able to discern the difference be- Convergence • Heuristics and Procedures tween a correct and fraudulent transaction. In traditional fiduciary settings, this is done through trust between 4 Simulation Code 7 institutions and cryptographic signatures that guarantee 5 Discussion 7 a transaction is indeed coming from the institution that 6 Acknowledgments 8 it claims to be coming from. In distributed systems, References 8 however, there is no such trust, as the identity of any and all members in the network may not even be known. Therefore, alternative methods for correctness must be 1

utilized. a distributed payment system must be robust both in Agreement refers to the problem of maintaining a the face of standard failures, and so-called “Byzantine” single global truth in the face of a decentralized account- failures, which may be coordinated and originate from ing system. While similar to the correctness problem, multiple sources in the network. the difference lies in the fact that while a malicious In this work, we analyze one particular implemen- user of the network may be unable to create a fraudu- tation of a distributed payment system: the Ripple Pro- lent transaction (defying correctness), it may be able to tocol. We focus on the algorithms utilized to achieve create multiple correct transactions that are somehow the above goals of correctness, agreement, and utility, unaware of each other, and thus combine to create a and show that all are met (within necessary and predeter- fraudulent act. For example, a malicious user may make mined tolerance thresholds, which are well-understood). two simultaneous purchases, with only enough funds in In addition, we provide code that simulates the consen- their account to cover each purchase individually, but sus process with parameterizable network size, number not both together. Thus each transaction by itself is of malicious users, and message-sending latencies. correct, but if executed simultaneously in such a way that the distributed network as a whole is unaware of 2. Definitions, Formalization and both, a clear problem arises, commonly referred to as Previous Work the “Double-Spend Problem” [1]. Thus the agreement We begin by defining the components of the Ripple problem can be summarized as the requirement that only Protocol. In order to prove correctness, agreement, and one set of globally recognized transactions exist in the utility properties, we first formalize those properties into network. axioms. These properties, when grouped together, form Utility is a slightly more abstract problem, which we the notion of consensus: the state in which nodes in the define generally as the “usefulness” of a distributed pay- network reach correct agreement. We then highlight ment system, but which in practice most often simplifies some previous results relating to consensus algorithms, to the latency of the system. A distributed system that and finally state the goals of consensus for the Ripple is both correct and in agreement but which requires one Protocol within our formalization framework. year to process a transaction, for example, is obviously an inviable payment system. Additional aspects of util- 2.1 Ripple Protocol Components ity may include the level of computing power required We begin our description of the ripple network by defin- to participate in the correctness and agreement processes ing the following terms: or the technical proficiency required of an end user to • Server: A server is any entity running the Ripple avoid being defrauded in the network. Server software (as opposed to the Ripple Client Many of these issues have been explored long before software which only lets a user send and receive the advent of modern distributed computer systems, via funds), which participates in the consensus pro- a problem known as the “Byzantine Generals Problem” cess. [2]. In this problem, a group of generals each control a portion of an army and must coordinate an attack by • Ledger: The ledger is a record of the amount sending messengers to each other. Because the gener- of currency in each user’s account and represents als are in unfamiliar and hostile territory, messengers the “ground truth” of the network. The ledger is may fail to reach their destination (just as nodes in a repeatedly updated with transactions that success- distributed network may fail, or send corrupted data in- fully pass through the consensus process. stead of the intended message). An additional aspect of the problem is that some of the generals may be • Last-Closed Ledger: The last-closed ledger is traitors, either individually, or conspiring together, and the most recent ledger that has been ratified by the so messages may arrive which are intended to create a consensus process and thus represents the current false plan that is doomed to failure for the loyal gener- state of the network. als (just as malicious members of a distributed system • Open Ledger: The open ledger is the current may attempt to convince the system to accept fraudulent operating status of a node (each node maintains transactions, or multiple versions of the same truthful its own open ledger). Transactions initiated by transaction that would result in a double-spend). Thus end users of a given server are applied to the open 2

ledger of that server, but transactions are not con- previous work has included extensions to cases where all sidered final until they have passed through the participants in the network are not known ahead of time, consensus process, at which point the open ledger where the messages are sent asynchronously (there is becomes the last-closed ledger. no bound on the amount of time an individual node will take to reach a decision), and where there is a delineation • Unique Node List (UNL): Each server, s, main- between the notion of strong and weak consensus. tains a unique node list, which is a set of other One pertinent result of previous work on consen- servers that s queries when determining consen- sus algorithms is that of Fischer, Lynch, and Patterson, sus. Only the votes of the other members of the 1985 [4], which proves that in the asynchronous case, UNL of s are considered when determining con- non-termination is always a possibility for a consen- sensus (as opposed to every node on the network). sus algorithm, even with just one faulty process. This Thus the UNL represents a subset of the network introduces the necessity for time-based heuristics, to which when taken collectively, is “trusted” by s ensure convergence (or at least repeated iterations of to not collude in an attempt to defraud the net- non-convergence). We shall describe these heuristics for work. Note that this definition of “trust” does not the Ripple Protocol in section 3. require that each individual member of the UNL The strength of a consensus algorithm is usually be trusted (see section 3.2). measured in terms of the fraction of faulty processes it can tolerate. It is provable that no solution to the • Proposer: Any server can broadcast transactions Byzantine Generals problem (which already assumes to be included in the consensus process, and every synchronicity, and known participants) can tolerate more server attempts to include every valid transaction than (n − 1)/3 byzantine faults, or 33% of the network when a new consensus round starts. During the acting maliciously [2]. This solution does not, however, consensus process, however, only proposals from require verifiable authenticity of the messages delivered servers on the UNL of a server s are considered between nodes (digital signatures). If a guarantee on the by s. unforgeability of messages is possible, algorithms ex- ist with much higher fault tolerance in the synchronous 2.2 Formalization case. We use the term nonfaulty to refer to nodes in the net- Several algorithms with greater complexity have work that behave honestly and without error. Conversely, been proposed for Byzantine consensus in the asyn- a faulty node is one which experiences errors which may chronous case. FaB Paxos [5] will tolerate (n − 1)/5 be honest (due to data corruption, implementation er- Byzantine failures in a network of n nodes, amounting rors, etc.), or malicious (Byzantine errors). We reduce to a tolerance of up to 20% of nodes in the network the notion of validating a transaction to a simple binary colluding maliciously. Attiya, Doyev, and Gill [3] in- decision problem: each node must decide from the in- troduce a phase algorithm for the asynchronous case, formation it has been given on the value 0 or 1. which can tolerate (n − 1)/4 failures, or up to 25% of As in Attiya, Dolev, and Gill, 1984 [3], we define the network. Lastly, Alchieri et al., 2008 [6] present consensus according to the following three axioms: BFT-CUP, which achieves Byzantine consensus in the 1. (C1): Every nonfaulty node makes a decision in asynchronous case even with unknown participants, with finite time the maximal bound of a tolerance of (n − 1)/3 failures, 2. (C2): All nonfaulty nodes reach the same deci- but with additional restrictions on the connectivity of sion value the underlying network. 3. (C3): 0 and 1 are both possible values for all non- faulty nodes. (This removes the trivial solution 2.4 Formal Consensus Goals in which all nodes decide 0 or 1 regardless of the Our goal in this work is to show that the consensus information they have been presented). algorithm utilized by the Ripple Protocol will achieve consensus at each ledger-close (even if consensus is the 2.3 Existing Consensus Algorithms trivial consensus of all transactions being rejected), and There has been much research done on algorithms that that the trivial consensus will only be reached with a achieve consensus in the face of Byzantine errors. This known probability, even in the face of Byzantine failures. 3

Since each node in the network only votes on proposals on a transaction. All transactions that meet this from a trusted set of nodes (the other nodes in its UNL), requirement are applied to the ledger, and that and since each node may have differing UNLs, we also ledger is closed, becoming the new last-closed show that only one consensus will be reached amongst ledger. all nodes, regardless of UNL membership. This goal is also referred to as preventing a “fork” in the network: a 3.2 Correctness situation in which two disjoint sets of nodes each reach In order to achieve correctness, given a maximal amount consensus independently, and two different last-closed of Byzantine failures, it must be shown that it is im- ledgers are observed by nodes on each node-set. possible for a fraudulent transaction to be confirmed Lastly we will show that the Ripple Protocol can during consensus, unless the number of faulty nodes achieve these goals in the face of (n − 1)/5 failures, exceeds that tolerance. The proof of the correctness of which is not the strongest result in the literature, but we the RPCA then follows directly: since a transaction is will also show that the Ripple Protocol possesses several only approved if 80% of the UNL of a server agrees other desirable features that greatly enhance its utility. with it, as long as 80% of the UNL is honest, no fraud- ulent transactions will be approved. Thus for a UNL 3. Ripple Consensus Algorithm of n nodes in the network, the consensus protocol will maintain correctness so long as: The Ripple Protocol consensus algorithm (RPCA), is applied every few seconds by all nodes, in order to main- f ≤ (n − 1)/5 (1) tain the correctness and agreement of the network. Once consensus is reached, the current ledger is considered where f is the number Byzantine failures. In fact, even “closed” and becomes the last-closed ledger. Assum- in the face of (n − 1)/5 + 1 Byzantine failures, correct- ing that the consensus algorithm is successful, and that ness is still technically maintained. The consensus pro- there is no fork in the network, the last-closed ledger cess will fail, but it will still not be possible to confirm a maintained by all nodes in the network will be identical. fraudulent transaction. Indeed it would take (4n + 1)/5 Byzantine failures for an incorrect transaction to be con- 3.1 Definition firmed. We call this second bound the bound for weak The RPCA proceeds in rounds. In each round: correctness, and the former the bound for strong correct- ness. • Initially, each server takes all valid transactions it It should also be noted that not all “fraudulent” trans- has seen prior to the beginning of the consensus actions pose a threat, even if confirmed during consen- round that have not already been applied (these sus. Should a user attempt to double-spend funds in may include new transactions initiated by end- two transactions, for example, even if both transactions users of the server, transactions held over from are confirmed during the consensus process, after the a previous consensus process, etc.), and makes first transaction is applied, the second will fail, as the them public in the form of a list known as the funds are no longer available. This robustness is due to “candidate set”. the fact that transactions are applied deterministically, and that consensus ensures that all nodes in the network • Each server then amalgamates the candidate sets are applying the deterministic rules to the same set of of all servers on its UNL, and votes on the veracity transactions. of all transactions. For a slightly different analysis, let us assume that • Transactions that receive more than a minimum the probability that any node will decide to collude and percentage of “yes” votes are passed on to the next join a nefarious cartel is pc . Then the probability of round, if there is one, while transactions that do correctness is given by p∗ , where: not receive enough votes will either be discarded, ⌈( n−1 5 )⌉  or included in the candidate set for the beginning ∗ n i p = ∑ p (1 − pc )n−i (2) of the consensus process on the next ledger. i=0 i c • The final round of consensus requires a minimum This probability represents the likelihood that the size percentage of 80% of a server’s UNL agreeing of the nefarious cartel will remain below the maximal 4

threshold of Byzantine failures, given pc . Since this likelihood is a binomial distribution, values of pc greater than 20% will result in expected cartels of size greater than 20% of the network, thwarting the consensus pro- cess. In practice, a UNL is not chosen randomly, but rather with the intent to minimize pc . Since nodes are not anonymous but rather cryptographically identifiable, selecting a UNL of nodes from a mixture of continents, nations, industries, ideologies, etc. will produce values of pc much lower than 20%. As an example, the proba- bility of the Anti-Defamation League and the Westboro Baptist Church colluding to defraud the network, is cer- tainly much, much smaller than 20%. Even if the UNL has a relatively large pc , say 15%, the probability of correctness is extremely high even with only 200 nodes in the UNL: 97.8%. A graphical representation of how the probability of Figure 2. An example of the connectivity required to incorrectness scales as a function of UNL size for differ- prevent a fork between two UNL cliques. ing values of pc is depicted in Figure 1. Note that here the vertical axis represents the probability of a nefarious prove agreement is given by: cartel thwarting consensus, and thus lower values indi- cate greater probability of consensus success. As can be 1 |UNLi ∩UNL j | ≥ max(|UNLi |, |UNL j |)∀i, j (3) seen in the figure, even with a pc as high as 10%, the 5 probability of consensus being thwarted very quickly This upper bound assumes a clique-like structure of becomes negligible as the UNL grows past 100 nodes. UNLs, i.e. nodes form sets whose UNLs contain other nodes in those sets. This upper bound guarantees that 3.3 Agreement no two cliques can reach consensus on conflicting trans- To satisfy the agreement requirement, it must be shown actions, since it becomes impossible to reach the 80% that all nonfaulty nodes reach consensus on the same threshold required for consensus. A tighter bound is set of transactions, regardless of their UNLs. Since possible when indirect edges between UNLs are taken the UNLs for each server can be different, agreement into account as well. For example, if the structure of the is not inherently guaranteed by the correctness proof. network is not clique-like, a fork becomes much more For example, if there are no restrictions on the member- difficult to achieve, due to the greater entanglement of ship of the UNL, and the size of the UNL is not larger the UNLs of all nodes. than 0.2 ∗ ntotal where ntotal is the number of nodes in It is interesting to note that no assumptions are made the entire network, then a fork is possible. This is il- about the nature of the intersecting nodes. The intersec- lustrated by a simple example (depicted in figure 2): tion of two UNLs may include faulty nodes, but so long imagine two cliques within the UNL graph, each larger as the size of the intersection is larger than the bound than 0.2 ∗ ntotal . By cliques, we mean a set of nodes required to guarantee agreement, and the total number where each node’s UNL is the selfsame set of nodes. of faulty nodes is less than the bound required to satisfy Because these two cliques do not share any members, strong correctness, then both correctness and agreement it is possible for each to achieve a correct consensus will be achieved. That is to say, agreement is dependent independently of each other, violating agreement. If solely on the size of the intersection of nodes, not on the the connectivity of the two cliques surpasses 0.2 ∗ ntotal , size of the intersection of nonfaulty nodes. then a fork is no longer possible, as disagreement be- tween the cliques would prevent consensus from being 3.4 Utility reached at the 80% agreement threshold that is required. While many components of utility are subjective, one that is indeed provable is convergence: that the consen- An upper bound on the connectivity required to sus process will terminate in finite time. 5

Figure 1. Probability of a nefarious cartel being able to thwart consensus as a function of the size of the UNL, for different values of pc , the probability that any member of the UNL will decide to collude with others. Here, lower values indicate a higher probability of consensus success. 3.4.1 Convergence Since the consensus algorithm itself is deterministic, and has a preset number of rounds, t, before consensus We define convergence as the point in which the RPCA is terminated, and the current set of transactions are de- reaches consensus with strong correctness on the ledger, clared approved or not-approved (even if at this point and that ledger then becomes the last-closed ledger. Note no transactions have more than the 80% required agree- that while technically weak correctness still represents ment, and the consensus is only the trivial consensus), convergence of the algorithm, it is only convergence in the limiting factor for the termination of the algorithm the trivial case, as proposition C3 is violated, and no is the communication latency between nodes. In order transactions will ever be confirmed. From the results to bound this quantity, the response-time of nodes is above, we know that strong correctness is always achiev- monitored, and nodes who’s latency grows larger than able in the face of up to (n − 1)/5 Byzantine failures, a preset bound b are removed from all UNLs. While and that only one consensus will be achieved in the this guarantees that consensus will terminate with an entire network so long as the UNL-connectedness con- upper bound of tb, it is important to note that the bounds dition is met (Equation 3). All that remains is to show described for correctness and agreement above must that when both of these conditions are met, consensus is be met by the final UNL, after all nodes that will be reached in finite time. 6

dropped have been dropped. If the conditions hold for validation”, in which they do not process or vote the initial UNLs for all nodes, but then some nodes are on transactions, but declare that are still partic- dropped from the network due to latency, the correctness ipating in the consensus process, as opposed to and agreement guarantees do not automatically hold but a different consensus process on a disconnected must be satisfied by the new set of UNLs. subnetwork. 3.4.2 Heuristics and Procedures • While it would be possible to apply the RPCA in As mentioned above, a latency bound heuristic is en- just one round of consensus, utility can be gained forced on all nodes in the Ripple Network to guarantee through multiple rounds, each with an increas- that the consensus algorithm will converge. In addi- ing minimum-required percentage of agreement, tion, there are a few other heuristics and procedures that before the final round with an 80% requirement. provide utility to the RPCA. These rounds allow for detection of latent nodes in the case that a few such nodes are creating a • There is a mandatory 2 second window for all bottleneck in the transaction rate of the network. nodes to propose their initial candidate sets in These nodes will be able to initially keep up dur- each round of consensus. While this does intro- ing the lower-requirement rounds but fall behind duce a lower bound of 2 seconds to each consen- and be identified as the threshold increases. In the sus round, it also guarantees that all nodes with case of one round of consensus, it may be the case reasonable latency will have the ability to partici- that so few transactions pass the 80% threshold, pate in the consensus process. that even slow nodes can keep up, lowering the • As the votes are recorded in the ledger for each transaction rate of the entire network. round of consensus, nodes can be flagged and removed from the network for some common, 4. Simulation Code easily-identifiable malicious behaviors. These in- clude nodes that vote “No” on every transaction, The provided simulation code demonstrates a round of and nodes that consistently propose transactions RPCA, with parameterizable features (the number of which are not validated by consensus. nodes in the network, the number of malicious nodes, la- tency of messages, etc.). The simulator begins in perfect • A curated default UNL is provided to all users, disagreement (half of the nodes in the network initially which is chosen to minimize pc , described in sec- propose “yes”, while the other half propose “no”), then tion 3.2. While users can and should select their proceeds with the consensus process, showing at each own UNLs, this default list of nodes guarantees stage the number of yes/no votes in the network as nodes that even naive users will participate in a consen- adjust their proposals based upon the proposals of their sus process that achieves correctness and agree- UNL members. Once the 80% threshold is reached, ment with extremely high probability. consensus is achieved. We encourage the reader to ex- periment with different values of the constants defined at • A network split detection algorithm is also em- the beginning of “Sim.cpp”, in order to become familiar ployed to avoid a fork in the network. While with the consensus process under different conditions. the consensus algorithm certifies that the transac- tions on the last-closed ledger are correct, it does not prohibit the possibility of more than one last- 5. Discussion closed ledger existing on different subsections of We have described the RPCA, which satisfies the con- the network with poor connectivity. To try and ditions of correctness, agreement, and utility which we identify if such a split has occurred, each node have outlined above. The result is that the Ripple Pro- monitors the size of the active members of its tocol is able to process secure and reliable transactions UNL. If this size suddenly drops below a preset in a matter of seconds: the length of time required for threshold, it is possible that a split has occurred. one round of consensus to complete. These transactions In order to prevent a false positive in the case are provably secure up to the bounds outlined in sec- where a large section of a UNL has temporary tion 3, which, while not the strongest available in the latency, nodes are allowed to publish a “partial literature for Asynchronous Byzantine consensus, do 7

allow for rapid convergence and flexibility in network [4] Fischer, Michael J., Nancy A. Lynch, and Michael membership. When taken together, these qualities allow S. Paterson. “Impossibility of distributed consensus the Ripple Network to function as a fast and low-cost with one faulty process.” Journal of the ACM (JACM) global payment network with well-understood security 32.2 (1985): 374-382. and reliability properties. [5] Martin, J-P., and Lorenzo Alvisi. “Fast byzan- While we have shown that the Ripple Protocol is tine consensus.” Dependable and Secure Computing, provably secure so long as the bounds described in equa- IEEE Transactions on 3.3 (2006): 202-215. tions 1 and 3 are met, it is worth noting that these are [6] maximal bounds, and in practice the network may be Alchieri, Eduardo AP, et al. “Byzantine consensus secure under significantly less stringent conditions. It with unknown participants.” Principles of Distributed is also important to recognize, however, that satisfying Systems. Springer Berlin Heidelberg, 2008. 22-40. these bounds is not inherent to the RPCA itself, but rather requires management of the UNLs of all users. The default UNL provided to all users is already suffi- cient, but should a user make changes to the UNL, it must be done with knowledge of the above bounds. In addition, some monitoring of the global network struc- ture is required in order to ensure that the bound in equation 3 is met, and that agreement will always be satisfied. We believe the RPCA represents a significant step forward for distributed payment systems, as the low- latency allows for many types of financial transactions previously made difficult or even impossible with other, higher latency consensus methods. 6. Acknowledgments Ripple Labs would like to acknowledge all of the peo- ple involved in the development of the Ripple Protocol consensus algorithm. Specifically, Arthur Britto, for his work on transaction sets, Jed McCaleb, for the original Ripple Protocol consensus concept, and David Schwartz, for his work on the “failure to agree is agreement to de- fer” aspect of consensus. Ripple Labs would also like to acknowledge Noah Youngs for his efforts in preparing and reviewing this paper. References [1] Nakamoto, Satoshi. “Bitcoin: A peer-to-peer elec- tronic cash system.” Consulted 1.2012 (2008): 28. [2] Lamport, Leslie, Robert Shostak, and Marshall Pease. “The Byzantine generals problem.” ACM Transactions on Programming Languages and Sys- tems (TOPLAS) 4.3 (1982): 382-401. [3] Attiya, C., D. Dolev, and J. Gill. “Asynchronous Byzantine Agreement.” Proc. 3rd. Annual ACM Symposium on Principles of Distributed Computing. 1984. 8