Arweave Whitepaper

Friday, February 19, 2021
Download document
Save for later
Add to list is a project of OpenBook

Arweave Lightpaper Version 0.9 Samuel Williams William Jones April 24, 2018 Contents 7 Conclusion 10 1 Introduction 2 Abstract 2 Background 2 Typical blockchains have several major well- 3 Motivation 3 known problems with data storage. These problems require new third-party protocols to 4 Technology 4 be integrated on-top of existing blockchains, 4.1 Blockweave . . . . . . . . . . 4 as fees are too high for on-chain storage to be 4.2 Proof of Access . . . . . . . . 5 feasible. Therefore, with typical blockchains 4.3 Wildfire . . . . . . . . . . . . 5 there is always going to be a cost to access 4.4 Blockshadows . . . . . . . . . 7 content, and content is never stored perma- 4.5 Democratic Content Policy . . 7 nently. As the demand for data storage grows 4.6 Discussion . . . . . . . . . . . 8 exponentially, the need for a decentralized 4.6.1 Storage Pools . . . . . 8 low-cost data storage protocol that can scale is a necessity. 5 Building Apps 8 In this work we present Arweave – a new 5.1 Client-Server Architecture . . 8 blockchain like structure called the block- 5.2 Serverless Architecture . . . . 9 weave. The blockweave is a platform de- 5.3 Event Based . . . . . . . . . . 9 signed to provide scalable on-chain storage in 5.4 Trustless and Provable . . . . 9 a cost-efficient manner for the very first time. As the amount of data stored in the system 6 Use Cases 10 increases, the amount of hashing needed for 6.1 Authenticity . . . . . . . . . . 10 consensus decreases, thus reducing the cost of 1

storing data. The protocol’s existing REST Further still, a number of governments are API makes it trivially simple to build de- taking increasing steps to censor and remove centralised applications on top of the block- access to politically sensitive information on weave, reflecting Arweave’s focus on the de- the internet [13, 5, 4]. Equally with media veloper community and their ability to drive and news organizations, where we once held adoption of emerging and novel protocols. physical and irrevocable copies, we now sim- In this paper, we also introduce novel con- ply access the information and then discard cepts such as; block-shadowing, a flexibly- it. It has become commonplace for media sized transaction block distribution algo- organisations to update the contents of their rithm that improves on current ‘sharding’ articles over time. While this provides a num- techniques by other blockchains, a self- ber of advantages over the previous system, optimising network topology, and a new con- most prominently, the ability to disseminate sensus mechanism called proof of access. real-time updates about unfolding situations, it also allows important context to be lost or become obscured. 1 Introduction In this information age we often succumb to 2 Background the illusion that because information is read- ily available, it can never be altered or lost. All blockchain innovations sit on the shoul- This is foundationally untrue [7]. While, in ders of giants, including Bitcoin itself, a sym- the internet, we have built a monumental phony of data structures, distributed net- system of decentralised information dissem- working and cryptography. We too have ination, we have yet to build a correspond- sought to further the space, solving spe- ing system of permanent knowledge storage. cific shortcomings of existing blockchain net- Modern history is full of examples of the de- works, namely storage, and along the way a struction and loss of vital information, from novel approach to transaction speeds. Most fires at libraries and archives [9, 10, 3, 8], to blockchain technologies today insist that a book burning in authoritarian states [12, 11]. ”full node” must maintain a copy of the entire When we look up information on the inter- blockchain in order to verify future transac- net, we are depending on being allowed ac- tions. While the Merkle data structures that cess to centralised stores of that data. Ac- make this possible are in and of themselves cess to the servers that hold this informa- an impressive feat and add a layer of unparal- tion can be revoked by their owners at any leled security, we feel that some performance time. Similarly, as serving information on enhancements around this process could re- the internet requires the paying of server and duce the burden of synchronization for a full upkeep costs, websites can often simply dis- node. We present in section 4 several tech- appear when funds are no longer available. nologies that address block, node, and wallet 2

synchronization. Ethereum [12]. In the past, archives (inter- The full blockchain requirement is per- net or otherwise) have typically been main- haps even more of a hindrance for exist- tained by a single institution (or even indi- ing blockchain technologies when it comes to vidual), making them vulnerable to two pri- storing data. In the case of Ethereum, a mary forms of manipulation. The first of decentralised world computer, storage is in- these is through the modification of docu- credibly costly using their native token. Ar- ments during their storage [2]. The second is weave’s primary motivation is to make per- that the documents could have been forged manent, immutable storage a reality, in the or modified prior to their entry into storage same way it is represented in Ethereum. [1]. For example, the many works attributed However, high fees make this storage increas- to Socrates that are believed to have been ingly impractical. While it is possible to store penned by his disciples [6]. Arweave solves data on the Ethereum, previous attempts both of these problems. Once the document have been impractical due to data storage is stored on the weave, it is cryptographically costs. linked with every other block on the weave. Other blockchain technologies have focused This ensures that any attempt to change the on improving consensus algorithms between contents of the document will be detected and nodes, notably Stellar Lumens, and dPos ar- rejected by the network. In this way, no sub- chitectures such as Ark and Neo. While this version of the information on the weave is may improve transaction speeds, the burden possible. Arweave is a browsable sister net- of storage still remains the long term hurdle work to the internet, providing the long-term, many of these networks will face. By focusing permanent data storage features that the in- on solving storage first, we have experienced ternet desperately needs but currently lacks. several performance enhancements that can A critical component of the Arweave sys- be applied to facilitate high-throughput cur- tem is designed for developers to easily build rency transactions. applications that interface with, create, and use data from the network. These apps, built with a language agnostic REST API, will act 3 Motivation as a node in the network that listen to the network. The functions of these apps will be We have designed and implemented a wide and varied, from decentralised and im- blockchain network where permanent, low mutable social networks to discussion web- cost storage is a reality. Weaving stor- sites and news aggregators. In order to sub- age access into consensus, combined with mit information to the weave, a small number novel approaches to transaction bundling of tokens will be required. These tokens will and arbitrarily sized blocks, creates a high- be used to pay miners for their work in main- throughput cryptocurrency that improves on taining the weave and network, as well as dis- other cryptocurrencies like Bitcoin [10] and incentivizing the propagation of spam. This 3

represents a great improvement over typical Instead, Arweave introduces two new con- centralized storage systems. Similarly, it em- cepts that allow nodes to fulfil key network powers individuals to ensure that the infor- functions without possessing the whole chain. mation they personally care about will be The first of these concepts is the block hash perpetuated into the future. The incentive to list, a list of the hashes of all previous blocks. maintain the weave also increases as the net- This allows old blocks to be verified, and po- work and documents will reinforce the value tential new blocks evaluated effectively. The of the tokens. As these effects compound, we second of these concepts is the wallet list, a expect Arweave tokens to become a valuable list of all active wallets in the system. This al- asset for the information age; inseparably and lows transactions to be verified without pos- intrinsically linked to a vast trove of impor- sessing the block in which the last transac- tant documents. tion was used. Using these blockhash list and wallet lists synchronized by the network and available for download by the miners, nodes 4 Technology are able to join the network and participate in mining the weave almost immediately. Arweave is built on four core technologies Further, instead of having each miner ver- that work together to create low cost, high- ify the entire block structure from the gene- throughput, permanent storage on a new sis block to the current block when they join blockchain. These innovations are: the network, Arweave uses a system of ‘on- • Blockweave going verification’. When miners join the Ar- weave network, they will download the cur- • Proof of Access rent block and retrieve the blockhash and wallet lists from the current block. Since • Wildfire these blockhash and wallet lists have been • Blockshadows continuously verified through the ongoing progress of each block, new miners can start While these technologies are intertwined, participating immediately without verifying each plays a pivotal role in creating a new the entire weave themselves. Full weave ver- type of network suited for both fast transac- ification is of course available to any node tions and low cost permanent storage. that wishes to perform it. In this way, miners do not need to find the previous transaction associated with a wallet in order to verify a 4.1 Blockweave new transaction. Instead, miners would sim- A well known property of most blockchains is ply need to verify that the transaction has that every block must be stored to participate been appropriately signed by the wallet own- in validating transactions as a “full node”. ers private key. To prevent recall block forg- This is not the case with Arweave. ing attacks, the hash of the blockhash list is 4

Figure 1: An illustration of the blockweave data structure, demonstrating the link to both the previous block and the recall block. distributed with every new block. the recall block, to independently verify that the new block is valid. 4.2 Proof of Access 4.3 Wildfire Arweaves consensus mechanism is based on proof of access (PoA) and proof of work As a data storage system, Arweave requires (PoW). While typical PoW systems only de- not only the ability to store large amounts pend on the previous block in order to gener- of information, but also to provide access to ate each successive block, the PoA algorithm that data in the most expedient manner pos- incorporates data from a randomly chosen sible. Further, an important part of the Ar- previous block. Combined with the block- weave system is costless access to data at the weave data structure, miners do not need to point of request. Subsequently, the Arweave store all blocks (forming a blockchain), but has an added layer of incentives to encourage rather can store any previous blocks, incen- miners to share data freely. tivised by PoA and wildfire, forming a weave Wildfire is a system that solves the prob- of blocks, a blockweave. The ‘recall block’ lem of data sharing in a decentralised net- to incorporate into the next block is chosen work by making the rapid fulfilment of data by taking the hash of the current block and requests on the network a necessary part of calculating its modulus with respect to the participation. Wildfire works by creating a current block height. ranking system local to each node that de- The transactions in the recall block are termines how quickly new blocks and trans- hashed alongside those found in the current actions are distributed to peers, based on how block in order to generate the next block. quickly they respond to requests and accept When a miner finds an appropriate hash, data from others. Peers are served in the they distribute the new block along with the order of their rank, with poorly performing recall block to other members of the network. peers being blacklisted from the network en- This allows the other members of the net- tirely. Peers are financially incentivised to work, even those without their own copy of stay well positioned in each other’s rankings 5

Figure 2: Illustration of the wildfire system. Each node ranks its peers based on how favourably these peers have behaved to them previously. so that they can spend the largest amount of are prefered. In practise, the wildfire mecha- time efficiently mining. nism builds a network topology that maps the underlying physical connection substrate of This strongly encourages nodes in the sys- the internet, adapting to changes in its archi- tem to behave in the most friendly manner tecture over time. Overall, the wildfire sys- possible to other peers, without cost to those tem ensures high speed distribution of new who are receiving the data, even those who blocks and keeps data available with short may potentially be making one-time requests. latency. Even further, it creates a network topology that adapts to the most efficient routes for global distribution, as connections that allow fast transfer of new data around the system 6

4.4 Blockshadows network, and consensus about blocks to be achieved at near network speed. Further, In a traditional blockchain system, when a this system ensures transaction fees do not new block is mined, each entire block is dis- increase dramatically when network usage is tributed to every node in the network, no high and a theoretical limit on transaction matter how much of the block data a node throughputs on an optimistic 100mbps net- already possesses. This is not only an enor- work is around 5000 transactions per second. mous waste of data, but significantly slows down the rate at which a network can come to the consensus about a block. Arweave 4.5 Democratic Content Policy therefore introduces a new technology, block- To support the freedom of individual partic- shadows that not only minimises this waste ipants in the network to control what con- of data, but enables fast block consensus and tent they store, and to allow the network as massive transaction throughput. a whole to democratically reject content that Blockshadowing works by partially decou- is widely reviled, the Arweave software pro- pling transactions from blocks, and only vides a blacklisting system. Each node main- sending between nodes a minimal block tains an (optional) blacklist containing, for “shadow” that allows peers to reconstruct a example, the hashes or substrings of certain full block, instead of transmitting the full data that it doesn’t wish to ever store, and block itself. These blockshadows specifically will never write to disk content that matches contain a hash of the wallet list and hash this. These blacklists can be built by indi- list, and in place of the transactions inside viduals or collaboratively, or can be imported a block, only contain a list of transaction from other sources. hashes. From this information (likely only At a local level, these blacklists allow nodes a few kilobytes), a node who already holds to control their own content, but the sum all of the transactions inside the block and of these local rejections also creates network an up-to-date hash and wallet list can recon- wide content rejection. Content that is re- struct an entire block of almost arbitrary size. jected by more than half the network will To facilitate this, nodes will also immediately not only be rejected by each of those indi- share transactions with one another, but only vidual nodes, but will also be rejected by attempt to place transactions inside a block the wider network as a whole. This creates once they have a high certainty that other a democratic network-wide content rejection nodes in the network also have the transac- system that can merge blacklists across a va- tion. riety of cultures and opinions into a tiny, spe- The result of this blockshadowing system cific blacklist of content that is universally re- is a fast and flexible block distribution sys- viled. This near universal, democratic black- tem that allows transactions to be processed list shields the network from outside censor- as fast as they can be distributed around the ship by a small number of actors while still 7

allowing it the freedom to protect itself in a the network. There are several architectures democratic manner. that can be built on top of the weave. 4.6 Discussion 5.1 Client-Server Architecture 4.6.1 Storage Pools Traditional web or native applications have One potential theoretical attack against the a client-server architecture. A server run- Arweave that has become extraordinarily ning the cloud will be “Arweave enabled”, large is that miners may work co-operatively interacting with one or more Arweave nodes, to maintain a single copy of the weave, which reading and writing data on behalf of clients. they all access to retrieve recall blocks. While These services can be websites with clients this kind of behaviour may at first seem prob- as visitors, or they can be native applications lematic, this is not in fact the case. If such passing client requests to a server operated ‘storage pools’ were employed by a large pro- by the developers. These servers will need portion of the miners, the incentive for other to maintain a float of AR tokens in order to miners to store rare blocks increases. This ensure that requests for writing data can be is because if the centralised stores become processed. Reading data from the weave how- unavailable, miners with a copy of the rare ever is still free using this architecture. blocks will be highly likely to receive the Monetization potential for this architec- reward when that block becomes the recall ture is simple. A developer will need to ac- block in the future. This self-interested be- crue more value through advertising, monthly haviour provides a risk-offsetting function to subscriptions or direct payments for a wrap- the network, which scales as the potential per “credit” within their application, than for data loss (caused by centralised storage the amount of AR tokens they are utilizing pools) grows. to power their storage. There are many ap- plications for permanent immutable storage. 5 Building Apps For example, storing quantum resistant, en- crypted legal case files, identity or medical Applications using the weave can be built records. While some legislation needs to ac- using a simple REST API. The REST end- commodate sensitive information storage, ge- points are HTTP and access the network di- ographical boundaries and the right to be rectly, such that any Arweave wallet is capa- forgotten, this can also be somewhat miti- ble of reading and writing data. The client gated through encryption and key manage- only needs to bring their Arweave wallet to ment. Several revenue generating models can a website through a Chrome extension or na- be layered on top of the weave, with the pri- tive application with Arweave wallet integra- mary value proposition being permanent im- tion, in order to read or write data from/to mutable storage on-chain. 8

5.2 Serverless Architecture 5.3 Event Based In the early days of Twitter, there was a thriving ecosystem of cottage industry appli- Applications can live on the weave itself, cations and developers building on top of the accessed by a client through an Arweave “firehost” APIs that were streaming tweets enabled browser. Due to the ubiquity of to anyone willing to pay for access. This is browsers and proliferation of web technol- not the case anymore, and in the wake of the ogy, it makes most sense to store these ap- Facebook Cambridge Analytica fiasco, many plications as standard frontend web applica- “trusted partners” of these services that pro- tions using HTML/CSS/JS. However, if the vided data analytics to their clients are being client’s native application included an inter- arbitrarily shut off. preter/parser for different languages such as Arweave is a decentralised network of pub- LLVM bytecode or scripting language like lic data and thus can never censor data ac- Python, they could run on the client and cess or the data itself, with the exception of perhaps benefit from the same upgradability democratically rejected content. This means found in web applications. that developers are free to build on top of Arweave and can listen for incoming data us- Developers will not only be able to de- ing the REST API. As events are triggered, ploy serverless applications to Arweave, these the listeners will fire the appropriate function applications will also be able to write per- calls of the clients subscribed to those events. sistence and provable state to the network. Developers need not fear being throttled or Since Arweave does not impose a particular shut down, as the network is incentivised to data structure, developers are free to store provide them with reliable access to the data their data in the format that makes the most feed. sense for them. If the application is best served by a highly optimized Merkle struc- ture such as the one found in the Ethereum 5.4 Trustless and Provable Virtual Machine (EVM), it can be easily im- plemented on the weave. If more text blob Application architectures can be designed style storage is what the developer is looking such that information needing to be stored for, this is trivial as well. and guaranteed as tamper-proof are easily implemented. Additionally, provably fair Serverless applications are extremely inter- runtime code can be stored on the weave esting as they can write their own data. Lay- and interpreted directly by the client. Using ering on distributed computation will, for ex- the transaction ID of the content, the client ample, allow the training of neural networks can verify the payload from the weave prior to store their results, possibly sharing their to computation and be guaranteed that code resultant models with other nets. they are running is both trustless and prov- 9

ably fair, i.e. it is the same code that other and a high-throughput cryptocurrency. The clients are running. This opens up interesting Arweave protocol is made possible through possibilities for trustless random number gen- the use of a new blockchain-like data struc- erators and other oracle-based services per- ture called the blockweave; flexible size trans- haps serving other blockchain networks. action block distribution via blockshadowing; a new consensus mechanism reducing depen- dency on proof of work called proof of access; 6 Use Cases and a self-optimising network topology called wildfire. Much like the Bitcoin network, our Permanent storage has several use cases. technical advancements in isolation are not Specifically, regulations requiring the archiv- terribly complex; however, when combined ing of documents up to a certain number of to form the whole of the network, the emer- years. Provable media reporting, academic gent behavior is extremely powerful. We have research and immutable records are becoming seen from our testnet results that secure, re- increasingly important in our modern world liable and immutable data storage is possible of echo chambers and proliferation of fake on a public, permissionless and decentralised news. network protocol. In addition to data stor- age, arbitrary size blocks make a secure high- 6.1 Authenticity throughput cryptocurrency possible without having to resort to complicated consensus Too often the legal system is tied up with lit- mechanisms such as dBFT or dPoS. igation over the authenticity of documents. Arweave is tightly woven into the fabric of Arweave solves this problem by providing an the internet through its REST API and sev- indefinite and verifiable store of any digital eral revenue generating businesses are being content from an author. In 2017, the state built using the Arweave mainnet. Bridges be- of Delaware ruled to have blockchain evi- tween Arweave and other popular cryptocur- dence admissible in court proceedings. These rencies, secure computation, and smart con- records could dramatically speed up disputes tract protocols will enable a low cost and per- over artistic attribution and intellectual prop- manent data store to be easily integrated into erty battles. The effects are twofold for the technology stack of decentralised applica- the creative economy, allowing artists to li- tions. A fully globalized world of information cense their work to others instantly and avoid and financial exchange requires permanent frivolous litigation. records. Through a combination of cryptog- raphy and distributed systems, we have pro- 7 Conclusion vided the basis for those permanent record- ings. We hope Arweave will become an essen- We have presented a new blockchain network tial companion to existing internet protocols powering low cost immutable data storage such as the world wide web; working with 10

others to build a more open and transparent future. nning-center-fire-please-help-rebuild/. [9] Birmingham Public Libraries. Notes on References the history of the Birmingham Public Li- braries, 1861-1961. Birmingham Public [1] The national archives: Investigation Libraries Birmingham, 1962. into forged documents discovered amongst authentic public records. [10] Satoshi Nakamoto. Bitcoin: A peer-to- peer electronic cash system, 2008. /details/r/C16525. [11] Jonathan Rose. The holocaust and the [2] North’s ex-secretary book: destruction and preservation. Univ tells of altering memos. of Massachusetts Press, 2008. [12] Gavin Wood. Ethereum: A se- us/north-s-ex-secretary-tells-of-altering- cure decentralised generalised transac- memos.html. tion ledger. Ethereum Project Yellow [3] The patent fire of 1836. Paper, 151, 2014. of-1836/patent-act-of-1836-patent-fire- [13] Xueyang Xu, Z. Morley Mao, and of-1836. J. Alex Halderman. Internet Censor- ship in China: Where Does the Filtering [4] Mustafa Akgul and Melih Kirlidog. In- Occur?, pages 133–142. Springer Berlin ternet censorship in turkey. Internet Pol- Heidelberg, Berlin, Heidelberg, 2011. icy Review, 4(2):1–22, 2015. [5] Fernando Baez. A universal history of the destruction of books: From ancient Sumer to modern Iraq. Atlas Books, 2008. [6] Anton-Hermann Chroust. Socrates–a source problem. The New Scholasticism, 19(1):48–72, 1945. [7] Anne Frank and Storm Jameson. Anne Frank’s diary. Vallentine, mitchell, 1971. [8] Brewster Kahle. Fire update: Lost many cameras, 20 boxes. no one hurt., 2013. 11