Elastic Whitepaper

Tuesday, June 19, 2018
Download document
Save for later
Add to list

Whitepaper Revision 0001 c xel.org 2018 May 24, 2018 Embarrassingly Parallel Grid-computing on the Blockchain A High-level Overview of the Machinations of XEL www.xel.org ABSTRACT In this overview, we discuss a Blockchain-based, decentralized grid-computing platform which allows tasks with immense computational require- ments, usually set in the scientific domain, to utilize a virtually infinite number of computational nodes over large physical distances. In contrast to traditional volunteer-computing where computer owners altruistically donate their spare computational power to one or more research projects, these projects can now create an additional incentive encouraging users to participate by getting rewarded with automated crypto-currency pay- ments in exchange for conducted work. This overcomes the problem that most participants in traditional volunteer-computing network are both typically found in scientific communities and as such very picky about the projects they volunteer to; while an altruistic project like [email protected] may excite and attract plenty of volunteers, that does not necessarily apply to a “boring” corporate calculation with little to no scientific impact. By introducing crypto-currency payments, our system creates an additional incentive, eliminates a necessary scientific value for attracting volunteers, and contributes to the creation of a system that is open and equally suitable for both scientific and non-scientific use-cases. Key words. Multiple Data Stream Architectures; Blockchain; Distributed Ledger Technology; Performance 1. Introduction entific advances but which are still in need of immense computa- tional resources. In this paper, we address this issue and propose A vast majority of the world’s computational power is not cen- a distributed computing system that builds around Blockchain tered around supercomputer centers or laboratories but instead technology and rewards users with crypto-currency payments distributed among millions of personal computers in people’s for sharing their computational resources with scientific projects. homes all around the globe. Volunteer computing[6] is a con- The proposed approach creates a non-altruistic incentive for the cept which was first introduced in 1996 by the Great Mersenne broader general public to contribute computational resources and Prime Search project [3] and comprises a paradigm which uses attracts users outside the boundaries of scientific communities. these resources to conduct previously infeasible scientific com- This allows for the system to be accessible not only for scientif- putations. The potential of distributed computation was quickly ically meaningful computations but also for, e.g., computation- adopted by other projects such as [email protected] [2], which ally intense computations in the corporate world. In the remain- was launched in 1999 and is dedicated to searching for extra- der of this document, we will describe the end-to-end workflow terrestrial life by analyzing radio signals from different parts in detail from the user’s point of view. of the sky. In 2004, BOINC[1] emerged from the [email protected] project, making this technology accessible to the general public allowing virtually any research project to be aided by distributed 2. Paradigm computation. Typical for these approaches is that they take "em- barrassingly parallel" [5] problems, i.e. problems which can be Our goal was to create an open-source Blockchain-driven easily broken up into multiple smaller, independent subtasks. middle-ware system for the distributed computation of embar- These subtasks can then be assigned to a magnitude of differ- rassingly parallel computational problems. A great focus has ent computers which perform the computations all in parallel. In been put on making the platform sustainable, i.e. building a plat- the case of [email protected], e.g. every computational entity analy- form that does not depend on any person, central infrastructure ses just a small portion of the sky which can be done indepen- which is required to be maintained, or any continuous funding. dently from the analysis of other portions. That being said, virtu- We believe that Blockchain technology holds great potential to ally any embarrassingly parallel problem can be accelerated this achieve this goal; it offers a solid layer of robustness as it is a way. This includes, but is not limited to, evolutionary compu- highly redundant decentralized system and allows us to rethink tation, metaheuristics, simulation of particle physics, numerical the way we design systems from the ground up. In particular, weather prediction, CNF solving and crypto-currency mining. we leverage this new technology to build a “better” version of However, volunteer computing relies on altruistic partici- BOINC on-top of a decentralized digital ledger across an arbi- pants who donate their computational power to one or more trarily scalable and fault resistant network of computers with- research projects and who are most likely found in the scien- out the need for a central authority ensuring fair play with all tific community only. Furthermore, these volunteers typically are participating entities acting as independent equal agents in an very picky about which project they contribute their resources open, free and fair environment. While censorship, a monopoly to; it is very common that a project must provide some scien- on the fair market value of a certain amount of computational tific impact or otherwise gain the interest of volunteers in order power, and favoring certain (more pleasant) projects is theoret- to attract any meaningful amount of computational resources. ically possible in centralized systems, this no longer is the case However, there are many use-cases that do not provide any sci- in such decentralized environment. page 1 of 3

3. Terminology 4.2. Working for Bounties In this paper, the term "scientist" is used in a broader sense and Jobs can be generally understood as programs which take a refers to any entity that seeks to use the computational resources pseudo-random input, run a logic coded by the scientist and out- available in the network in order to solve some complex compu- put whether this particular input constitutes a PoW, a bounty or tational task which is referred to as a "job". Entities who con- nothing. To give a better understanding, such logic could be for tribute their computational resources in exchange for Crypto- example a variant of the Travelling-Salesman Problem[4] with currency payments are called "workers". the random input translating to one particular solution candidate. Furthermore, the bounty criterion could be a check whether that There are two ways workers can earn units of the underly- particular solution candidate – in this example, a Hamiltonian ing Crypto-currency: they can be either rewarded for their on- path from a start to a destination point – has a total cost which is going work on a problem or for finding an actual solution to below a certain threshold. the problem. The on-going commitment is rewarded by allowing Workers are searching for bounty solutions by continuously workers to frequently submit partial (yet incorrect) solutions that generating new pseudo-random inputs to the function until they meet certain criteria. These solutions are called "PoW submis- find an input that yields in a bounty solution. These random in- sions" and the underlying challenge is called "PoW". The term puts used by the logic will need to be calculated by the com- "bounty" in this study will refer to the latter of the two pathways putation node using a methodology we called personalizedInts. and described the actual solutions to the problems. Respectively, Rather than simply using a random number generator to cre- the amounts workers are paid for each of these submissions are ate random inputs for each computation node, personalizedInts termed "PoW reward" and "bounty reward". uses publicly known node and job-specific values such as the worker’s public key, the job’s ID, the block ID of the block the job was announced in and some random noise that the worker can randomly pick. This is necessary to ensure each solution can 4. End-to-end Work Flow be directly tied back to the node that found the bounty solution in order to prevent certain types of solution stealing attacks. If a 4.1. Work Creation node were to intercept a solution and submit it as their own, their public key would be different resulting in a different pseudo- The entire workflow typically starts with scientists composing random input, which would likely result in a solution that does jobs which contain all the logic to solve for the actual problem not meet the threshold required by the job author. as well as information regarding the rewards being offered to When a worker finds a bounty solution, they will submit the incentivize participation in solving the problem. To be more pre- values used to generate the random inputs to the calculation as cise, scientists express their algorithmic problem using ePL, a well as the output of the calculation as defined by the job author. DSL (domain specific language) particularly designed for this If the bounty solution can be verified by the rest of the network, use-case, which is highly parallelizable and allows to define spe- the worker is paid the bounty reward by the scientist. cific criteria that determine when a solution to their problem has been found. While some may argue that introducing a new pro- gramming language may limit the adoption of this decentralized 4.3. Working for PoW Solutions platform, we’d like to point out that it is relatively straightfor- PoW Solutions can be found at almost no additional cost after ward to overlay an IDE (integrated development environment) performing a full evaluation of the scientist’s code while check- which allows scientists to code in familiar languages such as ing if a certain pseudo-random input constituted a valid bounty Python, C, and Java while the IDE converts their code to con- solution. The POW task is simply to perform a MD5 hash of form to the requirements of ePL. the results of each run of the job’s code and compare that to a That being said, workers that find actual solutions – or boun- PoW target hash set by the network. If the computational node’s ties – are rewarded with a "bounty reward" by the scientist paid run produces a hash less than the target, they are eligible for a in units of the underlying Crypto-currency. These bounty re- POW reward. When a PoW solution is found, the inputs and cal- wards are set by the scientist when the job is created and should culation outputs are submitted, similar to submitting a bounty be ideally set to an amount that attracts participation in running solution. The network recalculates the PoW target hash once per their job. As the network grows, competition among scientists to block with a goal of ten PoW rewards per block (with a maxi- attract participation in running their job increases; therefore, sci- mum of 25 per block) averaged over all currently running jobs. entists will want to calibrate their bounty rewards to a fair market That is, given a block-time of 60 seconds, the network tries to value to ensure interest in their job. converge towards a state where all jobs, in total, payout ten PoW While bounties are the primary incentive to attract compu- rewards per minute. Similar to the case outlined above, workers tation nodes to work on a job, there is a risk that nodes could get rewarded the PoW rewards for every PoW submission that be working on jobs where no bounty solution even exists. This the network can positively validate. could be due to an intentional malicious act by the scientist, or simply a bug in the job’s code. To mitigate this risk, scientists 4.4. Validation of Results will be required to additionally provide PoW rewards. PoW is defined as moderately hard tasks which are easily verifiable and The validation is the most important part of the system. How- which are randomly found at a certain rate, regardless of how ever, while its absolutely critical to perform this validation step, hard or easy the underlying task is, in order ensure workers stay many calculations are far too complex and time-consuming to motivated. PoW rewards should be calibrated to roughly match perform in a timely manner across the entire network. While cal- the average electricity cost of running a computational node in culations, which are carried out on the worker’s hardware, can order to alleviate any concerns of participants that they could run for a while without bothering anyone, that cannot be said potentially lose money by running the job. for the verification part which every single node in the network page 2 of 3

www.xel.org et al.: Embarrassingly Parallel Grid-computing on the Blockchain must be able to execute in a reasonable amount of time. Hence, statement that requires scientist to provide an upper limit for there is a very strict requirement on how complex a verification the number of iterations which allows the pre-processor to es- can be. If a scientist happens to have a too complex algorithmic timate the maximum computational effort it will take to execute logic, he has the option to provide an alternative simplified ver- the loop. That allows for a precise WCET (worst-case execu- ification logic that checks some key variables from the solution tion time) estimation of the entire program logic which would to ensure the results were valid and at the same time stays within be not the case for deeply nested, highly-pipelined traditional the permitted boundaries. loops. Now, to ensure the logic runs in a timely manner, the During the verification process – for both bounty and PoW pre-processor estimates the amount of computation effort using submissions – it has to be ensured that the solution was actually WCET analysis; only programs with a verification (and execu- found by the node submitting it. This part is simple: the pseudo- tion) WCET below a certain threshold are allowed to be sub- random input for the verification is derived from the values the mitted to the Blockchain in order to prevent clogging down the worker has submitted. Since these values contain the workers re- entire network. ported public-key, it can be easily verified that the public-key in Additionally, the DSL does only allow to use an isolated, re- the values actually matches the worker’s public key that tried to stricted and highly limited memory space which prevents mali- claim credit for that solution. In the case of a PoW solution, we cious code to both eavesdrop on any other data and attack the additionally need to ensure that the computation node is actually worker’s platform by allocating more resources on the target working on the calculation coded in the job and not just search- platform than are available. ing for MD5 hashes below the target. This is accomplished by requiring PoW solutions to include the same data required for a bounty solution, plus the MD5 hash that the computation node 6. Summary found while running the job. Because the verifying nodes now have all the data required to verify a bounty along with the PoW In this paper, we have described the general principles of public- hash, they can simply run the bounty verification logic described resource volunteer-computing paradigms and identified a few as- above and then apply an MD5 hash to the outputs. This will al- pects that could be improved upon. Hence, we have proposed a low the node to quickly verify that the submitting node is valid slightly adapted version of these traditional approaches, XEL, as well as confirming that the submitted POW hash does, in fact, that leverages Blockchain technology in order to allow scien- correspond to the output from running the actual job. tists to create an additional incentive for workers to share their It is felt that the proposed verification steps offer a robust computational resources. These adaptations attract a number of solution that can deter the most common types of attacks that people to participate even if they have no further interest in the would be expected to be encountered in this type of setting. scientific projects they are working on – a preliminary assump- tion in traditional volunteer-computing. We have given a brief overview of all important aspects of the end-to-end workflow, 5. Benefits of an intermediary DSL beginning with the creation of new jobs all the way to the verifi- cation of solutions that workers find. While very similar to tradi- 5.1. Handling large numbers of platforms tional volunteer-computing, a number of crucial changes to the While it is expected that the bulk of platforms contributing com- way jobs and solutions are processed had to be made since the putational resources will comprise traditional x86 and x86-64 monetary incentive is likely to attract all sorts of different attacks architecture CPUs, we focused on designing the system in a on the system. The exact details of this approach are beyond the way that virtually any platform is supported, e.g. GPUs running scope of this high-level paper but will be covered in a follow-up OpenCL or Cuda and FPGAs which operate using logic gates in- full paper version. stead of a sequential stream of byte-code instructions. Our DSL provides top-down high-level capabilities to describe algorithms in a platform-independent manner using a very generic set of References operators. This DSL then allows workers to locally translate the logic into platform-specific code and locally compile it for the [1] David P Anderson. “Boinc: A system for public-resource desired target architecture. computing and storage”. In: proceedings of the 5th IEEE/ACM International Workshop on Grid Computing. IEEE Computer Society. 2004, pp. 4–10. 5.2. Preventing Malleability [2] David P Anderson et al. “[email protected] home: an experiment A primary focus of the DSL is to ensure that all jobs terminate in public-resource computing”. In: Communications of the gracefully in a timely manner and post no thread to the worker’s ACM 45.11 (2002), pp. 56–61. platform whatsoever. [3] Matt Cutts. “An introduction to the gimp”. In: Crossroads The DSL consists of a pre-processor that checks that any sub- 3.4 (1997), pp. 28–30. mitted job conforms to the standards of the proposed language, [4] Robert L Karg and Gerald L Thompson. “A heuristic ap- as well as a runtime component to prevent crashes of the pro- proach to solving travelling salesman problems”. In: Man- gram. Overflows, division by zero and other illegal statements agement science 10.2 (1964), pp. 225–248. identified by the pre-processor will cause a job to be rejected [5] Wikipedia. Embarrassingly parallel — Wikipedia, The by the network. Also, because many of these illegal statements Free Encyclopedia. http : / / en . wikipedia . org / w / won’t manifest themselves until the code is run, the DSL con- index.php?title=Embarrassingly\%20parallel& tains a runtime component that continuous checks for illegal oldid = 830119483. [Online; accessed 17-May-2018]. conditions. If such a condition is encountered, the offending step 2018. in the logic is bypassed rather than causing the job to crash. [6] Wikipedia. Volunteer computing — Wikipedia, The Free To prevent jobs from running endlessly, traditional FOR, Encyclopedia. http://en.wikipedia.org/w/index. WHILE, and DO loops as well as GOTO statements have been re- php ? title = Volunteer \ %20computing & oldid = moved from the language. Instead, we have provided a REPEAT 840807707. [Online; accessed 17-May-2018]. 2018. page 3 of 3