by Christopher Tozzi, Jun 29, 2017

Byzantine Fault Tolerance: The Key for Blockchains


Blockchain companies have seen billions of dollars in investment and many Fortune 500 companies are now exploring applications with distributed ledgers. By now, it’s clearly on the road to mainstream adoption. But to make the most out of blockchain technology, it helps to have an understanding of the concepts that make it so powerful.

The characteristic known as “Byzantine fault tolerance” (BFT) is one of those concepts worth understanding. The ability to tolerate what computer scientists call “byzantine failures” is a crucial part of blockchains’ ability to maintain reliable records of transactions in a transparent, tamper-proof way.

The Byzantine Generals’ Problem

BFT is so-named because it represents a solution to the “Byzantine generals’ problem,” a logical dilemma that researchers Leslie Lamport, Robert Shostak and Marshall Pease described in an academic paper published in 1982. Essentially, it imagines a group of Byzantine generals and their armies surrounding a castle and preparing to attack. To be successful, these armies must all attack at the same time. But they know that there is a traitor in their midst. The problem they face is one of launching a successful attack with one, unknown bad actor in their system.

The metaphor describes a problem that plagues many computer networks. When a group is trying to make a collective decision about how it will act, there is a risk that traitors within the group may send mixed messages about their preferences. The traitors may tell some members of the group that they wish to do one thing, and tell other members of the group the opposite. This can cause problems for the group’s ability to coordinate its actions effectively. If some members of the group are led to believe one thing and others believe something different, group members will fail to act in unison. The group’s cohesiveness and effectiveness will then break down, exactly as the traitors would desire.

Digital Metaphor

You may be wondering what any of this has to do with computers.

In any distributed computing environment, meaning an environment where multiple users, applications, servers or other types of nodes compose the environment (like a blockchain), there is a risk that rogue or unreliable actors could cause the environment to break apart. A server cluster won’t work well if some servers within it fail to pass data consistently to other servers. A computer network will fail if the devices on it do not agree on a common networking protocol to use when exchanging information.

In order to be reliable, a distributed computing environment has to be designed in a way that solves the Byzantine generals’ problem by providing what’s known as BFT.

Perhaps nowhere is BFT more essential than on a blockchain. Most traditional distributed computing environments have central configuration databases or authorities that can help right wrongs in the event that Byzantine failures occur. But on a blockchain there is, by definition, no central authority. Blockchains’ ability to legitimate transactions based on community consensus alone is what makes them so powerful.

This heavy reliance on community consensus also makes Byzantine faults a particularly important challenge for blockchain. If some members of the community send inconsistent information to others about transactions, the reliability of the blockchain breaks down, and there is no authority that can step in to correct it. So, unless you can place absolute trust in everyone who participates in your blockchain (which you can’t in most situations), you need a way to protect against the Byzantine faults that could occur in the event that some members distribute inaccurate, misleading or malicious transaction information.

Potential Solutions

There’s no single or official solution for byzantine fault tolerance within blockchain systems.

Many of the most influential blockchain systems to emerge so far, including Bitcoin, have relied on a concept called proof of work (PoW). Under this model, anyone who wants to add to the blockchain must perform a work-intensive task using information from the existing blockchain in order to add new information. In the case of Bitcoin, PoW is produced using a hashing algorithm that, by its nature, takes a fair amount of time to execute.

In a PoW system, data can’t be added to the blockchain without a significant time investment on the part of the party adding the data. This provides a practical protection against manipulation of the blockchain because, in order to undermine group consensus, a malicious party would need to invest a great deal of time producing sufficient PoW to exert a meaningful influence on the blockchain.

On a blockchain that is sufficiently large, the PoW requirement effectively provides BFT. This approach also has a limitation, however. It requires the expenditure of a large amount of computational effort for no purpose other than fault tolerance.

An alternative solution — and one that does not require compute-intensive operations — centers on relying on node votes and majority consensus in order to root out faults. The downside to this strategy is that it provides protection against byzantine faults only so long as a relatively large majority of nodes on the blockchain continue to act legitimately. Consensus regarding legitimate transactions could become unclear in the event that the number of rogue nodes began approaching fifty percent.

Why It Matters

The intricacies of BFT may sound like something that only computer scientists or digital currency designers should worry about. To a certain extent, they are. Once a well-designed blockchain is implemented, end users should not have to think about byzantine faults.

Yet, since blockchain technology in many places is still in the design and planning phases, understanding BFT matters for people who want to apply blockchains as novel solutions in areas beyond digital currency. Bitcoin's approach to the byzantine fault issue may not be practical for other types of blockchain applications.

For example, requiring healthcare providers to expend large amounts of computing resources hashing data to produce PoW would be very inefficient. In industries like healthcare, it may make more sense to rely on node votes. Because participants in a blockchain in healthcare are more likely to be altruistic and operate under real identities than are users of a highly anonymous, unregulated system like Bitcoin, the benefits of avoiding PoW may outweigh the risks associated with node voting as the solution to byzantine faults.

BFT is a crucial part of an effective blockchain and there are multiple ways in which it can be implemented. Deciding which approach to take requires weighing the nature and priorities of the community associated with the blockchain an organization wants to build. The solutions to BFT that have made systems like Bitcoin possible may not work well in the blockchain applications of the future.

You might also like

The Schuldschein and Blockchain-Powered Transactions

Christopher Tozzi

Contributing Writer