In computer science, a Byzantine fault is what happens when one data source within a network generates information that is inconsistent with the data generated by other sources. Unless the network is able to handle the fault effectively, the network will cease to function properly.
The term “Byzantine fault tolerance” originated from research in which computer scientists imagined multiple generals in the Byzantine empire trying to coordinate an attack without being able to trust one another, called the “Byzantine generals’ problem.”
As an example of a Byzantine fault, consider a data center composed of multiple servers. Some of the servers host data, while others host websites. The servers hosting websites need to connect to the servers hosting data in order to retrieve and record information from the websites.
A Byzantine fault could occur if one of the data servers is compromised by attackers and, as a result, begins sending inaccurate data to the web servers. If other data servers remain uncompromised and continue sending different versions of the data to the web servers, the web servers would not know which version of the data was the right one. Unless the inconsistency is resolved, the cohesion of the network will break down.
If a network is able to deal with a Byzantine fault in such a way that the network continues operating after the fault occurs, the network is said to have Byzantine fault tolerance.
Byzantine fault tolerance can be implemented in different ways depending on the size of the network involved, the type of information exchanged between network nodes and the type of systems that belong to the network.
In the data center example above, a simple way of building Byzantine fault tolerance into the system would be to set up a master server that is equipped to make decisions about which data servers the web servers can interact with. If the master server detects unusual activity on one of the data servers, it could tell the rest of the network to ignore that server. The network would thus remain intact and the fault would be successfully handled.
Although Byzantine faults can occur on any type of computer network, the problem poses particular difficulties on a decentralized network of the kind used to create a blockchain.
That is because, on a blockchain, there is no master node or central authority that can make decisions for the rest of the network. Decisions are instead made via consensus. As long as a majority of nodes on the blockchain agree about which data is legitimate, that is the data that gets recorded. If one node tries to cheat by sending false data, the other nodes would override the cheater and the data would remain accurate.
It is possible to achieve Byzantine fault tolerance on a blockchain. In fact, proposing a solution to the Byzantine fault conundrum was one of the key innovations that made Bitcoin, the first blockchain, possible. The blockchain concept existed before Bitcoin, but Bitcoin’s big step forward was to create a way to handle Byzantine faults.
The Bitcoin blockchain does this via a consensus framework called proof of work. In order to add data to the Bitcoin blockchain, nodes have to complete hashing operations that require a significant amount of computing power and time. These requirements serve to deter malicious actors because sending inaccurate data to the Bitcoin blockchain (or to any blockchain that uses proof of work) in large volumes requires enormous amounts of energy. By attempting to play by different rules than the rest of the network, bad actors waste mining energy.
The downside of proof of work is that it requires legitimate members of the network to expend time and resources to record legitimate data on the network. In other words, lots of resources are spent on operations whose sole purpose is to achieve Byzantine fault tolerance. For that reason, other frameworks have been developed to address Byzantine faults.
The leading alternative is proof of stake. Under this model, nodes must place cryptocurrency in reserve in order to prove that they have a “stake” in the network, and their ability to confirm transactions is commensurate with the amount of cryptocurrency that they stake. Therefore, in order to override the rest of the network, a malicious node would have to acquire a large amount of cryptocurrency and place it in reserve — enough to give the bad actor more influence than 50 percent of the network’s overall value combined. This requirement makes malicious activity impractical in blockchains that rely on proof of stake.