In the simplest of terms, a blockchain is data that is processed and recorded by a group of computers, who work together to ensure the authenticity and security of these data transactions. In a more abstract and forward-thinking perspective, it’s the potential future of financial transactions, one that is not bound by global location, or beholden to overseeing bodies.
What is Blockchain?
One of blockchain’s main features is the way it records data, which is:
Immutable means that no hackers can modify the transaction records; transparent implies that everyone can see and verify the transactions on a blockchain through the internet; and decentralised means no single entity can govern the whole network.
Bitcoin was the first application utilising blockchain technology (ironically, the term ‘blockchain’ was introduced after Bitcoin). The technology is now utilised in other cryptocurrency projects or business applications such as trade finance, remittance, e-commerce. Even Maersk, a shipping and transportation consortium, has unveiled plans for a blockchain solution to streamline marine insurance.
Blockchain technology is still under active research and development to make it more useful in daily life.
Introduction to the Distributed System
Blockchain is a distributed system. Many problems that blockchain encounters are already discussed or even solved in a distributed system.
A distributed system is one in which hardware or software components located at networked computers communicate and coordinate their actions only by passing messages.
The key characteristics of a distributed system are:
- No global clock
- Independent failure
Concurrency means multiple computations are happening at the same time in different machines. It may seem intuitive, but the complexity arises when you think about how numerous machines should work together (see below).
No global clock
In a distributed system, every participant, or machine, has equal weight on deciding what is right or wrong. Machine A may believe now is 11:13 a.m., while machine B may believe it is 11:14 a.m. Network delay may further complicate the situation; even if two machines have the same time locally, they won’t know how long the network has taken to transfer the data. There is always a potential noise involved when a machine has to confirm the time from its neighbour. There is no single source of truth in a distributed system.
Independent failure is best explained as a hypothetical situation. For example, ‘machine A’ may need data from ‘machine B’ to continue its work, so it needs to communicate to machine B and wait for the response. However, machine B can fail (e.g. shut down due to overheating), and the network can delay arbitrarily or even disconnect. The system designer must design with communication and failure responses in mind to ensure the system stays intact.
The Byzantine Generals Problem
The Byzantine Generals Problem is a computer system condition, particularly distributed computing systems, where components may fail, and there is imperfect information on whether a component has failed.
To illustrate the problem clearly, let’s start with a story:
A group of generals, each commanding a portion of the Byzantine Army, encircle a city. They must decide whether to attack or retreat. But whatever they decide, the most important thing is that they reach a consensus. But the consensus is difficult to reach because generals don’t know the decisions of other generals.
Consider the following:
- There are three generals, general A, B and C
- The generals must attack their enemy at the same time. Otherwise, they may risk failure
- The generals have no effective way to communicate instantly
- Therefore, they need to send a courier to others to transmit the message
- The generals need to reach a consensus before attacking
- They must confirm that the other generals will attack at the same time
- Therefore, the generals must rally messages and confirmation among each of them before launching an attack (For simplicity’s sake, the animation only shows one general sending out and receiving two confirmations. In a true consensus model, the other two generals will need to reciprocate this action)
The problem gets more complicated when we consider traitors may exist. We have no way to guarantee all of the messengers are trustworthy, and on top of that, a messenger could be captured and forced to deliver a forged message.
From the above story, we can infer that:
The generals in Byzantine represent nodes on a chain
Each consensus is formed by a group of generals representing a block (i.e. set of valid transactions). All the generals must confirm each other’s decision to reach a consensus before launching a coordinated attack. Similarly, in a blockchain, all nodes must agree on the next block to be written.
Therefore, nodes are subject to failure when:
One of the nodes is sharing inconsistent information (malicious node). Or, in another scenario, fail to respond due to network failure.
This is why everyone must acknowledge the information everyone else knows. At the same time, everyone needs to be aware of the information each other has; it creates a scenario where the information acknowledged and known by the majority will be the final decision, i.e. consensus.
What is Consensus?
It is not hard to understand what is consensus if you can understand the Byzantine Generals Problem. It requires an agreement among a number of processes (or agents) for a single data value.
Consensus protocols must be fault-tolerant or resilient, as some of the processes (agents) may fail or be unreliable in other ways. The processes must somehow put forth their candidate values, communicate with one another, and agree on a single consensus value.
Those participating in a decentralised network do so using decentralised servers called nodes. Each node needs to agree on a preconceived set of rules (called ‘consensus mechanisms‘) to participate in the blockchain network and reach an agreement. Using these mechanisms, we can solve the Byzantine Generals Problem.
Consensus has been studied since the 1970s.
Some notable results include:
1980: Byzantine exact consensus for 3f+1 nodes
1983: Impossibility of distributed consensus with one faulty process (known as FLP impossibility)
1986: Approximate consensus with asynchrony & failure
There are many different variations/considerations of consensus mechanisms:
- No failures vs failures allowed
- Crash failure: Faulty node stops taking any steps at some point
- Byzantine failure: Faulty node may behave arbitrarily
- Drop, tamper messages
- Send inconsistent messages
Synchronous vs asynchronous
Synchronous: Bound on time required for all operations
Asynchronous: No bound
Deterministic vs probabilistic
Deterministic: Always results in an incorrect outcome (agreement, validity)
Probabilistic: Correctness only with high probability
Exact agreement vs approximate agreement
Exact agreement: All nodes agree on exactly identical value (output)
Approximate agreement:Nodes agree on approximately equal values
Identical in the limit as time → ∞
Leader-based vs leaderless
Leader-based: There exists a virtual leader that proposes the end-state result, and other nodes vote for it
Leaderless: There doesn’t exist a virtual leader that proposes the end-state result, everyone may propose a new state and other nodes vote for it
Most studies in the pre-bitcoin era focused on safety and assumed a fixed number of participants. Blockchain’s main breakthrough opened up a new research direction on distributed systems which focuses on liveness and relieved the constraints on the number of participants to enable the permissionless setup.
Differences between Classic Consensus and Blockchain Consensus
Permissioned vs permissionless blockchain
Permissioned blockchain means that permission is required to participate in the network. While in permissionless blockchain, anyone can join the network. In contrast, participants on a permissioned blockchain may have full or selective privileges.
Private vs public blockchain
Private blockchain means that the blockchain is in a private network. For example, you have deployed your private Ethereum on your computer to learn about smart contracts. It is isolated, and others cannot see or join your network. The network members have to be invited and then validated by the network starter or by specific rights and restrictions to participate in the network to gain access to a private blockchain network.
Public blockchain is completely open, and anyone can participate or join in the network to contribute. Examples of public permissioned blockchain include tokens like Ripple or EOS, where a normal participant has fewer privileges than a classified participant like Unique Node List (for Ripple) or Block Producer (for EOS). Public permissionless blockchain include tokens like Bitcoin or Ethereum.
An example of private permissioned blockchain includes launching your own Hyperledger or Ethereum on your laptop. Companies may launch their private blockchain internally as well.
The last type is consortium blockchain, where there is a group of participants (most likely MNCs) forming a network (imagine this as some kind of “intranet”) and only they can join and use the network. Examples of consortium blockchain include R3, Linux Foundation’s HyperLedger, and JP Morgan’s Quorum.
Centralised, Decentralised and Distributed Systems
While a centralised system is easy to understand, differentiating a decentralised vs a distributed system is often confusing.
A distributed system stores and processes data in different locations or computers; the data is usually replicated.
A decentralised system means that no single participant can decide how the system behaves. It must aggregate responses from multiple parties before reaching a decision. A decentralised system must be distributed, while a distributed system may or may not be decentralised. In other words, a decentralised system is a subset of the distributed system.
For more readers wanting to take a deeper dive, this article by Vitalik Buterin may help you to further deep dive into the meaning of decentralisation.
Different layers in a blockchain system
Blockchain is a complicated system. It is a database, so it needs to store data; it is distributed system, so it needs to pass data through the network; it needs to solve the BGP; hence it needs to form a consensus mechanism; it needs to transfer money or some blockchain even allowing writing smart contracts; hence it is an application on top of it.
A blockchain can be roughly divided into four layers:
Blockchain vs distributed ledger technology
You may have heard of the term ‘Distributed Ledger Technology’ (DLT) in some occasions instead of a blockchain. What are the differences between the two?
Since the invention of bitcoin, people have generalised the technology behind it and called it ‘blockchain.’ However, the original bitcoin design is not the only design that may work.
Some people have suggested the use of alternative data structures like Directed Acyclic Graphs (DAGs) or Block Lattice instead of a blockchain. They still fulfil the vision of a decentralised system as originally wished, but we can no longer call them a ‘blockchain.’ Hence, the term DLT was invented to generalise all similar systems that aim to solve consensus problems in a decentralised way.
Alternative data structures
In short, blockchain is a subset of DLT. However, when people say ‘blockchain,’ they often mean DLT, which represents all similar technology in the space.
So now you can hopefully see and better understand the mechanisms behind blockchain – or similar systems – integrating security mechanisms into its process. Moving from one centralised security system, like traditional financial institutions, to the crypto world of sophisticated computing requiring a unified effort and transparency help minimise (but not necessarily eradicate) corruption by individual parties. This is just the beginning, and the system is set to become even better as it evolves.