one of this article looked at blockchain pilot deployment challenges and
suggested that having DevOps skills on the implementation team could help. This
article aims to highlight complex data integration issues as another aspect of
roll out that will be critical in moving blockchains from proofs-of-concept (POCs)
Whether or not blockchains are a “fancy type of database” as Blythe Masters, CEO of Digital Asset Holdings, famously described them, blockchains and distributed ledgers are about data, whether it be related to the processing of financial instruments, tracking of goods in a supply chain or recording of land registry information in mortgage processing.
To date, many blockchain POCs have concentrated, as might be expected, on determining whether the technology can benefit a specific use case, often focusing on features such as immutability and shared ledgers and how they might be applied to the specific application. As a result, it’s common for the details of how the complete application might work in practice to be noted as “TBD” items.
Data management and integration will surely be one aspect of building a complete application that will need to be addressed when implementing blockchain approaches as production pilots. And it’s a big issue to address, spanning business processes, system governance and technology architecture.
At a business process level, it’s important to understand that, contrary to the impression that might be optimistically formed after reading numerous reports from major consulting firms, simply deploying a blockchain is not going to immediately solve entrenched business problems.
For example, in the financial markets, blockchain approaches are often cited as a solution to the slow and costly burden of post-trade reconciliations, which essentially matches up all of the details of a transaction between a buyer and a seller to ensure they match.
Current processes in many markets often see each side of a transaction record separately and then compare or attempt to reconcile them at the end of each trading day. Not surprisingly, when some kind of mismatch occurs, it takes substantial time and effort to identify what data elements have been incorrectly recorded. Sometimes, a negotiation is required to come to an agreement that can cut into profits.
For reconciliations, blockchain’s role is to make a single record of the transaction in a shared ledger. But in order to do this, matching both sides to the transaction needs to occur continuously as soon after the transaction execution as possible. Governance rules need to be agreed upon, determining who will take responsibility for committing the matched transaction to the blockchain. Smart contract technology can be implemented to run within blockchains to control or assist with this matching. At a high level, such an approach is highly desirable from a business perspective, but it will likely require big changes to business processes and for the humans that oversee them.
Fortunately, the ongoing move to electronic trading of financial instruments provides an ideal approach to conducting the reconciliation activity as the transaction occurs, since the output of the trading system is a stream of matched trades, suitable for recording in a blockchain.
Maintaining static reference data on financial instruments — such as security codes, custodian and bank settlement information, interest payment details, etc. — in a single, shared ledger is a good approach to ensuring the data remains consistent and available to both sides of the transaction. But governance issues need to be addressed with regard to ownership of the ledger and to ensure that such information is correctly recorded at the outset. Perhaps the service being developed by CONCUR Reference Data points to what is possible.
From a business process perspective, moving to blockchain-based reconciliation is likely to require organizational effort and upheaval. But once that transformation has been completed, the ongoing operational efficiencies promise to be very significant — billions of dollars of savings across the financial markets industry has been suggested in various reports.
At a technical level, the data management and integration aspect of moving to blockchain also presents challenges, a number of which have only begun to be realized as a result of running POCs.
Existing applications are likely to already leverage some kind of (local) database technology, whether it be relational, NoSQL or something more exotic by nature. While the benefits of moving from local databases to a shared blockchain might be substantial, the cost and risk involved in application redesign, coding, testing and deployment can also be significant.
Approaches to more straightforward integration of blockchains do exist, but they will be highly dependent on individual applications and their design. For example, it might be possible to tap into existing messaging middleware in order to access the same data stream that is being committed to a local database. Also, some databases have “event trigger” interfaces so that when data is written to the local database, it is also made available for other applications via an event-driven API. In these scenarios, a new “agent application” might be implemented to run alongside existing code and the local database, and used to feed data to a blockchain.
Also, before one looks at redesigning an application for a blockchain, one has to consider how blockchains typically store data and any limitations they might exhibit that will affect the overall data management architecture.
In general, blockchains are implemented using a simple “key
value store” database technology running on each node. The open source LevelDB is popular and is used by both Ethereum
and Hyperledger’s fabric. This
technology is generally fast and lightweight, but it is not that functional,
storing data records as unstructured binary large objects (BLOBs). Thus, BLOBs
are flexible in what can be stored in them, but processing must be performed at
the application level to make sense of the content and to subsequently search
on it (by contrast, SQL databases can be searched by specific fields within
Some blockchains, such as R3’s Corda, are built upon a relational database model, which can be queried directly using SQL. But such approaches are currently not common, since the limitations of the likes of LevelDB are only now beginning to surface. In the future, established database vendors and their tried-and-tested technologies may well play a key role in implementing blockchains.
With the majority of blockchains, and even with Corda, which uses the open source H2 database as standard, there are other potential limitations, such as scalability, since blockchains tend to limit how much data is stored within them in order to maintain performance as they scale across nodes. With Corda, for example, it is possible to store documents (such as the legalese related to smart contracts) along with transactions, but only up to a 10MB limit.
As a result of storage limits — and they vary from one blockchain offering to another — an evolving architectural approach is to store large data sets outside of a blockchain (typically referred to as “off chain”) while creating a hash of it and storing the hash on the blockchain with a link to the source data and perhaps other key data.
This hybrid on-/off-chain architecture has elegance in that it leverages the immutability of blockchains in order to provide data integrity, while also making use of storage approaches that are “fit for purpose” for recording large data sets.
However, the hybrid data storage model begs questions as to where the bulk of the source data set is actually stored, how secure it is and sometimes how centralized it is. Some blockchain-aligned cloud storage mechanisms such as InterPlanetary File System (IPFS), Storj and BigchainDB are emerging as potential solutions, but in their current early development state they are unlikely to be deemed as enterprise ready by major corporations.
More likely, the corporate world will turn to traditional commercial cloud
vendors, including Amazon AWS, IBM Cloud and Microsoft’s Azure, as off-chain
storage options. Such clouds are also leading contenders for hosting
blockchains and associated smart contracts, so leveraging them also as a data
store makes sense.
Another reason to architect applications using on- and off-blockchain data sets is to support business analytics, which often work best when driven by column-oriented databases, such as Kx Systems’ kdb+ or SAP’s IQ.
There’s no doubt that the unique properties of blockchain technology will lead to its popularity for many applications, but just as big data architectures like Hadoop is not a replacement for databases from the likes of Oracle and MongoDB, so blockchain technologies will be implemented as one element of a holistic data management architecture. And it’s not going to be easy.
What you get:
1) The Distributed Ledger newsletter delivered once a week
2) Access to curated top content & exclusive reporting
3) Discounts and first access to our event series
I'm already a subscriber
Sorry we didn't recognize you, please login with your email below and we'll let you get back to our exclusive content.
Our goal is to bring you high quality content ad-free, all we ask is your email so we can keep you up to date.
I'm already a subscriber