In recent years blockchain became another buzzword sharing the fame with BitCoin. Blockchain solves certain technical problems, and could be implemented quite easily. In this blog post I will dive deeper into what blockchain is and how it works by going through a little technical exercise I’ve created.
Blockchain
Blockchain as the name suggests is a chain of blocks. Each block is made of two parts – data and a hash of the data, so we could verify the data hasn’t been modified. To connect blocks together, there is a hash of a previous block included into the data part. Meaning, each hash part also hashes a hash of a previous block, which recursively leads to the first block. This way blocks make a chain where effectively each hash is responsible for all the previous hashes, hence the whole chain can be verified. See the diagram below for the illustrated example.
Exercise
I’ve built a simple car sales registry (ledger) to keep track of all sales records. You can list all the cars, and sales transactions for each of them. The application allows adding new records but doesn’t allow any modifications.
Tech stack: ASP.NET Core 2.1 web application with Entity Framework Core 2.1 using MS SQL database for persistence. For the sake of simplicity, DB is re-created and re-populated with the same sample data each time the application is restarted, similar like in this post.
The source code is available on my GitHub.
Blockchain implementation
Whenever the individual car details are loaded for display, or a new transaction is attempted to be added, the full history of a car (our blockchain) is verified. If the chain is invalid, an error is displayed.
See Blockchain folder in the source code solution for the blockchain implementation.
- BlockChainHelper — performs verification of the blockchain.
- BlockHelper — concatenates all block fields in the consistent fashion to always get the same hash, e.g. date-to-string conversion could depend on the local machine settings.
- HashHelper — calculates SHA256 hash of a given string.
See CarSalesEntry class below — it’s the actual block in my blockchain.
public class CarSalesEntry : IBlock
{
[Key]
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public int Id { get; set; }
[ForeignKey("CarEntryId")]
public int CarEntryId { get; set; }
[ForeignKey("Id")]
public int? PreviousId { get; set; }
[Required]
public string CarNumber { get; set; }
[Required]
public decimal Price { get; set; }
[Required]
[DataType(DataType.Date)]
public DateTimeOffset TransactionDate { get; set; }
[Required]
[ValidateNever]
public string Hash { get; set; }
[NotMapped]
public bool IsValid { get; set; }
public virtual CarSalesEntry Previous { get; set; }
}
The CarSalesEntry class implements a marker interface IBlock to indicate that it’s a block. The class has a Hash property to store the hash of the current block, and PreviousId which refers to the previous block if available, so we can verify it’s hash as well. The actual orchestration of the hash creation is done in the CarSalesRepository, code below.
// ...
string previousBlockHash = null;
if (carSalesEntries.Any())
{
var previousCarSalesEntry = carSalesEntries.Last();
carSalesEntry.PreviousId = previousCarSalesEntry.Id;
previousBlockHash = previousCarSalesEntry.Hash;
}
var blockText = BlockHelper.ConcatData(carSalesEntry.CarEntryId, carSalesEntry.CarNumber,
carSalesEntry.Price, carSalesEntry.TransactionDate, previousBlockHash);
carSalesEntry.Hash = HashHelper.Hash(blockText);
// ..
Firstly, I check if there is a previous block, since if it’s the beginning of the chain then there is no one. Otherwise, I take the last block (sorted by SQL PK), and get it’s hash. Secondly, the hash of the previous block can now be included into the new block’s data (concatenated with other fields), and finally hashed. Having the data (multiple fields in my case) and the hash is the ultimate goal of forming the new block.
Tampering data
Even though the application doesn’t expose any UI for editing or deleting transactions, however let’s look deeper. I’ve used MS SQL for the persistence, and all the data stored in MS SQL is editable, the relational database technology allows editing data whenever you have appropriate permissions.
For example, if you were an admin on the specific SQL server, you could alter the data as in screenshots below.
Now try to refresh the car details page, and the error will indicate that the blockchain verification failed! Yay, the blockchain is protecting us!
The blockchain works! But lets see if it’s possible to bypass the data verification.
Bypass the verification
As mentioned above, MS SQL allows data modifications by nature. Meaning, we could attempt to delete data (and potentially re-add modified data if recalculated hashes).
The first attempt failed due to ForeignKey (FK) constraint violation. MS SQL is protecting us! :) But try to revert the price to the previous valid one, and delete the most recent transaction…
Voilla!! The transaction is gone, and if you refreshed the UI page, the blockchain would be valid — see screenshot below. Scary?!
This happens, since each block contains a hash of a previous block only. Meaning you cannot have a hash of a block that still doesn’t exist.
Persistence consideration
MS SQL is not ideal for blockchain, since you need to carefully manage permissions and access to the server. An append only persistence technology like EventStore would be better, even though you can technically delete the whole events stream.
You might think, if an underlying system guarantees that all the data is read-only, why do you need a blockchain at all? Hashing and verifying every time you make a write operation can add quite some overhead. The answer could be that even if the persistence technology guarantees data to be read-only, you might still need to prove this for auditors (remember deleting the whole EventStore events stream!). In such cases blockchain could become very handy.
Conclusions
Using MS SQL for this Blockchain exercise helped to illustrate data tampering effects easily, however it could cause more problems than benefits if used in production, since all data is editable by default.
Blockchain technology is relatively simple to implement as POC (see the full source code on my GitHub), however to make it production-ready to handle large amounts of data and meet scale-ability requirements might be a bit of challenge. Unless there is a good reason, use a fully managed service like AWS QLDB that solves technical challenges for you.
Blockchain trap
Be careful when signing up for a blockchain-based technology for persistence to design a new solution, since you might find out that some data should have been kept encrypted for example for PCI compliance, or some data should be delete-able, e.g. for GDPR compliance.