parallax circle item parallax circle item parallax circle item
Insights & Blogs

Architecting a robust, scalable system

Grey building

One of the key architectural decisions that needs to be made when designing a transactional system is the trade-off between concurrency and accuracy. The decision is not always straightforward, yet it plays a key part in the success or otherwise of the system in the eyes of the stakeholders and key users. Too little accuracy and people won’t trust the system. Too little concurrency and the system will eventually grind to a halt.

In a database, accuracy is maintained by locking, a process whereby only one user can interact with a piece of data at a time. By preventing multiple people from interacting with the same piece of data we can be sure that we are not going to end up with inconsistent data. Take a bank account balance update as a simple example:

 

Opening balance Transaction amount Closing balance
$250.00 -$50.00 $200.00
$200.00 -$20.00 $180

 

The closing balance is calculated based on the opening balance and transaction amount. If two transactions occurred at the exact same time and both used the same opening balance then the closing balance after the second transaction would be incorrect. Thus it is of utmost importance that only one transaction alters the balance of the bank account at one time. By locking the account balance the first transaction prevents the second transaction from accessing the data until the first transaction has completely finished, resulting in the closing balance being correct at all times.

There is however a price to pay with locking and that is concurrency. As the volume and complexity of transactions occurring in the system grows, the chance that two transactions requiring the same data also grows, increasing the likelihood that blocking will occur. Blocking is bad from a user perspective as it introduces tangible delays that can grow to the point where the system becomes unresponsive to users. Blocking can be removed by telling the system to ignore any locks when reading data, but this increases the risk of the system returning incorrect or inconsistent data. In technical terms, telling the system whether to respect or ignore locks is known as setting the isolation level.

The question a system architect asks is how important that any given piece of data in the system is accurate. Undoubtedly, when recording transactions against a bank account you’ll want to use locking to ensure that everything is 100% accurate. But how about when reading that data? If an account holder went to an ATM and requested their account balance while these two transactions were being processed the system architect would have to decide whether to:

  1. Force the user to wait until all transactions have been processed so that they get the correct closing balance.
  2. Ignore any locks and immediately show the user the last known closing balance, with the knowledge that the balance shown could be incorrect.

It is this balancing act between accuracy and concurrency that can make the difference between a system that is a success and one that is deemed a failure. Determining which isolation level to use within a system comes down to a combination of experience and understanding what the system aims to do. Different isolation levels can be applied to different screens and processes within the same system so with correct planning it is possible to design a system that is both accurate where it matters while still allowing a degree of concurrency that will allow the system to remain usable even as it scales.

Typically when determining an isolation level for a given screen or process the architect will ask:

  • How important is it that the data for this process is accurate? In a typical OLTP system any transactional processes will use locks in order to ensure data is accurate
  • If I use inaccurate data in this process can the system heal itself later on? If a process was continually reading data from an OLTP system to populate a data warehouse without using locks, even if it retrieved some incomplete data right now the data warehouse would eventually heal itself by retrieving the remaining data at a later time.
  • How likely is it that the data will change? Data that changes infrequently (such as a Customer’s address) is something that could be read without locks without sacrificing accuracy, whereas something that changes frequently (such as a Customer’s account balance) should be read with locks to prevent the system from making incorrect decisions.

There are many variables that come into play when designing a transactional system that is both robust and scalable. In order to get the best results the system architect needs a technical understanding of how the database handles concurrent requests and also a strong understanding of each business process so a judgement on an appropriate isolation level can be made.

Follow us for the latest insights

More Articles