ACID properties in Database Systems

What is ACID?

ACID is an acronym of the following words:

1. Atomicity

2. Consistency

3. Isolation

4. Durability.

If we have a certain operation in a database that satisfies these properties, we can call this operation “transaction”.

Ok, that’s nice, but these words are just words without an explanation. Let’s dig deeper into each of them!

Transactions

A quick note about transactions. A transaction is a collection of instructions. All transactions must follow ACID properties in order to guarantee database validity. I’ll use banking transactions between customers below to illustrate the 4 properties.

Atomicity

Atomicity ensures that each transaction is treated as a single unit that follows the “all or nothing” rule; a transaction either happens completely or not at all. If any part of the transaction fails, the entire transaction will fail.

For example, customer A wants to withdraw $100 from their account and then deposit it into customer B’s account. If any of the instructions fail (insufficient funds or server crashes, etc.), the transaction will fail and any changes will be rolled back.

Consistency

Consistency guarantees that a transaction must be valid before it gets written to the database. A transaction is valid when it follows all of the defined rules, including constraints, cascades, triggers, and any combination of these. Inconsistent transactions will result in the database being rolled back to a previous state that complies with the given database rules.

If customer A wants to withdraw $100 from their account but only has $50 in their account, consistency prevents them from withdrawing the funds and the transaction will be aborted.

Isolation

As transactions are typically executed at the same time (i.e. multiple transactions that read and write to a table at the same time), isolation sees that each transaction acts as an individual transaction and would receive the same state if the transactions were executed sequentially. Isolation is important for concurrency control.

If customer C has an account balance of $1000 that both customers A and B can make withdrawals from (say $50 and $100, respectively), one of the customers will have to wait until the other customer transaction is finished, in order to avoid inconsistent data.

Durability

Durability ensures that once a transaction has been completed and committed, it will remain committed even in the event of a system failure (crash or power failure). These completed transactions are stored in non-volatile memory.

If customer A successfully deposited $500 in their account, this transaction should not disappear if any system failure occurs.

As I mentioned above, a majority of SQL/relational databases are ACID compliant. NoSQL databases tend to sacrifice ACID compliance for faster performance. There is strong data reliability and a guarantee that the transactions are safe.

There may be clear-cut scenarios where an ACID-compliant database is the best solution, such as e-commerce, healthcare, financial services apps (when data integrity is of the utmost concern). Yet, there are other times where you may want to take other factors into consideration in deciding between a SQL or NoSQL database.

All the things that can go wrong with your transactions

Data Scientists worry about long analytical queries and warehousing, but for developers, databases are all about transactions. A database transaction is a series of logically grouped database operations: insert a row here, update a record there, and more stuff like that. Your application code is constantly making transactions every time you sign up a new user or that user updates their account information.

The thing about transactions, though, is that they can go very wrong. Any number of things can happen when you’re trying to write to your database: you can lose connection to a remote instance, you can encounter value errors or anything else under the sun. You’ve seen it, you’ve dealt with it, and it can mean disaster for your underlying data. Let’s take a look at a quick example that a company like Amazon might run into:

User updates order quantity and clicks “order now” →
- Update order quantity in the pending orders table
- Add row to orders table
- Apply the purchase to user’s balance / charge credit card

If something goes wrong in the middle of this group of operations but the system continues executing them, the user will get charged the wrong amount. And if the charge doesn’t work, they’ll get their order for free. It turns out that these kinds of data errors have names, and there are a bunch of them. A few examples:

Dirty Reads

If a transaction is in the middle of updating some data and hasn’t committed yet, and another transaction is allowed to read that uncommitted data, that’s dirty, and could lead to your app showing incorrect data that got rolled back.

An example of a dirty read could be a transaction that invalidates login tokens when a user changes their password. If as the first transaction loads the token, a second one reads that token before the first invalidates it, you’d have yourself a dirty read.

In terms of actual SQL, here’s what a dirty read might look like:

### Transaction 1 ###
SELECT user_login_token_id
FROM tokens
UPDATE tokens
SET token_status = "INVALID"
WHERE token_id = user_login_token_id
### Transaction 2 ###
SELECT user_login_token_id
FROM tokens

Non-Repeatable Reads If you’ve got two consecutive reads in one transaction with a concurrent update in between, those reads are going to show different results even though they’re part of the same transaction.

An example might be two writers working on a blog. Our first user starts a transaction that reads a post’s title, writes to the post, and then reads that post’s title again. If a second user changes that post’s title in the middle of the first user’s transaction, the first user is going to see different values for the title across the two reads; or in other words, a non-repeatable read.

Here’s what a non-repeatable read might look like in SQL:

### Transaction 1 ###
SELECT post_title
FROM posts
SELECT
    post_title,
    post_content
FROM posts
### Transaction 2 ###
UPDATE posts
SET post_title = "something_new"
WHERE post_title = post_title

Phantom Reads

If a transaction reads data and then a concurrent transaction inserts data that would have been read in the original transaction, that’s a phantom read.

Let’s use the same example as a non-repeatable read: if our second user adds content in between our first user’s two reads, the first read will be missing data that appears in the second read (this is actually really similar to a non-repeatable read, which is why the same example works).

In SQL, a phantom read might look like this:

### Transaction 1 ###
SELECT post_title
FROM posts
SELECT
    post_title,
    post_content
FROM posts
### Transaction 2 ###
INSERT INTO posts
VALUES "something_new", ...

These are the 3 transactional errors as defined by the SQL standard — the big three, you might say. A lot of these errors sound like one another and tend to overlap in practice, so don’t sweat the details.

ACID concepts in NoSQL and distributed systems

ACID was a building block of the reliable relational DBs we’re familiar with today, but NoSQL has changed the game: many NoSQL DBs are built as distributed systems, and they can’t always ensure complete transactional consistency. There’s actually a theory that governs this — called CAP theorem — that in distributed systems, you can never have full consistency and full availability; you need to choose one. So what does ACID look like for something like MongoDB or Cassandra?

A new quasi-standard has emerged for NoSQL databases called BASE (a rare overlap between SQL and chemistry jokes), and it’s a weak or soft consistency model that relaxes some of the assumptions of ACID in order to achieve scalability:

-Basic Availability: the database basically works most of the time, even though it’s not perfect

-Soft-State: nodes of the database aren’t necessarily consistent with each other all the time

-Eventual Consistency: data will be consistent across nodes eventually, like by reading time

In a lot of ways, this is the exact opposite of ACID, and prioritizes availability over perfect consistency; but that’s kind of the point of NoSQL in the first place. As NoSQL becomes a more established part of app development (more than 25% of developers are using MongoDB already), expect more advancements on this front.

Today, we’re dealing with more data than we ever have, and we’re building database systems that can scale and handle that load. ACID might not be the exact transactional standard for the future, but its building blocks will certainly make their way in there.

What are the ACID properties in Database Systems?