Choosing NoSQL Storage with CAP and Eventual Consistency

NoSQL is not just a database model. It is an umbrella for several storage approaches that trade some relational database behavior for flexibility, scale, or performance.

That tradeoff is the important part. A NoSQL database can be the right choice for a distributed Java system, but only when the team understands what it is giving up and what it is gaining.

The most useful starting point is the CAP theorem.

The Problem

A relational database is a strong choice when the application needs structured data and ACID transactions. But many modern systems have different needs.

They may need to:

Store documents with variable fields.
Scale horizontally across several nodes.
Process very high read or write volume.
Represent graph relationships naturally.
Store sparse or wide data.
Favor throughput over strict transactionality.

A traditional relational model can be forced into these scenarios, but it may become expensive, rigid, or difficult to scale.

NoSQL systems offer alternatives, but they usually come with different consistency and reliability behavior.

Core Idea

A distributed data store runs across multiple processes or nodes. Those nodes communicate over a network. When the network works, many designs look simple. The hard part appears when the network is partitioned, and some nodes cannot communicate.

Normal state:
Node A <-> Node B <-> Node C

Partitioned state:
Node A <-> Node B     Node C
        no communication

The CAP theorem describes the tension between three characteristics:

Characteristic	Meaning
Consistency	Data remains complete, updated, and formally correct
Availability	Reads and writes remain accessible
Partition tolerance	The system continues behaving correctly during network failure

When a partition happens, the system must choose between preserving consistency and preserving availability.

If the system preserves availability, both sides of the split may keep accepting writes. That can produce conflicting data.

If the system preserves consistency, one side may stop accepting writes or enter a degraded mode to avoid conflict.

Why CAP Matters in Architecture

CAP is not only a theory. It explains why distributed storage behaves differently from one-node storage.

In enterprise systems, teams can reduce the probability of network splits with redundant connections and careful infrastructure. But a distributed store still needs a strategy for what happens when communication fails.

Some systems elect a primary partition. The larger surviving part of the cluster may continue operating while the smaller part shuts down or becomes read-only.

Network split:
Partition A: 3 nodes -> keeps accepting writes
Partition B: 2 nodes -> read-only or stopped

This avoids conflicting writes from both sides, but it sacrifices availability for the minority partition.

The key point is that the database choice affects application behavior. Developers need to know whether the system can return stale data, reject writes, or temporarily diverge.

Eventual Consistency

Many NoSQL systems use eventual consistency.

Eventual consistency means the system may have short periods where different nodes return different answers for the same data. Over time, the system converges.

Write happens on node A
  |
  v
Node A has new value
Node B still has old value
Node C still has old value
  |
  v
Replication completes
  |
  v
All nodes have new value

This can be perfectly acceptable for some use cases. For example, a profile cache or analytics counter may tolerate temporary differences.

It is more dangerous for use cases such as payments, balances, inventory reservation, or legal records, where users expect strong consistency.

The chapter makes a practical point: if transactionality and data consistency are the primary requirements, a relational database is usually the better choice.

NoSQL Categories

NoSQL databases are not interchangeable. The chapter describes several categories, each with different strengths.

Key Value Stores

Key-value stores are the simplest category. They use direct key-based access and are often designed for performance and scalability.

key: session:123
value: serialized session state

They are useful for caching, session data, and fast lookup by known key. They are weak when the application needs complex searches over value content.

Document Stores

Document stores save documents instead of rows. Documents are often serialized as JSON or XML.

This makes them useful when records do not all share the same fixed set of fields.

{
  "paymentId": "pay-100",
  "sender": "user-1",
  "recipient": "user-2",
  "amount": 30.5,
  "metadata": {
    "channel": "mobile",
    "campaign": "spring"
  }
}

Document stores can be searched by document content. The chapter mentions MongoDB, Couchbase, and Elasticsearch as examples.

Use document storage when a flexible structure matters and the document itself is a meaningful unit.

Graph Databases

Graph databases model data as graph concepts such as nodes and links. They are useful where relationships are the main problem.

Examples include roads, links, social relationships, and algorithms such as the shortest path.

Customer A -- knows --> Customer B
Customer B -- paid --> Merchant C
Customer A -- visited --> Merchant C

The chapter mentions Neo4j as a well-known graph database implementation.

Use graph databases when relationship traversal is central, not just when data has relationships.

Wide Column Databases

Wide-column stores are similar to relational tables in appearance, but each row can have a different set of fields in name, number, and type.

The chapter mentions Apache Cassandra and Apache Accumulo as examples.

This can be useful when data is sparse, distributed, or shaped differently across rows.

Choosing NoSQL Safely

The safest way to choose NoSQL is to start with the access pattern.

Ask what the application does most often:

Known key lookup?
  consider key value

Flexible JSON-like records?
  consider document store

Relationship traversal?
  consider graph database

Sparse or distributed wide records?
  consider wide-column store

Strong ACID transactions?
  prefer relational database

Do not choose NoSQL only because it is modern. Choose it because the shape of the data and the behavior of the system fit the model.

Testing Distributed Data Behavior

A NoSQL design should be tested for failure behavior, not only normal reads and writes.

A useful test plan can include:

Write data to one node or endpoint.
Read the data from different nodes or paths.
Simulate a slow or unavailable node.
Verify whether the system rejects writes, returns stale values, or continues normally.
Check how long convergence takes.
Confirm application behavior during stale reads.
Verify that critical workflows do not depend on immediate consistency unless the store provides it.

A simple architecture note can document the expected behavior:

Customer profile store:
- Document database
- Temporary stale reads acceptable
- Profile updates may take time to appear everywhere
- Payment authorization does not depend on this store

This helps prevent accidental use of eventually consistent data in strongly consistent workflows.

Common Mistakes

The first mistake is treating NoSQL as a faster relational database. It is a different model with different guarantees.

The second mistake is ignoring CAP behavior. Network partitions may be rare, but the system still needs a defined response.

The third mistake is placing financial or transactional correctness on top of eventually consistent storage without a compensation strategy.

The fourth mistake is choosing a document store when the real problem is graph traversal, or choosing a graph database when simple key lookup is enough.

The fifth mistake is tuning a NoSQL store toward full transactionality and then expecting the same performance and availability benefits that motivated the NoSQL choice.

Checklist

The data model fits the selected NoSQL category.
Access patterns are known.
Consistency expectations are documented.
Partition behavior is understood.
Eventual consistency is acceptable for the use case.
Critical transactional workflows use a suitable storage model.
Application code handles stale reads where needed.
Query needs are supported by the chosen database type.
Operational behavior during node failure is tested.
The team understands the tradeoff, not only the API.

Conclusion

NoSQL databases are useful when the application needs storage behavior that relational databases do not naturally provide. They can improve flexibility, scalability, throughput, and distributed operation.

The cost is usually paid in transactionality, consistency, or operational complexity. Use NoSQL when the data shape and access pattern justify it. Use relational databases when strong consistency and ACID behavior are the real requirements.