All Articles
3 min read

How Distributed Systems Keep Our Apps Running

Article

How Distributed Systems Keep Our Apps Running

When you edit a Google Doc and see changes appear instantly, or tap "like" on Instagram and the counter updates, you're seeing distributed systems at work.


Why Distributed Systems?

A single server has two big problems:

  • Scalability: One machine can't handle millions of users.
  • Reliability: If it crashes, the entire app goes down.

Distributed systems solve this by spreading the load across multiple servers (nodes):

  • If one fails, others take over (availability).
  • If traffic grows, more servers can be added (scalability).

But once you spread work across nodes, you must keep them in sync. That's where complexity begins.


The CAP Theorem: You Can't Have It All

Eric Brewer's CAP theorem explains the limits of distributed systems. It states you can only fully guarantee two of three properties:

  • Consistency (C): Every user sees the same data at the same time.
  • Availability (A): The system always responds to requests.
  • Partition Tolerance (P): The system keeps working even if nodes can't talk to each other.

CAP Theorem - Consistency, Availability, Partition Tolerance


Why Partition Tolerance Is Non-Negotiable

In the real world, networks fail: cables break, servers restart, data centers lose connectivity.

That means partition tolerance (P) is unavoidable. You can't build a distributed system that assumes perfect networks.

So in practice, the trade-off is always Consistency vs. Availability during a partition.


The Trade-offs in Action

Letter Meaning Real-World Example
C Consistency Everyone sees the same data. Banks rely on this.
A Availability System responds even if some nodes are disconnected. Instagram likes can lag slightly.
P Partition Tolerance System keeps running even when network failures split nodes. Cloud storage across continents relies on this.

Understanding the Trade-offs

In practice, network partitions (P) are inevitable in distributed systems, so the real choice is between:

CP Systems: Choose consistency over availability

  • Example: Traditional databases that lock during updates
  • Use case: Financial transactions where accuracy is critical

AP Systems: Choose availability over consistency

  • Example: Social media feeds that might show slightly outdated content
  • Use case: User-facing applications where uptime matters most

Key Takeaways

  • Distributed systems solve scalability and availability challenges but introduce complexity
  • The CAP theorem forces architectural decisions based on business requirements
  • Understanding these trade-offs helps design systems that meet user expectations
  • No system is perfect — it's about choosing the right compromises for your use case

Enjoyed this post?

Found this helpful? Share the link with others!