Designing Distributed Systems: 8 Transactional Saga Patterns

11 minute read

Coupling

→ the degree of interdependence between software modules and components.

There are two types: static and dynamic.

Static Coupling

→ fixed, code/deployment-level dependencies.

Example: a shared library, or direct code references — regardless of whether services are actively communicating.

Dynamic Coupling

→ runtime dependencies.

Example: when Service A calls Service B synchronously, A becomes temporarily coupled to B’s:

Availability — if B is down, A fails too
Performance — if B is slow, A is slow
Error behaviour — B’s errors propagate directly to A

Transactional Coupling

→ data processing dependencies.

Example: a customer request involves an update to 3 tables in a database. For this update to be atomic, all 3 transaction updates should happen or nothing at all.

Contract Coupling

→ data transfer level: data exchange between microservices should agree on the data format being exchange (JSON schema, XML schema, RPC, etc).

More details on Managing Contracts section below.

Temporal Coupling

→ availability of the interacting microservices.

Example: microservice A requires microservice B to be available at a specific time to perform some functionality.

The Three Primal Dynamic-Coupling Forces

Every distributed system is shaped by three fundamental tensions:

Spectrum of primal dynamic coupling forces

Each of these forces is acting in their own spectrum axis which combine to form the 8 saga patterns covered later in Section 4.

Dynamic coupling forces axes

Force	Options	Impact on Distributed System Characteristics
Communication	Synchronous ↔ Asynchronous	★★
Consistency	Atomic ↔ Eventual	★★★
Coordination	Orchestration ↔ Choreography	★

The impacts of each of these driving forces will become apparent when we compare 8 different types of sagas based on the permutation of the options.

1. Communication

Synchronous vs. Asynchronous

Synchronous	Asynchronous
Tasks run in order	Tasks run independently
Often blocking the thread	Using callbacks for completion

Blocking vs. Non-blocking

Subtle difference between synchronous/asynchronous and blocking/non-blocking:

Synchronous/Asynchronous	Blocking/Non-blocking
On a system level	On a thread level
Relates to when a task finishes and how the result is handled (flow control)	Relates to whether the thread is suspended while waiting for a task (resource state)

Examples

Synchronous & Blocking (Standard): You call a function, and your code stops (blocks) and waits for the result before moving to the next line. Example: Reading a large file from disk; the program pauses until the file is fully loaded.
Synchronous & Non-blocking: You start a task and immediately get control back (non-blocking), but you must manually check back (polling) until the task is complete. Example: Starting a download, then checking the progress bar every few seconds, but you still wait for that specific download to finish to proceed.
Asynchronous & Blocking (Rare/Inefficient): The task runs in the background (asynchronous), but you still stop and wait for it to finish. Example: Starting a background task, but immediately sitting idle waiting for the “done” notification.
Asynchronous & Non-blocking (Modern): You start a task and immediately move on to other work (non-blocking). When the task finishes, you are notified (asynchronous). Example: Sending an API request with a callback function; the main thread continues executing other code and handles the result only when the request completes.

Sample of sync/async blocking/non-blocking combinations

Architectural Quantum

An architecture quantum establishes the scope for a set of architectural characteristics.

Characteristics of a quantum:

Independent deployment
High functional cohesion
Low external-implementation static coupling
Synchronous communication with other quanta

Synchronous calls create “dynamic quantum entanglement” — they couple the operational characteristics of separate services. A slow or unavailable downstream service degrades the entire call chain, i.e. the weakest characteristic of a service becomes the characteristic of the system.

**Your system is only as good as the least characteristics **

Asynchronous calls, on the other hand, retains the architecture quantum of individual services in a system.

Example:

Architectural quanta: sample case

In the above picture, the microservices on the right side form 1 architectural quanta because of the synchronous communication among the services.

When Portfolio Management System communicates with Trade Order Orchestrator, 2 things can happen:

If they communicate synchronously, they form into 1 architectural quantum => architectural characteristics drop to the lower level in the workflow.

2 architectural quanta becomes 1 architectural quantum

If they communicate asynchronously, they stay separated as 2 architectural quanta => architectural characteristics are retained.

Architectural quanta are retained

So how do we determine the communication type should be synchronous or asynchronous?

It depends on whether the service needs to wait for the response from another service. If so, then it’s synchronous.

Also for simplicity, default to synchronous communication. Synchronous → simpler

Trade-off Summary

	Synchronous	Asynchronous
Easy to reason about	✅
Mimics non-distributed calls	✅
Easier to implement	✅
Highly decoupled		✅
High performance and scale		✅
Common performance tuning technique		✅
Performance impact on interactive systems	⚠️
Creates dynamic quantum entanglements	⚠️
Complex to build and debug		⚠️
Difficult for transactional behaviour		⚠️
Complex error handling		⚠️

Managing Contracts

When a service calls a method in another service, it’s a contract between the 2 services of what the request and response formats should be like.

Strict vs. Loose Contracts

Contract spectrum

	Tight (Strict)	Loose
Guaranteed contract fidelity	✅
Build-time validation	✅
Better for decoupled architectures		✅
Easier to evolve		✅
Decouples integration from implementation platform		✅
Brittle integration points	⚠️
Requires versioning	⚠️
Less certainty; needs fitness functions		⚠️
Requires developer discipline		⚠️

Principle: Transfer values, not types. Value-based (loose) contracts decouple consumers from implementation changes.

Contract Fitness Functions

Consumer-Driven Contracts (CDCs) can automate contract verification.

The Pact framework (docs.pact.io) supports this for REST and event-driven architectures, with CI/CD integration.

Event Payload Design

Should an event carry the full data payload or just key identifiers?

Trade-off Dimension	Full Payload	Key-Only Payload
Scalability & performance	✅ Good	⚠️ Poor
Contract management & versioning	⚠️ Complex	✅ Simple
Single system of record	⚠️ Multiple	✅ Single
Stamp coupling & bandwidth	⚠️ High	✅ Low

Choose Key-Only Payload if scalability and performance are not the priority

Stamp coupling example: A customer profile returning 500 KB payloads at 2,000 req/s uses ~1,000,000 KB/s of bandwidth. Returning only required fields (~200 bytes) reduces this to ~400 KB/s — a 2,500× reduction.

Looser contracts create less brittle software architectures.

2. Coordination

Orchestration vs. Choreography

Orchestration

generally one orchestrator per major workflow
orchestrator owns state and communicates update points to services

Orchestration

Choreography

services react to events
no central controller
the workflow emerges from the chain of events

Choreography

Hybrids

an orchestrated outer workflow can delegate to choreographed sub-workflows

Hybrids

Summary

	Orchestration	Choreography
State owner	Central orchestrator	Distributed across services
Workflow control	Centralised → tighter coupling	Emergent / event-driven → loose coupling
Error handling	✅ Easier — one place	⚠️ Harder — spread across services
Responsiveness	Moderate	✅ High
Scalability / throughput	Moderate	✅ High
Fault tolerance	✅ Good	✅ Better
Recoverability	✅ Good	⚠️ Difficult
Workflow / state management	✅ Good	⚠️ Complex

3. Consistency

ACID Transactions

Atomicity — all steps in a transaction either fully succeed or fully roll back together; there is no partial completion.
Consistency — a transaction always moves the database from one valid state to another, never leaving data in a corrupt or rule-violating state.
Isolation — concurrent transactions are invisible to each other until committed, so one transaction cannot see another’s in-progress changes.
Durability — once a transaction is committed, the data is permanently saved and survives any subsequent system failure.

In a distributed context, standard ACID properties break down:

Property	Problem in Distributed Systems
Atomicity	Each service commits/rolls back independently — a failure mid-chain leaves partial state
Consistency	An error in one service (e.g. inventory) causes data inconsistency across others
Isolation	Inserted data may be visible to other services before the overall transaction completes
Durability	Data is only made permanent at the service level, not the transaction level

BASE Transactions

As an alternative, BASE (Basically Available, Soft state, Eventually consistent) is the natural fit for distributed, decoupled systems.

Eventual Consistency Patterns

Sample problem: Deleting a customer in one service must propagate to dependent services (wish list, preferences, etc.).

Sample problem: deleting customer 123 within Distributed Systems - 1 service done, 2 other still have customer 123 in their DB

3 patterns for propagating state changes from the sample above:

Background Synchronisation → a background job reads across databases to sync state.
⚠️ Creates direct database coupling between services.
Event-Based Data Synchronisation → the originating service fires an event; consumers react.
✅ Decoupled. Standard Event-Driven Architecture pattern.
Workflow Event Pattern → a dedicated workflow processor mediates between the event producer and consumers, handling ordering and retries.

Compensating Updates

When a step fails partway through a distributed transaction, previously committed changes must be compensated (rolled back manually).

Fallacies of compensating updates:

The compensating update itself may fail
Side effects may have already occurred (e.g. an email sent, a charge processed)
State management becomes complex

Mitigation (by state management): Marking data as WIP (Work-In-Progress) vs PLACED allows services to avoid querying data not yet in a finalisable state, reducing inconsistency windows.

When a system incorporates process for compensating updates, execution logs is crucial for traceability.

4. Transactional Sagas

Transactional Sagas

8 Saga patterns can be formed by combining the 3 coupling forces:

Saga	Consistency	Coordination	Communication
Epic Saga	Atomic	Orchestration	Sync
Fantasy Fiction Saga	Atomic	Orchestration	Async
Phone Tag Saga	Atomic	Choreography	Sync
Horror Story Saga	Atomic	Choreography	Async
Fairy Tale Saga	Eventual	Orchestration	Sync
Parallel Saga	Eventual	Orchestration	Async
Time Travel Saga	Eventual	Choreography	Sync
Anthology Saga	Eventual	Choreography	Async

Saga Descriptions

🗡️ Epic Saga — “A long-running, heroic story”

Epic Saga star rating

Mimics a non-distributed transactional interaction
Holistic transactional coordination increases coupling
Easy to understand; difficult to implement

Use when: Each step must complete before the next starts; absolute transactionality matters more than responsiveness.

📖 Fantasy Fiction Saga — “A complex story that’s hard to believe in the end”

Fantasy Fiction Saga star rating

Moving to async communication improves performance/responsiveness…
…but introduces concurrency issues that may be worse than the original performance problems

Use when: A first attempt at improving an Epic Saga; responsiveness matters more than strict ordering.

🏡 Fairy Tale Saga — “An easy story with a pleasant ending”

Fairy Tale Saga star rating

Sync + orchestrated = easiest to reason about
The Orchestrated Saga in Chris Richardson’s Microservices Patterns book

Use when: Medium to complex workflows that don’t need extreme scale. Default choice for most situations.

⚡ Parallel Saga — “Multiple stories running at the same time”

Parallel Saga star rating

Orchestrator allows complex workflows with concurrency
Highly attractive for complex workflows at high scale
Difficult if ordering of updates matters

📞 Phone Tag Saga — “Like the game of phone tag”

Phone Tag Saga star rating

An unusual combination: atomic + choreography
Adds scalability to a simple transactional workflow when the orchestrator becomes a bottleneck
Not common in practice

😱 Horror Story Saga — “A nightmare”

Horror Story Saga star rating

Attempts atomic workflows without a coordinator, with concurrency on top
Workflow, error handling, boundary conditions, and transactionality are spread across domain services
Middling performance coupled with impossible-to-reproduce errors
Actually an anti-pattern

⚠️ Not uncommon in practice. Usually a well-intentioned but flawed attempt to achieve high performance with atomicity.

⏳ Time Travel Saga — “A problem that moves atomically through time”

Time Travel Saga star rating

No orchestrator makes complex workflows difficult
Best for pipeline problems (chain-of-responsibility, pipes-and-filters) with staging or additive workflows
Works best when synchronous communication is acceptable

📚 Anthology Saga — “A loosely associated group of short stories”

Anthology Saga star rating

The Choreographed Saga in Chris Richardson’s Microservices Patterns book
Polar opposite of the Epic Saga
Best for non-transactional pipes-and-filters architecture styles
Highly scalable due to lack of coupling - the least coupled option

Quantitative Trade-off Summary

Dynamic coupling forces axes

Mixing the properties of the 3 axes of dynamic coupling forces, Consistency choice has the greatest impact on saga quality, followed by Coordination style then Communication mode.

Consistency → Coordination → Communication

Axis	Impact on Decoupling / Scalability
Consistency: Atomic → Eventual	+163% improvement
Coordination: Orchestration → Choreography	+143% improvement
Communcation: Sync → Async	+55% improvement

Saga Profiles

Below is the 8 transactional sagas score cards:

Saga profiles

The table above gives a simplified guidance on the system architecture we should have to achieve the characteristic priorities implied.

For example, to have a distributed system with low response latency, the table shows below:

Choice of responsive distributed system architecture

Now the option is just either we want orchestrated or choreographed system. If scalability is the next priority, then anthology saga is the pattern we should follow. If simplicity is, parallel saga is the pattern.

Also from the table we can see that what drives an orchestration system down on scalability and responsiveness is not the orchestration itself but it’s the atomic transactions. Once we tweak the transactions to be eventual consistent, we should expect improvements in responsiveness and scalability of the system.

Atomic vs eventual orchestrated system for better responsiveness and scalability

Transactional Sharding

For very high-volume scenarios, domain sharding distributes load across multiple Saga instances for better performance/responsiveness.

Example: a concert ticket sale spike can be handled by sharding by seat section or venue zone, avoiding a single transaction bottleneck.

Example - transactional sharding by seat section for high volume concert

Domain sharding maps naturally onto microservices: services can be sharded by geography or specialisation, allowing concurrent saga instances without contention.

Example - transactional sharding by geographical location

Key Principles Summary

Principle	Implication
Synchronous calls create dynamic quantum entanglement	Default to async where responsiveness and scale matter
Looser contracts = less brittle software	Prefer value-based contracts in microservices
Stamp coupling wastes bandwidth	Send only the data a consumer needs
ACID breaks in distributed systems	Accept eventual consistency where business rules allow
Compensating updates have fallacies	Use state management to reduce inconsistency windows
Fairy Tale Saga is the default	Use it for most medium-complexity workflows
Horror Story Saga is an antipattern	Avoid atomic + choreography + async
Anthology Saga is the most scalable	Use for pipelines that don’t require transactionality

Share on

X Facebook LinkedIn Bluesky

Rodney Taylor

Coupling

Static Coupling

Dynamic Coupling

Transactional Coupling

Contract Coupling

Temporal Coupling

The Three Primal Dynamic-Coupling Forces

1. Communication

Synchronous vs. Asynchronous

Blocking vs. Non-blocking

Examples

Architectural Quantum

Trade-off Summary

Managing Contracts

Strict vs. Loose Contracts

Contract Fitness Functions

Event Payload Design

2. Coordination

Orchestration vs. Choreography

Orchestration

Choreography

Hybrids

Summary

3. Consistency

ACID Transactions

BASE Transactions

Eventual Consistency Patterns

Compensating Updates

4. Transactional Sagas

Saga Descriptions

🗡️ Epic Saga — “A long-running, heroic story”

📖 Fantasy Fiction Saga — “A complex story that’s hard to believe in the end”

🏡 Fairy Tale Saga — “An easy story with a pleasant ending”

⚡ Parallel Saga — “Multiple stories running at the same time”

📞 Phone Tag Saga — “Like the game of phone tag”

😱 Horror Story Saga — “A nightmare”

⏳ Time Travel Saga — “A problem that moves atomically through time”

📚 Anthology Saga — “A loosely associated group of short stories”

Quantitative Trade-off Summary

Saga Profiles

Transactional Sharding

Key Principles Summary

Share on

You May Also Enjoy

Microservice Communication Styles and Patterns

Architecture Patterns and Anti-Patterns

Java 25 – New Features Overview

Hexagonal Architecture in Java: A Simple Implementation with Multiple Ports