Bulkhead Pattern

A resilience pattern that isolates elements of an application into separate resource pools so that if one component fails or becomes overloaded, the others continue to function — preventing total system failure from a single point of resource exhaustion.

Problem

When multiple services or features share the same resource pool (thread pool, connection pool, memory), an overloaded or failing component can exhaust all shared resources. This causes all other components to fail, even those that are otherwise healthy. A slow third-party service, for instance, can consume all available threads and take down the entire application.

Solution / Explanation

The pattern is named after the watertight bulkheads used in ships: separate compartments that can be individually flooded without sinking the whole vessel.

In software, a bulkhead isolates resources so that:

  • Each critical component or downstream dependency gets its own dedicated pool.
  • A failure or saturation in one pool does not bleed into others.
  • The rest of the application remains available even when one bulkhead is breached.

Implementation Approaches

Thread Pool Isolation Assign each dependency (Service A, Service B, external API) its own thread pool with a fixed size. If Service A hangs, only its thread pool fills up. The main thread pool and Service B’s pool remain available.

┌─────────────────────────────────────────────┐
│ Application                                 │
│ ┌──────────────┐ ┌──────────────┐           │
│ │ Thread Pool A│ │ Thread Pool B│ ...       │
│ │ (Service A)  │ │ (Service B)  │           │
│ └──────────────┘ └──────────────┘           │
└─────────────────────────────────────────────┘

Connection Pool Isolation Each downstream service gets its own database or HTTP connection pool. Exhaustion of connections to one database does not affect connections to others.

Process / Container Isolation The strongest form: each component runs in a separate process or container, with its own CPU, memory, and network limits. Failures are fully isolated at the OS level.

Semaphore-Based Isolation A lighter alternative to thread pools: a semaphore limits the maximum concurrent calls to a dependency. Faster but does not protect against slow calls blocking caller threads.

Bulkhead + Circuit Breaker

Bulkheads and circuit breakers are complementary:

  • Bulkhead: limits how many calls can be in-flight simultaneously.
  • Circuit breaker: limits when calls are allowed to proceed.

Together they provide comprehensive protection against slow or failing dependencies.

When to Use

  • Applications with multiple downstream service calls with different reliability characteristics.
  • Services where one consumer or tenant must not degrade the experience for others.
  • Critical components that must remain available even when non-critical ones fail.
  • High-throughput services where resource pool exhaustion is a realistic failure mode.

Not suitable when:

  • The overhead of separate pools outweighs the isolation benefit (very simple services).
  • Resources are constrained and cannot be divided (e.g., very low-memory environments).

Trade-offs

BenefitDrawback
Limits blast radius of failuresResource overhead: dedicated pools use more total resources
Critical paths remain availableThread pool sizing requires careful capacity planning
Prevents cascading resource exhaustionMore complex configuration and monitoring
Enables graceful degradationUndersized pools cause premature rejection of valid requests