Retry Pattern

A resilience pattern that automatically retries a failed operation a configurable number of times before propagating the failure to the caller. Used to handle transient faults — temporary failures that are likely to succeed on a subsequent attempt.

When to Apply

  • Transient network errors or timeouts
  • Temporary unavailability of a downstream service
  • Rate-limiting responses (e.g., HTTP 429) that will resolve after a short wait

Do not retry non-transient errors such as authentication failures, invalid request formats, or business logic rejections. Retrying those wastes resources without any chance of success.

Retry Strategies

StrategyDescription
Fixed delayWait a constant interval between each attempt
Exponential backoffDouble the wait time after each failure (e.g., 1s, 2s, 4s, 8s)
JitterAdd randomness to the backoff delay to avoid thundering herd across many callers
ImmediateRetry instantly (only for very short transients)

Exponential backoff with jitter is the recommended default for most distributed systems.

Limits

Always set a maximum retry count and a total timeout. Unbounded retries can amplify load on an already-struggling downstream system. Combine with the Circuit Breaker Pattern to stop retrying once a service is confirmed to be unhealthy.

Idempotency Requirement

Retried operations must be idempotent — repeating the call should produce the same result as calling it once. See Idempotency.