Resilient C Services: Backpressure, Retries, and Idempotency

You shipped a C service. It’s fast on your laptop. Then production heat hits: a bursty client floods you, a downstream dependency hiccups, latencies spike, and suddenly your process is paging the box to death. This post is a practical guide to making C services resilient under real load—where the kernel pushes back, networks wobble, and correctness must survive retries.

We’ll build from first principles: what backpressure actually is (and isn’t), how to budget finite resources, and how to wire small, boring loops that keep making progress without melting down. Along the way we’ll set the stage for retries (without storms) and idempotency (so “try again” doesn’t duplicate work).

Resilience in one minute

Backpressure: A deliberate policy that limits intake and output to what the system can safely handle now. Bound bytes, messages, concurrency, and time. When a limit is hit, you shed, defer, or degrade.
Retries: A recovery tool, not a throughput tool. Retrying without budgets and jitter amplifies load and turns small blips into outages.
Idempotency: The property that handling the same logical request twice has the same effect as once. This is how you make retries safe.

If you only remember one thing: shape the flow with budgets, make wait loops cancelable and deadline-driven, and design handlers so the same input can be applied more than once without harm.

Failure modes you will actually see

Unlimited queues grow until the OS kills you; GC-like stalls appear as the allocator churns under pressure.
A single slow peer monopolizes buffers, starving others; tail latencies explode.
Downstream hiccups trigger retry storms—more requests arrive than can complete, so the queue climbs and never drains.
Duplicate side effects (double writes, duplicate payments) when clients retry but your server isn’t idempotent.

Backpressure: what it is and what it buys you

Backpressure is not “drop packets randomly.” It’s the disciplined application of limits with explicit behavior at the boundary. You choose:

What to bound: bytes, messages, concurrent requests, CPU time, file descriptors.
Where to bound: per-connection, per-tenant, per-queue, and globally.
What to do on overflow: shed (reject early), wait (with a deadline), or degrade (serve a cheaper path).

Done right, backpressure converts chaos into predictable failure—requests time out and surface errors instead of accumulating silently.

Budgets: the vocabulary of limits

Think in four budgets:

Byte budget: cap buffers (in/out queues) per connection and system-wide.
Message budget: cap queued messages per connection and globally.
Concurrency budget: cap in-flight requests per tenant and total.
Time budget: every operation has a deadline; when it expires, you cancel and clean up.

Budgets compose. For example, a write path might enforce a per-connection byte cap, a global byte cap, and a per-request deadline. If any is exceeded, you stop pushing and surface a clear status.

Bounded queues everywhere

Unbounded buffers are memory leaks with better marketing. Use small ring buffers with explicit capacity, and make enqueue/dequeue return status codes that let callers apply policy.

#include <stdbool.h>
#include <stddef.h>
 
#define QCAP 64
 
struct bufseg { const void *data; size_t len; };
 
struct ringq {
  struct bufseg q[QCAP];
  unsigned head, tail; // head = next to pop, tail = next to push
  unsigned count;      // number of occupied entries
};
 
static inline void rq_init(struct ringq *rq) { rq->head = rq->tail = rq->count = 0; }
 
static inline bool rq_push(struct ringq *rq, struct bufseg s) {
  if (rq->count == QCAP) return false; // full → caller must shed/defer
  rq->q[rq->tail] = s; rq->tail = (rq->tail + 1u) % QCAP; rq->count++; return true;
}
 
static inline bool rq_pop(struct ringq *rq, struct bufseg *out) {
  if (rq->count == 0) return false; // empty
  *out = rq->q[rq->head]; rq->head = (rq->head + 1u) % QCAP; rq->count--; return true;
}

Policy lives at the edges: if rq_push returns false, decide whether to drop, delay, or send backpressure upstream. Keep the queue small; large queues don’t improve throughput—only latency variance.

Treat EAGAIN as a scheduling signal

On nonblocking sockets and pipes, EAGAIN/EWOULDBLOCK is how the kernel says “not now.” Your write path should:

Try to flush from the head of the queue.
Stop on EAGAIN and enable writable notifications.
Resume when notified, with a bounded amount of work per wakeup (fairness).

#include <errno.h>
#include <sys/uio.h>
#include <unistd.h>
 
static void advance_iovecs(struct iovec **piov, int *piovcnt, size_t bytes) {
  struct iovec *iov = *piov; int cnt = *piovcnt; size_t left = bytes;
  while (cnt > 0 && left > 0) {
    if (left >= iov->iov_len) { left -= iov->iov_len; ++iov; --cnt; }
    else { iov->iov_base = (char*)iov->iov_base + left; iov->iov_len -= left; left = 0; }
  }
  *piov = iov; *piovcnt = cnt;
}
 
// Returns: 1 fully flushed; 0 need POLLOUT; -1 hard error
static int flush_budgeted(int fd, struct iovec *iov, int iovcnt, size_t max_bytes) {
  size_t sent = 0;
  while (iovcnt > 0 && sent < max_bytes) {
    if (iov[0].iov_len > max_bytes - sent) {
      struct iovec tmp = { .iov_base = iov[0].iov_base, .iov_len = max_bytes - sent };
      ssize_t w = writev(fd, &tmp, 1);
      if (w > 0) { sent += (size_t)w; advance_iovecs(&iov, &iovcnt, (size_t)w); continue; }
      if (w == -1 && (errno == EINTR)) continue;
      if (w == -1 && (errno == EAGAIN || errno == EWOULDBLOCK)) return 0;
      return -1;
    }
    ssize_t w = writev(fd, iov, iovcnt);
    if (w > 0) { sent += (size_t)w; advance_iovecs(&iov, &iovcnt, (size_t)w); continue; }
    if (w == -1 && (errno == EINTR)) continue;
    if (w == -1 && (errno == EAGAIN || errno == EWOULDBLOCK)) return 0;
    return -1;
  }
  return iovcnt == 0 ? 1 : 0;
}

This function keeps fairness by honoring a byte budget. When it returns 0, re-arm writable interest; when it returns 1, disable it.

Admission control beats backlog

When global memory or CPU crosses a threshold, stop accepting new work and fail fast. Two practical levers:

Listen socket gating: temporarily pause accepts (e.g., don’t register the listen fd for readable events) when global byte or request counters exceed a cap; resume with hysteresis.
Per-tenant caps: if a single client exceeds its byte or request budget, drop or throttle that tenant only.

“Failing fast” sounds harsh, but it’s kinder than accepting work you cannot finish. Clients see a clear signal and can back off.

Deadlines and cancellation: time is a first-class budget

Every operation should carry a deadline. Reads and writes honor it by waiting only until the deadline and then surfacing a timeout. Your event loop integrates a timer source (e.g., timerfd on Linux or EVFILT_TIMER on BSD) so long waits are cancelable and the loop stays responsive.

#include <poll.h>
#include <time.h>
 
static int ms_left(struct timespec deadline) {
  struct timespec now; clock_gettime(CLOCK_MONOTONIC, &now);
  long ms = (long)((deadline.tv_sec - now.tv_sec) * 1000)
          + (long)((deadline.tv_nsec - now.tv_nsec) / 1000000);
  if (ms < 0) return 0; if (ms > 0x3fffffff) return 0x3fffffff; return (int)ms;
}
 
// Returns 1 ready, 0 timeout, -1 error
static int wait_readable_until(int fd, struct timespec deadline) {
  struct pollfd p = { .fd = fd, .events = POLLIN };
  for (;;) {
    int r = poll(&p, 1, ms_left(deadline));
    if (r > 0) return 1; if (r == 0) return 0; if (r < 0 && errno == EINTR) continue; return -1;
  }
}

Deadlines keep the system honest: stalled peers don’t tie up memory forever.

Preface: retries and idempotency (why we care)

A retry without backoff and budgets is a DoS against yourself. If your server is slow because a downstream is struggling, unbounded retries add more load to the same hot path.
Idempotency turns retries from “maybe double-apply” into “safe to re-apply.” That means request IDs, deduplication stores, and handlers that check before doing.

We’ll dig into concrete retry strategies (jitter, hedging, caps) and idempotency techniques (keys, stores, exactly-once side effects) next. For now, the backpressure foundation lets those tools work instead of backfiring.

Retries that recover instead of overload

Retries are powerful when scoped by time and attempts, targeted to transient failures, and randomized to avoid stampedes. Treat them as a way to harvest success from noise—not as a way to add throughput.

Classify before you retry

Retryable: timeouts, EAGAIN on remote capacity, 5xx-like upstream errors, transient network resets.
Do not retry: validation failures (4xx-like), idempotency violations, permanent resource errors (e.g., permission denied).
Maybe retry once: connection establishment races, DNS lookups—only if your budget allows.

Make the classification explicit in your error model so call sites don’t guess.

Exponential backoff with jitter

Plain exponential backoff aligns clients and creates waves. Add jitter so attempts smear over time. Two common flavors work well:

Full jitter: sleep a random time uniformly in [0, backoff].
Equal jitter: sleep around half the backoff ± half (tighter distribution).

#include <stdint.h>
#include <stdlib.h>
#include <time.h>
 
static uint64_t now_ms(void) {
  struct timespec ts; clock_gettime(CLOCK_MONOTONIC, &ts);
  return (uint64_t)ts.tv_sec*1000ull + (uint64_t)ts.tv_nsec/1000000ull;
}
 
static void sleep_ms(unsigned ms) {
  struct timespec ts = { .tv_sec = ms/1000, .tv_nsec = (long)(ms%1000)*1000000L };
  nanosleep(&ts, NULL);
}
 
static unsigned exp_backoff_ms(unsigned base_ms, int attempt, unsigned max_ms) {
  // attempt starts at 1
  uint64_t d = (uint64_t)base_ms << (attempt - 1);
  if (d > max_ms) d = max_ms;
  return (unsigned)d;
}
 
static unsigned jitter_full(unsigned ms) {
  static int seeded = 0; if (!seeded) { seeded = 1; srand((unsigned)now_ms()); }
  if (ms == 0) return 0;
  return (unsigned)(rand() % (ms + 1)); // [0, ms]
}
 
static unsigned jitter_equal(unsigned ms) {
  // ~ms/2 ± ms/2
  unsigned half = ms/2u; unsigned r = jitter_full(ms - half);
  return half + r;
}

A deadline-aware retry wrapper

Bind retries to a hard deadline and a max-attempts cap. The attempt function returns whether to retry based on the error class.

enum attempt_result { ATTEMPT_OK = 0, ATTEMPT_RETRY = 1, ATTEMPT_FATAL = -1 };
 
typedef int (*attempt_fn)(void *ctx, int attempt, int ms_left);
 
// Returns 0 on success, -1 on failure (fatal or deadline reached)
int retry_with_backoff(void *ctx,
                       attempt_fn fn,
                       int max_attempts,
                       unsigned base_ms,
                       unsigned max_backoff_ms,
                       int use_equal_jitter,
                       uint64_t deadline_ms) {
  for (int a = 1; a <= max_attempts; ++a) {
    int ms_left = (int)((deadline_ms > now_ms()) ? (deadline_ms - now_ms()) : 0);
    if (ms_left <= 0) return -1; // deadline
    int r = fn(ctx, a, ms_left);
    if (r == ATTEMPT_OK) return 0;
    if (r == ATTEMPT_FATAL) return -1;
    if (a == max_attempts) return -1;
    unsigned back = exp_backoff_ms(base_ms, a, max_backoff_ms);
    unsigned sleep = use_equal_jitter ? jitter_equal(back) : jitter_full(back);
    if ((int)sleep > ms_left) sleep = (unsigned)ms_left;
    sleep_ms(sleep);
  }
  return -1;
}

Example attempt function for a write with timeout (pseudo-hardened for brevity):

#include <errno.h>
#include <unistd.h>
 
struct write_ctx { int fd; const void *buf; size_t len; };
 
static int attempt_write(void *vctx, int attempt, int ms_left) {
  (void)attempt; // attempt index available for logging/branching
  struct write_ctx *c = (struct write_ctx *)vctx;
  ssize_t w = write(c->fd, c->buf, c->len);
  if (w == (ssize_t)c->len) return ATTEMPT_OK;
  if (w >= 0) return ATTEMPT_FATAL; // short write on blocking fd: treat as fatal
  if (errno == EINTR) return ATTEMPT_RETRY;
  if (errno == EAGAIN || errno == EWOULDBLOCK) {
    // Wait for writable within ms_left (left as exercise to integrate poll/epoll)
    if (ms_left <= 0) return ATTEMPT_FATAL;
    return ATTEMPT_RETRY;
  }
  // Permanent errors like EPIPE/ECONNRESET usually shouldn’t be retried by this layer
  return ATTEMPT_FATAL;
}

Key points:

Cap attempts and honor deadlines strictly.
Only retry errors you’ve classified as transient.
Apply jitter; otherwise, many clients wake simultaneously.

Hedged requests (read-only, idempotent)

Hedging improves tail latency by sending a duplicate after a threshold (e.g., your P95). It must be used only for safe, idempotent reads. Cancel the loser promptly to avoid extra work.

Sketch:

// Start primary; if not done by hedge_ms and budget allows, start a secondary.
// Whichever completes first wins; cancel the other.
struct hedge_ctx { /* endpoints, request bytes, budgets, cancel fds */ };
 
int do_request_once(struct hedge_ctx *h, int which, int *out_status);
void cancel_request(struct hedge_ctx *h, int which);
 
int hedged_request(struct hedge_ctx *h, unsigned hedge_ms, int *out_status) {
  int done = 0, status = -1;
  uint64_t start = now_ms();
  // Launch primary (0)
  (void)do_request_once(h, 0, &status);
  while (!done) {
    uint64_t t = now_ms();
    if (t - start >= hedge_ms) {
      // Launch secondary (1) if not already
      (void)do_request_once(h, 1, &status);
    }
    // Poll both for completion (details depend on your I/O model)
    // On first completion:
    //   * store status
    //   * cancel the other: cancel_request(h, other)
    //   * done = 1
    // Break here for brevity
    break;
  }
  *out_status = status;
  return done ? 0 : -1;
}

Operational guidance:

Pick the hedge threshold from observed latency (e.g., dynamic P95) and apply a global cap to avoid overproduction under load.
Count hedges against budgets; disable when the system is hot.
Ensure handlers and storage layers are idempotent before enabling hedges.

Circuit breakers: fail fast to recover faster

When a dependency is failing persistently, stop sending full traffic into the fire. A small state machine prevents useless attempts and gives time to recover.

States:

Closed: normal traffic; count failures in a rolling window.
Open: immediately fail; after a cool-down, transition to HalfOpen.
HalfOpen: allow a limited number of probe requests; if they succeed, close; if they fail, open again.

enum cb_state { CB_CLOSED, CB_OPEN, CB_HALFOPEN };
 
struct circuit_breaker {
  enum cb_state state;
  uint64_t opened_at_ms;
  int window_ms;           // failure window for rate
  int open_ms;             // cool-down
  int fail_threshold;      // failures within window to trip
  int halfopen_max;        // concurrent probes allowed
  int halfopen_inflight;
  // Simple counters; production code uses a ring/time-bucketed window
  int recent_failures;
  uint64_t last_failure_ms;
};
 
static void cb_init(struct circuit_breaker *cb, int window_ms, int open_ms,
                    int fail_threshold, int halfopen_max) {
  cb->state = CB_CLOSED; cb->opened_at_ms = 0; cb->window_ms = window_ms;
  cb->open_ms = open_ms; cb->fail_threshold = fail_threshold;
  cb->halfopen_max = halfopen_max; cb->halfopen_inflight = 0;
  cb->recent_failures = 0; cb->last_failure_ms = 0;
}
 
static void cb_note_failure(struct circuit_breaker *cb) {
  uint64_t n = now_ms();
  if (cb->last_failure_ms == 0 || (n - cb->last_failure_ms) > (uint64_t)cb->window_ms) {
    cb->recent_failures = 0; // window reset
  }
  cb->recent_failures++;
  cb->last_failure_ms = n;
  if (cb->state == CB_CLOSED && cb->recent_failures >= cb->fail_threshold) {
    cb->state = CB_OPEN; cb->opened_at_ms = n;
  } else if (cb->state == CB_HALFOPEN) {
    // Failed probe → open again
    cb->state = CB_OPEN; cb->opened_at_ms = n; cb->halfopen_inflight = 0;
  }
}
 
static void cb_note_success(struct circuit_breaker *cb) {
  if (cb->state == CB_HALFOPEN) {
    // Close on successful probe (could require k successes)
    cb->state = CB_CLOSED; cb->halfopen_inflight = 0; cb->recent_failures = 0;
  } else if (cb->state == CB_CLOSED) {
    // Success decays failure pressure
    if (cb->recent_failures > 0) cb->recent_failures--;
  }
}
 
// Returns 1 allow, 0 reject immediately
static int cb_allow(struct circuit_breaker *cb) {
  uint64_t n = now_ms();
  if (cb->state == CB_OPEN) {
    if (n - cb->opened_at_ms >= (uint64_t)cb->open_ms) {
      cb->state = CB_HALFOPEN; cb->halfopen_inflight = 0;
    } else {
      return 0; // fail fast
    }
  }
  if (cb->state == CB_HALFOPEN) {
    if (cb->halfopen_inflight >= cb->halfopen_max) return 0;
    cb->halfopen_inflight++;
    return 1;
  }
  // Closed
  return 1;
}
 
static void cb_after_attempt(struct circuit_breaker *cb, int success) {
  if (success) cb_note_success(cb); else cb_note_failure(cb);
}

Guidance:

Trip on a rate within a short rolling window (time buckets) rather than a lifetime counter.
Keep cool-down short and probes small; the goal is to sense recovery, not resume full blast instantly.
Log breaker transitions; they’re vital signals during incidents.

Putting it together: budgeted, breaker-guarded retries

Before attempting a call, check the breaker. On each failure, update it. Respect the overall deadline and attempt cap.

int guarded_call_with_retries(struct circuit_breaker *cb,
                              attempt_fn fn, void *ctx,
                              int max_attempts,
                              unsigned base_ms, unsigned max_backoff_ms,
                              uint64_t deadline_ms) {
  for (int a = 1; a <= max_attempts; ++a) {
    if (!cb_allow(cb)) return -1; // fail fast; caller may degrade
    int ms_left = (int)((deadline_ms > now_ms()) ? (deadline_ms - now_ms()) : 0);
    if (ms_left <= 0) return -1;
    int r = fn(ctx, a, ms_left);
    if (r == ATTEMPT_OK) { cb_after_attempt(cb, 1); return 0; }
    cb_after_attempt(cb, 0);
    if (r == ATTEMPT_FATAL || a == max_attempts) return -1;
    unsigned back = exp_backoff_ms(base_ms, a, max_backoff_ms);
    unsigned sleep = jitter_full(back);
    if ((int)sleep > ms_left) sleep = (unsigned)ms_left;
    sleep_ms(sleep);
  }
  return -1;
}

This pattern ensures:

You never hammer a downed dependency endlessly.
Attempts spread out under load.
The operation terminates within a predictable time bound.

Idempotency that actually holds under retries

Idempotency converts “try again” into “same effect as before.” Achieving it in practice means two things:

A stable, operation-scoped key that names the logical request.
A dedupe record that you check-before-do and update atomically with your side effect.

Designing the idempotency key

Scope: per operation type (e.g., “POST /charge” vs “POST /refund”). A key valid for one operation shouldn’t collide with another.
Stability: derived from client semantics (client-provided header or a hash of immutable fields). Do not include volatile fields (timestamps) unless part of the semantics.
Lifetime (TTL): long enough to cover worst-case retries, short enough to bound storage. Hours to days, depending on domain.

What to store per key

Outcome status (success/failure classification).
A checksum or digest of the materialized effect (e.g., row id created, amount, currency).
Optional response bytes (bounded) or a pointer to a durable record you can re-fetch.

If a duplicate arrives and your store has a hit, return the stored outcome. If the payload mismatches the stored checksum, reject with a conflict—this is a safety net against clients reusing keys across different payloads.

Minimal in-memory dedupe with TTL (single-process)

For a single-process service (or as a fast local cache in front of a shared store), a tiny TTL map goes a long way. Keep it bounded and evict oldest.

#include <stdint.h>
#include <string.h>
 
#define IDEMP_MAX 1024
#define IDEMP_KEY_MAX 64
 
struct idem_rec {
  char key[IDEMP_KEY_MAX];
  uint64_t expire_ms;
  int status;           // app-defined status code
  uint64_t checksum;    // of payload/result
};
 
struct idem_map { struct idem_rec recs[IDEMP_MAX]; };
 
static int idem_find(struct idem_map *m, const char *key, uint64_t now_ms) {
  for (int i = 0; i < IDEMP_MAX; ++i) {
    if (m->recs[i].key[0] && strcmp(m->recs[i].key, key) == 0) {
      if (now_ms < m->recs[i].expire_ms) return i; // hit
      // expired → clear slot
      m->recs[i].key[0] = '\0';
      return -1;
    }
  }
  return -1;
}
 
static int idem_insert(struct idem_map *m, const char *key,
                       uint64_t expire_ms, int status, uint64_t checksum) {
  for (int i = 0; i < IDEMP_MAX; ++i) {
    if (m->recs[i].key[0] == '\0') {
      strncpy(m->recs[i].key, key, IDEMP_KEY_MAX-1);
      m->recs[i].key[IDEMP_KEY_MAX-1] = '\0';
      m->recs[i].expire_ms = expire_ms;
      m->recs[i].status = status;
      m->recs[i].checksum = checksum;
      return 1;
    }
  }
  return 0; // full → consider LRU eviction in production
}

Usage pattern:

typedef int (*exec_fn)(void *ctx, int *out_status, uint64_t *out_checksum);
 
int with_idempotency(struct idem_map *m, const char *key, uint64_t ttl_ms,
                     exec_fn fn, void *ctx, int *out_status) {
  uint64_t t0 = now_ms();
  int idx = idem_find(m, key, t0);
  if (idx >= 0) { *out_status = m->recs[idx].status; return 0; }
  uint64_t checksum = 0; int st = -1;
  int r = fn(ctx, &st, &checksum);
  if (r != 0) return -1; // execution failed; do not record success
  (void)idem_insert(m, key, t0 + ttl_ms, st, checksum);
  *out_status = st;
  return 0;
}

This single-process map is not a global guarantee, but it collapses local duplicates and reduces pressure on the shared store.

Exactly-once-ish with an outbox

For side effects that cross system boundaries (emails, payments, Kafka events), use an “outbox” table and a worker that drains it. The service transaction writes both the domain change and an outbox row with a unique idempotency key; the worker reads committed outbox rows and performs the external call, marking each row sent. If the worker crashes after sending, a retry will see the unique key and skip duplication.

Sketch (SQL-ish pseudocode around a C service):

-- Domain write and outbox are in the SAME transaction
BEGIN;
  INSERT INTO orders(id, amount, ...) VALUES($1, $2, ...);
  INSERT INTO outbox(idem_key, kind, payload, sent)
    VALUES($idem_key, 'order.created', $json, false)
    ON CONFLICT (idem_key) DO NOTHING; -- unique key prevents duplicates
COMMIT;
 
-- Worker loop (idempotent consumer)
SELECT idem_key, kind, payload FROM outbox WHERE sent = false LIMIT 100;
-- For each row: send; on success:
UPDATE outbox SET sent = true, sent_at = now() WHERE idem_key = $k;

Notes:

The unique idem_key on outbox (and other effect tables) turns double-insert into a no-op.
If the consumer sends twice due to a crash, the duplicate update is harmless.
For reads of prior outcomes, keep a result table keyed by idem_key so the service can return the same response body for retries.

Transaction boundaries: check-before-do, then record

To avoid races, the idempotency check and the side effect must be coordinated. Two common patterns:

Check in a shared dedupe store, then perform the action, then record outcome under the same key atomically with the state change (single DB transaction).
Always write a unique, idempotent row first (e.g., INSERT ... ON CONFLICT DO NOTHING with idem_key), then branch on whether you inserted or conflicted. If conflicted, read the existing outcome and return it.

For example (pseudocode):

BEGIN;
  -- Reserve the operation if not already
  INSERT INTO operations(idem_key, status)
    VALUES($key, 'in_progress')
    ON CONFLICT (idem_key) DO NOTHING;
 
  -- Proceed only if we inserted (new op) or if status allows resume
  -- ... perform domain writes here ...
 
  UPDATE operations SET status = 'done', checksum = $chk WHERE idem_key = $key;
COMMIT;

On duplicate, you’ll get a conflict and can read the existing operations row to return the prior result.

Per-tenant quotas and fairness

Backpressure without fairness creates noisy neighbors. Apply caps per tenant (e.g., API key, auth id, IP) and globally. Two tools:

A per-tenant token bucket limiter for request rate/weight.
Byte/message caps per connection, already covered earlier.

Token bucket in C (monotonic time)

struct token_bucket {
  double tokens;       // current tokens (can exceed capacity slightly on burst calc)
  double capacity;     // max tokens
  double refill_per_s; // tokens added per second
  uint64_t last_ns;    // last refill time
};
 
static uint64_t now_ns(void) {
  struct timespec ts; clock_gettime(CLOCK_MONOTONIC, &ts);
  return (uint64_t)ts.tv_sec*1000000000ull + (uint64_t)ts.tv_nsec;
}
 
static void tb_init(struct token_bucket *tb, double capacity, double per_s) {
  tb->tokens = capacity; tb->capacity = capacity; tb->refill_per_s = per_s; tb->last_ns = now_ns();
}
 
static int tb_allow(struct token_bucket *tb, double cost) {
  uint64_t n = now_ns();
  double dt = (double)(n - tb->last_ns) / 1e9;
  tb->last_ns = n;
  tb->tokens += dt * tb->refill_per_s;
  if (tb->tokens > tb->capacity) tb->tokens = tb->capacity;
  if (tb->tokens >= cost) { tb->tokens -= cost; return 1; }
  return 0;
}

You can maintain a small open-addressing hash map from tenant id → token_bucket. On each request, compute a “cost” (e.g., 1 per request or proportional to expected bytes) and call tb_allow. If it returns 0, reject early (429-equivalent) or degrade.

Integrating quotas with admission

Admission path (simplified):

Check global byte/message/concurrency caps. If exceeded → reject or shed.
Look up tenant limiter; if tb_allow fails → reject/slow.
Check per-connection queue length; if full → shed or wait with deadline.
If a request has an idempotency key → consult dedupe store; if hit → return prior result immediately; else proceed.

This order prevents expensive work when you already know the answer (idempotency hit) and enforces fairness before allocating buffers.

Operational guardrails

Size the idempotency TTL to your actual retry windows and client behavior; too short wastes wins, too long wastes memory.
Bound the dedupe store and prefer a real datastore (e.g., Redis with SET key value NX PX=ttl) for cross-process correctness.
Make idempotent handlers “payload-aware”: reject reuse of a key with different payload by comparing a checksum.
Emit structured logs on: dedupe hits/misses, limiter rejections, breaker state changes, and retry outcomes. These are your incident dashboards.

End-to-end: composing the patterns

Let’s stitch the ideas together into a small, predictable handler path that applies quotas, idempotency, backpressure, retries, and circuit breaking under a single deadline.

struct req { const char *tenant; const char *idem_key; const void *payload; size_t len; };
 
struct deps {
  struct token_bucket *tenant_tb;  // per-tenant limiter
  struct ringq *conn_outq;         // per-connection bounded queue
  struct idem_map *idem;           // in-memory dedupe (plus shared store in prod)
  struct circuit_breaker *cb;      // downstream breaker
};
 
// Application-specific execution with idempotency-safe side effects
static int exec_business(void *ctx, int *out_status, uint64_t *out_checksum) {
  (void)ctx; // compute, do DB tx, write outbox, etc.
  *out_status = 200; *out_checksum = 0xDEADBEEF; return 0;
}
 
int handle_request(struct deps *D, struct req *R, uint64_t deadline_ms) {
  // 1) Tenant quota
  struct token_bucket *tb = D->tenant_tb; // lookup by R->tenant
  if (!tb_allow(tb, 1.0)) return 429;
 
  // 2) Idempotency fast path
  if (R->idem_key && *R->idem_key) {
    int st = 0; if (with_idempotency(D->idem, R->idem_key, /*ttl*/ 6*60*60*1000ULL,
                                     exec_business, R, &st) == 0) {
      return st; // stored or newly produced
    }
  }
 
  // 3) Guarded downstream call with breaker + deadline-aware retries (if needed)
  int app_attempt(void *ctx, int a, int ms_left) {
    (void)a; (void)ms_left; int st = 0; uint64_t chk = 0;
    return exec_business(ctx, &st, &chk) == 0 ? ATTEMPT_OK : ATTEMPT_RETRY;
  }
 
  if (!cb_allow(D->cb)) return 503; // immediate shed while dependency is down
  int ok = guarded_call_with_retries(D->cb, app_attempt, R,
                                     /*max_attempts*/ 5,
                                     /*base_ms*/ 25, /*max_backoff_ms*/ 500,
                                     deadline_ms) == 0;
  if (!ok) return 504; // deadline/attempts exhausted
 
  // 4) Backpressure on write path (enqueue response)
  struct bufseg seg = { .data = "OK", .len = 2 };
  if (!rq_push(D->conn_outq, seg)) return 503; // queue full → shed/degrade
 
  return 200;
}

Notes:

In real code, the idempotency store is shared (e.g., Redis/DB) and the business function records outcome under the key within the same transaction.
The response write is queued; a nonblocking flush finishes it. If the queue is full, you degrade or drop based on policy.
The breaker gates calls, and the retry wrapper guarantees a deadline-bound outcome.

Graceful degradation playbook

When budgets are tight or dependencies flap, don’t just fail—serve something cheaper:

Return cached or partial data for read endpoints (bounded freshness window).
Switch to simplified code paths (e.g., omit non-critical joins or expensive enrichments) when a “hot” flag is on.
Shed low-priority traffic first: separate queues per priority and apply stricter caps to best-effort work.
Prefer immediate, explicit errors (429/503) over accepting work you cannot complete.

Keep degradation reversible with a single feature flag or environment toggle, and log when it’s active.

SLO-driven timeouts and budgets

Start from the SLO for each endpoint (e.g., 95% under 200 ms) and decompose the time budget:

Admission + queueing: ≤ 5–10 ms
Business logic + storage: ≤ 150 ms (with internal sub-budgets)
Downstream calls: each with own deadline (e.g., 40 ms primary, hedge at 30 ms)
Write/flush: ≤ 10–20 ms (buffered)

Enforce budgets in code: compute a deadline once at ingress and pass it down. Each layer derives remaining time with now() and refuses work if not enough remains to succeed reliably.

Observability that matters

Track a minimal, high-signal set:

Backpressure: per-conn queue depth, global queued bytes/messages, shed counts.
Retries: attempts per call, backoff distribution, success-after-n.
Breaker: state transitions, open duration, half-open probe outcomes.
Idempotency: hits/misses, TTL expirations, payload-mismatch conflicts.
Quotas: per-tenant allow/deny, bucket fill level distribution.
Latency: P50/P95/P99 per endpoint; hedge rate; deadline timeouts.

Alert on: sustained shed/deny rates, breaker stuck-open, retry explosion, queue depth near caps, and SLO breaches.

Test like you’ll fail tomorrow

Force partial reads/writes (socketpair, tiny buffers) and ensure handlers loop until EAGAIN/deadline.
Inject downstream 5xx/timeouts and verify breaker trips and resets; confirm retries respect caps and deadlines.
Replay duplicate requests with the same key and different payload—expect conflict, not duplication.
Run load with uneven tenants to prove fairness (noisy neighbor doesn’t starve others).
Chaos toggles: flip the “hot” flag and verify degradation paths are correct and reversible.

A short production checklist

Bounded queues everywhere; enforce per-tenant and global caps.
Every operation has a deadline; timeouts are explicit and logged.
Retries: exponential backoff + jitter; attempt cap; only on transient errors.
Circuit breaker in front of flaky dependencies; probe to recover.
Idempotency keys on mutating endpoints; dedupe store with TTL and payload checks.
Token buckets per tenant; reject or degrade when empty.
Metrics and logs for budgets, retries, breaker, dedupe, quotas; alerts on sustained pressure.

Closing thoughts

Resilient services aren’t “heroically fast”—they’re predictably honest. By bounding memory and time, spreading retries with jitter, failing fast behind breakers, and making work idempotent, you turn production from a game of luck into a system of contracts. Start with budgets and small, boring loops. Everything else gets easier when the flow is shaped by design.