Syscalls, File Descriptors, and Robust I/O Patterns in C

So you called write() once and assumed all your data magically teleported to disk or across the network. Adorable.

In real systems the kernel is a strict bouncer. You cross into its world via syscalls, you’re handed integer tokens called file descriptors, and you follow rules about what “write succeeded” actually means. If you don’t internalize those rules, you’ll ship data truncation bugs, rare deadlocks, and the occasional 3 a.m. incident.

This post is your practical field guide: what syscalls really promise, how file descriptors behave, and how to build I/O loops that survive signals, short writes, and nonblocking chaos—without turning your codebase into spaghetti.

The boundary: what a syscall actually is

Calling read(), write(), accept(), or open() transfers control to the kernel. That transition:

Switches privilege levels and executes kernel code on your behalf
May block your thread until the resource is ready (unless you asked for nonblocking)
Returns either a non-negative result or -1 with errno set

Return-value truth table, condensed:

>= 0: success. For read(), it’s the byte count read. For write(), it’s the byte count written. For open(), it’s a new file descriptor.
-1: failure. Inspect errno for why (e.g., EINTR, EAGAIN, EPIPE).

Key mindset: success does not mean “all of it.” It means “some of it.” Your code must be prepared to loop.

File descriptors: small integers, big responsibilities

File descriptors (FDs) are small integers indexing a per-process table of “open files.” An “open file” is a kernel object pointing to a driver/stream/regular file plus state (offset, flags, reference counts).

What an FD can represent:

Regular files, directories (limited ops), character/block devices
Pipes (pipe()), UNIX domain sockets, TCP/UDP sockets
Eventfd, signalfd, epoll/kqueue descriptors, timerfd (Linux)

Semantics and lifecycle you must know:

Creation: open(), socket(), accept(), pipe(), etc. On success you get a non-negative FD.
Duplication: dup(), dup2(), dup3() create new FDs referencing the same open-file description (shared offset/flags). Useful for redirecting stdin/stdout/stderr or implementing tee-like behavior.
Close: close(fd) decrements the reference count; when the last reference drops, the kernel releases the resource. Never leak FDs—long-running services will run out.
Inheritance: after fork(), the child inherits copies of the parent’s FDs. After execve(), inherited FDs remain open unless marked CLOEXEC. Use O_CLOEXEC (on open/socket/accept4) or fcntl(F_SETFD, FD_CLOEXEC) to prevent accidental leaks into child processes.
Per-FD flags: O_NONBLOCK governs blocking behavior; O_APPEND, O_SYNC, O_DIRECT and friends alter write semantics or caching. Set flags with open(..., O_NONBLOCK) or fcntl(F_SETFL, O_NONBLOCK).

A quick correctness checklist for FDs

Always set CLOEXEC when creating long-lived FDs in services
Decide up front: blocking vs nonblocking
Handle closure on all paths (including error paths)
Don’t mix blocking reads with nonblocking writes on the same socket without understanding backpressure

Blocking vs nonblocking: what you actually asked the kernel to do

With blocking FDs (the default), read() and write() may stall your thread:

read() blocks until at least one byte is available or EOF
write() blocks until the kernel can accept at least one byte into its buffers

With nonblocking FDs (O_NONBLOCK):

read() returns -1 with errno == EAGAIN (or EWOULDBLOCK) if no data is ready
write() returns -1 with errno == EAGAIN if kernel buffers are full

stateDiagram-v2 [*] --> TryOperation state "Blocking I/O" as Blocking { TryOperation --> BlockingWait: No data/space BlockingWait --> Success: Data available BlockingWait --> Error: I/O error Success --> [*] Error --> [*] } state "Non-blocking I/O" as NonBlocking { TryOperation --> EAGAIN: Would block TryOperation --> Success2: Data ready TryOperation --> Error2: I/O error EAGAIN --> PollWait: Use epoll/kqueue PollWait --> TryOperation: Ready signal PollWait --> Timeout: Deadline expired Success2 --> [*] Error2 --> [*] Timeout --> [*] } note right of BlockingWait Thread blocked until I/O possible end note note right of PollWait Thread can handle other events while waiting end note

Nonblocking buys you control: you integrate readiness APIs (epoll/kqueue/poll) and implement timeouts and backpressure explicitly. It also forces you to write robust loops.

The small print: short reads, short writes, and EOF

On success, read(fd, buf, n) may return any 0 <= r <= n:

r > 0: you got some bytes; loop if you need more
r == 0: EOF for streams/files (peer closed for sockets)

On success, write(fd, buf, n) may return any 0 < w <= n:

Regular files typically write fully, but partial writes can occur (signals, quotas, O_NONBLOCK, resource pressure)
Pipes/sockets frequently produce short writes; treat any partial as routine

Errors that matter:

EINTR: the syscall was interrupted by a signal before transferring any bytes; retry
EAGAIN/EWOULDBLOCK: would block on nonblocking FDs; wait for readiness, then retry
EPIPE: writing to a closed pipe/socket; peer is gone (often accompanied by SIGPIPE unless suppressed)

If your code assumes “all bytes in one go,” it’s wrong by construction.

Robust I/O loops: minimal, correct, boring (the good kind)

Two foundational helpers cover 80% of production needs: “write all” and “read exactly N unless EOF.” They are EINTR-safe, handle partial transfers, and optionally integrate nonblocking retries.

#include <errno.h>
#include <stddef.h>
#include <stdint.h>
#include <stdbool.h>
#include <unistd.h>
 
// Write the entire buffer (best-effort). Returns true on success, false on error.
// For nonblocking fds, the caller should ensure writability (e.g., epoll/kqueue) before calling.
bool write_all(int fd, const void *buf, size_t len) {
  const uint8_t *p = (const uint8_t *)buf;
  size_t remaining = len;
  while (remaining > 0) {
    ssize_t w = write(fd, p, remaining);
    if (w > 0) {
      p += (size_t)w;
      remaining -= (size_t)w;
      continue;
    }
    if (w == -1 && errno == EINTR) {
      continue; // interrupted, retry
    }
    if (w == -1 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
      // Would block: the caller must wait for POLLOUT/EVFILT_WRITE then retry.
      return false;
    }
    return false; // other error (EPIPE, ENOSPC, etc.)
  }
  return true;
}

#include <errno.h>
#include <stddef.h>
#include <stdint.h>
#include <stdbool.h>
#include <unistd.h>
 
// Read exactly N bytes unless EOF occurs first.
// Returns the number of bytes placed into buf (<= len). 0 means EOF from the start.
ssize_t read_exact(int fd, void *buf, size_t len) {
  uint8_t *p = (uint8_t *)buf;
  size_t total = 0;
  while (total < len) {
    ssize_t r = read(fd, p + total, len - total);
    if (r > 0) {
      total += (size_t)r;
      continue;
    }
    if (r == 0) {
      // EOF before we reached len
      return (ssize_t)total;
    }
    if (r == -1 && errno == EINTR) {
      continue; // interrupted, retry
    }
    if (r == -1 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
      // Would block: the caller must wait for POLLIN/EVFILT_READ then retry.
      return (ssize_t)total;
    }
    return -1; // other error
  }
  return (ssize_t)total;
}

Notes:

These helpers separate “transfer” from “readiness.” In a nonblocking design, you wait for readiness (epoll/kqueue), then call these until EAGAIN.
For blocking FDs, these spin until completion or error, which is fine in simple CLI tools but can cause head-of-line blocking in servers.
Suppress SIGPIPE on sockets (set SO_NOSIGPIPE on BSDs or use send(..., MSG_NOSIGNAL) on Linux) so a peer-close yields EPIPE instead of terminating the process.

Practical edge cases to cement your intuition

read() on a TCP socket can return fewer bytes than requested even if the sender wrote a single, larger buffer. TCP is a byte stream; message boundaries don’t exist.
write() to a pipe with an atomic write size limit (POSIX requires atomicity up to PIPE_BUF) may still short-write when O_NONBLOCK is set and buffers are tight.
On regular files, a blocking write() is often full-sized, but a signal arriving mid-flight can cause EINTR without progress, or partial completion followed by an error. Loop regardless.

Large buffers may be split due to kernel limits, cgroup I/O throttling, or filesystem peculiarities. Your loop doesn’t care—keep going until done.

Scatter/gather I/O that actually scales

Vectored I/O lets you move bytes from or to multiple non-contiguous buffers in a single syscall. Instead of stitching headers + payload into a temporary buffer (and copying), you describe them with an array of struct iovec and call writev() (or readv() for the reverse).

Why this matters:

Fewer syscalls: amortize syscall overhead under high throughput
Fewer copies: keep data in place; better cache locality
Cleaner code: describe segments declaratively

The catch: partial completion can split anywhere across your segments. You must advance the iovec array after each call.

Advancing iovecs after a partial

#include <sys/uio.h>
#include <stddef.h>
 
// Advance an iovec array by `bytes` consumed, mutating base/len and iovcnt.
// On return, *piov points to the first unconsumed segment and *piovcnt is updated.
static void advance_iovecs(struct iovec **piov, int *piovcnt, size_t bytes) {
  struct iovec *iov = *piov;
  int cnt = *piovcnt;
  size_t remain = bytes;
  while (cnt > 0 && remain > 0) {
    if (remain >= iov->iov_len) {
      remain -= iov->iov_len;
      ++iov;
      --cnt;
    } else {
      iov->iov_base = (char *)iov->iov_base + remain;
      iov->iov_len  -= remain;
      remain = 0;
    }
  }
  *piov = iov;
  *piovcnt = cnt;
}

Robust `writev`: full send with retries

Two variants are useful in practice—one that returns early on EAGAIN for nonblocking designs, and one that waits up to a timeout.

#include <errno.h>
#include <poll.h>
#include <stdbool.h>
#include <sys/uio.h>
#include <unistd.h>
 
// Attempt to write all iovecs. Returns true when everything is written.
// On nonblocking fds: returns false with errno=EAGAIN when you should wait for POLLOUT.
bool writev_all_try(int fd, struct iovec *iov, int iovcnt) {
  while (iovcnt > 0) {
    ssize_t w = writev(fd, iov, iovcnt);
    if (w > 0) {
      advance_iovecs(&iov, &iovcnt, (size_t)w);
      continue;
    }
    if (w == -1 && errno == EINTR) {
      continue; // signal, retry
    }
    if (w == -1 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
      return false; // caller should poll for writable
    }
    return false; // hard error (EPIPE, ENOSPC, ...)
  }
  return true;
}

#include <time.h>
 
static int wait_writable(int fd, int timeout_ms) {
  struct pollfd p = { .fd = fd, .events = POLLOUT };
  for (;;) {
    int r = poll(&p, 1, timeout_ms);
    if (r > 0) return 1;   // ready
    if (r == 0) return 0;  // timeout
    if (r < 0 && errno == EINTR) continue;
    return -1;             // error
  }
}
 
// Deadline-based helper: returns milliseconds remaining, clamped to [0, INT_MAX]
static int ms_left(struct timespec deadline) {
  struct timespec now;
  clock_gettime(CLOCK_MONOTONIC, &now);
  long ms = (long)((deadline.tv_sec - now.tv_sec) * 1000)
          + (long)((deadline.tv_nsec - now.tv_nsec) / 1000000);
  if (ms < 0) return 0;
  if (ms > 0x3fffffff) return 0x3fffffff;
  return (int)ms;
}
 
// Write all iovecs before the deadline. Returns true on success, false on timeout/error.
bool writev_all_until(int fd, struct iovec *iov, int iovcnt, struct timespec deadline) {
  while (iovcnt > 0) {
    ssize_t w = writev(fd, iov, iovcnt);
    if (w > 0) {
      advance_iovecs(&iov, &iovcnt, (size_t)w);
      continue;
    }
    if (w == -1 && errno == EINTR) {
      continue;
    }
    if (w == -1 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
      int left = ms_left(deadline);
      int wr = wait_writable(fd, left);
      if (wr == 1) continue;
      return false; // timeout or poll error
    }
    return false; // hard error
  }
  return true;
}

Signal nuance for sockets: avoid SIGPIPE on peer-close. Options include ignoring SIGPIPE process-wide with sigaction(SIGPIPE, SIG_IGN, ...), using send()/sendmsg() with MSG_NOSIGNAL (Linux), or SO_NOSIGPIPE (macOS/BSD). With plain writev(), ignoring SIGPIPE is the simplest.

Robust `readv`: fill buffers, honor EOF and timeouts

readv() mirrors writev(): it can return fewer bytes than requested even when data exists, and it can split across segments. The loop looks similar.

static int wait_readable(int fd, int timeout_ms) {
  struct pollfd p = { .fd = fd, .events = POLLIN };
  for (;;) {
    int r = poll(&p, 1, timeout_ms);
    if (r > 0) return 1;   // ready
    if (r == 0) return 0;  // timeout
    if (r < 0 && errno == EINTR) continue;
    return -1;             // error
  }
}
 
// Read exactly the iovec payload or stop on EOF/timeout/error.
// Returns total bytes read (<= total iovec length), or -1 on error, 0 on immediate EOF.
ssize_t readv_exact_until(int fd, struct iovec *iov, int iovcnt, struct timespec deadline) {
  size_t total = 0;
  // Compute total requested
  for (int i = 0; i < iovcnt; ++i) total += iov[i].iov_len;
  size_t consumed = 0;
  while (iovcnt > 0) {
    ssize_t r = readv(fd, iov, iovcnt);
    if (r > 0) {
      consumed += (size_t)r;
      advance_iovecs(&iov, &iovcnt, (size_t)r);
      continue;
    }
    if (r == 0) {
      return (ssize_t)consumed; // EOF
    }
    if (r == -1 && errno == EINTR) {
      continue;
    }
    if (r == -1 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
      int left = ms_left(deadline);
      int rr = wait_readable(fd, left);
      if (rr == 1) continue;
      return (ssize_t)consumed; // timeout: return what we have
    }
    return -1; // hard error
  }
  return (ssize_t)consumed; // full success
}

Design choices to note:

Timeout as a deadline, not a per-iteration slice: makes progress consistent under partial transfers
On timeout, readv_exact_until returns the bytes gathered so far (like read_exact); callers can decide whether to fail or process partials
EINTR is treated as a non-event—just retry

A small nonblocking pattern: toggling `O_NONBLOCK`

If you need time-bounded I/O on an FD you usually run in blocking mode, consider temporarily setting O_NONBLOCK around the operation so poll() can control waiting explicitly. Beware of races if other threads share the FD.

#include <fcntl.h>
 
static bool set_nonblocking(int fd, bool on) {
  int flags = fcntl(fd, F_GETFL, 0);
  if (flags == -1) return false;
  int want = on ? (flags | O_NONBLOCK) : (flags & ~O_NONBLOCK);
  if (want == flags) return true;
  return fcntl(fd, F_SETFL, want) == 0;
}

Use with care in single-threaded tools; prefer dedicated nonblocking sockets in servers.

Signals without surprises: making I/O signal-safe

Signals can interrupt syscalls, flipping success into -1 with errno == EINTR. Good news: if your loops already retry on EINTR, you’re most of the way there. A few additional practices make the system predictable.

Prefer sigaction over signal() and set SA_RESTART when appropriate so many syscalls resume automatically. Still treat EINTR as routine.
Ignore SIGPIPE globally or use MSG_NOSIGNAL/SO_NOSIGPIPE so a peer-close on sockets yields EPIPE instead of killing the process.
For precise coordination between signals and timeouts, use ppoll/pselect with a signal mask to avoid classic races.

#include <signal.h>
 
static volatile sig_atomic_t g_got_sigint = 0;
 
static void on_sigint(int signo) { (void)signo; g_got_sigint = 1; }
 
static void install_signal_handlers(void) {
  // Ignore SIGPIPE so writes on closed sockets set EPIPE
  struct sigaction ign = {0};
  ign.sa_handler = SIG_IGN;
  sigemptyset(&ign.sa_mask);
  ign.sa_flags = 0;
  sigaction(SIGPIPE, &ign, NULL);
 
  // Handle SIGINT and request graceful shutdown; SA_RESTART restarts many syscalls
  struct sigaction sa = {0};
  sa.sa_handler = on_sigint;
  sigemptyset(&sa.sa_mask);
  sa.sa_flags = SA_RESTART;
  sigaction(SIGINT, &sa, NULL);
}

Race-free timeouts with `ppoll`/`pselect`

poll() has a race: a signal can arrive after you check the flag but before you call poll(), leaving you blocked. ppoll/pselect solve this by atomically swapping the signal mask during the wait.

#include <errno.h>
#include <poll.h>
#include <signal.h>
#include <time.h>
 
// Wait for readiness while unblocking the provided signals during the wait.
// Returns 1 when any fd is ready, 0 on timeout, -1 on error.
static int wait_rw_with_mask(struct pollfd *fds, nfds_t nfds,
                             struct timespec *ts, const sigset_t *unblock) {
#if defined(_GNU_SOURCE) || defined(__linux__)
  // Linux: use ppoll
  for (;;) {
    int r = ppoll(fds, nfds, ts, unblock);
    if (r >= 0) return r;
    if (errno == EINTR) continue;
    return -1;
  }
#else
  // Portable fallback: temporarily set mask then poll; small race may remain
  sigset_t prev;
  pthread_sigmask(SIG_SETMASK, unblock, &prev);
  int r;
  for (;;) {
    r = poll(fds, nfds, ts ? (int)(ts->tv_sec * 1000 + ts->tv_nsec / 1000000) : -1);
    if (r >= 0) break;
    if (errno == EINTR) continue;
    break;
  }
  pthread_sigmask(SIG_SETMASK, &prev, NULL);
  return r;
#endif
}

Notes:

On Linux, ppoll is the clean choice. On other platforms, consider pselect or the platform’s event loop which usually integrates signal delivery.
Use a deadline to compute struct timespec per wait.

Cancellation you can reason about: self-pipe/eventfd

Long waits should be cancelable. Two portable primitives make this easy:

Self-pipe: create pipe(); include the read end in your poll set. To cancel, write a byte to the write end.
Linux eventfd: cheaper, 64-bit counter you can increment; also pollable.

#include <fcntl.h>
#include <unistd.h>
 
struct cancel_fd { int r; int w; };
 
static int make_nonblocking(int fd) {
  int flags = fcntl(fd, F_GETFL, 0);
  if (flags == -1) return -1;
  return fcntl(fd, F_SETFL, flags | O_NONBLOCK);
}
 
static bool cancel_fd_init(struct cancel_fd *c) {
  int fds[2];
  if (pipe(fds) != 0) return false;
  (void)make_nonblocking(fds[0]);
  (void)make_nonblocking(fds[1]);
  c->r = fds[0];
  c->w = fds[1];
  return true;
}
 
static void cancel_fd_signal(struct cancel_fd *c) {
  (void)write(c->w, "x", 1); // best-effort; nonblocking
}
 
static void cancel_fd_drain(struct cancel_fd *c) {
  char buf[64];
  while (read(c->r, buf, sizeof buf) > 0) {}
}

Integrate into readiness waits by adding the cancel read FD to your poll set and returning a distinct status when it becomes readable.

enum wait_result { WAIT_READY = 1, WAIT_TIMEOUT = 0, WAIT_ERROR = -1, WAIT_CANCELLED = -2 };
 
static int io_wait_rw(int io_fd, short events, int cancel_fd, struct timespec *ts) {
  struct pollfd pfds[2];
  pfds[0].fd = io_fd;     pfds[0].events = events; pfds[0].revents = 0;
  pfds[1].fd = cancel_fd; pfds[1].events = POLLIN; pfds[1].revents = 0;
  for (;;) {
    int r = poll(pfds, 2, ts ? (int)(ts->tv_sec * 1000 + ts->tv_nsec / 1000000) : -1);
    if (r > 0) {
      if (pfds[1].revents) return WAIT_CANCELLED;
      if (pfds[0].revents) return WAIT_READY;
      continue;
    }
    if (r == 0) return WAIT_TIMEOUT;
    if (errno == EINTR) continue;
    return WAIT_ERROR;
  }
}

Framing reads: stop exactly at a delimiter

Many text protocols and CLI tools need to read until a delimiter (e.g., \n) with a time budget. The helper below accumulates into a caller-provided buffer, stops when the delimiter is found or capacity is reached, and respects a deadline. It returns the number of bytes stored (which may be partial on timeout/EOF) or -1 on error.

#include <string.h>
 
// Reads up to cap bytes or until delim is seen; returns count (>=0) or -1 on error.
// The returned count includes the delimiter if present. The buffer is not NUL-terminated.
ssize_t read_until_delim(int fd, char *buf, size_t cap, char delim, struct timespec deadline) {
  size_t used = 0;
  while (used < cap) {
    ssize_t r = read(fd, buf + used, cap - used);
    if (r > 0) {
      used += (size_t)r;
      // Check for delimiter; keep the last position to resume efficiently
      char *pos = memchr(buf, (unsigned char)delim, used);
      if (pos) {
        size_t have = (size_t)(pos - buf + 1);
        return (ssize_t)have;
      }
      continue;
    }
    if (r == 0) {
      return (ssize_t)used; // EOF
    }
    if (errno == EINTR) {
      continue;
    }
    if (errno == EAGAIN || errno == EWOULDBLOCK) {
      // Wait for readability then retry
      int left_ms = ms_left(deadline);
      int rr = wait_readable(fd, left_ms);
      if (rr == 1) continue;
      return (ssize_t)used; // timeout or poll error: return what we have
    }
    return -1; // hard error
  }
  return (ssize_t)used; // buffer full without delimiter
}

Practical tips:

Cap lines to a reasonable maximum and treat longer ones as an error to avoid unbounded memory usage.
Reuse buffers between calls to reduce allocations and improve cache locality.
For binary protocols, prefer length-prefix framing and use read_exact/readv_exact_until to collect exactly the announced size.

Backpressure: when the kernel says “not now”

EAGAIN is not an error—it’s a signal that downstream buffers are full. Good patterns:

Treat EAGAIN as a scheduling event: stop writing, register interest in writability, and try again when notified
Bound your userland output queue (bytes and messages). If limits are exceeded, shed work or apply upstream backpressure
Prefer deadlines for end-to-end operations. A write that never makes progress should eventually time out and surface an error you can observe

Simple budgeted sender loop:

// Attempts to flush up to max_bytes from iovecs. Returns bytes flushed (>=0) or -1 on error.
ssize_t flush_budgeted(int fd, struct iovec *iov, int iovcnt, size_t max_bytes) {
  size_t sent = 0;
  while (iovcnt > 0 && sent < max_bytes) {
    size_t want = iov[0].iov_len;
    if (want > max_bytes - sent) want = max_bytes - sent;
    struct iovec tmp = { .iov_base = iov[0].iov_base, .iov_len = want };
    ssize_t w = writev(fd, &tmp, 1);
    if (w > 0) {
      sent += (size_t)w;
      advance_iovecs(&iov, &iovcnt, (size_t)w);
      continue;
    }
    if (w == -1 && (errno == EINTR)) continue;
    if (w == -1 && (errno == EAGAIN || errno == EWOULDBLOCK)) break; // yield
    return -1; // hard error
  }
  return (ssize_t)sent;
}

This pattern is friendly to event loops: you push a bit each wakeup and never monopolize the thread.

Length-prefixed framing: robust, binary-friendly

Length-prefixed messages avoid delimiter corner cases and make partials easy to manage. A minimal pair of helpers:

#include <arpa/inet.h>
#include <stdlib.h>
 
struct msg { const void *data; uint32_t len; };
 
bool send_msg(int fd, const void *data, uint32_t len, struct timespec deadline) {
  uint32_t nlen = htonl(len);
  struct iovec iov[2] = {
    { .iov_base = &nlen, .iov_len = sizeof nlen },
    { .iov_base = (void *)data, .iov_len = len }
  };
  return writev_all_until(fd, iov, 2, deadline);
}
 
// Returns malloc'd buffer on success (caller frees) and sets *out_len; NULL on error/timeout/EOF.
void *recv_msg(int fd, uint32_t *out_len, struct timespec deadline) {
  uint32_t nlen = 0;
  // Read 4-byte header exactly (or return partial count; treat <4 as EOF/timeout)
  struct iovec hiov = { .iov_base = &nlen, .iov_len = sizeof nlen };
  ssize_t hdr = readv_exact_until(fd, &hiov, 1, deadline);
  if (hdr != (ssize_t)sizeof nlen) return NULL;
  uint32_t len = ntohl(nlen);
  void *buf = malloc(len);
  if (!buf) return NULL;
  struct iovec piov = { .iov_base = buf, .iov_len = len };
  ssize_t body = readv_exact_until(fd, &piov, 1, deadline);
  if (body != (ssize_t)len) { free(buf); return NULL; }
  *out_len = len;
  return buf;
}

Notes:

Enforce sane maximums (e.g., reject len > 16 MiB) to avoid memory bombs
Consider a small fixed-size header struct for versioning and checksums

Nonblocking integration sketch (epoll-style)

The exact event loop will vary, but the principles are consistent:

Always drain reads until EAGAIN
For writes, try to flush the queue; if incomplete, enable POLLOUT/EPOLLOUT and resume on notification
Bound per-connection buffers and enforce deadlines

struct bufseg { struct iovec iov[4]; int iovcnt; };
struct conn {
  int fd;
  struct bufseg outq[64];
  int q_head, q_tail; // ring
};
 
static bool conn_flush(struct conn *c) {
  // Try to write from the head segment only to keep fairness
  while (c->q_head != c->q_tail) {
    struct bufseg *s = &c->outq[c->q_head];
    if (s->iovcnt == 0) { c->q_head = (c->q_head + 1) % 64; continue; }
    ssize_t w = writev(c->fd, s->iov, s->iovcnt);
    if (w > 0) { advance_iovecs(&s->iov, &s->iovcnt, (size_t)w); continue; }
    if (w == -1 && errno == EINTR) continue;
    if (w == -1 && (errno == EAGAIN || errno == EWOULDBLOCK)) return false; // need POLLOUT
    return false; // hard error
  }
  return true; // queue empty
}
 
static void conn_on_readable(struct conn *c) {
  char buf[4096];
  for (;;) {
    ssize_t r = read(c->fd, buf, sizeof buf);
    if (r > 0) {
      // process buf[0..r)
      continue;
    }
    if (r == 0) { /* peer closed */ break; }
    if (r == -1 && errno == EINTR) continue;
    if (r == -1 && (errno == EAGAIN || errno == EWOULDBLOCK)) break; // drained
    break; // hard error
  }
}

This sketch purposely omits epoll setup/teardown; its goal is to emphasize the robust read-until-EAGAIN/write-drain-until-EAGAIN pattern and bounded queues.

Production checklist

Set CLOEXEC on all long-lived FDs; audit child processes
Decide blocking vs nonblocking per component; default to nonblocking in servers
Always handle partials and EINTR
Treat EAGAIN as backpressure; never spin—wait for readiness
Bound buffers and enforce deadlines; surface timeouts as errors
Suppress SIGPIPE for socket I/O; expect EPIPE
Log with context: fd, peer address, bytes attempted/achieved, errno
Test with socketpair(), pipe(), and tiny buffers; inject signals; simulate timeouts

Closing thoughts

Robust I/O in C isn’t about cleverness—it’s about discipline. Embrace the kernel’s contract: syscalls may be interrupted, reads and writes may be partial, and nonblocking is a conversation with the scheduler. Wrap these truths in small, boring helpers, add deadlines and backpressure, and your services will trade 3 a.m. incidents for predictable, observable behavior.

Syscalls, File Descriptors, and Robust I/O Patterns in C

The boundary: what a syscall actually is

File descriptors: small integers, big responsibilities

A quick correctness checklist for FDs

Blocking vs nonblocking: what you actually asked the kernel to do

The small print: short reads, short writes, and EOF

Robust I/O loops: minimal, correct, boring (the good kind)

Practical edge cases to cement your intuition

Scatter/gather I/O that actually scales

Advancing iovecs after a partial

Robust writev: full send with retries

Robust readv: fill buffers, honor EOF and timeouts

A small nonblocking pattern: toggling O_NONBLOCK

Signals without surprises: making I/O signal-safe

Race-free timeouts with ppoll/pselect

Cancellation you can reason about: self-pipe/eventfd

Framing reads: stop exactly at a delimiter

Backpressure: when the kernel says “not now”

Length-prefixed framing: robust, binary-friendly

Nonblocking integration sketch (epoll-style)

Production checklist

Closing thoughts

Robust `writev`: full send with retries

Robust `readv`: fill buffers, honor EOF and timeouts

A small nonblocking pattern: toggling `O_NONBLOCK`

Race-free timeouts with `ppoll`/`pselect`