You wrote a tiny micro-optimization, shaved a few cycles, and suddenly a unit test prints 0xDEADBEEF where you expected 42. You flip -O0
and it works; flip -O3
and it breaks. Somewhere, an optimizer just took your code at its word—and your word didn’t mean what you thought.
This post is a practical tour of C’s strict aliasing and effective type rules: what the language guarantees, what compilers assume, and how to write code that’s both fast and correct. We’ll keep it grounded in real examples you can compile, and we’ll separate folklore from rules the compiler actually enforces.
Why you should care (even if “it works on my machine”)
Compilers are allowed to aggressively optimize as long as they preserve the program’s defined behavior. If your code reads memory through a type that the standard says cannot alias the original object’s type, the compiler is free to assume those locations are independent. It can cache values in registers, reorder loads/stores, and eliminate reads it believes cannot change—leading to “heisenbugs” that only show up with optimization or different architectures.
Symptoms you’ll see in the wild:
- “Changing an unrelated variable fixed my bug.”
- “
-O0
works,-O3
misbehaves.” - “Adding
volatile
or printing a value makes it pass.”
Those are not fixes. They’re red flags that your program relied on aliasing that the language forbids.
Effective type: the rule the optimizer uses
At the heart of strict aliasing is the notion of an object’s effective type. Roughly:
- An object’s effective type is the type of the lvalue used to store to it (or its declared type, for objects with a declared type).
- You must access an object through an lvalue of a compatible type, or through one of a few special exceptions (see below).
- If you don’t, behavior is undefined. The optimizer may assume those accesses never affect each other.
What counts as “compatible” depends on the implementation, but you can think “same type, or same type with qualifiers (e.g., const
/volatile
) and signed/unsigned variants where your implementation says they’re compatible.”
The important exceptions you can rely on
- Pointers to
char
,signed char
, orunsigned char
may alias any object type. These are the byte-wise escape hatches for copying/inspecting raw storage. - Access through a member of the same
union
has special latitude in C, but correctness and optimization interactions are subtle in practice (we’ll set the stage here and dig deeper later in the post series). - Some implementations treat certain signed/unsigned variants as compatible for aliasing (e.g.,
int
andunsigned int
), but do not count on cross-size or unrelated-type aliasing.
A small, concrete footgun
#include <stdio.h>
int alias_demo(float *pf) {
// UB: reading the float’s storage through an int* violates strict aliasing
int *pi = (int *)pf; // type-punning via pointer cast (not allowed)
*pf = 1.0f; // write a float
int x = *pi; // read the same bytes as int (undefined behavior)
return x;
}
int main(void) {
float f = 0.0f;
printf("%d\n", alias_demo(&f));
}
At -O0
, you might “get lucky” and observe the byte pattern of 1.0f
printed as an int
. At -O3
, the compiler can keep *pf
in an FP register and assume *pi
cannot observe that write—because an int *
may not alias a float *
. You’ve left the realm of defined behavior.
The safe way to examine bytes
Use a character-type pointer or memcpy
when you mean “treat these bytes as bytes.”
#include <stdint.h>
#include <string.h>
uint32_t bitpattern_of_float(float v) {
uint32_t u = 0;
// Well-defined: memcpy moves bytes; no aliasing assumptions violated
memcpy(&u, &v, sizeof u);
return u;
}
This is not slower in practice: compilers recognize these patterns and emit optimal code (often a single load/store or movdps/movd). The benefit is that you’ve stayed within the language rules, so optimizations remain correct.
“But unions…” — what C actually gives you
C permits writing to one union
member and reading from another. However, two practical cautions keep seasoned systems programmers conservative here:
- Optimizers and alias analysis: while many compilers treat union-based punning as a valid alias, optimizations across translation units or inlining heuristics can still surprise you when the union object’s lifetime and visibility get complex.
- Portability and intent:
memcpy
-based punning communicates your intent unambiguously to both the compiler and readers. Union punning communicates layout intent but can interact awkwardly with effective type when the same storage is later accessed through a non-union pointer.
Rule of thumb used in robust codebases:
- Prefer
memcpy
for punning between unrelated types. - If you use unions for layout or protocol overlays, keep accesses local and avoid mixing with non-union pointers that might re-enter the same storage through a different type.
We’ll return to union subtleties (and how to structure safe overlays) later, but for now, don’t depend on pointer-cast punning; it’s the classic source of alias-driven miscompiles.
How optimizers exploit the rules (on purpose)
Once the compiler believes two pointers cannot alias, it can:
- Reorder independent loads/stores for better scheduling.
- Keep values in registers across stores through unrelated-typed pointers.
- Eliminate redundant loads (“of course this value hasn’t changed”).
That’s great when your code is well-typed. It’s catastrophic when a “clever” cast violates the aliasing contract. The fix isn’t to turn off optimization globally; it’s to make your intent explicit within the rules.
A minimal miscompile you can reason about
int g;
void store_float_through_alias(float *pf, int *pi) {
*pf = 2.0f; // write as float
*pi = 7; // write as int
}
int read_back(float *pf) {
// The compiler thinks *pf and g (int) are independent unless you force aliasing through char*
float x = *pf; // may be hoisted or kept live in an FP reg
(void)g; // unrelated int global
return x == 2.0f; // could be computed before any intervening int stores
}
If a different part of the program smuggles the same storage behind int *
and writes to it, the optimizer is still allowed to assume a float *
view can’t see that write. Your intent (shared storage) and the language model (independent) diverged.
What’s actually allowed to alias what?
Think of aliasing in terms of “access sets.” You may access an object’s storage via:
- The object’s type (and qualifiers) or a type compatible with it
- A character type (
char
,signed char
,unsigned char
) - Sometimes, a different signedness variant when the implementation says they’re compatible
- A
union
member when the storage has union type and the implementation’s rules admit it
Outside those sets, assume “no alias.” If you need aliasing across types, introduce a well-defined bridge: memcpy
, explicit byte views, or a disciplined union
overlay.
Safe byte-wise access patterns you can standardize today
If you need to interpret storage in multiple ways, choose one of these patterns and stick to it across your codebase:
- Use
memcpy
to move raw bits between differently typed objects. - For serialization/deserialization, define packed on-the-wire structs as byte arrays and read/write fields with explicit load/store helpers that assemble bytes safely.
- If a union overlay is unavoidable (e.g., HW registers, protocol discriminated unions), keep access sites narrow and avoid mixing with pointer casts of unrelated types.
Here’s a minimal, high-speed helper for byte views that compilers optimize well:
#include <string.h>
static inline unsigned value_as_u32(const void *p) {
unsigned u;
memcpy(&u, p, sizeof u);
return u;
}
static inline void store_u32_as(void *p, unsigned u) {
memcpy(p, &u, sizeof u);
}
These are easy to audit, friendly to alias analysis, and compile down to the same moves you’d hope for.
A word on “fixes” that aren’t fixes
volatile
is not a concurrency or aliasing tool. It changes optimization around accesses but does not make punning through unrelated types defined.- Sprinkling barriers or inline asm won’t fix aliasing UB.
- Turning off optimization globally (
-O0
) just hides the bug. If you must, use targetedmemcpy
-based bridges or clearly documented union overlays.
Turning aliasing off: -fno-strict-aliasing
(what you gain and what it costs)
When you compile with strict aliasing enabled (the default at -O2
/-O3
in GCC/Clang for C), the optimizer assumes the effective type rules hold. Disabling it with -fno-strict-aliasing
tells the compiler to be conservative: different-typed pointers might alias. That can “fix” code that was relying on undefined aliasing—but it also sacrifices optimizations globally.
Practical guidance:
- Prefer fixing the code (use
memcpy
, correct types, or well-structured overlays) over disabling alias analysis globally. - If you must ship a dependency you cannot modify, isolate the flag to affected translation units only.
- Expect performance regressions in hot loops that previously benefited from non-aliasing assumptions.
Example: compile a single TU without strict aliasing
cc -O3 -fno-strict-aliasing -c legacy_codec.c -o legacy_codec.o
cc -O3 *.o -o app
This narrows the blast radius. Measure before and after; alias-inhibited optimizations can show up as extra loads/stores and reduced vectorization.
Using restrict
to promise non-aliasing (safely)
Where -fno-strict-aliasing
turns optimizations off globally, restrict
turns them on locally by making an explicit promise: for the lifetime of a pointer, the object it points to is only accessed through that pointer (and pointers derived from it). That promise enables stronger scheduling and vectorization.
Minimal example (classic AXPY):
#include <stddef.h>
void saxpy(size_t n, float a,
const float *restrict x,
float *restrict y) {
for (size_t i = 0; i < n; ++i) {
y[i] = a * x[i] + y[i];
}
}
Why this helps:
- With
restrict
, the compiler may assumex
andy
do not overlap, so loads fromx
cannot be invalidated by stores toy
and vice versa. That unlocks vectorized fused loops on many targets.
Important constraints:
- Violating a
restrict
promise (e.g., callingsaxpy(n, a, buf, (float*)buf)
) is undefined behavior. Treatrestrict
as a contract at the API boundary; document it. - Use
restrict
only when you can guarantee non-overlap for the entire lifetime of the pointers inside the function.
memcpy
vs memmove
and restrict
memcpy
’s contract excludes overlap; it can be declared withrestrict
-qualified pointers and optimized aggressively.memmove
must handle overlap; it cannot makerestrict
guarantees, and compilers may generate a safe (potentially slower) path.
If you know your copies never overlap, prefer memcpy
. If overlap is possible, use memmove
and accept the necessary semantics.
A micro-pattern that scales
void add_arrays(size_t n,
const float *restrict a,
const float *restrict b,
float *restrict out) {
for (size_t i = 0; i < n; ++i) {
out[i] = a[i] + b[i];
}
}
Compilers commonly emit wide vector loads/stores here because restrict
rules out overlap between a
, b
, and out
.
Struct layout, padding, and field reordering that actually helps
Aliasing is only half the story; layout matters, too. Field order affects total size, alignment, cache behavior, and false sharing risk. The goal is to reduce padding without breaking external ABIs or accidental coupling.
Before/after: a compact, alignment-friendly layout
#include <stdint.h>
// Before: suboptimal padding
struct Bad {
uint8_t tag; // 1
uint64_t id; // 8 (requires 8-byte alignment)
uint8_t flags; // 1
// padding likely inserted after tag and flags
};
// After: group by alignment width
struct Good {
uint64_t id; // 8 first
uint8_t tag; // 1
uint8_t flags; // 1
// compiler may add minimal tail padding to satisfy struct alignment
};
On typical 64-bit ABIs, sizeof(struct Bad)
may be larger than sizeof(struct Good)
because of inserted padding between tag
and id
, and again before the end to meet alignment. Grouping widest fields first reduces internal holes.
Guidelines that pay off:
- Group fields by decreasing alignment (e.g., 8-byte types, then 4-byte, then 2/1-byte).
- Keep frequently-updated fields together and consider padding to cache-line boundaries to avoid false sharing across threads when necessary.
- Do not reorder externally visible structs used across FFI boundaries or persisted on disk; define explicit wire formats instead.
Packed structs: sharp knives
#pragma pack
or __attribute__((packed))
can trim padding but may generate unaligned accesses that are slower or fault on some architectures. Use with caution and only for on-the-wire representations, with explicit load/store helpers that handle alignment safely.
Verifying and documenting layout
Add compile-time assertions (where available) or runtime checks in tests to lock in expected sizes. Document invariants so future refactors don’t regress layout:
#include <assert.h>
#include <stdio.h>
int main(void) {
printf("Good=%zu\n", sizeof(struct Good));
// assert(sizeof(struct Good) <= 16); // example budget
return 0;
}
Putting it together: fast and defined
- Keep aliasing within the rules: access objects through compatible types or via
char*
/memcpy
bridges. - Use
restrict
to tell the truth about non-overlap in hot paths; treat violations as UB. - Avoid global switches like
-fno-strict-aliasing
unless you can isolate them; measure their cost. - Lay out structs to reduce padding while respecting ABI/wire contracts.
Proving performance without breaking correctness
Optimizations are only wins when they’re measured and safe. A simple workflow:
- Write the defined, alias-safe version first (e.g.,
memcpy
punning,restrict
where guaranteed). - Enable optimizations and vectorization reports; inspect the output.
- Benchmark with realistic data and sizes; compare steady-state and tail latencies.
Helpful compiler flags:
- GCC:
-O3 -fstrict-aliasing -fopt-info-vec-optimized -fopt-info-vec-missed
- Clang:
-O3 -fstrict-aliasing -Rpass=loop-vectorize -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize
Interpretation tips:
- If vectorization is missed due to “possible aliasing,” consider
restrict
on array parameters after verifying non-overlap. - If you see many redundant loads/stores in disassembly, an aliasing assumption may be inhibiting register reuse.
Minimal timing harness sketch:
#include <stdint.h>
#include <stdio.h>
#include <time.h>
static uint64_t now_ns(void) {
struct timespec ts; clock_gettime(CLOCK_MONOTONIC, &ts);
return (uint64_t)ts.tv_sec*1000000000ull + (uint64_t)ts.tv_nsec;
}
void saxpy(size_t n, float a, const float *restrict x, float *restrict y);
int main(void) {
enum { N=1<<20 }; static float x[N], y[N];
for (int i=0;i<N;++i){ x[i]=1.0f; y[i]=2.0f; }
uint64_t t0=now_ns();
for (int r=0;r<200;++r) saxpy(N, 1.5f, x, y);
uint64_t t1=now_ns();
printf("ns/elem=%.2f\n", (double)(t1-t0)/(double)(N*200));
}
Run with and without restrict
to quantify the impact. Always validate results to avoid “fast but wrong.”
Case study: replacing pointer-cast punning with memcpy
Problem: a codec reads a 32-bit tag from a byte buffer and compares against constants. A naive version casts the byte pointer to uint32_t*
and dereferences—inviting both alignment faults and aliasing UB.
Defined, fast version:
#include <stdint.h>
#include <string.h>
static inline uint32_t load_u32_le(const unsigned char *p) {
uint32_t v; memcpy(&v, p, 4); // byte copy (no alias), then endian fix
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
v = ((v>>24)&0xff) | ((v>>8)&0xff00) | ((v<<8)&0xff0000) | ((v<<24)&0xff000000u);
#endif
return v;
}
int is_tag(const unsigned char *p) { return load_u32_le(p) == 0x6D6F6F76u; /* 'voom' */ }
Compilers turn the memcpy
into a single load on platforms that support unaligned accesses, or a couple of byte loads otherwise. The key is that behavior is defined everywhere.
Case study: image blend kernel with restrict
void blend_rgba(size_t n,
const unsigned char *restrict a,
const unsigned char *restrict b,
unsigned char *restrict out) {
for (size_t i = 0; i < 4*n; i += 4) {
unsigned ar = a[i+0], ag = a[i+1], ab = a[i+2], aa = a[i+3];
unsigned br = b[i+0], bg = b[i+1], bb = b[i+2], ba = b[i+3];
unsigned r = (ar*aa + br*(255-aa)) / 255;
unsigned g = (ag*aa + bg*(255-aa)) / 255;
unsigned b2= (ab*aa + bb*(255-aa)) / 255;
unsigned a2= aa + ((255-aa)*ba)/255;
out[i+0] = (unsigned char)r;
out[i+1] = (unsigned char)g;
out[i+2] = (unsigned char)b2;
out[i+3] = (unsigned char)a2;
}
}
With restrict
qualifiers and appropriate flags, compilers can vectorize the loop. Without restrict
, they must assume a
, b
, and out
may overlap and be more conservative.
API design that prevents aliasing bugs
- Prefer typed wrappers over raw
void *
for semantically distinct buffers. - Add
restrict
to function parameters where non-overlap is a precondition; document that contract. - Avoid exposing internal storage that invites re-interpretation; provide getters that copy into caller-provided typed objects via
memcpy
. - For protocol/HW overlays, define explicit marshal/unmarshal helpers instead of sharing in-place struct pointers.
Example API shape:
struct packet { unsigned char bytes[64]; };
struct header { uint16_t kind; uint16_t len; };
void pkt_set_header(struct packet *p, const struct header *h) {
memcpy(p->bytes, h, sizeof *h);
}
void pkt_get_header(const struct packet *p, struct header *h) {
memcpy(h, p->bytes, sizeof *h);
}
This prevents callers from forming unrelated typed pointers into the packet storage.
Tooling notes and gotchas
-Wstrict-aliasing[=2|3]
(GCC) can warn on some risky patterns but is not exhaustive. Treat warnings seriously.- Sanitizers don’t reliably catch strict-aliasing UB. Use differential builds and targeted tests instead.
- Avoid relying on compiler-specific extensions (e.g.,
__may_alias__
) unless you control all toolchains and document the portability cost.
Production checklist (printable)
- Access memory via compatible types or
char*
/memcpy
bridges. - Do not type-pun via pointer casts across unrelated types.
- Use
restrict
only when you can prove non-overlap; document it. - Keep global
-fno-strict-aliasing
off; isolate to legacy TUs if unavoidable. - Reorder struct fields by alignment; guard external layouts with size/assert checks.
- Marshal/unmarshal across boundaries; don’t take typed pointers into raw byte buffers.
- Benchmark and inspect vectorization reports when optimizing hot loops.
Closing thoughts
Strict aliasing is not an academic edge case—it’s a practical contract between you and the optimizer. When you follow the language’s effective type rules, the compiler can be bold on your behalf: fewer loads, wider vectors, tighter schedules. When you violate them, optimizations turn into miscompiles. Keep your intent explicit—through compatible types, byte-wise bridges, and honest restrict
—and you’ll ship code that is both faster and predictably correct.
Safe punning patterns that scale (catalog)
You can write high-performance code without tripping alias rules. Standardize these patterns and you’ll keep both compilers and reviewers happy.
1) Scalar reinterpretation (float ↔ int) via memcpy
#include <stdint.h>
#include <string.h>
static inline uint32_t u32_from_f32(float f) {
uint32_t u; memcpy(&u, &f, sizeof u); return u;
}
static inline float f32_from_u32(uint32_t u) {
float f; memcpy(&f, &u, sizeof f); return f;
}
Compilers recognize these as bit-casts and emit optimal moves. No aliasing UB.
2) Struct-to-bytes and back (protocol boundaries)
#include <stdint.h>
#include <string.h>
struct hdr { uint16_t type; uint16_t len; };
// Serialize to network byte order in a byte buffer
static inline void hdr_serialize(const struct hdr *h, unsigned char *out) {
unsigned char tmp[sizeof *h];
memcpy(tmp, h, sizeof *h); // byte image
// apply endian conversions explicitly if needed
memcpy(out, tmp, sizeof tmp);
}
// Parse from bytes into a properly typed object
static inline void hdr_parse(const unsigned char *in, struct hdr *out) {
memcpy(out, in, sizeof *out);
// apply endian conversions explicitly if needed
}
Do not create a struct hdr *
that points into an arbitrary byte buffer; copy in/out with memcpy
to establish a well-typed object.
3) Tagged unions (intent + safety)
enum kind { K_I32, K_F32 };
struct value {
enum kind k;
union { int i32; float f32; } u;
};
int to_int(struct value v) {
return v.k == K_I32 ? v.u.i32 : (int)v.u.f32; // well-defined: same union object
}
Keep union access localized and guarded by a tag. Avoid taking unrelated-typed pointers to the same storage elsewhere.
4) Byte views for inspection only
static inline const unsigned char *bytes_of(const void *p) {
return (const unsigned char *)p; // ok: char-types may alias anything
}
Use for hashing/checksums or debug prints; don’t write through this view unless you intend raw-byte semantics.
Effective type lifecycle, provenance, and malloc’d storage
Storage obtained from malloc
(or as an uninitialized array) doesn’t have an effective type until you store to it through a typed lvalue. That first store sets the effective type for subsequent accesses.
Consequences:
- Writing bytes into
malloc
’d storage viamemcpy
from an object of typeT
and then reading viaT*
is well-defined. - Reading via
U*
(unrelated type) is not, unless you treat the storage as bytes (unsigned char*
). - Re-using the same raw storage for a different unrelated type later requires re-establishing the new type via assignment/memcpy before typed reads.
Minimal example:
#include <stdlib.h>
#include <string.h>
struct A { int x; };
struct B { float y; };
void demo(void) {
void *p = malloc(sizeof(struct A));
struct A a = {42};
memcpy(p, &a, sizeof a); // establish A in storage
int x = ((struct A *)p)->x; // OK
struct B b = {1.0f};
memcpy(p, &b, sizeof b); // now establish B
float y = ((struct B *)p)->y; // OK
free(p);
}
Avoid reading through an unrelated-typed pointer when the storage’s last effective type is different.
Alignment and endianness: silent footguns
Aliasing rules don’t save you from alignment and byte order issues—those are separate constraints.
- Alignment: On some architectures, misaligned loads fault. Even where they don’t, they can be slower. Don’t invent a
T*
to an unaligned byte address; usememcpy
to move into a properly alignedT
. - Endianness: A byte image is not a value. For protocols/files, define explicit load/store helpers that marshal fields with
htobe16
/be32toh
(or manual shifts) so the typed object in memory is native-endian.
Example helper for a 32-bit big-endian field:
static inline uint32_t load_be32(const unsigned char *p) {
return ((uint32_t)p[0] << 24) | ((uint32_t)p[1] << 16)
| ((uint32_t)p[2] << 8) | ((uint32_t)p[3]);
}
static inline void store_be32(unsigned char *p, uint32_t v) {
p[0] = (unsigned char)(v >> 24);
p[1] = (unsigned char)(v >> 16);
p[2] = (unsigned char)(v >> 8);
p[3] = (unsigned char)(v);
}
Hardware registers and packed overlays
Memory-mapped I/O often demands byte-precise layouts. Practical rules:
- Use fixed-width integer types and explicit masks/shifts; avoid relying on compiler bitfield layout, which is implementation-defined.
- Keep the MMIO region accesses
volatile
, but do not conflatevolatile
with aliasing; it doesn’t permit pointer-cast punning between unrelated types. - For packed on-the-wire structs, prefer marshaling helpers and
memcpy
into aligned temporaries before typed access.
Bitfield alternative for clarity (still mindful of layout):
static inline unsigned get_bits_u32(uint32_t v, unsigned lo, unsigned hi) {
unsigned w = hi - lo + 1U;
return (v >> lo) & ((1U << w) - 1U);
}
Diagnosing aliasing bugs in practice
There isn’t a runtime sanitizer for strict-aliasing UB, but you can triangulate:
- Differential builds: compare behavior under
-O0
,-O3
, and-O3 -fno-strict-aliasing
. Divergence flags suspicious code paths. - Warnings: enable
-Wstrict-aliasing
(GCC) and high warning levels; treat new warnings as suspect hotspots. Clang is conservative here, but extra warnings still help. - Code search: hunt for casts between unrelated pointer types; replace with
memcpy
or unify types. - Tests: construct minimal repros that alternate writes via type A and reads via type B; assert invariants under
-O3
.
Harness template you can adapt:
#include <assert.h>
#include <stdint.h>
#include <string.h>
static uint32_t pun_float(float f) { uint32_t u; memcpy(&u, &f, 4); return u; }
int main(void) {
float f = 1.0f;
uint32_t u = pun_float(f);
(void)u; // place breakpoints/print as needed
assert(pun_float(1.0f) == pun_float(1.0f));
}
A short refactor checklist
- Remove pointer-cast punning between unrelated types; insert
memcpy
bridges. - Add
restrict
to hot loops where you can guarantee non-overlap; document the contract. - Reorder struct fields by alignment; guard public layouts with
_Static_assert(sizeof(struct X) == N)
where applicable. - Isolate any unavoidable legacy aliasing behind small, audited functions; if needed, compile those TUs with
-fno-strict-aliasing
and measure.