Python’s memory behavior is predictable when you adopt the right mental model. You don’t need folklore or guesswork—you need a few invariants that always hold, a map of where the GC helps (and where it doesn’t), and habits that keep leaks from sneaking in.
Important mindset:
- Refcounting is the primary lifetime mechanism; the cyclic GC is a safety net for cycles, not the main collector. [R1]
- CPython destroys most objects deterministically when their reference count reaches zero—often immediately. [R2]
- Cycles are real and common in modern code (closures, graphs, tasks); the GC finds and breaks them, but finalizers can complicate the story. [R3]
- Process RSS does not equal “live Python objects.” The small-object allocator (obmalloc) and arenas/pools/blocks create fragmentation and retention. RSS may stay high even when refcounts drop. [R4]
What refcounting actually promises
Every CPython object has a reference count. When it reaches zero, the object’s deallocator (tp_dealloc
) runs and its memory is returned to CPython’s allocator. This is immediate and deterministic in CPython (contrast with other Python implementations). [R1][R2]
When does the count change?
- Name binding and rebinding:
a = obj
increments;a = None
decrements. [R2] - Container membership: inserting into a
list
/dict
/set
increments; removing decrements. [R2] - Attribute assignment:
x.attr = obj
increments; deleting decrements. [R2] - Argument passing and return values: call frames hold temporary references. [R2]
- Temporaries you don’t see: the interpreter and C-API create short-lived references (e.g.,
sys.getrefcount
itself). [R2]
Two useful truths:
- If you can make the last owning reference go away, the object goes away (modulo cycles/finalizers). 2) If RSS does not drop after that, it’s allocator behavior or other owners—not “mystery leaks.” [R4]
A careful look at counts
import sys
obj = object()
print("baseline:", sys.getrefcount(obj)) # +1 from the call site
alias = obj
print("after alias:", sys.getrefcount(obj))
del alias
print("after del alias:", sys.getrefcount(obj))
Caution: sys.getrefcount(x)
reports “real count + 1” because it creates a temporary reference while inspecting. Use it for deltas, not absolutes. [R2]
Why RSS may not go down when refcounts do
CPython’s small-object allocator (obmalloc) manages memory in arenas (~256 KiB) subdivided into pools (size classes) and blocks. Freed blocks often return to the pool/arena rather than free()
ing to the OS. That’s great for latency; it confuses dashboards because RSS can plateau after pressure spikes. Long-lived processes can also fragment arenas across size classes. [R4]
Implication: the right metric for “do we leak Python objects?” is object counts/bytes inside the interpreter (e.g., tracemalloc
, gc.get_objects
, domain counters), not process RSS alone. We’ll use these in later sections. [R5]
Generational GC: a cycle collector, not your primary collector
CPython’s cyclic GC supplements refcounting to reclaim unreachable cycles. It tracks container objects and organizes them into three generations. Collections run more frequently on younger generations; survivors get promoted. Defaults are typically (700, 10, 10)
, meaning: when gen0 allocations minus deallocations exceed 700, collect gen0; after 10 gen0 collections, collect gen1; after 10 gen1, collect gen2. [R1][R6]
Key points you can rely on:
- Only container-like, GC-tracked objects participate (e.g., list, dict, set, many user classes). Pure scalars may be untracked. [R6]
- The collector identifies unreachable groups; if any object in the cycle has a finalizer (
__del__
), older Python used to “leak” them intogc.garbage
. Modern finalization (PEP 442) makes this saner; details still matter. [R7] - You can tune thresholds or temporarily disable GC for latency-sensitive bursts; don’t leave it off. [R6]
Minimal knobs you should know (we’ll benchmark later):
import gc
print("thresholds:", gc.get_threshold()) # e.g., (700, 10, 10)
print("enabled?", gc.isenabled())
gc.disable() # do a latency-critical burst
try:
# perform work that allocates many short-lived objects
pass
finally:
gc.enable()
gc.collect() # explicit sync cycle when safe
Cycles in practice (and why they’re easy to create)
- Object graphs: parents back-reference children, children back-reference parents.
- Closures: inner functions capture outer variables; registry globals keep references to closures; voila, a cycle. [R3]
- Async tasks/futures: task holds callback; callback closes over task or loop. [R3]
- Observers and caches: subscriber lists,
lru_cache
without size limits, or accidental globals.
The point is not to avoid cycles; it’s to make them either a) short-lived and collectible, or b) non-owning using weakref
where appropriate. [R8]
A tiny cycle you can see
import gc
class Node:
def __init__(self, name):
self.name = name
self.peer = None # will point to another Node
def make_cycle():
a = Node("a"); b = Node("b")
a.peer = b; b.peer = a
return a, b
gc.set_debug(gc.DEBUG_SAVEALL)
a, b = make_cycle()
del a, b
unreached = gc.collect()
print("unreachable found:", unreached)
print("garbage list size:", len(gc.garbage))
gc.garbage.clear()
Notes:
- With plain containers and no
__del__
, the GC reclaims the cycle.gc.garbage
is empty. [R6][R7] - Add a
__del__
toNode
and behavior changes; modern CPython (PEP 442) tries to finalize safely, but it’s still a sharp edge. We’ll cover finalization rules later. [R7]
What actually changes a refcount (practical checklist)
You can reason about lifetimes by listing the owners:
- Module globals; caches; singletons
- Containers: lists/sets/dicts/tuples; also composite attrs on instances
- Stack frames: parameters, locals; comprehensions and generators capture variables
- Live tasks/threads: pending callbacks, futures, loggers with closures
If an object isn’t going away, one of these still owns it. Find that owner—don’t guess. We’ll use tracemalloc
, objgraph
, and targeted heap diffs to locate owners later. [R5][R9]
Common growth patterns that aren’t “leaks”
- Free lists and interning: ints, small tuples, and some objects use free lists; memory returns to pools, not OS. [R4]
- Fragmentation after spikes: RSS stays high because arenas remain mapped for reuse. [R4]
- C extensions and buffers: memory may live outside Python’s GC view (NumPy arrays, memory-mapped files). [R10]
A quick sanity loop you can paste today
Use this to assert “no net growth” over a steady window in lower environments. It’s a canary, not a full diagnosis.
import gc, time, tracemalloc
def assert_quiescent_growth(seconds: float = 30.0, max_kib: int = 256):
tracemalloc.start()
gc.collect()
base, _ = tracemalloc.get_traced_memory()
t0 = time.perf_counter()
while time.perf_counter() - t0 < seconds:
time.sleep(1.0)
gc.collect()
cur, peak = tracemalloc.get_traced_memory()
growth_kib = (cur - base) // 1024
tracemalloc.stop()
if growth_kib > max_kib:
raise AssertionError(f"heap grew by {growth_kib} KiB (> {max_kib} KiB)")
# Run during a quiescent phase (no steady new allocations)
# assert_quiescent_growth()
Why this works: tracemalloc
tracks Python-level allocations by traceback; it ignores obmalloc caching costs and native buffers, which is exactly what you want to judge “Python object ownership” first. We’ll refine this into snapshot diffs and owner attribution in later sections. [R5]
Recap you can apply immediately
- Think “owners,” not “mystery leaks.” List owners (globals, containers, frames, tasks); remove the last owner → object is freed (unless in a cycle). [R2][R6]
- Don’t use RSS to prove Python leaks; use interpreter-aware tools (
tracemalloc
,gc
), then explain RSS via allocator behavior. [R4][R5] - Cycles are fine;
__del__
is the sharp edge. Preferweakref
for non-owning relationships and keep finalizers tiny and rare. [R7][R8] - GC is a cycle collector. Refcounting is your primary lifetime tool. Tune GC only with measurements. [R1][R6]
In the next section, we’ll turn these invariants into actionable diagnostics: targeted snapshot diffs, “top growth” reports that point to lines of code, and graph walks that reveal the owner keeping your objects alive—so you can fix leaks without guessing.
Leak diagnostics without guessing: snapshots, diffs, and object graphs
We’ll build a small, repeatable playbook that answers three questions: 1) Did Python-heap bytes grow? 2) Which lines (tracebacks) grew? 3) Which objects and owners are responsible?
Step 1 — Snapshot diffs with tracemalloc (per-line and per-traceback)
Start trace early, filter noise, and compare snapshots around a minimal reproducer. [R5]
import tracemalloc, time
def run_once():
# TODO: call the suspected code path with fixed inputs
payload = [b"x" * 16384 for _ in range(200)] # 3.2 MiB
return payload
tracemalloc.start()
# Warmup to avoid cold-start noise
run_once(); run_once()
snap_a = tracemalloc.take_snapshot()
run_once()
time.sleep(0.1) # let finalizers/GC catch up
snap_b = tracemalloc.take_snapshot()
# Compare by line number
by_line = snap_b.compare_to(snap_a, "lineno")
print("\nTop growth by line:")
for stat in by_line[:10]:
print(stat)
# Compare by traceback (group identical stacks)
by_tb = snap_b.compare_to(snap_a, "traceback")
print("\nTop growth by traceback:")
for stat in by_tb[:5]:
print("\n", "-" * 60)
print(stat.traceback.format())
tracemalloc.stop()
Tips that improve signal:
- Use
Snapshot.filter_traces
to exclude site‑packages or known-noise paths when hunting an app leak. [R11] - Prefer
key_type="traceback"
to group equivalent call stacks; this often surfaces a single hot allocation site. [R12] - Pause background churn: turn off periodic tasks and disable GC briefly around the measured region if you need crisp diffs. [R6]
# Example filter: only keep your project’s sources
snap_b = snap_b.filter_traces((
tracemalloc.Filter(True, "*/your_project/*"),
,))
Step 2 — From lines to owners: which objects are alive and why
Counts and bytes tell you “where.” Now find “who” (types) and “why” (referrers).
Option A: objgraph (quick, visual). [R9]
import gc, objgraph
# After reproducer runs and leak is suspected:
gc.collect()
objgraph.show_growth(limit=15) # which types grew
# Pick a suspicious type
victims = objgraph.by_type("MyWidget")[:3]
objgraph.show_backrefs(victims, max_depth=5, filename="backrefs.png")
Option B: pympler (tabular deltas). [R9]
from pympler import tracker
tr = tracker.SummaryTracker()
# ... run workload round 1 ...
tr.print_diff() # shows growth by type since previous call
# ... run workload round 2 ...
tr.print_diff()
Reading the graphs/diffs:
- Look for unexpected roots: module globals, singletons, caches, registries.
- Inspect container owners: dicts keyed by ids/strings that only ever grow.
- Async leaks: tasks/futures kept alive by callbacks/closures referencing the loop or parent. [R13]
Step 3 — Prove the fix with a tight loop and a ceiling
Turn your reproducer into a guard that fails when growth exceeds a small, agreed envelope. [R5]
import gc, tracemalloc
def assert_no_growth(fn, iters=5, kib=64):
tracemalloc.start()
gc.collect()
base, _ = tracemalloc.get_traced_memory()
for _ in range(iters):
fn()
gc.collect()
cur, _ = tracemalloc.get_traced_memory()
tracemalloc.stop()
grew = (cur - base) // 1024
if grew > kib:
raise AssertionError(f"heap grew {grew} KiB (> {kib} KiB)")
# assert_no_growth(run_once)
Practical patterns that prevent leaks (and how to retrofit)
Prefer weak references for non-owning links
Use weakref.ref
or weakref.WeakValueDictionary
to avoid turning lookups/registries into owners. [R8]
import weakref
class Registry:
def __init__(self) -> None:
self._by_id = weakref.WeakValueDictionary()
def add(self, ident: str, obj: object) -> None:
self._by_id[ident] = obj
def get(self, ident: str) -> object | None:
return self._by_id.get(ident)
Replace __del__
with weakref.finalize
Finalizers in cycles complicate GC. Use weakref.finalize
so cleanups don’t keep objects alive. [R7][R8]
import weakref
class Session:
def __init__(self, resource):
self.resource = resource
self._finalizer = weakref.finalize(self, type(self)._cleanup, resource)
@staticmethod
def _cleanup(res):
res.close()
Bound caches and clear deliberately
Unbounded lru_cache
or dicts are growth by design. Always bound and clear on lifecycle events. [R14]
from functools import lru_cache
@lru_cache(maxsize=1024)
def parse(schema: str) -> tuple: ...
# later, when schemas rotate
parse.cache_clear()
Async hygiene: cancel, await, and drop callbacks
Leaked tasks keep frames alive. Ensure tasks are awaited or cancelled; remove callbacks on shutdown. [R13]
import asyncio
async def worker(q: asyncio.Queue):
try:
while True:
item = await q.get()
q.task_done()
except asyncio.CancelledError:
pass
async def main():
q = asyncio.Queue()
t = asyncio.create_task(worker(q))
# ... use q ...
t.cancel();
try: await t
except asyncio.CancelledError: pass
Sanity checks that catch footguns fast
gc.DEBUG_SAVEALL
during tests to surface uncollectable objects; inspectgc.garbage
. [R6]- Export
sys.getallocatedblocks()
andgc.get_stats()
to see allocator/GC pressure over time. [R15][R16] - Observe OS metrics wisely: prefer USS/PSS where available; RSS alone confounds allocator caching. [R17][R18]
import gc, sys
print("allocated CPython blocks:", getattr(sys, "getallocatedblocks", lambda: -1)())
print("gc stats:", gc.get_stats())
Up next: finalize-safe patterns (weakref.finalize
, context management), container slimming (__slots__
, compact types), and a short field guide to RSS vs Python-heap vs native buffers, so production graphs tell a coherent story.
Make objects smaller: slots, compact containers, and zero-copy bytes
Memory bugs are one side of the coin; the other is footprint discipline. This section distills choices that routinely cut per-object overhead and avoid unnecessary copies in hot paths.
Shrink Python objects with __slots__
(and slot dataclasses)
Instances backed by a dynamic __dict__
carry a hash table and per-key overhead. Slots replace the dict with a fixed layout of descriptors—dramatically smaller and faster to access. [R19]
from dataclasses import dataclass
@dataclass(slots=True) # Python 3.10+
class Point:
x: float
y: float
p = Point(1.0, 2.0)
Guidance:
- Prefer
@dataclass(slots=True)
for plain-data instances. Combine withfrozen=True
for hashability and safer sharing. [R19] - If you need dynamic attributes or multiple inheritance, slots may not fit; consider a shared-key
__dict__
pattern (see below). [R20]
Shared-key dicts: a pragmatic middle ground
CPython uses key‑sharing dictionaries for many instances of the same class: instances share a single keys table and store only values per object. You keep flexibility with less memory than fully separate dicts. This is automatic for normal instance __dict__
on new-style classes. [R20]
Rule of thumb: many instances with the same attributes can be “cheap enough” without slots thanks to key‑sharing; measure before committing to slots where reflection/pickling ergonomics matter.
Prefer tuples and arrays over lists of boxes
- Tuples are smaller than lists for fixed data; they don’t over‑allocate. [R21]
- Numeric blobs belong in dense buffers:
array('I')
,bytearray
,memoryview
overbytes
/bytearray
/mmap
, or NumPy for large vectors. [R22]
from array import array
xs = array('I', range(1_000_000)) # ~4 MB for 1e6 uint32
buf = bytearray(1_000_000)
mv = memoryview(buf)[100:200_100] # slice view, no copy
Zero‑copy with memoryview
; avoid accidental byte copies
Slicing bytes
creates a copy; slicing a memoryview
does not. Many IO and crypto libraries accept objects supporting the buffer protocol—send views instead of cloning payloads. [R22]
data = bytearray(b"\x00" * 4096)
hdr = memoryview(data)[:64] # no copy
body = memoryview(data)[64:] # no copy
# Writing without copies
import struct
struct.pack_into('!I', hdr, 0, 0xDEADBEEF)
Build strings/bytes with builders, not +
in loops
Use io.StringIO
/io.BytesIO
or bytearray.extend
. Repeated concatenation allocates and copies; builders amortize. [R23]
from io import BytesIO
buf = BytesIO()
for chunk in chunks():
buf.write(chunk)
payload = buf.getvalue()
Intern repeated keys to reduce duplicate strings
Repeated, equal strings (e.g., keys, symbols) can be interned to a single object, shrinking memory and speeding dict lookups. Use judiciously at load/parse boundaries. [R24]
import sys
key = sys.intern(raw_key)
Compact structs instead of dicts-of-scalars
When schema is known, pack into a dataclass(slots=True)
, a namedtuple, or a struct
/array
layout. This trades flexibility for large per‑row savings. [R21]
import struct
REC = struct.Struct('!I d H') # id, value, count
buf = bytearray(REC.size * 1000)
for i in range(1000):
REC.pack_into(buf, i * REC.size, i, i * 0.5, i % 65535)
Startup and long‑lived processes: make GC cheaper
Freeze immortal objects at startup
If your service imports lots of code/data once, gc.freeze()
moves long‑lived objects out of the cyclic GC’s tracking, shrinking future collection work. Call it after app initialization. [R25]
import gc
def init_app():
# import modules, build caches, read configs
...
init_app()
gc.freeze() # reduce GC scanning of permanent objects
Tune GC with measurements (and keep it on)
Raising gen0 threshold can reduce GC churn under bursty allocation; lowering it can catch cycles earlier during tests. Always A/B with throughput and tail latency metrics; don’t rely on folklore. [R6]
Field guide: Python heap vs RSS vs native buffers
Your graphs should tell a consistent story. Align metrics and collectors to the memory you care about.
- Python heap (objects): use
tracemalloc
snapshots,objgraph
,pympler
. [R5][R9] - GC pressure:
gc.get_stats()
, collection counts, and pause histograms. [R16] - Allocator view:
sys.getallocatedblocks()
and optionalPYTHONMALLOC=debug
in lower envs. [R15][R26] - Process memory: RSS/USS/PSS via
psutil.Process().memory_full_info()
; prefer USS/PSS to isolate unique footprint. [R17] - Native buffers: NumPy/C/CFFI/mmap allocate outside Python’s small‑object pools—track via library‑specific counters and PSS. [R10][R17]
import psutil, os
p = psutil.Process(os.getpid())
mi = p.memory_full_info()
print({
"rss": mi.rss,
"uss": getattr(mi, "uss", None),
"pss": getattr(mi, "pss", None),
})
Production corollary: a flat tracemalloc
with a rising RSS means allocator retention or native buffers, not a Python object leak. Explain it accordingly. [R4][R17]
Production playbook: find it, fix it, prove it
1) Capture the symptom with the right signals
- Export Python‑heap:
tracemalloc
current/peak, snapshot diffs on demand. [R5] - Track GC: collection counters and pause durations from
gc.get_stats()
. [R16] - Track process memory: USS/PSS in addition to RSS; log allocator block counts. [R15][R17]
2) Isolate a minimal reproducer
- Disable background churn; narrow to one endpoint or job.
- Record deterministic inputs; guard with timeouts.
3) Pinpoint growth
tracemalloc
snapshot → diff (traceback key) → top stacks. [R5][R12]- Confirm the type growth with
objgraph.show_growth()
orpympler.SummaryTracker
. [R9]
4) Prove ownership
- Walk backrefs to a root owner (global/cache/closure/task). [R9]
- If in cycles, inspect for
__del__
; preferweakref.finalize
. [R7][R8]
5) Fix with lifetime and structure changes
- Remove the last owner when work completes; bound caches; cancel/await tasks. [R13][R14]
- Replace owning refs with
weakref
where appropriate. [R8] - Slim heavy objects with slots/compact buffers if footprint is the issue. [R19][R22]
6) Verify “no growth” and regressions
- Add a quiescent leak guard to tests/CI (tunable KiB ceiling). [R5]
- Repeat under load; confirm PSS/USS and GC stats settle.
# ci/leak_guard.py
import gc, tracemalloc
def guard(fn, iters=10, kib=128):
tracemalloc.start()
gc.collect()
base, _ = tracemalloc.get_traced_memory()
for _ in range(iters):
fn()
gc.collect()
cur, _ = tracemalloc.get_traced_memory()
tracemalloc.stop()
grew = (cur - base) // 1024
assert grew <= kib, (base, cur, grew)
7) Roll out safely
- Stage/canary with alerts on heap deltas, USS/PSS drift, and GC stalls.
- Document overload behavior and cleanup paths.
FAQ and footguns
- “RSS won’t drop after fixes.” Allocator retention or native buffers—check PSS/USS and library counters. [R4][R17]
- “
gc.disable()
made it faster; can we keep it off?” Only temporarily around bursts; re‑enable and collect explicitly. [R6] - “A finalizer caused uncollectable objects.” Replace
__del__
withweakref.finalize
or redesign ownership. [R7][R8] - “
lru_cache
keeps growing.” Always setmaxsize
and call.cache_clear()
on lifecycle events. [R14]
Closing thoughts
Treat memory as a contract you can test: clear ownership, bounded caches, explicit cancellation, and repeatable measurements. With refcounting as the primary lifetime tool and the GC as a safety net for cycles, you can keep Python processes small, predictable, and boring—in a good way.
References
- [R1] Python Developer Guide — Memory Management Overview (CPython Devguide): “Memory management”
- [R2] Python Docs —
sys.getrefcount
and reference counting notes: sys module - [R3] Python Docs — Closures and scopes (cycles via cells): “Nested scopes”
- [R4] CPython Small Object Allocator (obmalloc) internals: “Memory management in Python” (pymalloc)
- [R5] Python Docs —
tracemalloc
usage and API: tracemalloc - [R6] Python Docs —
gc
generational collector, thresholds, debug flags: gc - [R7] PEP 442 — Safe object finalization (impact on cycles and
__del__
): PEP 442 - [R8] Python Docs —
weakref
andweakref.finalize
: weakref - [R9] objgraph — Object graph inspection and leak hunting: objgraph docs
- [R10] NumPy memory and
nbytes
/buffer semantics (native allocations): NumPy arrays - [R11] Python Docs —
tracemalloc.Filter
and snapshot filtering: Snapshot filters - [R12] Python Docs — Snapshot
compare_to
andstatistics
by traceback: Snapshot comparison - [R13] Python Docs —
asyncio
tasks, cancellation, and best practices: asyncio tasks - [R14] Python Docs —
functools.lru_cache
and cache management: lru_cache - [R15] Python Docs —
sys.getallocatedblocks
and allocator insight: sys - [R16] Python Docs —
gc.get_stats
output and meaning: gc stats - [R17] psutil — Process memory (RSS/USS/PSS) fields: psutil docs
- [R18] Python Docs —
resource.getrusage
andru_maxrss
semantics: resource - [R19] Python Docs — Dataclasses and
slots=True
: dataclasses - [R20] PEP 412 — Key-sharing dictionaries for memory efficiency: PEP 412
- [R21] Python Docs —
tuple
vslist
basics and memory notes: Built-in types - [R22] Python Docs — Buffer protocol,
memoryview
,array
,struct
: memoryview, array, struct - [R23] Python Docs —
io.StringIO
/io.BytesIO
for builders: io - [R24] Python Docs —
sys.intern
behavior: sys.intern - [R25] Python Docs —
gc.freeze()
andgc.unfreeze()
: gc freeze - [R26] PEP 445 — Exposing a New API for Memory Allocators (
PYTHONMALLOC
): PEP 445