Memory: Refcounting, Generational GC, and Finding Leaks Without Guessing

Python’s memory behavior is predictable when you adopt the right mental model. You don’t need folklore or guesswork—you need a few invariants that always hold, a map of where the GC helps (and where it doesn’t), and habits that keep leaks from sneaking in.

Important mindset:

Refcounting is the primary lifetime mechanism; the cyclic GC is a safety net for cycles, not the main collector. [R1]
CPython destroys most objects deterministically when their reference count reaches zero—often immediately. [R2]
Cycles are real and common in modern code (closures, graphs, tasks); the GC finds and breaks them, but finalizers can complicate the story. [R3]
Process RSS does not equal “live Python objects.” The small-object allocator (obmalloc) and arenas/pools/blocks create fragmentation and retention. RSS may stay high even when refcounts drop. [R4]

What refcounting actually promises

Every CPython object has a reference count. When it reaches zero, the object’s deallocator (tp_dealloc) runs and its memory is returned to CPython’s allocator. This is immediate and deterministic in CPython (contrast with other Python implementations). [R1][R2]

When does the count change?

Name binding and rebinding: a = obj increments; a = None decrements. [R2]
Container membership: inserting into a list/dict/set increments; removing decrements. [R2]
Attribute assignment: x.attr = obj increments; deleting decrements. [R2]
Argument passing and return values: call frames hold temporary references. [R2]
Temporaries you don’t see: the interpreter and C-API create short-lived references (e.g., sys.getrefcount itself). [R2]

Two useful truths:

If you can make the last owning reference go away, the object goes away (modulo cycles/finalizers). 2) If RSS does not drop after that, it’s allocator behavior or other owners—not “mystery leaks.” [R4]

A careful look at counts

import sys
 
obj = object()
print("baseline:", sys.getrefcount(obj))  # +1 from the call site
 
alias = obj
print("after alias:", sys.getrefcount(obj))
 
del alias
print("after del alias:", sys.getrefcount(obj))

Caution: sys.getrefcount(x) reports “real count + 1” because it creates a temporary reference while inspecting. Use it for deltas, not absolutes. [R2]

Why RSS may not go down when refcounts do

CPython’s small-object allocator (obmalloc) manages memory in arenas (~256 KiB) subdivided into pools (size classes) and blocks. Freed blocks often return to the pool/arena rather than free()ing to the OS. That’s great for latency; it confuses dashboards because RSS can plateau after pressure spikes. Long-lived processes can also fragment arenas across size classes. [R4]

Implication: the right metric for “do we leak Python objects?” is object counts/bytes inside the interpreter (e.g., tracemalloc, gc.get_objects, domain counters), not process RSS alone. We’ll use these in later sections. [R5]

flowchart LR subgraph OS_Heap[OS Heap] A1[Arena 0 ~256KiB]:::arena A2[Arena 1 ~256KiB]:::arena end subgraph Obmalloc[CPython small-object allocator] P1[Pool size=64B]:::pool --> B1[Blocks in use]:::blk P2[Pool size=512B]:::pool --> B2[Freed blocks held]:::blk end A1 -.mapped.-> Obmalloc A2 -.mapped.-> Obmalloc classDef arena fill:#e1f5fe,stroke:#90caf9; classDef pool fill:#e8f5e9,stroke:#66bb6a; classDef blk fill:#fff3e0,stroke:#ffb74d; note right of P2: Freed blocks may stay cached in pools;\nRSS can remain high despite object frees.

Generational GC: a cycle collector, not your primary collector

CPython’s cyclic GC supplements refcounting to reclaim unreachable cycles. It tracks container objects and organizes them into three generations. Collections run more frequently on younger generations; survivors get promoted. Defaults are typically (700, 10, 10), meaning: when gen0 allocations minus deallocations exceed 700, collect gen0; after 10 gen0 collections, collect gen1; after 10 gen1, collect gen2. [R1][R6]

Key points you can rely on:

Only container-like, GC-tracked objects participate (e.g., list, dict, set, many user classes). Pure scalars may be untracked. [R6]
The collector identifies unreachable groups; if any object in the cycle has a finalizer (__del__), older Python used to “leak” them into gc.garbage. Modern finalization (PEP 442) makes this saner; details still matter. [R7]
You can tune thresholds or temporarily disable GC for latency-sensitive bursts; don’t leave it off. [R6]

sequenceDiagram participant App as Your Code participant Gen0 as Gen 0 (young) participant Gen1 as Gen 1 participant Gen2 as Gen 2 (old) App->>Gen0: allocate container objects Note over Gen0: Threshold exceeded → collect Gen0-->>Gen0: Mark reachable / find cycles Gen0->>Gen1: Promote survivors Note over Gen1: After N young GCs → collect gen1 Gen1->>Gen2: Promote long-lived survivors Note over Gen2: Rare full collection

Minimal knobs you should know (we’ll benchmark later):

import gc
 
print("thresholds:", gc.get_threshold())   # e.g., (700, 10, 10)
print("enabled?", gc.isenabled())
 
gc.disable()   # do a latency-critical burst
try:
    # perform work that allocates many short-lived objects
    pass
finally:
    gc.enable()
    gc.collect()  # explicit sync cycle when safe

Cycles in practice (and why they’re easy to create)

Object graphs: parents back-reference children, children back-reference parents.
Closures: inner functions capture outer variables; registry globals keep references to closures; voila, a cycle. [R3]
Async tasks/futures: task holds callback; callback closes over task or loop. [R3]
Observers and caches: subscriber lists, lru_cache without size limits, or accidental globals.

The point is not to avoid cycles; it’s to make them either a) short-lived and collectible, or b) non-owning using weakref where appropriate. [R8]

A tiny cycle you can see

import gc
 
class Node:
    def __init__(self, name):
        self.name = name
        self.peer = None  # will point to another Node
 
def make_cycle():
    a = Node("a"); b = Node("b")
    a.peer = b; b.peer = a
    return a, b
 
gc.set_debug(gc.DEBUG_SAVEALL)
a, b = make_cycle()
del a, b
unreached = gc.collect()
print("unreachable found:", unreached)
print("garbage list size:", len(gc.garbage))
gc.garbage.clear()

Notes:

With plain containers and no __del__, the GC reclaims the cycle. gc.garbage is empty. [R6][R7]
Add a __del__ to Node and behavior changes; modern CPython (PEP 442) tries to finalize safely, but it’s still a sharp edge. We’ll cover finalization rules later. [R7]

What actually changes a refcount (practical checklist)

You can reason about lifetimes by listing the owners:

Module globals; caches; singletons
Containers: lists/sets/dicts/tuples; also composite attrs on instances
Stack frames: parameters, locals; comprehensions and generators capture variables
Live tasks/threads: pending callbacks, futures, loggers with closures

If an object isn’t going away, one of these still owns it. Find that owner—don’t guess. We’ll use tracemalloc, objgraph, and targeted heap diffs to locate owners later. [R5][R9]

Common growth patterns that aren’t “leaks”

Free lists and interning: ints, small tuples, and some objects use free lists; memory returns to pools, not OS. [R4]
Fragmentation after spikes: RSS stays high because arenas remain mapped for reuse. [R4]
C extensions and buffers: memory may live outside Python’s GC view (NumPy arrays, memory-mapped files). [R10]

A quick sanity loop you can paste today

Use this to assert “no net growth” over a steady window in lower environments. It’s a canary, not a full diagnosis.

import gc, time, tracemalloc
 
def assert_quiescent_growth(seconds: float = 30.0, max_kib: int = 256):
    tracemalloc.start()
    gc.collect()
    base, _ = tracemalloc.get_traced_memory()
    t0 = time.perf_counter()
    while time.perf_counter() - t0 < seconds:
        time.sleep(1.0)
    gc.collect()
    cur, peak = tracemalloc.get_traced_memory()
    growth_kib = (cur - base) // 1024
    tracemalloc.stop()
    if growth_kib > max_kib:
        raise AssertionError(f"heap grew by {growth_kib} KiB (> {max_kib} KiB)")
 
# Run during a quiescent phase (no steady new allocations)
# assert_quiescent_growth()

Why this works: tracemalloc tracks Python-level allocations by traceback; it ignores obmalloc caching costs and native buffers, which is exactly what you want to judge “Python object ownership” first. We’ll refine this into snapshot diffs and owner attribution in later sections. [R5]

Recap you can apply immediately

Think “owners,” not “mystery leaks.” List owners (globals, containers, frames, tasks); remove the last owner → object is freed (unless in a cycle). [R2][R6]
Don’t use RSS to prove Python leaks; use interpreter-aware tools (tracemalloc, gc), then explain RSS via allocator behavior. [R4][R5]
Cycles are fine; __del__ is the sharp edge. Prefer weakref for non-owning relationships and keep finalizers tiny and rare. [R7][R8]
GC is a cycle collector. Refcounting is your primary lifetime tool. Tune GC only with measurements. [R1][R6]

In the next section, we’ll turn these invariants into actionable diagnostics: targeted snapshot diffs, “top growth” reports that point to lines of code, and graph walks that reveal the owner keeping your objects alive—so you can fix leaks without guessing.

Leak diagnostics without guessing: snapshots, diffs, and object graphs

We’ll build a small, repeatable playbook that answers three questions: 1) Did Python-heap bytes grow? 2) Which lines (tracebacks) grew? 3) Which objects and owners are responsible?

Step 1 — Snapshot diffs with tracemalloc (per-line and per-traceback)

Start trace early, filter noise, and compare snapshots around a minimal reproducer. [R5]

import tracemalloc, time
 
def run_once():
    # TODO: call the suspected code path with fixed inputs
    payload = [b"x" * 16384 for _ in range(200)]  # 3.2 MiB
    return payload
 
tracemalloc.start()
 
# Warmup to avoid cold-start noise
run_once(); run_once()
 
snap_a = tracemalloc.take_snapshot()
run_once()
time.sleep(0.1)  # let finalizers/GC catch up
snap_b = tracemalloc.take_snapshot()
 
# Compare by line number
by_line = snap_b.compare_to(snap_a, "lineno")
print("\nTop growth by line:")
for stat in by_line[:10]:
    print(stat)
 
# Compare by traceback (group identical stacks)
by_tb = snap_b.compare_to(snap_a, "traceback")
print("\nTop growth by traceback:")
for stat in by_tb[:5]:
    print("\n", "-" * 60)
    print(stat.traceback.format())
 
tracemalloc.stop()

Tips that improve signal:

Use Snapshot.filter_traces to exclude site‑packages or known-noise paths when hunting an app leak. [R11]
Prefer key_type="traceback" to group equivalent call stacks; this often surfaces a single hot allocation site. [R12]
Pause background churn: turn off periodic tasks and disable GC briefly around the measured region if you need crisp diffs. [R6]

# Example filter: only keep your project’s sources
snap_b = snap_b.filter_traces((
    tracemalloc.Filter(True, "*/your_project/*"),
,))

Step 2 — From lines to owners: which objects are alive and why

Counts and bytes tell you “where.” Now find “who” (types) and “why” (referrers).

Option A: objgraph (quick, visual). [R9]

import gc, objgraph
 
# After reproducer runs and leak is suspected:
gc.collect()
objgraph.show_growth(limit=15)                  # which types grew
 
# Pick a suspicious type
victims = objgraph.by_type("MyWidget")[:3]
objgraph.show_backrefs(victims, max_depth=5, filename="backrefs.png")

Option B: pympler (tabular deltas). [R9]

from pympler import tracker
 
tr = tracker.SummaryTracker()
# ... run workload round 1 ...
tr.print_diff()   # shows growth by type since previous call
# ... run workload round 2 ...
tr.print_diff()

Reading the graphs/diffs:

Look for unexpected roots: module globals, singletons, caches, registries.
Inspect container owners: dicts keyed by ids/strings that only ever grow.
Async leaks: tasks/futures kept alive by callbacks/closures referencing the loop or parent. [R13]

flowchart TD A[Hot allocation site] --> B[Specific type grows] B --> C{Who still references it?} C -->|Globals| G[Module-level cache] C -->|Container| D[Dict/Set/List owner] C -->|Closure| E[Callback captures object] C -->|Async| F[Pending Task/Future] D --> H[Fix: release on completion / bound size] E --> I[Fix: use weakref / decouple lifecycle] F --> J[Fix: cancel/await; clear callbacks]

Step 3 — Prove the fix with a tight loop and a ceiling

Turn your reproducer into a guard that fails when growth exceeds a small, agreed envelope. [R5]

import gc, tracemalloc
 
def assert_no_growth(fn, iters=5, kib=64):
    tracemalloc.start()
    gc.collect()
    base, _ = tracemalloc.get_traced_memory()
    for _ in range(iters):
        fn()
    gc.collect()
    cur, _ = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    grew = (cur - base) // 1024
    if grew > kib:
        raise AssertionError(f"heap grew {grew} KiB (> {kib} KiB)")
 
# assert_no_growth(run_once)

Practical patterns that prevent leaks (and how to retrofit)

Prefer weak references for non-owning links

Use weakref.ref or weakref.WeakValueDictionary to avoid turning lookups/registries into owners. [R8]

import weakref
 
class Registry:
    def __init__(self) -> None:
        self._by_id = weakref.WeakValueDictionary()
    def add(self, ident: str, obj: object) -> None:
        self._by_id[ident] = obj
    def get(self, ident: str) -> object | None:
        return self._by_id.get(ident)

Replace `del` with `weakref.finalize`

Finalizers in cycles complicate GC. Use weakref.finalize so cleanups don’t keep objects alive. [R7][R8]

import weakref
 
class Session:
    def __init__(self, resource):
        self.resource = resource
        self._finalizer = weakref.finalize(self, type(self)._cleanup, resource)
    @staticmethod
    def _cleanup(res):
        res.close()

Bound caches and clear deliberately

Unbounded lru_cache or dicts are growth by design. Always bound and clear on lifecycle events. [R14]

from functools import lru_cache
 
@lru_cache(maxsize=1024)
def parse(schema: str) -> tuple: ...
 
# later, when schemas rotate
parse.cache_clear()

Async hygiene: cancel, await, and drop callbacks

Leaked tasks keep frames alive. Ensure tasks are awaited or cancelled; remove callbacks on shutdown. [R13]

import asyncio
 
async def worker(q: asyncio.Queue):
    try:
        while True:
            item = await q.get()
            q.task_done()
    except asyncio.CancelledError:
        pass
 
async def main():
    q = asyncio.Queue()
    t = asyncio.create_task(worker(q))
    # ... use q ...
    t.cancel();
    try: await t
    except asyncio.CancelledError: pass

Sanity checks that catch footguns fast

gc.DEBUG_SAVEALL during tests to surface uncollectable objects; inspect gc.garbage. [R6]
Export sys.getallocatedblocks() and gc.get_stats() to see allocator/GC pressure over time. [R15][R16]
Observe OS metrics wisely: prefer USS/PSS where available; RSS alone confounds allocator caching. [R17][R18]

import gc, sys
print("allocated CPython blocks:", getattr(sys, "getallocatedblocks", lambda: -1)())
print("gc stats:", gc.get_stats())

Up next: finalize-safe patterns (weakref.finalize, context management), container slimming (__slots__, compact types), and a short field guide to RSS vs Python-heap vs native buffers, so production graphs tell a coherent story.

Make objects smaller: slots, compact containers, and zero-copy bytes

Memory bugs are one side of the coin; the other is footprint discipline. This section distills choices that routinely cut per-object overhead and avoid unnecessary copies in hot paths.

Shrink Python objects with `slots` (and slot dataclasses)

Instances backed by a dynamic __dict__ carry a hash table and per-key overhead. Slots replace the dict with a fixed layout of descriptors—dramatically smaller and faster to access. [R19]

from dataclasses import dataclass
 
@dataclass(slots=True)  # Python 3.10+
class Point:
    x: float
    y: float
 
p = Point(1.0, 2.0)

Guidance:

Prefer @dataclass(slots=True) for plain-data instances. Combine with frozen=True for hashability and safer sharing. [R19]
If you need dynamic attributes or multiple inheritance, slots may not fit; consider a shared-key __dict__ pattern (see below). [R20]

Shared-key dicts: a pragmatic middle ground

CPython uses key‑sharing dictionaries for many instances of the same class: instances share a single keys table and store only values per object. You keep flexibility with less memory than fully separate dicts. This is automatic for normal instance __dict__ on new-style classes. [R20]

Rule of thumb: many instances with the same attributes can be “cheap enough” without slots thanks to key‑sharing; measure before committing to slots where reflection/pickling ergonomics matter.

Prefer tuples and arrays over lists of boxes

Tuples are smaller than lists for fixed data; they don’t over‑allocate. [R21]
Numeric blobs belong in dense buffers: array('I'), bytearray, memoryview over bytes/bytearray/mmap, or NumPy for large vectors. [R22]

from array import array
xs = array('I', range(1_000_000))          # ~4 MB for 1e6 uint32
buf = bytearray(1_000_000)
mv = memoryview(buf)[100:200_100]          # slice view, no copy

Zero‑copy with `memoryview`; avoid accidental byte copies

Slicing bytes creates a copy; slicing a memoryview does not. Many IO and crypto libraries accept objects supporting the buffer protocol—send views instead of cloning payloads. [R22]

data = bytearray(b"\x00" * 4096)
hdr = memoryview(data)[:64]   # no copy
body = memoryview(data)[64:]  # no copy
 
# Writing without copies
import struct
struct.pack_into('!I', hdr, 0, 0xDEADBEEF)

Build strings/bytes with builders, not `+` in loops

Use io.StringIO/io.BytesIO or bytearray.extend. Repeated concatenation allocates and copies; builders amortize. [R23]

from io import BytesIO
buf = BytesIO()
for chunk in chunks():
    buf.write(chunk)
payload = buf.getvalue()

Intern repeated keys to reduce duplicate strings

Repeated, equal strings (e.g., keys, symbols) can be interned to a single object, shrinking memory and speeding dict lookups. Use judiciously at load/parse boundaries. [R24]

import sys
key = sys.intern(raw_key)

Compact structs instead of dicts-of-scalars

When schema is known, pack into a dataclass(slots=True), a namedtuple, or a struct/array layout. This trades flexibility for large per‑row savings. [R21]

import struct
 
REC = struct.Struct('!I d H')  # id, value, count
buf = bytearray(REC.size * 1000)
for i in range(1000):
    REC.pack_into(buf, i * REC.size, i, i * 0.5, i % 65535)

Startup and long‑lived processes: make GC cheaper

Freeze immortal objects at startup

If your service imports lots of code/data once, gc.freeze() moves long‑lived objects out of the cyclic GC’s tracking, shrinking future collection work. Call it after app initialization. [R25]

import gc
 
def init_app():
    # import modules, build caches, read configs
    ...
 
init_app()
gc.freeze()   # reduce GC scanning of permanent objects

Tune GC with measurements (and keep it on)

Raising gen0 threshold can reduce GC churn under bursty allocation; lowering it can catch cycles earlier during tests. Always A/B with throughput and tail latency metrics; don’t rely on folklore. [R6]

Field guide: Python heap vs RSS vs native buffers

Your graphs should tell a consistent story. Align metrics and collectors to the memory you care about.

Python heap (objects): use tracemalloc snapshots, objgraph, pympler. [R5][R9]
GC pressure: gc.get_stats(), collection counts, and pause histograms. [R16]
Allocator view: sys.getallocatedblocks() and optional PYTHONMALLOC=debug in lower envs. [R15][R26]
Process memory: RSS/USS/PSS via psutil.Process().memory_full_info(); prefer USS/PSS to isolate unique footprint. [R17]
Native buffers: NumPy/C/CFFI/mmap allocate outside Python’s small‑object pools—track via library‑specific counters and PSS. [R10][R17]

import psutil, os
p = psutil.Process(os.getpid())
mi = p.memory_full_info()
print({
  "rss": mi.rss,
  "uss": getattr(mi, "uss", None),
  "pss": getattr(mi, "pss", None),
})

Production corollary: a flat tracemalloc with a rising RSS means allocator retention or native buffers, not a Python object leak. Explain it accordingly. [R4][R17]

Production playbook: find it, fix it, prove it

1) Capture the symptom with the right signals

Export Python‑heap: tracemalloc current/peak, snapshot diffs on demand. [R5]
Track GC: collection counters and pause durations from gc.get_stats(). [R16]
Track process memory: USS/PSS in addition to RSS; log allocator block counts. [R15][R17]

2) Isolate a minimal reproducer

Disable background churn; narrow to one endpoint or job.
Record deterministic inputs; guard with timeouts.

3) Pinpoint growth

tracemalloc snapshot → diff (traceback key) → top stacks. [R5][R12]
Confirm the type growth with objgraph.show_growth() or pympler.SummaryTracker. [R9]

4) Prove ownership

Walk backrefs to a root owner (global/cache/closure/task). [R9]
If in cycles, inspect for __del__; prefer weakref.finalize. [R7][R8]

5) Fix with lifetime and structure changes

Remove the last owner when work completes; bound caches; cancel/await tasks. [R13][R14]
Replace owning refs with weakref where appropriate. [R8]
Slim heavy objects with slots/compact buffers if footprint is the issue. [R19][R22]

6) Verify “no growth” and regressions

Add a quiescent leak guard to tests/CI (tunable KiB ceiling). [R5]
Repeat under load; confirm PSS/USS and GC stats settle.

# ci/leak_guard.py
import gc, tracemalloc
 
def guard(fn, iters=10, kib=128):
    tracemalloc.start()
    gc.collect()
    base, _ = tracemalloc.get_traced_memory()
    for _ in range(iters):
        fn()
    gc.collect()
    cur, _ = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    grew = (cur - base) // 1024
    assert grew <= kib, (base, cur, grew)

7) Roll out safely

Stage/canary with alerts on heap deltas, USS/PSS drift, and GC stalls.
Document overload behavior and cleanup paths.

FAQ and footguns

“RSS won’t drop after fixes.” Allocator retention or native buffers—check PSS/USS and library counters. [R4][R17]
“gc.disable() made it faster; can we keep it off?” Only temporarily around bursts; re‑enable and collect explicitly. [R6]
“A finalizer caused uncollectable objects.” Replace __del__ with weakref.finalize or redesign ownership. [R7][R8]
“lru_cache keeps growing.” Always set maxsize and call .cache_clear() on lifecycle events. [R14]

Closing thoughts

Treat memory as a contract you can test: clear ownership, bounded caches, explicit cancellation, and repeatable measurements. With refcounting as the primary lifetime tool and the GC as a safety net for cycles, you can keep Python processes small, predictable, and boring—in a good way.

References

[R1] Python Developer Guide — Memory Management Overview (CPython Devguide): “Memory management”
[R2] Python Docs — sys.getrefcount and reference counting notes: sys module
[R3] Python Docs — Closures and scopes (cycles via cells): “Nested scopes”
[R4] CPython Small Object Allocator (obmalloc) internals: “Memory management in Python” (pymalloc)
[R5] Python Docs — tracemalloc usage and API: tracemalloc
[R6] Python Docs — gc generational collector, thresholds, debug flags: gc
[R7] PEP 442 — Safe object finalization (impact on cycles and __del__): PEP 442
[R8] Python Docs — weakref and weakref.finalize: weakref
[R9] objgraph — Object graph inspection and leak hunting: objgraph docs
[R10] NumPy memory and nbytes/buffer semantics (native allocations): NumPy arrays
[R11] Python Docs — tracemalloc.Filter and snapshot filtering: Snapshot filters
[R12] Python Docs — Snapshot compare_to and statistics by traceback: Snapshot comparison
[R13] Python Docs — asyncio tasks, cancellation, and best practices: asyncio tasks
[R14] Python Docs — functools.lru_cache and cache management: lru_cache
[R15] Python Docs — sys.getallocatedblocks and allocator insight: sys
[R16] Python Docs — gc.get_stats output and meaning: gc stats
[R17] psutil — Process memory (RSS/USS/PSS) fields: psutil docs
[R18] Python Docs — resource.getrusage and ru_maxrss semantics: resource
[R19] Python Docs — Dataclasses and slots=True: dataclasses
[R20] PEP 412 — Key-sharing dictionaries for memory efficiency: PEP 412
[R21] Python Docs — tuple vs list basics and memory notes: Built-in types
[R22] Python Docs — Buffer protocol, memoryview, array, struct: memoryview, array, struct
[R23] Python Docs — io.StringIO/io.BytesIO for builders: io
[R24] Python Docs — sys.intern behavior: sys.intern
[R25] Python Docs — gc.freeze() and gc.unfreeze(): gc freeze
[R26] PEP 445 — Exposing a New API for Memory Allocators (PYTHONMALLOC): PEP 445