Guardrails
Safety mechanisms that prevent transient provider issues from becoming destructive writes.
Guardrails are CrossWatch safety systems.
They prevent bad snapshots and flaky APIs from causing destructive writes.
The ones you’ll notice most
Auth gating: bad auth skips the pair.
Provider down: writes are skipped.
Drop guard: tiny snapshots don’t trigger mass deletes.
Mass delete protection: large removal waves get blocked.
If CrossWatch “does nothing”
That is often a guardrail doing its job.
Check the run log for events like:
pair:skip(auth)writes:skipped(provider down)snapshot:suspect(drop guard)mass_delete:blocked
If you really want big deletes
Only do this if you have backups and a restore plan.
Then explicitly enable mass deletes and removals in your pair settings.
Related:
Two-way delete propagation: Two-way sync
Delete memory: Tombstones
Guardrails are the orchestrator safety mechanisms. They prevent bad snapshots and flaky writes from becoming bad deletes.
This page documents what is actually wired today. It ignores “future knobs”.
Core modules:
Drop guard:
_snapshots.pyMass delete protection:
_pairs_massdelete.pyTombstones:
_tombstones.pyBlackbox:
_blackbox.pyUnresolved:
_unresolved.pyPhantomGuard:
_phantoms.pyHealth gating:
_pairs.py+ providerhealth()
Overview
Guardrails run before and around writes. They prefer to block or skip instead of guessing.
Key outcomes:
skip pairs when auth is broken
skip writes when a provider is down
suppress deletes when snapshots look wrong
quarantine items that keep failing
Guardrails (what they do)
1) Health gating
Goal: don’t write when auth is broken or a provider is down.
Auth failures
If a provider health says auth_failed:
the orchestrator skips the pair entirely
emits
pair:skip reason=auth_failed
Provider down
If a provider health says down:
one-way: writes to that destination are skipped; items become unresolved
two-way: writes to that side are skipped; observed deletions are disabled
Observed deletions are disabled when a provider is down. “Down” often looks like an empty snapshot. That would otherwise look like “everything was deleted”.
2) Drop guard (suspect snapshot coercion)
Goal: if a snapshot shrinks massively without checkpoint movement, treat it as bad and reuse the baseline.
Code
coerce_suspect_snapshot(...) in _snapshots.py
Triggers only when:
sync.drop_guard == Trueprovider
capabilities.index_semantics == "present"(default)previous baseline is sufficiently large
current snapshot is tiny relative to baseline
checkpoint did not advance
Defaults (runtime):
runtime.suspect_min_prev = 20runtime.suspect_shrink_ratio = 0.10
Effect:
replaces
cur_idxwithprev_idxfor planningemits
snapshot:suspect/snapshot.guard
This prevents a transient 0-item response from turning into “remove 900 items”.
3) Mass delete protection
Goal: block huge removal plans unless explicitly allowed.
Code
_pairs_massdelete.py
Input:
removeslistbaseline_size(destination effective index size)sync.allow_mass_delete(default False)runtime.suspect_shrink_ratio(same ratio, default 0.10)
If mass deletes are not allowed and:
len(removes) > baseline_size * ratiothen:it drops the entire removal list
emits
mass_delete:blocked
This is intentionally blunt. If you really want large deletions, you must opt in.
4) Tombstones (pair-scoped deletion memory)
Goal: avoid ping-pong deletes and re-add loops, especially in two-way.
Code
_tombstones.py
Where stored:
/config/.cw_state/tombstones.json
Two-way loads tombstones scoped to:
feature + pair (
PAIR = "-".join(sorted([A,B])))
Tokens recorded on successful removals:
canonical key (e.g.,
imdb:tt...)all ID tokens in
item["ids"](imdb/tmdb/tvdb/simkl/mal/anilist/etc.)
Key format in file:
{feature}:{PAIR}|{token}→{ "at": epoch, "why": "remove" }
TTL pruning:
sync.tombstone_ttl_days(default 30)older tomb entries are ignored (and can be cleaned later)
How tombstones affect planning:
prevent re-adding an item that was intentionally removed
allow two-way to decide “this is a real delete, propagate it”
5) Observed deletions (two-way only)
Goal: infer “real deletes” from baseline vs live snapshot.
In two-way, observed deletions are:
prev.keys - cur.keys
But they are only trusted when:
not bootstrapping
provider not down
snapshot not suspect (drop guard didn’t trigger)
sync.include_observed_deletes == True
Observed deletions are immediately written into tombstones (with ID tokens) and removed from effective indices so they don’t get re-added.
This is the main delete-propagation mechanism. It is also the biggest “oops” risk. That’s why it is guarded heavily.
6) Blackbox (failure cooldown)
Goal: stop re-attempting flapping items after repeated failures.
Code
_blackbox.py and apply_blocklist usage in _pairs_blocklist.py
Blackbox maintains:
*.flap.jsoncounters (consecutive failures)*.blackbox.jsonblocked keys (cooldown)
Promotion:
after
sync.blackbox.promote_afterconsecutive failures
Applied today:
blocks adds only
and only for non-watchlist features (watchlist uses PhantomGuard instead)
Pruning:
sync.blackbox.cooldown_dayscontrols how long entries stay blocked
Some config flags exist (enabled, block_adds, block_removes) but aren’t fully enforced in wiring.
Treat blackbox as “always active if you call record_attempts / record_success”.
7) Unresolved (don’t hammer what keeps failing)
Goal: when a key repeatedly fails to apply, stop retrying it every run.
Code
_unresolved.py
How it’s recorded:
when apply returns failures, those keys are written to:
*.unresolved.pending.json
How it’s used:
orchestrator loads unresolved keys (depending on file naming availability)
then removes them from planned adds
The loader looks for *.unresolved.json. The recorder writes *.unresolved.pending.json.
If you don’t have a promotion step (pending → active), unresolved blocking is weaker.
Unresolved is still useful. It may need a small wiring pass to fully “bite”.
8) PhantomGuard (watchlist anti-flap)
Goal: stop re-adding watchlist items that “succeed” but don’t stick.
Code
_phantoms.py
PhantomGuard is applied in one-way and two-way for feature == "watchlist" (and partially for ratings).
It tracks:
keys that were attempted as adds
whether they were later observed in the destination snapshot (“actually stuck”)
If an item repeatedly fails to “stick”, it becomes a phantom and further adds are blocked for ttl_days.
Files (scoped):
{feature}.{src}-{dst}.{scope}.phantoms.json{feature}.{src}-{dst}.{scope}.last_success.json
Wiring quirk:
PhantomGuard config is read from root-level
cfg["blackbox"], notsync.blackbox.
So yes: there are two “blackbox-ish” systems:
_blackbox.py(failure cooldown for non-watchlist adds)_phantoms.py(watchlist stickiness guard)
9) Manual policy (human override)
Goal: let the user pin adds or block items regardless of snapshots.
Stored in state.manual.json (merged with state.json at load time).
Manual adds:
merged into the source index before diffing
Manual blocks:
applied to both adds and removes after planning
match by canonical key, id tokens, or title-year token
This is the “I know better, do it anyway / don’t ever sync this title” escape hatch.
Guardrail ordering (why it matters)
In practice, the orchestrator relies on this order to stay safe:
Health gating (skip if auth/down)
Drop guard (fix suspect snapshots before planning)
Index semantic merge (delta providers get baseline merge)
Plan diffs
Mass delete protection (remove big deletions)
Blocklists (tombstones/unresolved/blackbox)
PhantomGuard (watchlist)
Apply with confirmations + unresolved bookkeeping
Persist baselines/checkpoints
If you change ordering, you can accidentally disable a guardrail.
Related pages
Snapshot coercion details: Snapshots
Two-way delete propagation: Two-way sync
Delete memory: Tombstones
Failure suppression: Blackbox, Phantom Guard, Unresolved
Last updated