Guardrails

Safety mechanisms that prevent transient provider issues from becoming destructive writes.

Guardrails are CrossWatch safety systems.

They prevent bad snapshots and flaky APIs from causing destructive writes.

The ones you’ll notice most

Auth gating: bad auth skips the pair.
Provider down: writes are skipped.
Drop guard: tiny snapshots don’t trigger mass deletes.
Mass delete protection: large removal waves get blocked.

If CrossWatch “does nothing”

That is often a guardrail doing its job.

Check the run log for events like:

pair:skip (auth)
writes:skipped (provider down)
snapshot:suspect (drop guard)
mass_delete:blocked

If you really want big deletes

Only do this if you have backups and a restore plan.

Then explicitly enable mass deletes and removals in your pair settings.

Two-way delete propagation: Two-way sync
Delete memory: Tombstones

Guardrails are the orchestrator safety mechanisms. They prevent bad snapshots and flaky writes from becoming bad deletes.

This page documents what is actually wired today. It ignores “future knobs”.

Core modules:

Drop guard: _snapshots.py
Mass delete protection: _pairs_massdelete.py
Tombstones: _tombstones.py
Blackbox: _blackbox.py
Unresolved: _unresolved.py
PhantomGuard: _phantoms.py
Health gating: _pairs.py + provider health()

Overview

Guardrails run before and around writes. They prefer to block or skip instead of guessing.

Key outcomes:

skip pairs when auth is broken
skip writes when a provider is down
suppress deletes when snapshots look wrong
quarantine items that keep failing

Guardrails (what they do)

1) Health gating

Goal: don’t write when auth is broken or a provider is down.

Auth failures

If a provider health says auth_failed:

the orchestrator skips the pair entirely
emits pair:skip reason=auth_failed

Provider down

If a provider health says down:

one-way: writes to that destination are skipped; items become unresolved
two-way: writes to that side are skipped; observed deletions are disabled

Observed deletions are disabled when a provider is down. “Down” often looks like an empty snapshot. That would otherwise look like “everything was deleted”.

2) Drop guard (suspect snapshot coercion)

Goal: if a snapshot shrinks massively without checkpoint movement, treat it as bad and reuse the baseline.

Code

coerce_suspect_snapshot(...) in _snapshots.py

Triggers only when:

sync.drop_guard == True
provider capabilities.index_semantics == "present" (default)
previous baseline is sufficiently large
current snapshot is tiny relative to baseline
checkpoint did not advance

Defaults (runtime):

runtime.suspect_min_prev = 20
runtime.suspect_shrink_ratio = 0.10

Effect:

replaces cur_idx with prev_idx for planning
emits snapshot:suspect / snapshot.guard

This prevents a transient 0-item response from turning into “remove 900 items”.

3) Mass delete protection

Goal: block huge removal plans unless explicitly allowed.

Code

_pairs_massdelete.py

Input:

removes list
baseline_size (destination effective index size)
sync.allow_mass_delete (default False)
runtime.suspect_shrink_ratio (same ratio, default 0.10)

If mass deletes are not allowed and:

len(removes) > baseline_size * ratio then:
it drops the entire removal list
emits mass_delete:blocked

This is intentionally blunt. If you really want large deletions, you must opt in.

4) Tombstones (pair-scoped deletion memory)

Goal: avoid ping-pong deletes and re-add loops, especially in two-way.

Code

_tombstones.py

Where stored:

/config/.cw_state/tombstones.json

Two-way loads tombstones scoped to:

feature + pair (PAIR = "-".join(sorted([A,B])))

Tokens recorded on successful removals:

canonical key (e.g., imdb:tt...)
all ID tokens in item["ids"] (imdb/tmdb/tvdb/simkl/mal/anilist/etc.)

Key format in file:

{feature}:{PAIR}|{token} → { "at": epoch, "why": "remove" }

TTL pruning:

sync.tombstone_ttl_days (default 30)
older tomb entries are ignored (and can be cleaned later)

How tombstones affect planning:

prevent re-adding an item that was intentionally removed
allow two-way to decide “this is a real delete, propagate it”

5) Observed deletions (two-way only)

Goal: infer “real deletes” from baseline vs live snapshot.

In two-way, observed deletions are:

prev.keys - cur.keys

But they are only trusted when:

not bootstrapping
provider not down
snapshot not suspect (drop guard didn’t trigger)
sync.include_observed_deletes == True

Observed deletions are immediately written into tombstones (with ID tokens) and removed from effective indices so they don’t get re-added.

This is the main delete-propagation mechanism. It is also the biggest “oops” risk. That’s why it is guarded heavily.

6) Blackbox (failure cooldown)

Goal: stop re-attempting flapping items after repeated failures.

Code

_blackbox.py and apply_blocklist usage in _pairs_blocklist.py

Blackbox maintains:

*.flap.json counters (consecutive failures)
*.blackbox.json blocked keys (cooldown)

Promotion:

after sync.blackbox.promote_after consecutive failures

Applied today:

blocks adds only
and only for non-watchlist features (watchlist uses PhantomGuard instead)

Pruning:

sync.blackbox.cooldown_days controls how long entries stay blocked

Some config flags exist (enabled, block_adds, block_removes) but aren’t fully enforced in wiring.

Treat blackbox as “always active if you call record_attempts / record_success”.

7) Unresolved (don’t hammer what keeps failing)

Goal: when a key repeatedly fails to apply, stop retrying it every run.

Code

_unresolved.py

How it’s recorded:

when apply returns failures, those keys are written to:
- *.unresolved.pending.json

How it’s used:

orchestrator loads unresolved keys (depending on file naming availability)
then removes them from planned adds

The loader looks for *.unresolved.json. The recorder writes *.unresolved.pending.json.

If you don’t have a promotion step (pending → active), unresolved blocking is weaker.

Unresolved is still useful. It may need a small wiring pass to fully “bite”.

8) PhantomGuard (watchlist anti-flap)

Goal: stop re-adding watchlist items that “succeed” but don’t stick.

Code

_phantoms.py

PhantomGuard is applied in one-way and two-way for feature == "watchlist" (and partially for ratings).

It tracks:

keys that were attempted as adds
whether they were later observed in the destination snapshot (“actually stuck”)

If an item repeatedly fails to “stick”, it becomes a phantom and further adds are blocked for ttl_days.

Files (scoped):

{feature}.{src}-{dst}.{scope}.phantoms.json
{feature}.{src}-{dst}.{scope}.last_success.json

Wiring quirk:

PhantomGuard config is read from root-level cfg["blackbox"], not sync.blackbox.

So yes: there are two “blackbox-ish” systems:

_blackbox.py (failure cooldown for non-watchlist adds)
_phantoms.py (watchlist stickiness guard)

9) Manual policy (human override)

Goal: let the user pin adds or block items regardless of snapshots.

Stored in state.manual.json (merged with state.json at load time).

Manual adds:

merged into the source index before diffing

Manual blocks:

applied to both adds and removes after planning
match by canonical key, id tokens, or title-year token

This is the “I know better, do it anyway / don’t ever sync this title” escape hatch.

Guardrail ordering (why it matters)

In practice, the orchestrator relies on this order to stay safe:

Health gating (skip if auth/down)
Drop guard (fix suspect snapshots before planning)
Index semantic merge (delta providers get baseline merge)
Plan diffs
Mass delete protection (remove big deletions)
Blocklists (tombstones/unresolved/blackbox)
PhantomGuard (watchlist)
Apply with confirmations + unresolved bookkeeping
Persist baselines/checkpoints

If you change ordering, you can accidentally disable a guardrail.

Snapshot coercion details: Snapshots
Two-way delete propagation: Two-way sync
Delete memory: Tombstones
Failure suppression: Blackbox, Phantom Guard, Unresolved

PreviousTwo-way sync NextState

Last updated 14 hours ago

Good afternoon

hashtagThe ones you’ll notice most

hashtagIf CrossWatch “does nothing”

hashtagIf you really want big deletes

hashtagOverview

hashtagGuardrails (what they do)

hashtag1) Health gating

hashtag2) Drop guard (suspect snapshot coercion)

hashtag3) Mass delete protection

hashtag4) Tombstones (pair-scoped deletion memory)

hashtag5) Observed deletions (two-way only)

hashtag6) Blackbox (failure cooldown)

hashtag7) Unresolved (don’t hammer what keeps failing)

hashtag8) PhantomGuard (watchlist anti-flap)

hashtag9) Manual policy (human override)

hashtagGuardrail ordering (why it matters)

hashtagRelated pages

The ones you’ll notice most

If CrossWatch “does nothing”

If you really want big deletes

Overview

Guardrails (what they do)

1) Health gating

2) Drop guard (suspect snapshot coercion)

3) Mass delete protection

4) Tombstones (pair-scoped deletion memory)

5) Observed deletions (two-way only)

6) Blackbox (failure cooldown)

7) Unresolved (don’t hammer what keeps failing)

8) PhantomGuard (watchlist anti-flap)

9) Manual policy (human override)

Guardrail ordering (why it matters)

Related pages