# Two-way sync

{% tabs %}
{% tab title="End users" %}
Two-way sync keeps both sides aligned.

It is powerful. It is also where deletes can propagate.

{% hint style="warning" %}
CrossWatch supports real bidirectional synchronization. Changes on either platform can sync in both directions.

That adds flexibility. It also adds risk.

Because both sides can introduce changes, you are more likely to see:

* conflicts
* duplicate items
* unintended overwrites
* propagated deletions

This gets worse when a provider returns delayed, partial, or inconsistent data.

Use bidirectional sync carefully.

It may look like the better option. For most users, **one-way** sync is still the safer and more predictable setup.
{% endhint %}

#### When to use it

Use two-way only when:

* both providers match items reliably (stable IDs)
* you already had a few clean one-way runs
* you understand what “remove” will do

{% hint style="warning" %}
Start two-way with **Watchlist only**. Keep **Remove** off at first.
{% endhint %}

#### What to expect

* First run is safe. It avoids delete propagation.
* Later runs can propagate deletes when evidence is strong.
* If a provider is down, CrossWatch avoids “everything deleted” behavior.

#### Safety mechanisms you’ll see

* **Drop guard**: blocks destructive plans from tiny snapshots.
* **Mass delete protection**: blocks large removal waves unless you opt in.
* **Tombstones**: remember deletions to avoid ping-pong.

#### Recommended setup

1. Run **Dry run** first.
2. Enable **Add**. Keep **Remove** off.
3. Watch plans for a few runs.
4. Only then consider enabling removes.

Related:

* Safer defaults: [Best practice](/getting-started/best-practices.md)
* How deletes are remembered: [Tombstones](/blueprint-architecture/orchestrator/tombstones.md)
* Safety model: [Guardrails](/blueprint-architecture/orchestrator/guardrails.md)
  {% endtab %}

{% tab title="Power users" %}
Two-way means: **A ↔ B**.

It tries to keep both sides aligned. It also tries very hard not to turn an API hiccup into mass deletes.

<details>

<summary>Implementation notes</summary>

Primary implementation: `cw_platform/orchestrator/_pairs_twoway.py`

Key helpers: `_snapshots.py`, `_planner.py`, `_tombstones.py`, `_phantoms.py`, `_pairs_blocklist.py`, `_pairs_massdelete.py`, `_unresolved.py`, `_blackbox.py`

</details>

Related runtime docs:

* [Orchestrator](/blueprint-architecture/orchestrator.md)
* [One-way sync](/blueprint-architecture/orchestrator/one-way-sync.md)
* [Snapshots (indices)](/blueprint-architecture/orchestrator/snapshots.md)
* [Blackbox](/blueprint-architecture/orchestrator/blackbox.md)

***

### Overview

Two-way is not “run one-way twice”. It adds extra logic for safe delete propagation:

* **Observed deletions**: “it used to exist, now it’s gone”
* **Tombstones**: “we saw a delete, remember it for a while”
* **Drop guard**: “a tiny snapshot might be wrong”

Per feature, two-way does:

1. Gate by provider availability and health.
2. Build snapshots for A and B.
3. Load baselines, checkpoints, and manual policy for both sides.
4. Apply drop-guard and index semantics (present vs delta).
5. Compute observed deletions and write tombstones (when safe).
6. Plan adds and optional removals in both directions.
7. Apply removals first, then adds.
8. Persist baselines and checkpoints.

***

### Technical reference

#### Entry point

`run_two_way_feature(ctx, src, dst, feature, fcfg, health_map)`

* Emits `feature:start` / `feature:done`
* Calls `_two_way_sync(ctx, a, b, ...)`

***

#### Step 0 — Health gating

* If either side reports `auth_failed` → emits `pair:skip` and returns `ok:false`.
* If either side is `down`:
  * writes to that side are skipped (items become unresolved)
  * **observed deletions are disabled** for both sides (to avoid “down = everything deleted”)

Also: feature support is checked:

* provider `capabilities.features[feature]` (best effort)
* plus health `features[feature]` if reported

If either side doesn’t support the feature → `feature:unsupported` and no-op.

***

#### Step 1 — Build snapshots

`build_snapshots_for_feature(feature, providers={A,B}, ...)` returns:

* `A_cur`: current snapshot from A
* `B_cur`: current snapshot from B

Snapshots may be memoized per run using `ctx.snap_cache` and `ctx.snap_ttl_sec`.

***

#### Step 2 — Baselines, checkpoints, manual policy

Loads the stable “previous state” once per run:

* `prev_state = ctx._stable_prev_state or state_store.load_state()`

From `prev_state`:

* `prevA` baseline items: `state.providers[A][feature].baseline.items`
* `prevB` baseline items: `state.providers[B][feature].baseline.items`
* `prev_cp_A / prev_cp_B`: stored checkpoints

Current checkpoints:

* `now_cp_A = module_checkpoint(aops, cfg, feature)`
* `now_cp_B = module_checkpoint(bops, cfg, feature)`

Manual policy is loaded for *each* side:

* `manual_adds_A, manual_blocks_A`
* `manual_adds_B, manual_blocks_B`

Manual adds are merged into A\_eff/B\_eff; manual blocks filter adds and removes.

***

#### Step 3 — Drop guard (optional)

Controlled by:

* `sync.drop_guard` (bool)
* runtime knobs:
  * `runtime.suspect_min_prev` (default 20)
  * `runtime.suspect_shrink_ratio` (default 0.10)
  * `runtime.suspect_debug` (default True)

If enabled, each side’s “present snapshot” can be coerced back to baseline when:

* previous baseline is sufficiently large
* current snapshot shrank too much
* checkpoint did not advance

Flags produced:

* `A_suspect`, `B_suspect`

These flags later suppress observed deletions for that side.

***

#### Step 4 — Index semantics: present vs delta

Per provider capability:

* `capabilities.index_semantics` defaults to `"present"`

Then:

* if `"delta"`: `eff = prev | cur`
* else: `eff = guarded_cur` (or raw cur if drop guard off)

Result:

* `A_eff`, `B_eff`

Library whitelists (history/ratings) can filter:

* `prevA`, `A_cur`, `A_eff`
* `prevB`, `B_cur`, `B_eff`

***

#### Step 5 — Tombstones (pair-scoped) + TTL

Two-way uses pair-scoped tombstones only.

Pair key:

* `pair_key = "-".join(sorted([A, B]))` (e.g., `PLEX-SIMKL`)

Load:

* `tomb_map = keys_for_feature(state_store, feature, pair=pair_key)`
* Filter by TTL:
  * `sync.tombstone_ttl_days` (default 30)

Result:

* `tomb` is the set of tokens still “alive” (canonical keys and id tokens)

Tokens look like:

* `imdb:tt...`, `tmdb:123`, etc.

***

#### Step 6 — Observed deletions

This is how two-way decides “an item disappeared on one side, so we should delete it on the other side too”.

#### When it runs

Observed deletions run only if:

* not bootstrapping (see below)
* `sync.include_observed_deletes` is true (default True)
* neither provider is down
* the relevant side’s snapshot is **not suspect** (`A_suspect/B_suspect == False`)

#### How it’s computed

For each side:

* `obsA = prevA.keys - A_cur.keys`
* `obsB = prevB.keys - B_cur.keys`

Interpretation: “it used to exist (baseline), now the live snapshot no longer contains it.”

#### What happens next

* `newly = (obsA | obsB) - tomb`
* Every token in `newly` is written into tombstones with timestamp `now`.

And the orchestrator also tries to include **ID tokens** for the item by looking it up in `prevA` or `prevB`:

* it writes `imdb:...`, `tmdb:...`, etc. for stronger matching across providers.

Finally:

* observed-deleted keys are removed from the effective indices:
  * `A_eff.pop(k)` for `k in obsA`
  * `B_eff.pop(k)` for `k in obsB`

This prevents immediately re-adding an observed-deleted item back to the same side.

#### Bootstrap mode

If this is the first time syncing this pair+feature:

* `bootstrap = (not prevA) and (not prevB) and not tomb`

Then:

* observed deletions are skipped
* and later removals are forced off, even if `allow_removals` is true

Translation: first run won’t delete things. Good.

***

#### Step 7 — Tombstone expansion (`tombX`)

Tombstone tokens can be either:

* canonical keys, or
* ID tokens, or
* episode/season typed tokens.

Two-way expands tombstones into `tombX` by resolving tokens through alias maps:

* Builds alias maps from:
  * `A_eff`, `B_eff`, `prevA`, `prevB`
* If a tomb token matches a typed token, it maps back to a canonical key.
* Adds those canonical keys to `tombX`

Net effect: tombstones match more reliably even if providers disagree on the primary key.

***

#### Step 8 — Planning

Two-way produces four lists:

* `add_to_A`
* `add_to_B`
* `rem_from_A`
* `rem_from_B`

#### Presence features (watchlist/history/playlists)

For each item present in A\_eff but not “present” in B\_eff:

* normally → `add_to_B`
* but if removals are enabled *and* deletion is strongly indicated → `rem_from_A`

Deletion “strong signals” (any of):

* item tokens hit `tombX`
* canonical key in `tombX`
* key is in `obsB` (B observed deletion)
* key is in `shrinkB` (B baseline key missing from current snapshot)

…and **the item must have existed on the opposite side before**:

* `_prev_had(prevB, prevB_alias, item)` is true, OR it hits tombstones (tombX), which counts as “strong enough evidence”.

Same logic is repeated in reverse for items present in B\_eff but not in A\_eff.

#### Ratings (special)

For ratings:

* compute diffs both directions:
  * `upserts_to_B, unrates_on_B = diff_ratings(A, B)`
  * `upserts_to_A, unrates_on_A = diff_ratings(B, A)`

Conflict resolution:

* if both sides want to upsert, pick a winner:
  * newer `rated_at` wins (if both parse), else `sync.bidirectional.source_of_truth` wins (default: A)
* if one side wants upsert and the other wants unrate:
  * if removals enabled *and* tombstones match → unrate
  * else propagate the upsert (keep the rating)

Important: unrates/removals are only planned if `allow_removals` is enabled.

***

#### Step 9 — Blocklists, manual blocks, phantoms

* Unresolved keys block **adds** on each destination:
  * `load_unresolved_keys(dst, feature, cross_features=True)`
* For non-watchlist features, `apply_blocklist(...)` also blocks adds using:
  * tombstones + unresolved + blackbox

Manual blocks apply to both adds and removes.

Watchlist uses PhantomGuard (not blackbox) to avoid add flapping:

* `PhantomGuard(src=X, dst=Y, feature, ttl_days=cooldown_days)`
* ratings upserts are split:
  * updates are not phantom-filtered
  * fresh adds are phantom-filtered

***

#### Step 10 — Mass delete protection

Before applying removals:

* `maybe_block_mass_delete(...)` can drop the entire removal list if it exceeds a ratio of baseline size.

Controlled by:

* `sync.allow_mass_delete`
* runtime ratio: `runtime.suspect_shrink_ratio` (default 0.10)

This is the “no, you’re not deleting 900 items because the API had a nap” guard.

***

#### Step 11 — Apply order

Two-way applies in this order:

1. removals from A
2. removals from B
3. adds to A
4. adds to B

Why removals first?

* It lets tombstones get recorded before add planning on later passes in the same run (and keeps indices tidy).

#### Tombstone marking on successful removals

After confirmed removals (and not dry-run), `_mark_tombs(rem_items)` writes:

* canonical key
* every `ids.*` token in the removed items

into pair-scoped tombstones:

* key format: `{feature}:{PAIR}|{token}`

***

#### Step 12 — Apply results, confirmations, blackbox + unresolved

Adds use the same “effective confirmation” rules as one-way:

* if provider returns `confirmed_keys`, those are used
* else: success is approximated as `attempted - newly_unresolved`

If `sync.verify_after_write` is enabled and the provider supports it:

* successes are further reduced by re-checking unresolved after write

Blackbox integration:

* failed attempted keys (excluding skipped) call:
  * `record_attempts(dst, feature, failed_keys, reason="two:apply:add:failed", op="add", pair=pair_key)`
* successes call:
  * `record_success(dst, feature, success_keys, pair=pair_key)`

Unresolved:

* when a provider is down, everything becomes unresolved with hints `provider_down:add/remove`
* when apply failures occur, those items are recorded unresolved with hint `apply:add:failed`

***

#### Step 13 — Persist baselines + checkpoints

At the end, two-way persists:

* baseline items for both providers (filtered)
* checkpoints (if present)

Filter rules match one-way:

* skip items marked transient (`_cw_transient`, `_cw_skip_persist`, etc.)
* skip items with provider subobject `ignored == True`

Writes `state.json` with updated `last_sync_epoch`.

***

### Config knobs that matter

Under `sync`:

* `enable_add` / `enable_remove` (or per-feature overrides)
* `include_observed_deletes`
* `tombstone_ttl_days`
* `drop_guard`
* `allow_mass_delete`
* `verify_after_write`
* `dry_run`

Under `sync.bidirectional` (ratings conflict preference):

* `source_of_truth` (A or B provider name)

Root `blackbox` (PhantomGuard):

* `enabled`, `block_adds`, `cooldown_days`

***

### Notes

* First run is bootstrap mode. It avoids observed deletes and removals.
* If either side is `down`, observed deletions are disabled for both sides.
* Drop-guard suspect snapshots suppress observed deletions for that side.
  {% endtab %}
  {% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://wiki.crosswatch.app/blueprint-architecture/orchestrator/two-way-sync.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
