# State

{% tabs %}
{% tab title="End users" %}
State files are CrossWatch’s on-disk memory.

They let CrossWatch plan safely across runs.

#### Files you should care about

* `/config/state.json`: baselines and checkpoints.
* `/config/state.manual.json`: your manual adds/blocks.

If you back up CrossWatch, back up `/config/`.

#### Why state matters

* Baselines stop CrossWatch from deleting “unknown” items.
* Checkpoints help detect stale snapshots.
* Guardrail files reduce flapping and retries.

#### When to reset state

Resetting can be useful when you change:

* pair direction or mode
* provider credentials
* whitelists / library filters

Related:

* Pair scoping: [Scope](/blueprint-architecture/orchestrator/scope.md)
* Safety model: [Guardrails](/blueprint-architecture/orchestrator/guardrails.md)
  {% endtab %}

{% tab title="Power users" %}
State files are the orchestrator’s on-disk memory. They store baselines, checkpoints, and guardrail data.

This page lists what gets written, where it lives, and who uses it.

Base directory is `CONFIG_BASE()` (usually `/config` inside the container).

Code references:

* `cw_platform/orchestrator/_state_store.py`
* `cw_platform/orchestrator/_scope.py`
* plus each guardrail module (`_tombstones.py`, `_blackbox.py`, `_phantoms.py`, `_unresolved.py`)

***

### File location rules

#### Root vs `.cw_state/`

* Core, user-facing state lives in `/config/`
* Most “guardrail internals” live in `/config/.cw_state/`

The orchestrator creates `.cw_state` automatically if missing.

#### Scope suffixes

Many files are **scoped**, meaning the filename includes a sanitized pair scope:

* derived from env vars set by `_pairs._pair_env()`
* created via `scoped_file(prefix, ext)`

Typical scope strings:

* `one-way_plex-simkl_0`
* `two-way_plex-trakt_1`
* `health` (temporary)

Scope length is capped (96 chars). Invalid characters are replaced with `_`.

***

### Core files in `/config/`

#### `/config/state.json`

Purpose: canonical orchestrator state.

Written by: `_state_store.save_state()` (called at end of a feature run)

Read by: most runtime logic (baselines, checkpoints, policy merge, summaries)

Top-level shape (high-level):

```json
{
  "schema": 1,
  "last_sync_epoch": 1738100000,
  "providers": {
    "PLEX": {
      "watchlist": {
        "baseline": { "items": { "imdb:tt...": { ...minimal... } } },
        "checkpoint": "2026-01-28T21:00:00Z"
      },
      "ratings": { ... },
      "history": { ... },
      "playlists": { ... }
    },
    "SIMKL": { ... }
  },
  "metrics": { ... },
  "wall": { ... }
}
```

#### Baselines

* `baseline.items` is a dict `canonical_key -> minimal item`
* “minimal item” is usually:
  * `type/title/year/ids`
  * plus provider subobject (plex/jellyfin/trakt/simkl…) when needed
  * excludes transient fields

Baseline persistence filters out items if:

* `_cw_persist == false`
* `_cw_transient == true`
* `_cw_skip_persist == true`
* provider subobject `ignored == true` (provider-specific)

#### Checkpoints

* stored at `providers.{PROVIDER}.{feature}.checkpoint`
* checkpoint is whatever `module_checkpoint()` returned (string-ish)

***

#### `/config/state.manual.json`

Purpose: manual overrides. Merged into `state.json` at load time.

Read by: `_state_store.load_state()`

Written by: UI and API endpoints (not orchestrator core)

Contains:

* manual adds: “always include”
* manual blocks: “never sync”

Stored under the relevant provider+feature node (same `providers` nesting as state.json).

Manual blocks can be:

* canonical keys (`imdb:tt...`)
* ID tokens (`tmdb:123`)
* title-year tokens (normalized)

***

#### `/config/last_sync.json`

Purpose: last run summary for UI.

Written by: `_pairs.py` after a run

Read by: UI endpoints and logs views

Typical contents:

* when the last run started/ended
* per pair:
  * per feature counts (adds/removes/unresolved)
* api totals (if recorded)

Not used for correctness; purely observability.

***

#### `/config/watchlist_hide.json`

Purpose: UI helper file to hide watchlist items.

Written by: UI

Cleared by: `_pairs.py` at end of a run

This is intentionally *not* a durable guardrail. It’s a UI affordance.

***

#### `/config/ratings_changes.json`

Purpose: optional sink for rating-change traces.

Writer: some provider modules may append to it

Reader: UI/debug tools

Not required for orchestrator logic.

***

### Guardrail internals in `/config/.cw_state/`

***

#### `/config/.cw_state/tombstones.json`

Purpose: deletion memory. Used mainly by two-way.

Written by:

* one-way: when removals succeed on destination
* two-way: when removals succeed on either side
* two-way: also when “observed deletions” are detected

Read by:

* `_tombstones.keys_for_feature(...)`
* `_pairs_blocklist.apply_blocklist(...)` (indirectly, adds blocklist)

Key format:

* `{feature}:{PAIR}|{token}`

Value includes:

* `at` epoch
* optional `why` string

TTL:

* `sync.tombstone_ttl_days` decides what is treated as active

Legacy migration:

* `/config/tombstones.json` may be migrated into `.cw_state/tombstones.json`

***

#### Blackbox files

**Flap counters (`*.flap.json`)**

Purpose: consecutive failure counters.

Filename pattern:

* `/config/.cw_state/{dst}_{feature}.{SCOPE}.flap.json`

Written by: `_blackbox.inc_flap()`, `record_attempts()`

Read by: `_blackbox.load_flap_map()`

**Blocked keys (`*.blackbox.json`)**

Purpose: blocked keys with cooldown timestamps.

Filename pattern:

* `/config/.cw_state/{dst}_{feature}.{PAIR or SCOPE}.blackbox.json`

Written by: `_blackbox._promote()`

Read by: `_blackbox.load_blackbox_keys()`

Pruned by:

* `_blackbox.prune_blackbox(cooldown_days=...)`

***

#### Unresolved files

**Pending unresolved (`*.unresolved.pending.json`)**

Purpose: keys that failed apply this run.

Filename pattern (scoped):

* `/config/.cw_state/{dst}_{feature}.{SCOPE}.unresolved.pending.json`

Written by: `_unresolved.record_unresolved(...)`

Contents:

```json
{
  "imdb:tt...": { "at": 1738100000, "hint": "apply:add:failed" }
}
```

**Active unresolved (`*.unresolved.json`)**

Purpose: active unresolved blocklist.

Filename pattern (scoped):

* `/config/.cw_state/{dst}_{feature}.{SCOPE}.unresolved.json`

Read by: `_unresolved.load_unresolved_keys(...)`

{% hint style="warning" %}
The orchestrator writes `*.unresolved.pending.json`. The loader reads `*.unresolved.json`.

If you don’t promote pending → active, unresolved won’t fully block retries.
{% endhint %}

***

#### PhantomGuard files (watchlist)

Filename patterns (scoped):

* `/config/.cw_state/{feature}.{src}-{dst}.{SCOPE}.phantoms.json`
* `/config/.cw_state/{feature}.{src}-{dst}.{SCOPE}.last_success.json`

Written by: `_phantoms.PhantomGuard.record_attempt/record_success`

Read by: `_phantoms.PhantomGuard.load()`

Purpose:

* block repeated adds that don’t “stick” on destination

TTL:

* `cooldown_days` read from root config `cfg["blackbox"]["cooldown_days"]` (yes, this is confusing)

***

### Misc scoped helpers

#### `anilist_watchlist_shadow.*.json`

* `/config/.cw_state/anilist_watchlist_shadow.{SCOPE}.json`

Written by:

* `_snapshots._maybe_backfill_anilist_shadow()`

Purpose:

* helps ANILIST entries map to better canonical keys by storing ANILIST IDs and matched source IDs.

***

### Debugging cheat sheet

If a run seems “stuck” on the same items:

1. Check unresolved:

* `ls /config/.cw_state/*unresolved*`
* see if pending is piling up without a promoted `*.unresolved.json`

2. Check blackbox:

* `ls /config/.cw_state/*blackbox.json`
* remove keys manually if needed

3. Check tombstones:

* `grep -n "watchlist:PLEX-SIMKL" /config/.cw_state/tombstones.json | head`

4. Check phantom guard:

* `ls /config/.cw_state/watchlist.*.phantoms.json`

5. Check baselines/checkpoints:

* open `/config/state.json`
* look under `providers.{PROVIDER}.{feature}`

***

### Related docs

* [Orchestrator](/blueprint-architecture/orchestrator.md)
* [Snapshots (indices)](/blueprint-architecture/orchestrator/snapshots.md)
* [Blackbox](/blueprint-architecture/orchestrator/blackbox.md)
* [Guardrails](/blueprint-architecture/orchestrator/guardrails.md)
* [One-way sync](/blueprint-architecture/orchestrator/one-way-sync.md)
* [Two-way sync](/blueprint-architecture/orchestrator/two-way-sync.md)
  {% endtab %}
  {% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://wiki.crosswatch.app/blueprint-architecture/orchestrator/state.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
