# State.json

{% tabs %}
{% tab title="End users" %}
`state.json` is CrossWatch’s local “memory”.

It stores what CrossWatch last saw, so future runs stay safe.

#### What it’s used for

* remembering baselines (prevents unsafe deletes)
* tracking freshness markers (helps detect stale snapshots)
* powering parts of the UI (run summaries, some rollups)

#### Where it lives

* `/config/state.json`

If you run in Docker, back up your mounted `/config` folder.

#### Should I edit it?

Usually no.

Edit it only if you’re debugging and you know the consequences.

#### When to clear it

Clear state if you:

* changed pair direction/mode a lot
* changed provider auth and the planner stays “stuck”
* see obvious planning loops that persist across runs

Clearing state forces a rebuild next run. It does not directly edit providers.

Related:

* File locations and other state files: [State](/blueprint-architecture/orchestrator/state.md)
* Safety model: [Guardrails](/blueprint-architecture/orchestrator/guardrails.md)
* Maintenance UI actions: [Maintenance](/crosswatch/maintenance.md)
  {% endtab %}

{% tab title="Power users" %}
This is the orchestrator’s canonical on-disk state. It’s used for:

* cross-run baselines (what we saw last time)
* checkpoints (what changed since last time)
* manual policy overlay (blocks/adds)
* lightweight metrics

<details>

<summary>Implementation notes</summary>

**File:** `/config/state.json`\
**Loader/writer:** `cw_platform/orchestrator/_state_store.py`\
**Baseline writers:** one-way/two-way runners and facade helpers.

</details>

***

### Top-level shape

`state.json` is a JSON object (dict). Keys you’ll see:

```json
{
  "providers": { ... },
  "wall": [...],
  "last_sync_epoch": 1738100000,
  "metrics": { ... }
}
```

#### 1) `providers` (dict)

Map: `PROVIDER_NAME -> provider_state`

Provider names are uppercased: `PLEX`, `SIMKL`, `TRAKT`, `JELLYFIN`, `EMBY`, `ANILIST`, etc.

#### 2) `wall` (list)

Final value at end of a normal run is a **list of minimal watchlist items**, aggregated across providers (see “Wall semantics”).

⚠️ Note: during `run_pairs()` the code briefly writes `wall` as a dict (stats overview), but `Orchestrator.run()` overwrites it with the watchlist wall list right after. So on disk, **expect a list** in steady-state.

#### 3) `last_sync_epoch` (int | null)

Unix epoch seconds of the most recent run persistence.

#### 4) `metrics` (dict, optional)

Currently used for API totals:

* `metrics.api.last`

***

### `providers` schema

Each provider has a node like:

```json
"PLEX": {
  "watchlist": { ...feature_state... },
  "ratings":  { ...feature_state... },
  "history":  { ...feature_state... },
  "playlists":{ ...feature_state... },

  "manual": { ...manual_overlay... }
}
```

Not every provider has every feature node; missing features simply mean “no baseline persisted”.

#### `feature_state`

For each feature, the orchestrator persists:

```json
{
  "baseline": {
    "items": {
      "imdb:tt0111161": { ...minimal_item... },
      "tmdb:550": { ...minimal_item... }
    }
  },
  "checkpoint": "2026-01-28T21:00:00Z"
}
```

**`baseline.items` (dict)**

Map: `canonical_key -> minimal_item`

* Keys are produced by `id_map.canonical_key(item)` at snapshot time.
* Values are “minimalized” via `id_map.minimal(item)` to keep state small and stable.

**Baseline persistence filters:** An item is skipped (not written into baseline) if any of these are true:

* `item["_cw_persist"] == false`
* `item["_cw_transient"] == true`
* `item["_cw_skip_persist"] == true`

Also skipped if provider subobject marks it ignored:

* `item[provider_key]["ignored"] == true`

Where `provider_key` is a normalized mapping (e.g. `PLEX -> "plex"`, `JELLYFIN -> "jellyfin"`).

**`checkpoint` (string | null)**

A per-provider, per-feature marker representing “how fresh the snapshot was”.

* Derived by `snapshots.module_checkpoint(...)` from provider `activities()` or other provider hints.
* Stored by one-way/two-way runners when a checkpoint value is available.
* Used by drop-guard logic (suspect snapshot detection) to decide whether an empty/shrunken snapshot is likely “real” or “stale/broken”.

Checkpoint is opaque to the orchestrator: it’s “whatever the provider returns”.

***

### Minimal item schema (what ends up in baselines)

There is no strict schema enforced, but stable items normally have:

```json
{
  "type": "movie|show|season|episode",
  "title": "string",
  "year": 1999,
  "ids": {
    "imdb": "tt0137523",
    "tmdb": "550",
    "tvdb": "12345",
    "...": "..."
  },

  "plex": { ...optional provider payload... },
  "simkl": { ... },
  "trakt": { ... },
  "jellyfin": { ... }
}
```

The orchestrator mostly cares about:

* `type`
* `ids` (because canonical keys and token matching come from IDs)

Everything else is provider-specific.

***

### Manual policy overlay (`providers.<PROV>.manual`)

The “manual” node is not written by the orchestrator. It’s injected from:

* `/config/state.manual.json` (policy file)

at **load time** (and then persisted back into `state.json` on save).

#### Policy merge rules

Loader:

* `StateStore.load_state()` reads `state.json` + reads `state.manual.json`
* merges policy into `state.providers.<PROV>.manual.<feature>`

Supported policy locations (both work):

1. Recommended:

```json
{
  "providers": {
    "PLEX": {
      "manual": {
        "watchlist": { ... },
        "ratings": { ... }
      }
    }
  }
}
```

2. Legacy-ish convenience:

```json
{
  "providers": {
    "PLEX": {
      "watchlist": { ... },
      "ratings": { ... }
    }
  }
}
```

#### Manual feature schema

For each feature:

```json
"manual": {
  "watchlist": {
    "blocks": ["imdb:tt...", "tmdb:123", "movie|title:foo|year:2024"],
    "adds": {
      "items": {
        "imdb:tt0111161": { ...minimal_item... }
      }
    }
  }
}
```

* `blocks` is a list of tokens; the orchestrator treats them like tombstone-style tokens:
  * canonical key, ID tokens, or normalized title/year tokens
* `adds.items` is a dict keyed by canonical key. Merge behavior is “first-wins”:
  * policy items are added only if missing in the current state node (no overwrite).

***

### `metrics` schema

`metrics` is optional. If present, it currently contains:

```json
"metrics": {
  "api": {
    "last": {
      "ts": 1738100000,
      "total": 1234,
      "providers": {
        "SIMKL": {
          "total": 300,
          "by_endpoint": { ... },
          "by_feature": { ... },
          "by_method": { ... },
          "by_status": { ... },
          "latency_ms_avg": 123,
          "latency_ms_samples": 300
        }
      }
    }
  }
}
```

Written by:

* `cw_platform/orchestrator/_pairs_metrics.py::persist_api_totals(...)`

This is “nice to have”; nothing in planning relies on it.

***

### Wall semantics (watchlist wall)

At the end of `Orchestrator.run()`, the facade writes a **watchlist wall**:

* It collects `providers[*].watchlist.baseline.items[*]` across providers
* Minimalizes each item
* De-duplicates by canonical key
* Writes it into `state["wall"]` as a list

So `wall` becomes:

* “the union of all watchlist baseline items across providers”

This is mostly for UI convenience.

***

### Migrations and compatibility

There is no explicit `schema_version` for `state.json` today.

Instead, compatibility is handled by:

* defaulting missing keys (`providers`, `wall`, `last_sync_epoch`)
* tolerant merges (skip non-dicts, keep what exists)
* policy merge always being best-effort

#### What counts as a “migration” here

1. **Policy merge injection**

* `state.manual.json` is merged into loaded state and persisted into `state.json` on save.

2. **Wall overwrite behavior**

* `run_pairs()` writes a dict-ish wall briefly; `Orchestrator.run()` overwrites to list wall.

3. **Baseline content changes**

* Because baselines store `minimal(item)`, if you change `id_map.minimal()` output, you’ve effectively “migrated” baseline shape.
* Code is tolerant because consumers treat baseline items as opaque mappings.

If you want real migrations (field-by-field schema guarantees), you’ll want to add:

* `state["schema"] = <int>`
* a loader migration step per schema

***

### Related files

* `/config/state.manual.json` (policy overlay)
* `/config/last_sync.json` (run summary)
* `/config/.cw_state/*` (guardrail internals; not part of state.json)
  {% endtab %}
  {% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://wiki.crosswatch.app/blueprint-architecture/orchestrator/state-json.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
