State.json

On-disk schema and semantics of /config/state.json (baselines, checkpoints, manual policy, metrics).

state.json is CrossWatch’s local “memory”.

It stores what CrossWatch last saw, so future runs stay safe.

What it’s used for

remembering baselines (prevents unsafe deletes)
tracking freshness markers (helps detect stale snapshots)
powering parts of the UI (run summaries, some rollups)

Where it lives

/config/state.json

If you run in Docker, back up your mounted /config folder.

Should I edit it?

Usually no.

Edit it only if you’re debugging and you know the consequences.

When to clear it

Clear state if you:

changed pair direction/mode a lot
changed provider auth and the planner stays “stuck”
see obvious planning loops that persist across runs

Clearing state forces a rebuild next run. It does not directly edit providers.

File locations and other state files: State
Safety model: Guardrails
Maintenance UI actions: Maintenance

This is the orchestrator’s canonical on-disk state. It’s used for:

cross-run baselines (what we saw last time)
checkpoints (what changed since last time)
manual policy overlay (blocks/adds)
lightweight metrics

Implementation notes

File: /config/state.json Loader/writer: cw_platform/orchestrator/_state_store.py Baseline writers: one-way/two-way runners and facade helpers.

Top-level shape

state.json is a JSON object (dict). Keys you’ll see:

{
  "providers": { ... },
  "wall": [...],
  "last_sync_epoch": 1738100000,
  "metrics": { ... }
}

1) `providers` (dict)

Map: PROVIDER_NAME -> provider_state

Provider names are uppercased: PLEX, SIMKL, TRAKT, JELLYFIN, EMBY, ANILIST, etc.

2) `wall` (list)

Final value at end of a normal run is a list of minimal watchlist items, aggregated across providers (see “Wall semantics”).

⚠️ Note: during run_pairs() the code briefly writes wall as a dict (stats overview), but Orchestrator.run() overwrites it with the watchlist wall list right after. So on disk, expect a list in steady-state.

3) `last_sync_epoch` (int | null)

Unix epoch seconds of the most recent run persistence.

4) `metrics` (dict, optional)

Currently used for API totals:

metrics.api.last

`providers` schema

Each provider has a node like:

"PLEX": {
  "watchlist": { ...feature_state... },
  "ratings":  { ...feature_state... },
  "history":  { ...feature_state... },
  "playlists":{ ...feature_state... },

  "manual": { ...manual_overlay... }
}

Not every provider has every feature node; missing features simply mean “no baseline persisted”.

`feature_state`

For each feature, the orchestrator persists:

{
  "baseline": {
    "items": {
      "imdb:tt0111161": { ...minimal_item... },
      "tmdb:550": { ...minimal_item... }
    }
  },
  "checkpoint": "2026-01-28T21:00:00Z"
}

baseline.items (dict)

Map: canonical_key -> minimal_item

Keys are produced by id_map.canonical_key(item) at snapshot time.
Values are “minimalized” via id_map.minimal(item) to keep state small and stable.

Baseline persistence filters: An item is skipped (not written into baseline) if any of these are true:

item["_cw_persist"] == false
item["_cw_transient"] == true
item["_cw_skip_persist"] == true

Also skipped if provider subobject marks it ignored:

item[provider_key]["ignored"] == true

Where provider_key is a normalized mapping (e.g. PLEX -> "plex", JELLYFIN -> "jellyfin").

checkpoint (string | null)

A per-provider, per-feature marker representing “how fresh the snapshot was”.

Derived by snapshots.module_checkpoint(...) from provider activities() or other provider hints.
Stored by one-way/two-way runners when a checkpoint value is available.
Used by drop-guard logic (suspect snapshot detection) to decide whether an empty/shrunken snapshot is likely “real” or “stale/broken”.

Checkpoint is opaque to the orchestrator: it’s “whatever the provider returns”.

Minimal item schema (what ends up in baselines)

There is no strict schema enforced, but stable items normally have:

{
  "type": "movie|show|season|episode",
  "title": "string",
  "year": 1999,
  "ids": {
    "imdb": "tt0137523",
    "tmdb": "550",
    "tvdb": "12345",
    "...": "..."
  },

  "plex": { ...optional provider payload... },
  "simkl": { ... },
  "trakt": { ... },
  "jellyfin": { ... }
}

The orchestrator mostly cares about:

type
ids (because canonical keys and token matching come from IDs)

Everything else is provider-specific.

Manual policy overlay (`providers.<PROV>.manual`)

The “manual” node is not written by the orchestrator. It’s injected from:

/config/state.manual.json (policy file)

at load time (and then persisted back into state.json on save).

Policy merge rules

Loader:

StateStore.load_state() reads state.json + reads state.manual.json
merges policy into state.providers.<PROV>.manual.<feature>

Supported policy locations (both work):

Recommended:

{
  "providers": {
    "PLEX": {
      "manual": {
        "watchlist": { ... },
        "ratings": { ... }
      }
    }
  }
}

Legacy-ish convenience:

{
  "providers": {
    "PLEX": {
      "watchlist": { ... },
      "ratings": { ... }
    }
  }
}

Manual feature schema

For each feature:

"manual": {
  "watchlist": {
    "blocks": ["imdb:tt...", "tmdb:123", "movie|title:foo|year:2024"],
    "adds": {
      "items": {
        "imdb:tt0111161": { ...minimal_item... }
      }
    }
  }
}

blocks is a list of tokens; the orchestrator treats them like tombstone-style tokens:
- canonical key, ID tokens, or normalized title/year tokens
adds.items is a dict keyed by canonical key. Merge behavior is “first-wins”:
- policy items are added only if missing in the current state node (no overwrite).

`metrics` schema

metrics is optional. If present, it currently contains:

"metrics": {
  "api": {
    "last": {
      "ts": 1738100000,
      "total": 1234,
      "providers": {
        "SIMKL": {
          "total": 300,
          "by_endpoint": { ... },
          "by_feature": { ... },
          "by_method": { ... },
          "by_status": { ... },
          "latency_ms_avg": 123,
          "latency_ms_samples": 300
        }
      }
    }
  }
}

Written by:

cw_platform/orchestrator/_pairs_metrics.py::persist_api_totals(...)

This is “nice to have”; nothing in planning relies on it.

Wall semantics (watchlist wall)

At the end of Orchestrator.run(), the facade writes a watchlist wall:

It collects providers[*].watchlist.baseline.items[*] across providers
Minimalizes each item
De-duplicates by canonical key
Writes it into state["wall"] as a list

So wall becomes:

“the union of all watchlist baseline items across providers”

This is mostly for UI convenience.

Migrations and compatibility

There is no explicit schema_version for state.json today.

Instead, compatibility is handled by:

defaulting missing keys (providers, wall, last_sync_epoch)
tolerant merges (skip non-dicts, keep what exists)
policy merge always being best-effort

What counts as a “migration” here

Policy merge injection

state.manual.json is merged into loaded state and persisted into state.json on save.

Wall overwrite behavior

run_pairs() writes a dict-ish wall briefly; Orchestrator.run() overwrites to list wall.

Baseline content changes

Because baselines store minimal(item), if you change id_map.minimal() output, you’ve effectively “migrated” baseline shape.
Code is tolerant because consumers treat baseline items as opaque mappings.

If you want real migrations (field-by-field schema guarantees), you’ll want to add:

state["schema"] = <int>
a loader migration step per schema

/config/state.manual.json (policy overlay)
/config/last_sync.json (run summary)
/config/.cw_state/* (guardrail internals; not part of state.json)

PreviousCaching layers NextProvider contract

Last updated 14 hours ago

Good afternoon

hashtagWhat it’s used for

hashtagWhere it lives

hashtagShould I edit it?

hashtagWhen to clear it

hashtagTop-level shape

hashtag1) providers (dict)

hashtag2) wall (list)

hashtag3) last_sync_epoch (int | null)

hashtag4) metrics (dict, optional)

hashtagproviders schema

hashtagfeature_state

hashtagMinimal item schema (what ends up in baselines)

hashtagManual policy overlay (providers.<PROV>.manual)

hashtagPolicy merge rules

hashtagManual feature schema

hashtagmetrics schema

hashtagWall semantics (watchlist wall)

hashtagMigrations and compatibility

hashtagWhat counts as a “migration” here

hashtagRelated files