State.json
On-disk schema and semantics of /config/state.json (baselines, checkpoints, manual policy, metrics).
state.json is CrossWatch’s local “memory”.
It stores what CrossWatch last saw, so future runs stay safe.
What it’s used for
remembering baselines (prevents unsafe deletes)
tracking freshness markers (helps detect stale snapshots)
powering parts of the UI (run summaries, some rollups)
Where it lives
/config/state.json
If you run in Docker, back up your mounted /config folder.
Should I edit it?
Usually no.
Edit it only if you’re debugging and you know the consequences.
When to clear it
Clear state if you:
changed pair direction/mode a lot
changed provider auth and the planner stays “stuck”
see obvious planning loops that persist across runs
Clearing state forces a rebuild next run. It does not directly edit providers.
Related:
File locations and other state files: State
Safety model: Guardrails
Maintenance UI actions: Maintenance
This is the orchestrator’s canonical on-disk state. It’s used for:
cross-run baselines (what we saw last time)
checkpoints (what changed since last time)
manual policy overlay (blocks/adds)
lightweight metrics
Implementation notes
File: /config/state.json
Loader/writer: cw_platform/orchestrator/_state_store.py
Baseline writers: one-way/two-way runners and facade helpers.
Top-level shape
state.json is a JSON object (dict). Keys you’ll see:
1) providers (dict)
providers (dict)Map: PROVIDER_NAME -> provider_state
Provider names are uppercased: PLEX, SIMKL, TRAKT, JELLYFIN, EMBY, ANILIST, etc.
2) wall (list)
wall (list)Final value at end of a normal run is a list of minimal watchlist items, aggregated across providers (see “Wall semantics”).
⚠️ Note: during run_pairs() the code briefly writes wall as a dict (stats overview), but Orchestrator.run() overwrites it with the watchlist wall list right after. So on disk, expect a list in steady-state.
3) last_sync_epoch (int | null)
last_sync_epoch (int | null)Unix epoch seconds of the most recent run persistence.
4) metrics (dict, optional)
metrics (dict, optional)Currently used for API totals:
metrics.api.last
providers schema
providers schemaEach provider has a node like:
Not every provider has every feature node; missing features simply mean “no baseline persisted”.
feature_state
feature_stateFor each feature, the orchestrator persists:
baseline.items (dict)
Map: canonical_key -> minimal_item
Keys are produced by
id_map.canonical_key(item)at snapshot time.Values are “minimalized” via
id_map.minimal(item)to keep state small and stable.
Baseline persistence filters: An item is skipped (not written into baseline) if any of these are true:
item["_cw_persist"] == falseitem["_cw_transient"] == trueitem["_cw_skip_persist"] == true
Also skipped if provider subobject marks it ignored:
item[provider_key]["ignored"] == true
Where provider_key is a normalized mapping (e.g. PLEX -> "plex", JELLYFIN -> "jellyfin").
checkpoint (string | null)
A per-provider, per-feature marker representing “how fresh the snapshot was”.
Derived by
snapshots.module_checkpoint(...)from provideractivities()or other provider hints.Stored by one-way/two-way runners when a checkpoint value is available.
Used by drop-guard logic (suspect snapshot detection) to decide whether an empty/shrunken snapshot is likely “real” or “stale/broken”.
Checkpoint is opaque to the orchestrator: it’s “whatever the provider returns”.
Minimal item schema (what ends up in baselines)
There is no strict schema enforced, but stable items normally have:
The orchestrator mostly cares about:
typeids(because canonical keys and token matching come from IDs)
Everything else is provider-specific.
Manual policy overlay (providers.<PROV>.manual)
providers.<PROV>.manual)The “manual” node is not written by the orchestrator. It’s injected from:
/config/state.manual.json(policy file)
at load time (and then persisted back into state.json on save).
Policy merge rules
Loader:
StateStore.load_state()readsstate.json+ readsstate.manual.jsonmerges policy into
state.providers.<PROV>.manual.<feature>
Supported policy locations (both work):
Recommended:
Legacy-ish convenience:
Manual feature schema
For each feature:
blocksis a list of tokens; the orchestrator treats them like tombstone-style tokens:canonical key, ID tokens, or normalized title/year tokens
adds.itemsis a dict keyed by canonical key. Merge behavior is “first-wins”:policy items are added only if missing in the current state node (no overwrite).
metrics schema
metrics schemametrics is optional. If present, it currently contains:
Written by:
cw_platform/orchestrator/_pairs_metrics.py::persist_api_totals(...)
This is “nice to have”; nothing in planning relies on it.
Wall semantics (watchlist wall)
At the end of Orchestrator.run(), the facade writes a watchlist wall:
It collects
providers[*].watchlist.baseline.items[*]across providersMinimalizes each item
De-duplicates by canonical key
Writes it into
state["wall"]as a list
So wall becomes:
“the union of all watchlist baseline items across providers”
This is mostly for UI convenience.
Migrations and compatibility
There is no explicit schema_version for state.json today.
Instead, compatibility is handled by:
defaulting missing keys (
providers,wall,last_sync_epoch)tolerant merges (skip non-dicts, keep what exists)
policy merge always being best-effort
What counts as a “migration” here
Policy merge injection
state.manual.jsonis merged into loaded state and persisted intostate.jsonon save.
Wall overwrite behavior
run_pairs()writes a dict-ish wall briefly;Orchestrator.run()overwrites to list wall.
Baseline content changes
Because baselines store
minimal(item), if you changeid_map.minimal()output, you’ve effectively “migrated” baseline shape.Code is tolerant because consumers treat baseline items as opaque mappings.
If you want real migrations (field-by-field schema guarantees), you’ll want to add:
state["schema"] = <int>a loader migration step per schema
Related files
/config/state.manual.json(policy overlay)/config/last_sync.json(run summary)/config/.cw_state/*(guardrail internals; not part of state.json)
Last updated