# Health

{% tabs %}
{% tab title="End users" %}
Health checks decide what CrossWatch is allowed to do.

If health is bad, CrossWatch skips risky work.

#### What you’ll see

* **auth\_failed**: the pair is skipped.
* **down**: writes are skipped to that provider.
* **feature unsupported**: a feature becomes a no-op.

#### If your run is skipping everything

1. Re-auth the provider.
2. Check server URL and tokens (media servers).
3. Retry the run after fixing auth.

Related:

* Safety model: [Guardrails](/blueprint-architecture/orchestrator/guardrails.md)
* Provider expectations: [Provider contract](/blueprint-architecture/orchestrator/provider-contract.md)
  {% endtab %}

{% tab title="Power users" %}
This doc describes how provider health is gathered and how it gates orchestrator behavior.

**Core code:** `cw_platform/orchestrator/_pairs.py` (`_collect_health_for_run`)\
**Provider contract:** `InventoryOps.health(cfg, emit=...)`

***

### Why health exists

Health is the orchestrator’s “don’t do something dumb” preflight.

It prevents:

* running sync with invalid auth (guaranteed failure + noise)
* treating a down provider as “everything removed”
* making delete propagation decisions on missing data

Health is best-effort:

* if a provider doesn’t implement `health()`, the orchestrator assumes it’s OK (less safe).

***

### When health is collected

At the start of `_run_pairs(ctx)`:

1. The orchestrator determines which providers are used by enabled pairs.
2. For each provider, it calls `ops.health(cfg, emit=ctx.emit)` (or `ops.health(cfg)` fallback).
3. It stores the result in `ctx.health_map`.

This happens once per run, not per feature.

***

### Scope during health

During health, the orchestrator sets a temporary scope:

* `CW_PAIR_MODE=health`
* `CW_PAIR_SCOPE=health`
* `CW_PAIR_FEATURE=health`

This ensures any provider-level state files created during health don’t collide with real pair scopes.

***

### Expected health response shape

There is no strict schema, but the orchestrator expects these fields if present:

* `ok` (bool)
* `status` (string):
  * `ok`
  * `down`
  * `auth_failed`
  * `degraded` (optional)
* `features` (dict\[str,bool]) (optional)
* `api` (dict) (optional)
* `latency_ms` (int) (optional)

Example:

```json
{
  "ok": true,
  "status": "ok",
  "features": { "watchlist": true, "ratings": true, "history": true, "playlists": false },
  "latency_ms": 120,
  "api": { "calls": 6, "errors": 0 }
}
```

If fields are missing, the orchestrator falls back conservatively.

***

### How health gates runs

#### Auth failed

If either side of a pair has:

* `status == "auth_failed"`

Then:

* the pair is skipped entirely
* features do not run
* emits: `pair:skip reason=auth_failed`

#### Down

If a provider is:

* `status == "down"`

Then behavior depends on mode:

**One-way**

* If source is down:
  * feature run is skipped (no meaningful plan)
* If destination is down:
  * plan is computed, but writes are skipped and items are recorded unresolved

**Two-way**

* Writes to down side are skipped (unresolved)
* **Observed deletions are disabled** for both sides
  * because missing snapshots can look like deletes

#### Feature support

If health contains `features[feature] == False`:

* feature run becomes a no-op and emits `feature:unsupported`

This is in addition to provider-declared `ops.features()`.

***

### Emitting health events

After calling `health`, the orchestrator emits a structured event:

* `health` with provider name + status + basic details

This lets the UI show:

* “SIMKL auth failed”
* “TRAKT rate limited”
* “PLEX ok”

Providers can also emit their own sub-events during health, especially:

* `api:hit` samples (requests made during health)

***

### Practical tips for provider authors

A good health implementation should:

* verify credentials (fast, one request if possible)
* detect rate-limits separately from auth errors
* return per-feature capability flags if certain features are unavailable
* emit `api:hit` samples for every network request

Also: don’t do heavy snapshot calls in health. That belongs in `build_index`.

***

### Troubleshooting patterns

#### “Everything skipped with auth\_failed”

* provider token invalid or expired
* connected app revoked
* wrong base URL (for Jellyfin/Emby)
* wrong protocol switch (HTTP/HTTPS) causing cookie invalidation in UI, etc.

#### “Two-way suddenly wants to delete everything”

* health didn’t mark provider as down, but snapshot was empty
* drop guard should catch it if checkpoints exist
* if checkpoints missing, implement provider `activities()` and return an updated timestamp

#### “Feature says unsupported but provider supports it”

* health `features` map is returning false accidentally
* or `ops.features()` says false

***

### Related pages

* Provider health contract: [Provider contract](/blueprint-architecture/orchestrator/provider-contract.md)
* Drop guard and delete suppression: [Guardrails](/blueprint-architecture/orchestrator/guardrails.md)
* Observed deletions gating: [Two-way sync](/blueprint-architecture/orchestrator/two-way-sync.md)
* Health event format: [Eventing](/blueprint-architecture/orchestrator/eventing.md)
  {% endtab %}
  {% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://wiki.crosswatch.app/blueprint-architecture/orchestrator/health.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
