# Post-processing and Analysis Tools

This page summarizes SPIDER post-processing utilities for reading samples, building summaries, diagnostics, and plotting.

## 1) Load samples

Primary reader:

- `spider.io.samples.read_all_samples(...)`

Example:

```python
from spider.io.samples import read_all_samples

samples = read_all_samples(
    {"samples_outfile": "SPIDER_samples_final.h5"},
    backend="numpy",
    thin=5,
)
```

Returned core arrays:

- `event_ids`
- `longitude`, `latitude`, `depth`
- `X`, `Y`, `Z`, `delta_t`

Useful metadata keys for chain-aware analysis:

- `_batch_names`, `_batch_boundaries`, `_batch_slices`
- `_sample_chain_idx`
- `_chain_segments`, `_chain_slices`, `_chain_indices`

Map-only behavior:

- If no `batch_*` groups exist but root `map_*` datasets exist, `read_all_samples` returns a single-sample map-only structure.

Related I/O helpers:

- `merge_samples_hdf5(...)` (merge multi-chain sample files)
- `read_growclust_bootstrap(...)` (convert GrowClust bootstrap outputs into SPIDER-like sample dict)

## 2) Build event summaries

High-level entrypoint:

- `spider.analysis.compute_cat_dd_and_xyz(...)`

This returns an `EventSamplesSummary` with centered sample arrays and optional catalog summary table.

Example:

```python
from spider.analysis import compute_cat_dd_and_xyz

summary = compute_cat_dd_and_xyz(
    samples,
    burn_in=1000,
    include=["X", "Y", "Z", "T", "cat_dd"],
    uncertainty_metrics=["sigma", "std", "mad", "qhw_0.95"],
)
```

Key options:

- `include`: choose outputs (`X`, `Y`, `Z`, `T`, `lats`, `lons`, `deps`, `cat_dd`)
- `burn_in`, `thin`
- `compute_map` / `map_bins` for histogram mode estimates
- `add_wasserstein` to append per-event prior-vs-posterior Wasserstein diagnostics

## 3) ESS diagnostics

ESS utilities in `spider.analysis.results`:

- `compute_effective_sample_size(summary, ...)`
- `compute_ess_summary(summary, ...)`

`compute_ess_summary` returns per-event and aggregate ESS metrics, including:

- `ess_per_event_x`, `ess_per_event_y`, `ess_per_event_z`, `ess_per_event_t`
- conservative `ess_per_event_xyzt_min`

These are useful for identifying under-mixed events and uneven exploration.

## 4) Wasserstein prior-vs-posterior diagnostics

Module:

- `spider.analysis.prior_posterior_wasserstein`

Main routines:

- `compute_event_wasserstein(...)` (from HDF5 samples + config)
- `compute_event_wasserstein_from_samples(...)` (from in-memory sample dict)

CLI-style usage:

```bash
python -m spider.analysis.prior_posterior_wasserstein \
  --samples SPIDER_samples_final.h5 \
  --config SPIDER.json \
  --burn 0.2 \
  --thin 5 \
  --dims 0,1,2
```

## 5) Plotting tools

Main plotting API (`spider.plotting`):

- `plot_event_distributions(...)`
- `plot_event_chains(...)`
- `plot_uncertainty_histograms(...)`
- `plot_event_marginal_hist2d(...)`
- `plot_noise_scale_posterior_vs_prior(...)`

Additional 2D KDE marginal helper (currently defined in `spider.plotting.events`):

- `plot_event_marginal_kde2d(...)`

Example:

```python
from spider.plotting import (
    plot_event_distributions,
    plot_event_chains,
    plot_uncertainty_histograms,
)
from spider.plotting.events import plot_event_marginal_kde2d

fig, ax = plot_event_distributions(summary, coords=("X", "Y", "Z"))
fig, ax = plot_event_chains(summary, coords=("X", "Y", "Z", "T"))
fig, ax = plot_uncertainty_histograms(summary, coords=("X", "Y", "Z", "T"))
fig, axes = plot_event_marginal_kde2d(samples, event_index=0, coords=("X", "Y", "Z"))
```

## 6) Typical post-processing workflow

1. Read samples with thinning (`read_all_samples`).
2. Build summary (`compute_cat_dd_and_xyz`) including `X/Y/Z/T` and `cat_dd`.
3. Compute ESS summary and inspect low-tail events.
4. Generate chain and marginal plots.
5. (Optional) run calibration and Wasserstein diagnostics for deeper quality checks.