Post-processing and Analysis Tools

This page summarizes SPIDER post-processing utilities for reading samples, building summaries, diagnostics, and plotting.

1) Load samples

Primary reader:

spider.io.samples.read_all_samples(...)

Example:

from spider.io.samples import read_all_samples

samples = read_all_samples(
    {"samples_outfile": "SPIDER_samples_final.h5"},
    backend="numpy",
    thin=5,
)

Returned core arrays:

event_ids
longitude, latitude, depth
X, Y, Z, delta_t

Useful metadata keys for chain-aware analysis:

_batch_names, _batch_boundaries, _batch_slices
_sample_chain_idx
_chain_segments, _chain_slices, _chain_indices

Map-only behavior:

If no batch_* groups exist but root map_* datasets exist, read_all_samples returns a single-sample map-only structure.

Related I/O helpers:

merge_samples_hdf5(...) (merge multi-chain sample files)
read_growclust_bootstrap(...) (convert GrowClust bootstrap outputs into SPIDER-like sample dict)

2) Build event summaries

High-level entrypoint:

spider.analysis.compute_cat_dd_and_xyz(...)

This returns an EventSamplesSummary with centered sample arrays and optional catalog summary table.

Example:

from spider.analysis import compute_cat_dd_and_xyz

summary = compute_cat_dd_and_xyz(
    samples,
    burn_in=1000,
    include=["X", "Y", "Z", "T", "cat_dd"],
    uncertainty_metrics=["sigma", "std", "mad", "qhw_0.95"],
)

Key options:

include: choose outputs (X, Y, Z, T, lats, lons, deps, cat_dd)
burn_in, thin
compute_map / map_bins for histogram mode estimates
add_wasserstein to append per-event prior-vs-posterior Wasserstein diagnostics

3) ESS diagnostics

ESS utilities in spider.analysis.results:

compute_effective_sample_size(summary, ...)
compute_ess_summary(summary, ...)

compute_ess_summary returns per-event and aggregate ESS metrics, including:

ess_per_event_x, ess_per_event_y, ess_per_event_z, ess_per_event_t
conservative ess_per_event_xyzt_min

These are useful for identifying under-mixed events and uneven exploration.

4) Wasserstein prior-vs-posterior diagnostics

Module:

spider.analysis.prior_posterior_wasserstein

Main routines:

compute_event_wasserstein(...) (from HDF5 samples + config)
compute_event_wasserstein_from_samples(...) (from in-memory sample dict)

CLI-style usage:

python -m spider.analysis.prior_posterior_wasserstein \
  --samples SPIDER_samples_final.h5 \
  --config SPIDER.json \
  --burn 0.2 \
  --thin 5 \
  --dims 0,1,2

5) Plotting tools

Main plotting API (spider.plotting):

plot_event_distributions(...)
plot_event_chains(...)
plot_uncertainty_histograms(...)
plot_event_marginal_hist2d(...)
plot_noise_scale_posterior_vs_prior(...)

Additional 2D KDE marginal helper (currently defined in spider.plotting.events):

plot_event_marginal_kde2d(...)

Example:

from spider.plotting import (
    plot_event_distributions,
    plot_event_chains,
    plot_uncertainty_histograms,
)
from spider.plotting.events import plot_event_marginal_kde2d

fig, ax = plot_event_distributions(summary, coords=("X", "Y", "Z"))
fig, ax = plot_event_chains(summary, coords=("X", "Y", "Z", "T"))
fig, ax = plot_uncertainty_histograms(summary, coords=("X", "Y", "Z", "T"))
fig, axes = plot_event_marginal_kde2d(samples, event_index=0, coords=("X", "Y", "Z"))

6) Typical post-processing workflow

Read samples with thinning (read_all_samples).
Build summary (compute_cat_dd_and_xyz) including X/Y/Z/T and cat_dd.
Compute ESS summary and inspect low-tail events.
Generate chain and marginal plots.
(Optional) run calibration and Wasserstein diagnostics for deeper quality checks.