Sampler Options and Mathematical Updates
This page documents sampler backends available in SPIDER and the update equations used in practice.
Available sampler backends
Configured at:
inference.sampler.backend
Supported values:
psgldsghmc
Backend selection is handled in spider.optim.backends.create_sampler_backend.
Common runtime conventions
For both backends in SPIDER:
Drift uses minibatch mean gradient scaled by total observations:
\(g_{\text{drift}} = N \,\bar g\)
User-configured sampler learning rate is internally scaled per observation at backend creation:
\(\lambda_{\text{effective}} = \lambda_{\text{config}} / N\)
Noise is off in Phase 2, ramped in Phase 3, and active in Phase 4.
Common controls:
temperaturebeta,epsfreeze_preconditioner_samplinggrad_clip_norm
Preconditioner options
Configured at:
inference.sampler.preconditioning.enabledinference.sampler.preconditioning.type
Supported preconditioner types:
rmsprop(diagonal)lrd(low-rank plus diagonal)
When preconditioning is enabled, both drift and injected noise are scaled by the same metric.
pSGLD backend
Class:
spider.optim.sgld.pSGLD
With diagonal preconditioner \(G\), the implemented step is:
where:
\(\xi_t \sim \mathcal N(0, I)\)
\(\Gamma_t\) is an optional diagonal approximation to the pSGLD correction term (enabled by
include_gamma)\(G_t = (\epsilon + \sqrt{v_t})^{-1}\) for RMSprop mode
RMSprop second-moment update:
SGHMC backend
Class:
spider.optim.sghmc.SGHMC
With momentum \(p\), friction \(\alpha\), and diagonal \(G\):
RMSprop metric in SGHMC uses bias-corrected second moment:
LRD preconditioner math
Implemented in both samplers via _build_lrd_metric(...).
Metric form:
Drift preconditioning:
Noise is drawn with covariance proportional to \(P_t\):
diagonal part via \(\sqrt{d_t}\odot z_1\)
low-rank part via \(U_t(\sqrt{\lambda_t}\odot z_2)\)
with independent standard-normal \(z_1, z_2\).
LRD subspace update modes
Configured under:
inference.sampler.preconditioning.lrd.mode
Modes:
svd: maintain a gradient buffer and periodically update \(U,\lambda\) from batched SVD.oja: online Oja-style subspace updates with learning rateeta.
Useful LRD knobs:
rankmode(svdoroja)update_every(svd mode)buffer_size(svd mode)eta/oja_eta(oja mode)diag_floor
What is currently exposed in config
Production path (via backend factory) currently exposes:
psgldsghmcrmsproporlrdpreconditioning
There is an additional optimizer class in code (AdaptiveDriftSGLDAdam), but it is not currently selected by inference.sampler.backend.
Practical tuning interpretation
Increase
epsto reduce extreme preconditioner amplification.Increase
betafor smoother/slower preconditioner adaptation.In SGHMC, lower
sghmc_alphato reduce damping; raise it to damp oscillations.Use
freeze_preconditioner_sampling=truefor time-homogeneous Phase 4 kernels after adaptation is mature.Use
grad_clip_normas a safety guardrail when exploring higher learning rates.
See also: