# Sampler Options and Mathematical Updates This page documents sampler backends available in SPIDER and the update equations used in practice. ## Available sampler backends Configured at: - `inference.sampler.backend` Supported values: - `psgld` - `sghmc` Backend selection is handled in `spider.optim.backends.create_sampler_backend`. ## Common runtime conventions For both backends in SPIDER: - Drift uses minibatch mean gradient scaled by total observations: - $g_{\text{drift}} = N \,\bar g$ - User-configured sampler learning rate is internally scaled per observation at backend creation: - $\lambda_{\text{effective}} = \lambda_{\text{config}} / N$ - Noise is off in Phase 2, ramped in Phase 3, and active in Phase 4. Common controls: - `temperature` - `beta`, `eps` - `freeze_preconditioner_sampling` - `grad_clip_norm` ## Preconditioner options Configured at: - `inference.sampler.preconditioning.enabled` - `inference.sampler.preconditioning.type` Supported preconditioner types: - `rmsprop` (diagonal) - `lrd` (low-rank plus diagonal) When preconditioning is enabled, both drift and injected noise are scaled by the same metric. ## pSGLD backend Class: - `spider.optim.sgld.pSGLD` With diagonal preconditioner $G$, the implemented step is: $$ \theta_{t+1} = \theta_t - \Big(\lambda\, G_t\, g_{\text{drift}} + \lambda\,\Gamma_t\Big) + \sqrt{2\,\lambda\,T}\,\sqrt{G_t}\,\xi_t $$ where: - $\xi_t \sim \mathcal N(0, I)$ - $\Gamma_t$ is an optional diagonal approximation to the pSGLD correction term (enabled by `include_gamma`) - $G_t = (\epsilon + \sqrt{v_t})^{-1}$ for RMSprop mode RMSprop second-moment update: $$ v_t = \beta v_{t-1} + (1-\beta)\,\bar g_t^{\,2} $$ ## SGHMC backend Class: - `spider.optim.sghmc.SGHMC` With momentum $p$, friction $\alpha$, and diagonal $G$: $$ p_{t+1} = (1-\alpha)\,p_t - \lambda\,G_t\,g_{\text{drift}} + \sqrt{2\alpha\,\lambda}\,\,\sqrt{T}\,\sqrt{G_t}\,\xi_t $$ $$ \theta_{t+1} = \theta_t + p_{t+1} $$ RMSprop metric in SGHMC uses bias-corrected second moment: $$ v_t = \beta v_{t-1} + (1-\beta)\,\bar g_t^{\,2},\qquad \hat v_t = \frac{v_t}{1-\beta^t},\qquad G_t = (\epsilon + \sqrt{\hat v_t})^{-1} $$ ## LRD preconditioner math Implemented in both samplers via `_build_lrd_metric(...)`. Metric form: $$ P_t = \operatorname{diag}(d_t) + U_t\,\operatorname{diag}(\lambda_t)\,U_t^\top $$ Drift preconditioning: $$ P_t g = d_t \odot g + U_t\Big(\lambda_t \odot (U_t^\top g)\Big) $$ Noise is drawn with covariance proportional to $P_t$: - diagonal part via $\sqrt{d_t}\odot z_1$ - low-rank part via $U_t(\sqrt{\lambda_t}\odot z_2)$ with independent standard-normal $z_1, z_2$. ### LRD subspace update modes Configured under: - `inference.sampler.preconditioning.lrd.mode` Modes: - `svd`: maintain a gradient buffer and periodically update $U,\lambda$ from batched SVD. - `oja`: online Oja-style subspace updates with learning rate `eta`. Useful LRD knobs: - `rank` - `mode` (`svd` or `oja`) - `update_every` (svd mode) - `buffer_size` (svd mode) - `eta`/`oja_eta` (oja mode) - `diag_floor` ## What is currently exposed in config Production path (via backend factory) currently exposes: - `psgld` - `sghmc` - `rmsprop` or `lrd` preconditioning There is an additional optimizer class in code (`AdaptiveDriftSGLDAdam`), but it is not currently selected by `inference.sampler.backend`. ## Practical tuning interpretation - Increase `eps` to reduce extreme preconditioner amplification. - Increase `beta` for smoother/slower preconditioner adaptation. - In SGHMC, lower `sghmc_alpha` to reduce damping; raise it to damp oscillations. - Use `freeze_preconditioner_sampling=true` for time-homogeneous Phase 4 kernels after adaptation is mature. - Use `grad_clip_norm` as a safety guardrail when exploring higher learning rates. See also: - {doc}`sampler-health` - {doc}`pcg-whitening-convergence`