Grok Challenge #1, Paper 2 - Intrafere Research Group

← PAPER DIRECTORY

This is the 2nd paper and rough draft answer generated for @Grok’s 2/28/2026 challenge to MOTO ASI by Intrafere Research Group. A fully autonomous super intelligence AI harness. Company Website: https://intrafere.com/ Software GitHub that produced this paper: https://github.com/Intrafere/MOTO-Autonomous-ASI ================================================================================ AUTONOMOUS AI SOLUTION Disclaimer: This is an autonomous AI solution generated with the MOTO harness. This paper was not peer reviewed and was autonomously generated without user oversight or interaction beyond the original user prompt, therefore, this text may contain errors. These papers often contain ambitious content and/or extraordinary claims, all content should be viewed with extreme scrutiny. (EDITOR NOTE: This single paper does not attempt to solve the user’s prompt entirely, it is meant to be one piece toward the complex solution required for the users prompt – total solutions typically are achieved in later papers) User’s Research Prompt: Deliver a complete, engineering-ready blueprint for a compact stellarator fusion reactor achieving sustained Q>15 net gain by 2030—using only near-term materials, full MHD/plasma stability models, tritium breeding cycle, and <$5B build cost. Include all equations, sim code, and falsifiable tests. AI Model Authors: z-ai/glm-5, openai/gpt-5.2, x-ai/grok-4, moonshotai/kimi-k2.5 Possible Models Used for Additional Reference: - moonshotai/kimi-k2.5 - openai/gpt-5.2 - x-ai/grok-4 - z-ai/glm-5 Generated: 2026-02-28 ================================================================================ Paper Title: An Implicit-Adjoint and Certified-Constraint Pipeline for Compact Stellarator Optimization Abstract We present an optimization-ready mathematical specification for compact stellarator design in which equilibrium, Boozer-coordinate transforms, stability proxies, bootstrap-current closure, and coil engineering are treated as a single implicitly defined map. The full pipeline state U is defined by one coupled finite-dimensional residual system R(U,p)=0 in the design parameters p, including linear Boozer solves, generalized eigenproblems with normalization constraints, bootstrap fixed-point residuals, and convex coil synthesis represented by KKT conditions. Under standard local well-posedness and nondegeneracy hypotheses, an implicit-adjoint formulation yields end-to-end gradients of any scalar objective or constraint margin using transpose-Jacobian solves and matrix-free JVP/VJP interfaces. To make inequality constraints safe numerical oracles inside gradient-based optimization, we develop modular certification wrappers that return conservative bounds (or explicit “inconclusive” statuses) rather than unaudited point estimates. These include: certified bounds for Boozer-derived quasi-symmetry/ripple metrics from residual norms, lower singular-value estimates, and truncation terms; one-sided stability certification for smallest-eigenvalue margins using Rayleigh-quotient instability detection and residual-based lower bounds requiring separation evidence; certified continuous min/max constraints from discrete samples via Lipschitz/\delta-net envelopes; and coil realizability posed as a convex program with auditable feasibility and outer sensitivities carried by dual variables. Additional layers treat bootstrap consistency, island/stochasticity risk, and prompt-loss proxies within the same diagnostic-and-rejection contract, and combine numerical error bars with linearized robust worst-case inflation under declared parameter uncertainty. The framework provides numerical soundness for the stated discrete problems while explicitly flagging required external validation of the underlying physics proxies. I. Introduction Compact stellarators are commonly designed by iterating a chain of numerical modules: a three-dimensional equilibrium solve determines magnetic geometry; coordinate transforms (often in Boozer angles) provide spectral representations used for quasi-symmetry and ripple metrics; ideal-MHD (or proxy) stability calculations assess margins; bootstrap-current closures couple profiles back into equilibrium; and separate coil-synthesis stages impose engineering and manufacturability constraints. When such a pipeline is embedded inside gradient-based constrained optimization, two mathematical issues become central. First, the end-to-end map from design parameters to objectives and constraint margins is typically defined only implicitly, through nested nonlinear solves, linear systems, eigenproblems, and inner optimization subproblems. Second, many constraint-relevant outputs (e.g., extrema on surfaces, smallest eigenvalues, or spectral norms) are vulnerable to discretization, sampling, and conditioning artifacts that can yield false feasibility or spurious “improvements” if treated as exact. This paper develops an optimization-ready mathematical specification for an end-to-end stellarator design loop that addresses both issues in a single framework. The first component is an implicit-adjoint formulation: we represent the entire pipeline state $\mathbf{U}$ (equilibrium variables, Boozer-transform unknowns, stability eigenpairs, bootstrap current/profile variables, and coil-subproblem primal/dual variables) as the solution of one coupled residual system $\mathcal{R}(\mathbf{U},\mathbf{p})=\mathbf{0}$ for design parameters $\mathbf{p}$. Under standard local well-posedness hypotheses (invertibility of the state Jacobian and suitable nondegeneracy conditions for eigenpairs and KKT systems), the implicit function theorem yields differentiability of $\mathbf{U}(\mathbf{p})$, and adjoint identities provide end-to-end gradients of any scalar objective or constraint margin at the cost of a transpose-Jacobian solve, without forming dense Jacobians. The second component is a “certified constraint” philosophy intended to make inequality constraints safe to use as numerical oracles inside optimization. Rather than treating computed quantities as exact, modules are required (when used for pass/fail decisions) to return conservative bounds or error bars derived from auditable diagnostics such as residual norms, conditioning proxies, separation evidence for eigenvalues, and sampling resolution metadata. Constraints are then evaluated against the conservative bound in the correct inequality direction (e.g., upper bounds for quantities constrained above, lower bounds for stability margins constrained below). When required certificate inputs are unavailable or indicate non-credibility, the pipeline must return an explicit inconclusive status so that the optimizer cannot silently accept numerically fragile evaluations. Within this scope, the paper’s main deliverables are: 1. A unified residual formulation of a coupled equilibrium $\to$ Boozer $\to$ stability $\to$ bootstrap $\to$ coil pipeline, including the representation of generalized eigenproblems via residuals with normalization constraints and the representation of convex coil synthesis via KKT residuals. 2. A block-structured adjoint system for end-to-end sensitivities that mirrors the module couplings and supports matrix-free implementations through JVP/VJP interfaces and bilinear-form contractions (notably for eigen-operator derivatives). 3. Modular certification wrappers for numerically fragile constraints, including: (i) Boozer-derived quasi-symmetry/ripple metrics certified by combining linear residual norms with lower singular-value bounds and declared truncation error terms; (ii) ideal-MHD stability proxy margins certified one-sidedly by Rayleigh-quotient instability detection and, conditionally, by residual-based lower bounds that require explicit separation evidence; (iii) generic min/max constraints certified from discrete samples using Lipschitz/$\delta$-net bounds and adaptive refinement; and (iv) coil realizability posed as a convex program providing auditable feasibility and outer sensitivities via dual variables under KKT credibility checks. 4. Additional physics-coupling layers expressed in the same contract: bootstrap-current self-consistency treated as a fixed-point residual with contraction/conditioning diagnostics that gate sensitivity credibility; a near-integrable island/stochasticity risk certificate to gate the applicability of nested-surface-dependent metrics; and a prompt-loss certificate formulated as a conservative worst-case bound over sampled initial conditions using Lipschitz propagation (Gr\"onwall-type) and explicit integration-error bookkeeping. 5. A robust-optimization wrapper that composes numerical certification error bars with linearized worst-case inflation over declared parameter uncertainty sets, producing auditable decompositions of each robust constraint margin. The paper is explicit about scope and non-claims. All results are stated for finite-dimensional discretizations and provide guarantees only for the declared discrete problems and the declared certificate prerequisites (e.g., singular-value bounds, Lipschitz constants, eigenvalue separation evidence, and integration-error bounds). The certification layers are designed to prevent numerical false positives; they do not validate that any chosen proxy model (e.g., a particular stability eigenproblem, bootstrap closure, near-integrable island estimate, or reduced orbit model) is physically adequate for a targeted device regime. Where such physical adequacy matters, the framework emphasizes “gap flags” and requires intermediate diagnostics and falsifiers that can be compared to external benchmarks or experimental measurements. Roadmap. Section II fixes the optimization variables, state variables, and the evaluation/diagnostics contract. Section III collects the analytic tools used throughout (implicit differentiation, adjoints, eigen-sensitivity basics, residual-based eigenvalue enclosures, Lipschitz sampling bounds, KKT/envelope identities, and linearized robust constraints). Section IV formulates the coupled residual system that defines the end-to-end state. Section V derives the block implicit-adjoint sensitivities, including correctness tests and credibility flags. Sections VI–XII specify certification layers for Boozer metrics, stability margins, bootstrap fixed points, sampled constraints, convex coil realizability, island/stochasticity risk, and prompt-loss proxies. Sections XIII–XV connect boundary shape calculus to discrete gradients, unify robust worst-case margins across modules, and consolidate an implementation contract with mandatory rejection criteria. Section XVI records limitations and external verification requirements, and Section XVII summarizes the resulting optimization-safe, auditable pipeline. II. Problem Setting and High-Level Pipeline Specification We model end-to-end stellarator design as a constrained optimization problem in which a vector of design parameters determines (implicitly, through a chain of numerical physics/engineering solves) a collection of state variables and derived scalar metrics. The purpose of this section is to fix notation, declare the optimization interfaces, and specify what an evaluation of the pipeline must return. II.A. Design variables and parameterization We split design parameters into plasma-geometry/profile variables and coil/engineering variables. Let \[ \mathbf{p} := (\mathbf{p}_{\mathrm{geom}},\mathbf{p}_{\mathrm{prof}},\mathbf{p}_{\mathrm{coil}}) \in \mathcal{P} \subset \mathbb{R}^{n_p}, \] where $\mathcal{P}$ is a feasible set encoding basic bounds (e.g., Fourier truncation ranges, current limits). II.A.1. Plasma boundary geometry and profiles A plasma boundary surface is represented as an embedded torus $\Gamma(\mathbf{p}_{\mathrm{geom}})\subset \mathbb{R}^3$ parameterized by angles $(\theta,\zeta)\in \mathbb{T}^2$ via cylindrical coordinates \[ R(\theta,\zeta) = \sum_{m=0}^{M}\sum_{n=-N}^{N} R_{mn}\cos(m\theta-n\zeta) + \widetilde R_{mn}\sin(m\theta-n\zeta), \] \[ Z(\theta,\zeta) = \sum_{m=0}^{M}\sum_{n=-N}^{N} Z_{mn}\cos(m\theta-n\zeta) + \widetilde Z_{mn}\sin(m\theta-n\zeta), \] with $\mathbf{p}_{\mathrm{geom}}$ collecting the retained coefficients. Profiles (pressure, toroidal current, or related closure parameters) are encoded by a finite parameter vector $\mathbf{p}_{\mathrm{prof}}$. A generic example is a basis expansion on normalized toroidal flux $\psi\in[0,1]$: \[ p(\psi;\mathbf{p}_{\mathrm{prof}})=\sum_{k=1}^{K_p} a_k\varphi_k(\psi),\qquad I(\psi;\mathbf{p}_{\mathrm{prof}})=\sum_{k=1}^{K_I} b_k\chi_k(\psi), \] where $\{\varphi_k\},\{\chi_k\}$ are fixed smooth basis functions and $\{a_k\},\{b_k\}\subset \mathbf{p}_{\mathrm{prof}}$. (The specific profile choices are not assumed physically complete; they are simply a declared parametrization for optimization.) II.A.2. Coil/winding-surface parameterization and currents We allow two equivalent outer-loop representations for external conductors. (1) Winding-surface/current-potential representation. A winding surface $\Gamma_c(\mathbf{p}_{\mathrm{coil}})$ is parameterized similarly to $\Gamma$ (e.g., Fourier coefficients for $R_c,Z_c$), and a scalar current potential $\Phi$ on $\Gamma_c$ is expanded in a finite basis \[ \Phi(\theta,\zeta) = \sum_{j=1}^{n_\Phi} x_j\,\psi_j(\theta,\zeta). \] The outer design vector $\mathbf{p}_{\mathrm{coil}}$ includes the winding-surface geometry and any prescribed net currents; the coefficients $\mathbf{x}:=(x_j)$ are treated as *inner* variables determined by a coil-synthesis subproblem (see II.B and II.C). (2) Discrete coil-curve representation. A fixed number $N_c$ of filamentary coils are given as closed curves $\gamma_k\colon[0,2\pi]\to\mathbb{R}^3$ with finite Fourier descriptions; the design variables include the curve coefficients and currents $I_k$. This representation is compatible with the same constraint types below, with derived fields computed by Biot–Savart evaluation. II.B. State variables across modules Let $\mathbf{U}$ denote the full collection of state variables produced by the pipeline for a given $\mathbf{p}$: \[ \mathbf{U}:=(\mathbf{u}_{\mathrm{eq}},\,\mathbf{y}_{\mathrm{B}},\,\lambda,\,\mathbf{x},\,\mathbf{j}_{\mathrm{bs}},\,\mathbf{z}_{\mathrm{coil}},\,\boldsymbol{\mu}_{\mathrm{coil}}) \in \mathcal{U}. \] The components are declared as follows. 1. Equilibrium degrees of freedom $\mathbf{u}_{\mathrm{eq}}$. These are the unknowns of a 3D equilibrium discretization (e.g., surface shape coefficients over flux, Lagrange multipliers, field representations), determined implicitly by an equilibrium residual. 2. Boozer-transform unknowns $\mathbf{y}_{\mathrm{B}}$. These are the unknowns defining the coordinate transform or its discretized representation (e.g., Fourier coefficients of transformation functions on selected flux surfaces), determined by a linear or nonlinear residual at fixed equilibrium. 3. Stability eigenpair $(\lambda,\mathbf{x})$. Here $\lambda$ is a scalar stability proxy (e.g., smallest generalized eigenvalue of a discretized operator) and $\mathbf{x}$ is the associated eigenvector, subject to a normalization convention. 4. Bootstrap-current profile degrees of freedom $\mathbf{j}_{\mathrm{bs}}$. These encode the bootstrap current (or related parallel current) profile, typically coupled to equilibrium and rotational transform. 5. Coil-certificate variables $(\mathbf{z}_{\mathrm{coil}},\boldsymbol{\mu}_{\mathrm{coil}})$. When the coil stage is posed as a convex optimization subproblem, $\mathbf{z}_{\mathrm{coil}}$ denotes its primal optimizer (e.g., current-potential coefficients $\mathbf{x}$ and slack variables) and $\boldsymbol{\mu}_{\mathrm{coil}}$ denotes the dual variables associated with constraints. These are treated as part of the state because they enable auditable feasibility and provide sensitivities of optimal values. II.C. Scalar objectives and constraints The pipeline exposes scalar observables used by the optimizer. We separate (i) raw computed metrics and (ii) constraints expressed as inequalities in these metrics. II.C.1. Objectives We allow either a single composite objective or a multi-objective formulation. A generic composite objective has the form \[ \min_{\mathbf{p}\in\mathcal{P}}\; \mathcal{J}(\mathbf{p}) := \Phi_0(\mathbf{p}) + \sum_{k=1}^{K} w_k\,\Phi_k\bigl(\mathcal{O}_k(\mathbf{U}(\mathbf{p}),\mathbf{p})\bigr), \] where $\mathcal{O}_k$ are scalar observables derived from $\mathbf{U}$ (examples below) and $\Phi_k$ are smooth penalty or utility functions. Typical objective ingredients include: (a) Quasi-symmetry / ripple measures. Let $\mathbf{b}$ denote a vector of Boozer Fourier coefficients of $B$ (or related spectra). A representative scalar is a weighted $\ell_2$ norm of non-target modes, \[ \mathcal{M}_{\mathrm{QS}} := \|W_{\mathrm{QS}}\,P\mathbf{b}\|_2, \] where $P$ is a fixed mask/projection selecting symmetry-breaking modes and $W_{\mathrm{QS}}$ encodes weighting/units. (b) Stability-margin objectives. If $\lambda$ is a stability proxy with larger values indicating greater margin, then a common choice is $-\lambda$ or a smoothed hinge $\Phi(\lambda)=\max(0,\lambda_\star-\lambda)^2$ with target $\lambda_\star$. (c) Coil complexity / regularity proxies. For current-potential coefficients $\mathbf{x}$, a quadratic regularizer $\|L\mathbf{x}\|_2^2$ (surface gradient/Laplacian-like) is typical, where $L$ is a fixed discrete operator. II.C.2. Constraints We write inequality constraints in the standard form \[ \mathbf{g}(\mathbf{p},\mathbf{U}) \le \mathbf{0}, \] with components representing both physics and engineering constraints. Representative constraints include: (a) Quasi-symmetry / ripple thresholds: \[ \mathcal{M}_{\mathrm{QS}}(\mathbf{U},\mathbf{p}) - \varepsilon_{\mathrm{QS}} \le 0. \] (b) Stability margin thresholds (written as $g\le 0$): \[ \lambda_{\min} - \lambda(\mathbf{U},\mathbf{p}) \le 0. \] (c) Bootstrap self-consistency constraints. If the bootstrap profile is defined by a fixed-point or residual relation $\mathcal{R}_{\mathrm{bs}}(\mathbf{j}_{\mathrm{bs}},\mathbf{u}_{\mathrm{eq}},\mathbf{p})=0$, then enforce either equality (as part of the coupled residual) or a tolerance inequality \[ \|\mathcal{R}_{\mathrm{bs}}(\cdot)\| \le \tau_{\mathrm{bs}}. \] (d) Coil engineering constraints, examples: - Minimum coil–plasma clearance $d_{\mathrm{cp}}$: \[ d_{\min} - d_{\mathrm{cp}}(\mathbf{p},\mathbf{U}) \le 0. \] - Maximum curvature $\kappa_{\max}$ of discrete coil curves: \[ \sup_s \kappa_k(s) - \kappa_{\max} \le 0\quad \text{for each }k. \] - Peak field on coils $B_{\max}$: \[ \sup_{x\in\Gamma_c}\|\mathbf{B}(x)\| - B_{\max} \le 0. \] (e) Manufacturability/geometry feasibility proxies. Examples include bounds on high-order boundary Fourier content \[ \sum_{m,n} w_{mn}\,(R_{mn}^2+\widetilde R_{mn}^2+Z_{mn}^2+\widetilde Z_{mn}^2) - K_{\max} \le 0, \] or additional inequalities that penalize extreme nonconvexity or loss of embedding (treated here as declared constraints to be checked by the implementation). II.D. Optimization statement We express the end-to-end design problem as a constrained nonlinear program with implicitly defined states: \[ \min_{\mathbf{p}\in\mathcal{P}}\; \mathcal{J}(\mathbf{p},\mathbf{U}) \quad\text{subject to}\quad \mathcal{R}(\mathbf{U},\mathbf{p})=\mathbf{0},\qquad \mathbf{g}(\mathbf{p},\mathbf{U})\le\mathbf{0}. \] Here $\mathcal{R}(\mathbf{U},\mathbf{p})=\mathbf{0}$ is an aggregate residual capturing the coupled state definition (equilibrium, transforms, eigenproblems, fixed points, and inner coil synthesis when present). The admissible use of penalty or barrier surrogates is encoded by allowing $\mathcal{J}$ to include smooth terms $\Phi_k$ that penalize constraint violation or proximity to a feasibility boundary, provided the implementation reports the underlying constraint margins $g_i$ alongside the surrogate value. Multi-objective variants are represented either by weights $w_k$ (scalarization) or by seeking Pareto-stationary points; the mathematical interfaces below depend only on having scalar outputs and their derivatives. II.E. Required intermediate observables and a falsifiability/diagnostics contract An evaluation of the pipeline at $\mathbf{p}$ must return not only nominal objective/constraint values but also sufficient diagnostics to detect numerical non-credibility of those values. Concretely, the pipeline is required to output a tuple \[ \mathcal{E}(\mathbf{p}) = \bigl(\widehat{\mathcal{J}},\,\widehat{\mathbf{g}},\,\mathsf{Diag}\bigr), \] where $\widehat{\mathcal{J}}$ and $\widehat{\mathbf{g}}$ are computed values and $\mathsf{Diag}$ is a structured diagnostic record containing (at minimum) the following categories. 1. Residual and convergence norms. For each implicit solve in $\mathcal{R}$, report a norm of the final residual and a termination flag, e.g. \[ \|\mathcal{R}_{\mathrm{eq}}\|,\;\|\mathcal{R}_{\mathrm{B}}\|,\;\|\mathcal{R}_{\mathrm{bs}}\|, \] and solver tolerances used. 2. Discretization metadata. For each computed metric depending on discretization choices, report resolution parameters (e.g., grid sizes, Fourier truncation $M,N$, quadrature order) so that changes in $\widehat{\mathcal{J}},\widehat{\mathbf{g}}$ can be audited against resolution. 3. Conditioning / sensitivity indicators. For linear solves, report a conditioning proxy sufficient to detect ill-posed evaluation (e.g., a lower bound on the smallest singular value estimate or an iterative-solver stagnation indicator). 4. Certified bounds when available. If a module returns an interval-valued guarantee \[ \mathcal{O}(\mathbf{p}) \in [\mathcal{O}^{\mathrm{lo}}(\mathbf{p}),\,\mathcal{O}^{\mathrm{hi}}(\mathbf{p})], \] then constraints must be evaluated against the conservative bound that preserves the intended inequality direction. For example, if the intended constraint is $\mathcal{M}_{\mathrm{QS}}\le \varepsilon_{\mathrm{QS}}$, then the reported conservative constraint value is \[ \mathcal{M}_{\mathrm{QS}}^{\mathrm{hi}} - \varepsilon_{\mathrm{QS}} \le 0, \] while for a stability margin constraint $\lambda\ge\lambda_{\min}$, the conservative constraint value is \[ \lambda_{\min} - \lambda^{\mathrm{lo}} \le 0. \] 5. Mandatory "inconclusive" semantics. If a required diagnostic indicates non-credibility (e.g., solver failure, undefined coordinate transform, missing certificate), then the pipeline must return an explicit status indicating that $\widehat{\mathcal{J}}$ and/or selected components of $\widehat{\mathbf{g}}$ are not to be interpreted as valid pass/fail constraints at $\mathbf{p}$. This requirement prevents silent propagation of numerically invalid metrics into the optimizer. This completes the high-level specification of (i) design inputs $\mathbf{p}$, (ii) state outputs $\mathbf{U}$, (iii) objectives/constraints $(\mathcal{J},\mathbf{g})$, and (iv) the diagnostic record required for auditable evaluation. III. Mathematical Preliminaries This section collects standard analytic tools used to (i) differentiate through implicitly defined numerical solves, and (ii) wrap computed quantities in conservative bounds suitable for inequality constraints. All results are stated for finite-dimensional discretizations; extensions to Banach-space settings follow the same patterns under Fr\'echet differentiability and appropriate invertibility assumptions. III.A. Implicitly defined maps and the implicit function theorem for residual systems Let $\mathcal{R}\colon \mathbb{R}^{n_U}\times \mathbb{R}^{n_p}\to \mathbb{R}^{n_U}$ be continuously differentiable. We consider states $\mathbf{U}\in\mathbb{R}^{n_U}$ implicitly defined by \[ \mathcal{R}(\mathbf{U},\mathbf{p})=\mathbf{0}. \] \textbf{Theorem III.1 (Implicit function theorem; finite-dimensional form).} Suppose $\mathcal{R}\in C^1$ in a neighborhood of $(\mathbf{U}_0,\mathbf{p}_0)$ and $\mathcal{R}(\mathbf{U}_0,\mathbf{p}_0)=\mathbf{0}$. If the Jacobian $\partial_{\mathbf{U}}\mathcal{R}(\mathbf{U}_0,\mathbf{p}_0)\in\mathbb{R}^{n_U\times n_U}$ is invertible, then there exist neighborhoods $\mathcal{N}_p\ni \mathbf{p}_0$ and $\mathcal{N}_U\ni\mathbf{U}_0$ and a unique $C^1$ map $\mathbf{U}(\cdot)\colon \mathcal{N}_p\to \mathcal{N}_U$ such that $\mathcal{R}(\mathbf{U}(\mathbf{p}),\mathbf{p})=\mathbf{0}$ for all $\mathbf{p}\in\mathcal{N}_p$. Moreover, $\mathbf{U}$ is differentiable and satisfies, for each direction $\mathbf{v}\in\mathbb{R}^{n_p}$, \[ \partial_{\mathbf{U}}\mathcal{R}(\mathbf{U}(\mathbf{p}),\mathbf{p})\,\mathbf{U}'(\mathbf{p})[\mathbf{v}] + \partial_{\mathbf{p}}\mathcal{R}(\mathbf{U}(\mathbf{p}),\mathbf{p})\,\mathbf{v} = \mathbf{0}. \] Equivalently, \[ \mathbf{U}'(\mathbf{p})[\mathbf{v}] = -\bigl(\partial_{\mathbf{U}}\mathcal{R}(\mathbf{U}(\mathbf{p}),\mathbf{p})\bigr)^{-1}\,\partial_{\mathbf{p}}\mathcal{R}(\mathbf{U}(\mathbf{p}),\mathbf{p})\,\mathbf{v}. \] The last identity is the basis for directional sensitivities (Jacobian-vector products) via a linear solve with the state Jacobian $\partial_{\mathbf{U}}\mathcal{R}$. III.B. Adjoint sensitivity for scalar objectives (Jacobian transpose solve; JVP/VJP interfaces) Let $\mathcal{J}\colon \mathbb{R}^{n_U}\times \mathbb{R}^{n_p}\to \mathbb{R}$ be differentiable and define the reduced objective $\widehat{\mathcal{J}}(\mathbf{p}) := \mathcal{J}(\mathbf{U}(\mathbf{p}),\mathbf{p})$. Differentiating and inserting Theorem III.1 gives \[ \nabla \widehat{\mathcal{J}}(\mathbf{p}) = \partial_{\mathbf{p}}\mathcal{J}(\mathbf{U},\mathbf{p}) + \bigl(\partial_{\mathbf{U}}\mathcal{J}(\mathbf{U},\mathbf{p})\bigr)\,\mathbf{U}'(\mathbf{p}), \] where $\mathbf{U}=\mathbf{U}(\mathbf{p})$. Directly forming $\mathbf{U}'$ is expensive when $n_p$ is large. The adjoint method removes this dependence. \textbf{Proposition III.2 (Adjoint gradient identity).} Assume $\partial_{\mathbf{U}}\mathcal{R}$ is invertible at $(\mathbf{U}(\mathbf{p}),\mathbf{p})$. Let the adjoint vector $\boldsymbol{\lambda}\in\mathbb{R}^{n_U}$ solve \[ \bigl(\partial_{\mathbf{U}}\mathcal{R}(\mathbf{U},\mathbf{p})\bigr)^T \boldsymbol{\lambda} = \bigl(\partial_{\mathbf{U}}\mathcal{J}(\mathbf{U},\mathbf{p})\bigr)^T. \] Then the total derivative of the reduced objective is \[ \nabla \widehat{\mathcal{J}}(\mathbf{p}) = \partial_{\mathbf{p}}\mathcal{J}(\mathbf{U},\mathbf{p}) - \bigl(\partial_{\mathbf{p}}\mathcal{R}(\mathbf{U},\mathbf{p})\bigr)^T\boldsymbol{\lambda}. \] \textbf{Proof.} By Theorem III.1, for any direction $\mathbf{v}$, $\mathbf{U}'[\mathbf{v}]$ solves $\partial_{\mathbf{U}}\mathcal{R}\,\mathbf{U}'[\mathbf{v}] = -\partial_{\mathbf{p}}\mathcal{R}\,\mathbf{v}$. Hence \[ \frac{d}{d\epsilon}\Big|_{0}\widehat{\mathcal{J}}(\mathbf{p}+\epsilon\mathbf{v}) = \partial_{\mathbf{p}}\mathcal{J}\,\mathbf{v} + \partial_{\mathbf{U}}\mathcal{J}\,\mathbf{U}'[\mathbf{v}]. \] Choose $\boldsymbol{\lambda}$ so that $(\partial_{\mathbf{U}}\mathcal{J})\,\mathbf{U}'[\mathbf{v}] = \boldsymbol{\lambda}^T\partial_{\mathbf{U}}\mathcal{R}\,\mathbf{U}'[\mathbf{v}]$. Then \[ \partial_{\mathbf{U}}\mathcal{J}\,\mathbf{U}'[\mathbf{v}] = -\boldsymbol{\lambda}^T \partial_{\mathbf{p}}\mathcal{R}\,\mathbf{v}, \] which yields the stated identity since $\mathbf{v}$ is arbitrary. $\square$ \textbf{JVP/VJP interfaces.} For large-scale implementations one typically avoids explicit Jacobians. The primal sensitivity uses the \emph{Jacobian-vector product} (JVP) $\mathbf{w}\mapsto \partial_{\mathbf{U}}\mathcal{R}\,\mathbf{w}$, while the adjoint uses the \emph{vector-Jacobian product} (VJP) $\boldsymbol{\lambda}\mapsto (\partial_{\mathbf{U}}\mathcal{R})^T\boldsymbol{\lambda}$. Either can be provided matrix-free (e.g., by algorithmic differentiation or hand-coded linearizations). III.C. Block-structured residuals and adjoint back-substitution Let $\mathbf{U}=(\mathbf{u}_1,\dots,\mathbf{u}_K)$ and suppose the aggregate residual has the block form \[ \mathcal{R}(\mathbf{U},\mathbf{p}) = \begin{pmatrix} \mathcal{R}_1(\mathbf{u}_1,\mathbf{p})\\ \mathcal{R}_2(\mathbf{u}_1,\mathbf{u}_2,\mathbf{p})\\ \vdots\\ \mathcal{R}_K(\mathbf{u}_1,\dots,\mathbf{u}_K,\mathbf{p}) \end{pmatrix}, \] so that each block $\mathcal{R}_k$ depends only on $\mathbf{u}_1,\dots,\mathbf{u}_k$ (a block-lower-triangular dependence in $\mathbf{U}$). Then $\partial_{\mathbf{U}}\mathcal{R}$ is block lower triangular, hence $(\partial_{\mathbf{U}}\mathcal{R})^T$ is block upper triangular. \textbf{Lemma III.3 (Adjoint back-substitution for block-triangular residuals).} Assume each diagonal block $\partial_{\mathbf{u}_k}\mathcal{R}_k$ is invertible. For a scalar objective $\mathcal{J}(\mathbf{U},\mathbf{p})$, the adjoint $\boldsymbol{\lambda}=(\boldsymbol{\lambda}_1,\dots,\boldsymbol{\lambda}_K)$ solving \[ (\partial_{\mathbf{U}}\mathcal{R})^T\boldsymbol{\lambda} = (\partial_{\mathbf{U}}\mathcal{J})^T \] can be computed by back-substitution from $k=K$ down to $k=1$: \[ \bigl(\partial_{\mathbf{u}_k}\mathcal{R}_k\bigr)^T\boldsymbol{\lambda}_k = \bigl(\partial_{\mathbf{u}_k}\mathcal{J}\bigr)^T - \sum_{j=k+1}^{K}\bigl(\partial_{\mathbf{u}_k}\mathcal{R}_j\bigr)^T\boldsymbol{\lambda}_j. \] This abstract statement captures the common situation of nested or sequential solves: the primal variables $\mathbf{u}_k$ are often computed in the forward order $k=1$ to $K$, while the adjoint information propagates in the reverse order $K$ down to $1$. III.D. Eigenvalue problems and sensitivity basics (generalized eigenproblems; normalization constraints) Consider a generalized eigenproblem depending smoothly on a parameter $t$: \[ A(t)\,\mathbf{x}(t) = \lambda(t)\,M(t)\,\mathbf{x}(t), \] where $A(t),M(t)\in\mathbb{R}^{n\times n}$, and assume $M(t)$ is symmetric positive definite. For sensitivity calculations, one must fix an eigenvector normalization; a convenient choice is \[ \mathbf{x}(t)^T M(t)\,\mathbf{x}(t) = 1. \] \textbf{Proposition III.4 (Derivative of a simple \emph{symmetric} generalized eigenvalue).} Assume $A(t),M(t)\in\mathbb{R}^{n\times n}$ are differentiable at $t$, $A(t)=A(t)^T$, $M(t)=M(t)^T$, and $M(t)$ is symmetric positive definite. Let $\lambda(t)$ be a \emph{simple} eigenvalue with eigenvector $\mathbf{x}(t)$ normalized by $\mathbf{x}(t)^T M(t)\,\mathbf{x}(t)=1$. Then \[ \lambda'(t) = \mathbf{x}(t)^T\bigl(A'(t) - \lambda(t) M'(t)\bigr)\mathbf{x}(t). \] \textbf{Proof.} Differentiate $A\mathbf{x}=\lambda M\mathbf{x}$: $A'\mathbf{x}+A\mathbf{x}' = \lambda' M\mathbf{x} + \lambda M'\mathbf{x} + \lambda M\mathbf{x}'$. Left-multiply by $\mathbf{x}^T$. Using symmetry and $A\mathbf{x}=\lambda M\mathbf{x}$ gives $\mathbf{x}^T A\mathbf{x}' = (A\mathbf{x})^T\mathbf{x}' = (\lambda M\mathbf{x})^T\mathbf{x}' = \lambda\,\mathbf{x}^T M\mathbf{x}'$, so the $\mathbf{x}'$-terms cancel, leaving $\mathbf{x}^T A'\mathbf{x} = \lambda'\,\mathbf{x}^T M\mathbf{x} + \lambda\,\mathbf{x}^T M'\mathbf{x}$. With $\mathbf{x}^T M\mathbf{x}=1$, this yields the stated formula. $\square$ \textbf{Remark (nonsymmetric case).} If $A$ and/or $M$ are not symmetric, the formula generally requires a left eigenvector $\mathbf{y}$ satisfying $\mathbf{y}^T A = \lambda\,\mathbf{y}^T M$ and normalization $\mathbf{y}^T M\mathbf{x}=1$, in which case $\lambda'(t)=\mathbf{y}^T\bigl(A'(t)-\lambda(t)M'(t)\bigr)\mathbf{x}$. When eigenvalues are multiple or nearly multiple, differentiability can fail and one must work with invariant subspaces; any implementation relying on Proposition III.4 must therefore detect near-degeneracy (e.g., small spectral gaps) and treat such points as non-generic. III.E. A posteriori eigenvalue bounds as certification primitives Certification of a stability margin often requires a \emph{lower bound} on a minimal eigenvalue that is robust to discretization and solver tolerances. Full rigor depends on the exact operator class; here we record standard residual-based bounds for symmetric problems. Let $B\in\mathbb{R}^{n\times n}$ be symmetric with ordered eigenvalues $\lambda_1\le\cdots\le\lambda_n$. Given a unit vector $\mathbf{v}$ ($\|\mathbf{v}\|_2=1$), define the Rayleigh quotient $\mu:=\mathbf{v}^T B\mathbf{v}$ and residual $\mathbf{r}:=B\mathbf{v}-\mu\mathbf{v}$. \textbf{Lemma III.5 (Residual-to-spectrum distance).} There exists an eigenvalue $\lambda_j$ of $B$ such that \[ |\lambda_j-\mu|\le \|\mathbf{r}\|_2. \] \textbf{Proof.} Expand $\mathbf{v}$ in an orthonormal eigenbasis of $B$. One computes $\|\mathbf{r}\|_2^2=\sum_i (\lambda_i-\mu)^2 |\alpha_i|^2$, where $\alpha_i$ are the expansion coefficients with $\sum_i|\alpha_i|^2=1$. Hence $\|\mathbf{r}\|_2^2\ge \min_i (\lambda_i-\mu)^2$, proving the claim. $\square$ Lemma III.5 yields an \emph{enclosure} $[\mu-\|\mathbf{r}\|,\,\mu+\|\mathbf{r}\|]$ containing at least one eigenvalue. To turn this into a certified \emph{lower bound} on $\lambda_1$, one additionally needs gap information (an upper bound on the next eigenvalue) or a verified spectral separation. A common refinement (Temple-type) is as follows. \textbf{Lemma III.6 (Temple-type lower bound; requires an upper spectral bound).} Let $B$ be symmetric. Suppose one has a number $\beta$ such that $\beta>\mu$ and $\beta\le \lambda_2$ (i.e., $\beta$ is a certified lower bound on the \emph{second} eigenvalue). Then \[ \lambda_1 \ge \mu – \frac{\|\mathbf{r}\|_2^2}{\beta-\mu}. \] \textbf{Remark.} The nontrivial requirement is the verified inequality $\beta\le \lambda_2$. In practice this may be obtained from a separate coarse upper bound on $\lambda_1$ combined with a certified separation, or from a verified interval arithmetic spectral routine. The present paper uses such bounds only as \emph{interfaces}: the numerical method must report enough diagnostics (residual norms, gap estimates, and the claimed enclosure) that downstream constraints can conservatively accept/reject without relying on unverified optimism. For generalized symmetric eigenproblems $A\mathbf{x}=\lambda M\mathbf{x}$ with $M$ SPD, one may reduce to a standard symmetric problem via $B=M^{-1/2}AM^{-1/2}$, applying the above bounds with $\mathbf{v}=M^{1/2}\mathbf{x}/\|M^{1/2}\mathbf{x}\|$, at the cost of requiring norms/residuals in the transformed coordinates. III.F. Lipschitz functions on compact sets, $\delta$-nets, and certified continuous bounds from discrete samples Let $(\Omega,\|\cdot\|)$ be a compact metric space (for instance, a torus parameter domain or a closed curve). A set $G\subset \Omega$ is a $\delta$-net if for every $x\in\Omega$ there exists $g\in G$ with $\|x-g\|\le \delta$. \textbf{Definition III.7 (Lipschitz constant).} A function $f\colon\Omega\to\mathbb{R}$ is $L$-Lipschitz if $|f(x)-f(y)|\le L\|x-y\|$ for all $x,y\in\Omega$. \textbf{Theorem III.8 (Certified min/max bounds from a $\delta$-net).} Let $f$ be $L$-Lipschitz on compact $\Omega$, and let $G\subset\Omega$ be a $\delta$-net. Then \[ \min_{x\in\Omega} f(x) \ge \min_{g\in G} f(g) – L\delta, \qquad \max_{x\in\Omega} f(x) \le \max_{g\in G} f(g) + L\delta. \] \textbf{Proof.} For any $x\in\Omega$ choose $g\in G$ with $\|x-g\|\le\delta$. Then $f(x)\ge f(g)-L\delta$ and $f(x)\le f(g)+L\delta$. Taking min/max over $x$ yields the bounds. $\square$ These inequalities are the basic mechanism for turning sampled checks (e.g., a minimum distance or a maximum field) into conservative constraints that remain valid between sample points, provided $L$ and $\delta$ are declared and auditable. III.G. Convex optimization/KKT conditions and envelope-theorem gradients via dual variables We consider a parameterized convex optimization problem \[ V(\mathbf{p}) := \min_{\mathbf{z}\in\mathbb{R}^{n_z}}\; \phi(\mathbf{z},\mathbf{p}) \quad \text{s.t.}\quad \mathbf{c}(\mathbf{z},\mathbf{p})\le \mathbf{0},\; A\mathbf{z}=\mathbf{b}, \] where for each $\mathbf{p}$, $\phi(\cdot,\mathbf{p})$ is convex, and each component of $\mathbf{c}(\cdot,\mathbf{p})$ is convex. Let the Lagrangian be \[ \mathcal{L}(\mathbf{z},\boldsymbol{\mu},\boldsymbol{\nu};\mathbf{p}) := \phi(\mathbf{z},\mathbf{p}) + \boldsymbol{\mu}^T\mathbf{c}(\mathbf{z},\mathbf{p}) + \boldsymbol{\nu}^T(A\mathbf{z}-\mathbf{b}),\qquad \boldsymbol{\mu}\ge \mathbf{0}. \] \textbf{Proposition III.9 (Envelope formula under KKT regularity).} Assume strong duality holds at $\mathbf{p}$, and there exist primal-dual optimizers $(\mathbf{z}^*,\boldsymbol{\mu}^*,\boldsymbol{\nu}^*)$ satisfying KKT conditions. If $\phi$ and $\mathbf{c}$ are differentiable in $\mathbf{p}$ at $(\mathbf{z}^*,\mathbf{p})$ and the usual regularity conditions ensuring directional differentiability of $V$ hold, then for any direction $\mathbf{v}\in\mathbb{R}^{n_p}$, \[ V'(\mathbf{p})[\mathbf{v}] = \partial_{\mathbf{p}}\mathcal{L}(\mathbf{z}^*,\boldsymbol{\mu}^*,\boldsymbol{\nu}^*;\mathbf{p})\,\mathbf{v} = \partial_{\mathbf{p}}\phi(\mathbf{z}^*,\mathbf{p})\,\mathbf{v} + (\boldsymbol{\mu}^*)^T\partial_{\mathbf{p}}\mathbf{c}(\mathbf{z}^*,\mathbf{p})\,\mathbf{v}. \] \textbf{Remark.} The key point is that $d\mathbf{z}^*/d\mathbf{p}$ does not appear explicitly; sensitivities are carried by the dual variables $\boldsymbol{\mu}^*$ when the inner problem is convex and solved to KKT optimality. When constraint qualifications fail or solutions are non-unique, the formula may produce a subgradient rather than a classical gradient, and implementations must report diagnostics (duality gap, complementarity, active-set stability) to justify differentiation. III.H. Robust constraints via linearized uncertainty sets and norm bounds Optimization constraints must often be enforced not only at a nominal $\mathbf{p}$ but under small perturbations $\mathbf{p}+\delta\mathbf{p}$ representing manufacturing or diagnostic uncertainty. Let $g\colon\mathbb{R}^{n_p}\to\mathbb{R}$ be differentiable, with intended constraint $g(\mathbf{p})\le 0$. Consider an ellipsoidal uncertainty set \[ \mathcal{U} := \{\delta\mathbf{p}\in\mathbb{R}^{n_p}: \|W\,\delta\mathbf{p}\|_2\le \rho\}, \] where $W$ is invertible and $\rho>0$ is a declared magnitude. \textbf{Proposition III.10 (Linearized robust upper bound).} For any $\mathbf{p}$ and any $\delta\mathbf{p}\in\mathcal{U}$, the first-order upper model satisfies \[ \sup_{\delta\mathbf{p}\in\mathcal{U}} \bigl(g(\mathbf{p}) + \nabla g(\mathbf{p})^T\delta\mathbf{p}\bigr) = g(\mathbf{p}) + \rho\,\|W^{-T}\nabla g(\mathbf{p})\|_2. \] Consequently, the conservative (linearized) robust constraint \[ g(\mathbf{p}) + \rho\,\|W^{-T}\nabla g(\mathbf{p})\|_2 \le 0 \] implies $g(\mathbf{p}) + \nabla g(\mathbf{p})^T\delta\mathbf{p}\le 0$ for all $\delta\mathbf{p}\in\mathcal{U}$. \textbf{Proof.} The support function of $\mathcal{U}$ is computed by the change of variables $\mathbf{q}=W\delta\mathbf{p}$: $ \sup_{\|\mathbf{q}\|\le\rho} \nabla g^T W^{-1}\mathbf{q} = \rho\,\|W^{-T}\nabla g\|_2 $ by Cauchy\u2013Schwarz, with equality at $\mathbf{q}\propto W^{-T}\nabla g$. $\square$ \textbf{Remark (other uncertainty sets).} If $\mathcal{U}=\{\delta\mathbf{p}: \|D\delta\mathbf{p}\|_\infty\le \rho\}$ is a (scaled) box, then $\sup \nabla g^T\delta\mathbf{p} = \rho\,\|D^{-T}\nabla g\|_1$. Such norm expressions are compatible with conic formulations when $g$ is affine (or suitably linearized), and they yield auditable worst-case margins as long as the uncertainty model $(W,\rho)$ (or $(D,\rho)$) is declared. IV. Unified Coupled Implicit System for End-to-End Stellarator Design This section formulates the end-to-end stellarator evaluation as a single coupled residual system whose solution defines the pipeline state $\mathbf{U}(\mathbf{p})$. The aim is to make the dependence of all downstream metrics on upstream numerical solves explicit, so that differentiation and conservative constraint evaluation can be stated at the level of the coupled system. IV.A. Product-manifold view of the coupled design/state space Let $\mathcal{P}\subset\mathbb{R}^{n_p}$ denote the admissible design-parameter set from Section II, and let $\mathcal{U}\subset\mathbb{R}^{n_U}$ denote a finite-dimensional space of discretized state variables. For clarity we decompose \[ \mathbf{U}=(\mathbf{u},\mathbf{y},\mathbf{x},\lambda,\mathbf{j},\mathbf{z},\boldsymbol{\mu},\boldsymbol{\nu}), \] where – $\mathbf{u}$ are equilibrium DOFs, – $\mathbf{y}$ are Boozer-transform DOFs, – $(\lambda,\mathbf{x})$ is a generalized eigenpair used as a stability proxy, – $\mathbf{j}$ is a bootstrap-current/profile variable, – $(\mathbf{z},\boldsymbol{\mu},\boldsymbol{\nu})$ are primal/dual variables for a convex coil-synthesis (or related engineering) subproblem. The coupled design/state space is the Cartesian product $\mathcal{P}\times\mathcal{U}$. The pipeline evaluation at $\mathbf{p}$ is defined as solving a residual equation \[ \mathcal{R}(\mathbf{U},\mathbf{p})=\mathbf{0}, \] and returning derived observables $\mathcal{O}(\mathbf{U},\mathbf{p})$ together with diagnostics (residual norms, conditioning proxies, and any certificates). IV.B. Coupled residual formulation $\mathcal{R}(\mathbf{U},\mathbf{p})=0$ We define the aggregate residual as a block concatenation \[ \mathcal{R}(\mathbf{U},\mathbf{p}) := \begin{pmatrix} \mathcal{R}_{\mathrm{eq}}(\mathbf{u},\mathbf{j};\mathbf{p})\\ \mathcal{R}_{\mathrm{B}}(\mathbf{y};\mathbf{u},\mathbf{p})\\ \mathcal{R}_{\mathrm{stab}}(\lambda,\mathbf{x};\mathbf{u},\mathbf{y},\mathbf{p})\\ \mathcal{R}_{\mathrm{bs}}(\mathbf{j};\mathbf{u},\mathbf{p})\\ \mathcal{R}_{\mathrm{coil}}(\mathbf{z},\boldsymbol{\mu},\boldsymbol{\nu};\mathbf{u},\mathbf{p}) \end{pmatrix} \in \mathbb{R}^{n_U}. \] The individual blocks are defined abstractly as follows. IV.B.1. 3D equilibrium residual Let $\mathcal{R}_{\mathrm{eq}}\colon \mathbb{R}^{n_u}\times\mathbb{R}^{n_j}\times\mathbb{R}^{n_p}\to\mathbb{R}^{n_u}$ denote the discretized equilibrium equations together with any closure equations coupling bootstrap current $\mathbf{j}$ to equilibrium unknowns: \[ \mathcal{R}_{\mathrm{eq}}(\mathbf{u},\mathbf{j};\mathbf{p})=\mathbf{0}. \] This residual is an interface: the only mathematical requirement used later is differentiability (or at least consistent linearization/JVP and VJP access) and a declared notion of convergence $\|\mathcal{R}_{\mathrm{eq}}\|$ used in diagnostics. IV.B.2. Boozer coordinate transform as a linear (or linearized) solve On a chosen set of flux surfaces $\{\psi_\ell\}_{\ell=1}^{n_s}$, define Boozer transform unknowns $\mathbf{y} = (\mathbf{y}_\ell)_\ell$. We encode the transformation by residual equations of the form \[ \mathcal{R}_{\mathrm{B}}(\mathbf{y};\mathbf{u},\mathbf{p}) := \begin{pmatrix} N_1(\mathbf{u},\mathbf{p})\,\mathbf{y}_1 – \mathbf{r}_1(\mathbf{u},\mathbf{p})\\ \vdots\\ N_{n_s}(\mathbf{u},\mathbf{p})\,\mathbf{y}_{n_s} – \mathbf{r}_{n_s}(\mathbf{u},\mathbf{p}) \end{pmatrix} = \mathbf{0}. \] Here each $N_\ell$ is a matrix (or linear operator) produced by the discretization of the Boozer-transform equations at surface $\psi_\ell$, and $\mathbf{r}_\ell$ is the corresponding right-hand side. The linear structure is convenient but not essential; one may replace each row by a nonlinear residual $F_\ell(\mathbf{y}_\ell;\mathbf{u},\mathbf{p})=0$, provided the implementation supplies consistent linearizations. IV.B.3. Stability proxy as a generalized eigenproblem residual Let $A(\mathbf{u},\mathbf{y},\mathbf{p})$ and $M(\mathbf{u},\mathbf{y},\mathbf{p})$ be matrices defining a discretized generalized eigenproblem \[ A\mathbf{x}=\lambda M\mathbf{x}. \] We encode this as a residual with an explicit normalization constraint. One convenient choice is $\mathbf{x}^T M\mathbf{x}=1$, yielding \[ \mathcal{R}_{\mathrm{stab}}(\lambda,\mathbf{x};\mathbf{u},\mathbf{y},\mathbf{p}) := \begin{pmatrix} A(\mathbf{u},\mathbf{y},\mathbf{p})\,\mathbf{x} – \lambda\,M(\mathbf{u},\mathbf{y},\mathbf{p})\,\mathbf{x}\\ \mathbf{x}^T M(\mathbf{u},\mathbf{y},\mathbf{p})\,\mathbf{x} – 1 \end{pmatrix} = \mathbf{0}. \] This representation makes two facts explicit: (i) differentiability of $\lambda$ with respect to inputs is only classical when the eigenvalue is simple (cf. Proposition III.4), and (ii) the normalization equation is part of the coupled system and must be respected by the solver and the adjoint. IV.B.4. Bootstrap-current consistency as a fixed-point (or residual) constraint Let $\mathcal{T}$ denote a mapping that predicts a bootstrap current/profile from equilibrium quantities and parameters. We represent bootstrap self-consistency as a fixed-point residual \[ \mathcal{R}_{\mathrm{bs}}(\mathbf{j};\mathbf{u},\mathbf{p}) := \mathbf{j} – \mathcal{T}(\mathbf{j};\mathbf{u},\mathbf{p}) = \mathbf{0}. \] This form covers both explicit fixed-point iterations and more general coupled closures, as long as the implementation reports $\|\mathcal{R}_{\mathrm{bs}}\|$ and provides a consistent linearization of $\mathcal{T}$ with respect to $\mathbf{j},\mathbf{u},\mathbf{p}$. IV.B.5. Coil certification subproblem as a differentiable KKT residual When coils are computed by solving a convex optimization subproblem parameterized by upstream quantities, we treat its KKT conditions as part of the coupled residual. Consider the parameterized convex program \[ \min_{\mathbf{z}}\; \phi(\mathbf{z};\mathbf{u},\mathbf{p}) \quad \text{s.t.}\quad \mathbf{c}(\mathbf{z};\mathbf{u},\mathbf{p})\le \mathbf{0},\quad A\mathbf{z}=\mathbf{b}, \] with Lagrange multipliers $\boldsymbol{\mu}\ge \mathbf{0}$ (inequalities) and $\boldsymbol{\nu}$ (equalities). The KKT system can be written as a residual \[ \mathcal{R}_{\mathrm{coil}}(\mathbf{z},\boldsymbol{\mu},\boldsymbol{\nu};\mathbf{u},\mathbf{p}) := \begin{pmatrix} \nabla_{\mathbf{z}}\phi(\mathbf{z};\mathbf{u},\mathbf{p}) + \nabla_{\mathbf{z}}\mathbf{c}(\mathbf{z};\mathbf{u},\mathbf{p})^T\boldsymbol{\mu} + A^T\boldsymbol{\nu}\\ \operatorname{diag}(\boldsymbol{\mu})\,\mathbf{c}(\mathbf{z};\mathbf{u},\mathbf{p})\\ A\mathbf{z}-\mathbf{b} \end{pmatrix} = \mathbf{0}, \] interpreting the complementarity row in a standard smooth KKT residual form when a differentiable interior-point or smoothing scheme is used (the smoothing parameter, if any, must be declared and included in diagnostics). This representation is consistent with Proposition III.9: outer sensitivities of coil-derived objective/constraint values can be computed from $(\boldsymbol{\mu},\boldsymbol{\nu})$ when duality and regularity are numerically credible. IV.C. Coupling assumptions and failure modes The usefulness of the coupled residual formulation depends on assumptions that may fail for specific parameter values. We record these assumptions explicitly because they determine whether the pipeline evaluation is to be considered conclusive. 1. Invertibility / well-conditioning of linearized blocks. Differentiability via the implicit function theorem requires that $\partial_{\mathbf{U}}\mathcal{R}(\mathbf{U},\mathbf{p})$ be invertible at the solved state. Practically, this can fail through near-singularity of equilibrium linearizations, ill-conditioning of the Boozer linear systems $N_\ell$, or degeneracy in the KKT system (loss of constraint qualification or active-set instability). 2. Existence of nested flux surfaces. The Boozer stage presumes that the equilibrium representation admits a meaningful transform on the chosen surfaces. If islands or stochastic regions are present, the very definition of $\mathbf{y}$ (and thus of Boozer-derived metrics) can be ill-posed. In such cases, the appropriate behavior is to return an explicit inconclusive status rather than a nominal metric value. 3. Eigenvalue non-degeneracy. The stability proxy $\lambda$ is classically differentiable only when it is simple and separated by a nontrivial gap. If spectral gaps collapse, then the mapping $(\mathbf{u},\mathbf{y},\mathbf{p})\mapsto \lambda$ may be nonsmooth; solvers must report gap/conditioning diagnostics so that derivatives and certificate statements are not over-interpreted. 4. Bootstrap fixed-point regularity. The bootstrap residual $\mathbf{j}-\mathcal{T}(\mathbf{j};\mathbf{u},\mathbf{p})$ may admit multiple solutions (bifurcation) or be poorly contractive; a small change in $\mathbf{p}$ can then induce a large change in $\mathbf{j}$. In such regimes, both optimization and certification must treat the evaluation as fragile and require additional diagnostics (e.g., a contraction estimate). These are not merely numerical details: they govern whether the end-to-end map $\mathbf{p}\mapsto \mathcal{O}(\mathbf{U}(\mathbf{p}),\mathbf{p})$ is well-defined and differentiable. IV.D. VMEC$\to$SPEC handoff trigger as an explicit decision rule In applications one may have two alternative equilibrium representations (for example, one optimized for nested surfaces and one intended to accommodate islands). We model this as a \,\emph{decision rule} that selects which equilibrium residual is used to define $\mathbf{u}$. Let $\mathcal{R}_{\mathrm{eq}}^{(1)}$ and $\mathcal{R}_{\mathrm{eq}}^{(2)}$ denote two equilibrium residuals (e.g., two discretization/physics models). Define a diagnostic scalar $\eta(\mathbf{u},\mathbf{p})\in\mathbb{R}$ intended to indicate when the nested-surface assumptions required by downstream modules are at risk. A mathematically explicit handoff rule is then a mapping \[ \mathsf{Model}(\mathbf{p}) \in \{1,2\},\qquad \mathsf{Model}(\mathbf{p}) := \begin{cases} 1, & \eta(\mathbf{u}^{(1)}(\mathbf{p}),\mathbf{p}) \le \eta_{\max},\\ 2, & \eta(\mathbf{u}^{(1)}(\mathbf{p}),\mathbf{p}) > \eta_{\max}, \end{cases} \] where $\mathbf{u}^{(1)}(\mathbf{p})$ is the state obtained by solving $\mathcal{R}_{\mathrm{eq}}^{(1)}(\mathbf{u},\mathbf{j};\mathbf{p})=0$ (with a declared termination criterion), and $\eta_{\max}$ is a declared threshold. \textbf{Gap flag.} The validity of any particular choice of $\eta$ and $\eta_{\max}$ as a physics-relevant trigger requires external validation. Within this paper, $\eta$ is treated as an auditable diagnostic: it is recorded and may be used to declare the evaluation inconclusive or to switch models, but no physical claim is inferred from it without an independently justified calibration. With these definitions, the end-to-end state $\mathbf{U}(\mathbf{p})$ is the solution of $\mathcal{R}(\mathbf{U},\mathbf{p})=0$ formed from the chosen equilibrium block and the fixed downstream blocks. This unified residual formulation is the object differentiated by the implicit-adjoint identities of Section III. V. Implicit-Adjoint Sensitivity Derivations for the Full Pipeline We derive end-to-end first-order sensitivities for scalar objectives and constraint margins whose evaluation depends on the coupled residual system of Section IV. Throughout, all statements are for the finite-dimensional discretized pipeline, and differentiability is asserted only at parameter points where (i) the coupled residual admits a locally unique solution and (ii) the relevant linearizations are invertible (or, in nonsmooth cases, replaced by a declared smooth conservative surrogate). V.A. Baseline adjoint identity for a scalar reduced objective Let $\Psi\colon \mathbb{R}^{n_U}\times\mathbb{R}^{n_p}\to\mathbb{R}$ be a scalar quantity computed from the solved state (objective value, a single constraint margin, or any differentiable scalar diagnostic), and define the reduced map \[ \widehat\Psi(\mathbf{p}) := \Psi(\mathbf{U}(\mathbf{p}),\mathbf{p}),\qquad \text{where }\mathcal{R}(\mathbf{U}(\mathbf{p}),\mathbf{p})=\mathbf{0}. \] Assume $\partial_{\mathbf{U}}\mathcal{R}(\mathbf{U},\mathbf{p})$ is invertible at the solution. Then, with the adjoint variable $\boldsymbol{\lambda}\in\mathbb{R}^{n_U}$ defined by \[ \bigl(\partial_{\mathbf{U}}\mathcal{R}(\mathbf{U},\mathbf{p})\bigr)^T\boldsymbol{\lambda} = \bigl(\partial_{\mathbf{U}}\Psi(\mathbf{U},\mathbf{p})\bigr)^T, \] we have the standard reduced-gradient identity \[ \nabla \widehat\Psi(\mathbf{p}) = \partial_{\mathbf{p}}\Psi(\mathbf{U},\mathbf{p}) – \bigl(\partial_{\mathbf{p}}\mathcal{R}(\mathbf{U},\mathbf{p})\bigr)^T\boldsymbol{\lambda}. \] This formula is valid for any scalar $\Psi$ that is differentiable in $(\mathbf{U},\mathbf{p})$ at the solved state and for which the adjoint solve is well-posed. V.B. Block adjoint system for (equilibrium, Boozer, stability, bootstrap, coil) coupling We use the state ordering \[ \mathbf{U}=(\mathbf{u},\mathbf{y},\lambda,\mathbf{x},\mathbf{j},\mathbf{z},\boldsymbol{\mu},\boldsymbol{\nu}), \] and the residual blocks from Section IV: \[ \mathcal{R}(\mathbf{U},\mathbf{p}) = \begin{pmatrix} \mathcal{R}_{\mathrm{eq}}(\mathbf{u},\mathbf{j};\mathbf{p})\\ \mathcal{R}_{\mathrm{B}}(\mathbf{y};\mathbf{u},\mathbf{p})\\ \mathcal{R}_{\mathrm{stab}}(\lambda,\mathbf{x};\mathbf{u},\mathbf{y},\mathbf{p})\\ \mathcal{R}_{\mathrm{bs}}(\mathbf{j};\mathbf{u},\mathbf{p})\\ \mathcal{R}_{\mathrm{coil}}(\mathbf{z},\boldsymbol{\mu},\boldsymbol{\nu};\mathbf{u},\mathbf{p}) \end{pmatrix}. \] Let the adjoint split accordingly: \[ \boldsymbol{\lambda}=(\boldsymbol{\lambda}_{\mathrm{eq}},\boldsymbol{\lambda}_{\mathrm{B}},\boldsymbol{\lambda}_{\mathrm{stab}},\boldsymbol{\lambda}_{\mathrm{bs}},\boldsymbol{\lambda}_{\mathrm{coil}}), \] where each component matches the dimension of the corresponding residual block. Writing the adjoint equation $(\partial_{\mathbf{U}}\mathcal{R})^T\boldsymbol{\lambda}=(\partial_{\mathbf{U}}\Psi)^T$ in block form yields one linear system whose structure mirrors the couplings. To make the propagation of adjoint information explicit, we display the nonzero couplings implied by Section IV: – $\mathcal{R}_{\mathrm{coil}}$ depends on $(\mathbf{z},\boldsymbol{\mu},\boldsymbol{\nu})$ and on $\mathbf{u}$, but not on $\mathbf{y},\lambda,\mathbf{x},\mathbf{j}$. – $\mathcal{R}_{\mathrm{stab}}$ depends on $(\lambda,\mathbf{x})$ and on $(\mathbf{u},\mathbf{y})$. – $\mathcal{R}_{\mathrm{B}}$ depends on $\mathbf{y}$ and on $\mathbf{u}$. – $\mathcal{R}_{\mathrm{eq}}$ and $\mathcal{R}_{\mathrm{bs}}$ couple $\mathbf{u}$ and $\mathbf{j}$ (and $\mathbf{p}$). Consequently, the adjoint system can be solved efficiently by exploiting partial block-triangularity: one may (i) solve the coil adjoint in the coil KKT block, (ii) solve the stability adjoint in the eigen residual block, (iii) solve the Boozer adjoint in the Boozer linear-solve block, and (iv) solve a coupled $(\mathbf{u},\mathbf{j})$ adjoint system that receives right-hand-side contributions from all downstream blocks. V.B.1. Coil adjoint component (KKT block) Let $\mathbf{w}:=(\mathbf{z},\boldsymbol{\mu},\boldsymbol{\nu})$ denote the coil subproblem primal-dual variables and write $\mathcal{R}_{\mathrm{coil}}(\mathbf{w};\mathbf{u},\mathbf{p})=\mathbf{0}$. The coil portion of the adjoint equation is \[ \bigl(\partial_{\mathbf{w}}\mathcal{R}_{\mathrm{coil}}(\mathbf{w};\mathbf{u},\mathbf{p})\bigr)^T\,\boldsymbol{\lambda}_{\mathrm{coil}} = \bigl(\partial_{\mathbf{w}}\Psi(\mathbf{U},\mathbf{p})\bigr)^T. \] This is precisely the transpose of the KKT Jacobian (or of its smoothed/interior-point residual). If the coil stage is solved by a primal-dual method, the linear system above is typically available (or can be applied) as part of the solver infrastructure. The influence of the coil stage on upstream sensitivities enters through the coupling derivative $\partial_{\mathbf{u}}\mathcal{R}_{\mathrm{coil}}$: once $\boldsymbol{\lambda}_{\mathrm{coil}}$ is known, its contribution to the equilibrium right-hand side is the vector \[ \bigl(\partial_{\mathbf{u}}\mathcal{R}_{\mathrm{coil}}(\mathbf{w};\mathbf{u},\mathbf{p})\bigr)^T\boldsymbol{\lambda}_{\mathrm{coil}}, \] which must be accumulated into the $\mathbf{u}$-adjoint equation. V.B.2. Stability eigenproblem adjoint component ($\lambda$-row and normalization handling) Recall the stability residual with normalization \[ \mathcal{R}_{\mathrm{stab}}(\lambda,\mathbf{x};\mathbf{u},\mathbf{y},\mathbf{p})= \begin{pmatrix} \mathbf{r}_x\\ r_n \end{pmatrix} := \begin{pmatrix} A(\mathbf{u},\mathbf{y},\mathbf{p})\mathbf{x}-\lambda M(\mathbf{u},\mathbf{y},\mathbf{p})\mathbf{x}\\ \mathbf{x}^T M(\mathbf{u},\mathbf{y},\mathbf{p})\mathbf{x}-1 \end{pmatrix}=\mathbf{0}. \] Let $\boldsymbol{\lambda}_{\mathrm{stab}}=(\boldsymbol{\lambda}_x,\lambda_n)$ where $\boldsymbol{\lambda}_x\in\mathbb{R}^{n_x}$ multiplies $\mathbf{r}_x$ and $\lambda_n\in\mathbb{R}$ multiplies the normalization residual $r_n$. The stability-block adjoint equations are obtained from \[ \bigl(\partial_{(\lambda,\mathbf{x})}\mathcal{R}_{\mathrm{stab}}\bigr)^T \boldsymbol{\lambda}_{\mathrm{stab}} = \bigl(\partial_{(\lambda,\mathbf{x})}\Psi\bigr)^T. \] Writing derivatives explicitly gives \[ \partial_{\mathbf{x}}\mathbf{r}_x = A-\lambda M,\qquad \partial_{\lambda}\mathbf{r}_x = -M\mathbf{x}, \] \[ \partial_{\mathbf{x}} r_n = 2M\mathbf{x},\qquad \partial_{\lambda} r_n = 0. \] Hence the adjoint equations in $(\boldsymbol{\lambda}_x,\lambda_n)$ read \[ (A-\lambda M)^T\boldsymbol{\lambda}_x + (2M\mathbf{x})\lambda_n = \bigl(\partial_{\mathbf{x}}\Psi\bigr)^T, \] \[ -(M\mathbf{x})^T\boldsymbol{\lambda}_x = \partial_{\lambda}\Psi. \] These equations remain meaningful even when $A$ and $M$ are nonsymmetric, provided the residual uses the actual matrices (in which case $(A-\lambda M)^T$ appears). However, solvability can fail when $A-\lambda M$ is singular (as it is at an exact eigenpair). The presence of the normalization row supplies an additional constraint that regularizes the augmented system under the standard nondegeneracy conditions for simple eigenpairs. Once $\boldsymbol{\lambda}_{\mathrm{stab}}$ is computed, its contributions to upstream adjoints are the contractions \[ \bigl(\partial_{\mathbf{u}}\mathcal{R}_{\mathrm{stab}}\bigr)^T\boldsymbol{\lambda}_{\mathrm{stab}},\qquad \bigl(\partial_{\mathbf{y}}\mathcal{R}_{\mathrm{stab}}\bigr)^T\boldsymbol{\lambda}_{\mathrm{stab}}, \] which involve terms such as $\boldsymbol{\lambda}_x^T(\partial A)\mathbf{x}$, $\boldsymbol{\lambda}_x^T(\partial M)\mathbf{x}$, and $\lambda_n\,\mathbf{x}^T(\partial M)\mathbf{x}$. These can be evaluated matrix-free via user-supplied routines for bilinear forms $(\mathbf{a},\mathbf{b})\mapsto \mathbf{a}^T(\partial A)\mathbf{b}$ and $(\mathbf{a},\mathbf{b})\mapsto \mathbf{a}^T(\partial M)\mathbf{b}$, avoiding explicit formation of $\partial A$ or $\partial M$. V.B.3. Boozer linear-solve adjoint component For each surface $\psi_\ell$, the Boozer residual has the form \[ \mathcal{R}_{\mathrm{B},\ell}(\mathbf{y}_\ell;\mathbf{u},\mathbf{p}) = N_\ell(\mathbf{u},\mathbf{p})\mathbf{y}_\ell-\mathbf{r}_\ell(\mathbf{u},\mathbf{p})=\mathbf{0}. \] Let $\boldsymbol{\lambda}_{\mathrm{B}}=(\boldsymbol{\lambda}_{\mathrm{B},\ell})_\ell$. The Boozer adjoint equation is block diagonal over $\ell$: \[ N_\ell(\mathbf{u},\mathbf{p})^T\boldsymbol{\lambda}_{\mathrm{B},\ell} = \bigl(\partial_{\mathbf{y}_\ell}\Psi(\mathbf{U},\mathbf{p})\bigr)^T. \] Given $\boldsymbol{\lambda}_{\mathrm{B},\ell}$, the contribution of Boozer to the $\mathbf{u}$-adjoint equation is \[ \bigl(\partial_{\mathbf{u}}\mathcal{R}_{\mathrm{B}}\bigr)^T\boldsymbol{\lambda}_{\mathrm{B}} = \sum_{\ell=1}^{n_s} \Bigl(\bigl(\partial_{\mathbf{u}}N_\ell\bigr)^T\!(\boldsymbol{\lambda}_{\mathrm{B},\ell}\mathbf{y}_\ell^T) – \bigl(\partial_{\mathbf{u}}\mathbf{r}_\ell\bigr)^T\boldsymbol{\lambda}_{\mathrm{B},\ell}\Bigr), \] where the first term denotes the contraction of the third-order tensor $\partial_{\mathbf{u}}N_\ell$ against $\boldsymbol{\lambda}_{\mathrm{B},\ell}$ and $\mathbf{y}_\ell$. In practice, this contraction is supplied as a directional or adjoint-directional derivative evaluation. V.B.4. Equilibrium adjoint component and the coupled ($\mathbf{u},\mathbf{j}$) solve The equilibrium and bootstrap residuals form a coupled subsystem in $(\mathbf{u},\mathbf{j})$: \[ \mathcal{R}_{\mathrm{eq}}(\mathbf{u},\mathbf{j};\mathbf{p})=\mathbf{0},\qquad \mathcal{R}_{\mathrm{bs}}(\mathbf{j};\mathbf{u},\mathbf{p})=\mathbf{0}. \] Let $\boldsymbol{\lambda}_{\mathrm{eq}}$ and $\boldsymbol{\lambda}_{\mathrm{bs}}$ be the corresponding adjoint variables. The $(\mathbf{u},\mathbf{j})$-rows of $(\partial_{\mathbf{U}}\mathcal{R})^T\boldsymbol{\lambda}=(\partial_{\mathbf{U}}\Psi)^T$ read \[ \bigl(\partial_{\mathbf{u}}\mathcal{R}_{\mathrm{eq}}\bigr)^T\boldsymbol{\lambda}_{\mathrm{eq}} + \bigl(\partial_{\mathbf{u}}\mathcal{R}_{\mathrm{B}}\bigr)^T\boldsymbol{\lambda}_{\mathrm{B}} + \bigl(\partial_{\mathbf{u}}\mathcal{R}_{\mathrm{stab}}\bigr)^T\boldsymbol{\lambda}_{\mathrm{stab}} + \bigl(\partial_{\mathbf{u}}\mathcal{R}_{\mathrm{bs}}\bigr)^T\boldsymbol{\lambda}_{\mathrm{bs}} + \bigl(\partial_{\mathbf{u}}\mathcal{R}_{\mathrm{coil}}\bigr)^T\boldsymbol{\lambda}_{\mathrm{coil}} = \bigl(\partial_{\mathbf{u}}\Psi\bigr)^T, \] \[ \bigl(\partial_{\mathbf{j}}\mathcal{R}_{\mathrm{eq}}\bigr)^T\boldsymbol{\lambda}_{\mathrm{eq}} + \bigl(\partial_{\mathbf{j}}\mathcal{R}_{\mathrm{bs}}\bigr)^T\boldsymbol{\lambda}_{\mathrm{bs}} = \bigl(\partial_{\mathbf{j}}\Psi\bigr)^T. \] These two equations define a coupled linear system in $(\boldsymbol{\lambda}_{\mathrm{eq}},\boldsymbol{\lambda}_{\mathrm{bs}})$, where the right-hand side of the $\mathbf{u}$-equation contains accumulated contributions from all downstream blocks. A computationally useful decomposition is obtained by writing the $(\mathbf{u},\mathbf{j})$ Jacobian of the primal residual subsystem \[ \mathcal{R}_{\mathrm{uj}}(\mathbf{u},\mathbf{j};\mathbf{p}):=\begin{pmatrix}\mathcal{R}_{\mathrm{eq}}(\mathbf{u},\mathbf{j};\mathbf{p})\\\mathcal{R}_{\mathrm{bs}}(\mathbf{j};\mathbf{u},\mathbf{p})\end{pmatrix}, \qquad \partial_{(\mathbf{u},\mathbf{j})}\mathcal{R}_{\mathrm{uj}}= \begin{pmatrix} \partial_{\mathbf{u}}\mathcal{R}_{\mathrm{eq}} & \partial_{\mathbf{j}}\mathcal{R}_{\mathrm{eq}}\\ \partial_{\mathbf{u}}\mathcal{R}_{\mathrm{bs}} & \partial_{\mathbf{j}}\mathcal{R}_{\mathrm{bs}} \end{pmatrix}. \] Then $(\boldsymbol{\lambda}_{\mathrm{eq}},\boldsymbol{\lambda}_{\mathrm{bs}})$ solves \[ \bigl(\partial_{(\mathbf{u},\mathbf{j})}\mathcal{R}_{\mathrm{uj}}\bigr)^T \begin{pmatrix}\boldsymbol{\lambda}_{\mathrm{eq}}\\\boldsymbol{\lambda}_{\mathrm{bs}}\end{pmatrix} = \begin{pmatrix} (\partial_{\mathbf{u}}\Psi)^T – \bigl[(\partial_{\mathbf{u}}\mathcal{R}_{\mathrm{B}})^T\boldsymbol{\lambda}_{\mathrm{B}} + (\partial_{\mathbf{u}}\mathcal{R}_{\mathrm{stab}})^T\boldsymbol{\lambda}_{\mathrm{stab}} + (\partial_{\mathbf{u}}\mathcal{R}_{\mathrm{coil}})^T\boldsymbol{\lambda}_{\mathrm{coil}}\bigr]\\ (\partial_{\mathbf{j}}\Psi)^T \end{pmatrix}. \] This highlights an important implementation point: even if the primal pipeline computes $\mathbf{u}$ and $\mathbf{j}$ by alternating fixed-point iterations, the adjoint must be consistent with the *coupled* linearization at the converged state. V.C. Chain rules through nested solves and matrix-free derivative interfaces The reduced-gradient identity of Section V.A requires two types of quantities: 1. adjoint solves with transposed Jacobians of residual blocks (e.g., $N_\ell^T\boldsymbol{\lambda}_{\mathrm{B},\ell}=\cdot$), and 2. contractions of parameter derivatives of residuals against adjoints (e.g., $(\partial_{\mathbf{p}}\mathcal{R})^T\boldsymbol{\lambda}$). To keep the interface matrix-free, each module is required to supply the following primitives at the solved state (and with declared discretization/tolerance metadata): (a) Residual-JVP: $\mathbf{w}\mapsto \partial_{\mathbf{U}}\mathcal{R}(\mathbf{U},\mathbf{p})\,\mathbf{w}$, to support Newton-like linearizations and directional checks. (b) Residual-VJP: $\boldsymbol{\lambda}\mapsto (\partial_{\mathbf{U}}\mathcal{R}(\mathbf{U},\mathbf{p}))^T\boldsymbol{\lambda}$, to support adjoint solves. (c) Parameter-VJP: $\boldsymbol{\lambda}\mapsto (\partial_{\mathbf{p}}\mathcal{R}(\mathbf{U},\mathbf{p}))^T\boldsymbol{\lambda}$, to assemble $\nabla \widehat\Psi$ without forming $\partial_{\mathbf{p}}\mathcal{R}$. (d) Objective partial derivatives: $\partial_{\mathbf{U}}\Psi$ and $\partial_{\mathbf{p}}\Psi$, again preferably as JVP/VJP actions. For eigenvalue-based quantities, a particularly important contraction is the bilinear form evaluation \[ (\mathbf{a},\mathbf{b})\longmapsto \mathbf{a}^T\bigl(\partial_{\xi}A\bigr)\mathbf{b},\qquad (\mathbf{a},\mathbf{b})\longmapsto \mathbf{a}^T\bigl(\partial_{\xi}M\bigr)\mathbf{b}, \] for $\xi\in\{\mathbf{u},\mathbf{y},\mathbf{p}\}$, since this supports both eigenvalue sensitivities (Proposition III.4) and the adjoint residual contractions needed in Section V.B.2. V.D. Constraint handling in the adjoint: penalties vs. barriers and auditable margins Constraints are expressed as $g_i(\mathbf{U},\mathbf{p})\le 0$. Optimization algorithms may act on an augmented scalar objective built from $\mathcal{J}$ and $\{g_i\}$. Two common smooth choices are: (1) Quadratic (or smooth-hinge) penalty: \[ \Psi_{\rho}(\mathbf{U},\mathbf{p}) := \mathcal{J}(\mathbf{U},\mathbf{p}) + \sum_{i=1}^m \rho_i\,\phi\bigl(g_i(\mathbf{U},\mathbf{p})\bigr), \] where $\phi$ is differentiable (e.g., $\phi(s)=\tfrac12\,\max(0,s)^2$ replaced by a declared smooth approximation). Then \[ \partial_{\mathbf{U}}\Psi_{\rho} = \partial_{\mathbf{U}}\mathcal{J} + \sum_i \rho_i\,\phi'(g_i)\,\partial_{\mathbf{U}}g_i,\qquad \partial_{\mathbf{p}}\Psi_{\rho} = \partial_{\mathbf{p}}\mathcal{J} + \sum_i \rho_i\,\phi'(g_i)\,\partial_{\mathbf{p}}g_i. \] The adjoint method applies with $\Psi$ replaced by $\Psi_{\rho}$. (2) Log barrier (feasible-side only): for $g_i<0$ one may define \[ \Psi_{\mu}(\mathbf{U},\mathbf{p}) := \mathcal{J}(\mathbf{U},\mathbf{p}) - \mu \sum_{i=1}^m \log\bigl(-g_i(\mathbf{U},\mathbf{p})\bigr),\qquad \mu>0. \] Then \[ \partial_{\mathbf{U}}\Psi_{\mu} = \partial_{\mathbf{U}}\mathcal{J} + \mu\sum_i \frac{1}{-g_i}\,\partial_{\mathbf{U}}g_i, \qquad \partial_{\mathbf{p}}\Psi_{\mu} = \partial_{\mathbf{p}}\mathcal{J} + \mu\sum_i \frac{1}{-g_i}\,\partial_{\mathbf{p}}g_i. \] This formula is well-defined only if all $g_i(\mathbf{U},\mathbf{p})<0$. Therefore any barrier-based run must report the minimum margin $\min_i(-g_i)$ and treat attempts to evaluate $\Psi_{\mu}$ with $g_i\ge 0$ as invalid. In both cases, the reduced-gradient identity produces a gradient of the *augmented* scalar. Separately, for auditability, the implementation must report the underlying constraint values $g_i$ (or their conservative certified versions when applicable) so that feasibility is never inferred solely from a surrogate. V.E. Second-order information and Hessian-vector products Second-order information can improve robustness of line searches, trust regions, and uncertainty-aware steps, but full Hessians are typically intractable. A practical interface is the Hessian-vector product $\nabla^2\widehat\Psi(\mathbf{p})\,\mathbf{v}$ for a given direction $\mathbf{v}\in\mathbb{R}^{n_p}$. Assume sufficient smoothness for second derivatives at a point where the coupled solve is well-posed. Let $\mathbf{U}'[\mathbf{v}]$ denote the directional state sensitivity, obtained by solving the linearized primal system \[ \partial_{\mathbf{U}}\mathcal{R}(\mathbf{U},\mathbf{p})\,\mathbf{U}'[\mathbf{v}] = -\partial_{\mathbf{p}}\mathcal{R}(\mathbf{U},\mathbf{p})\,\mathbf{v}. \] Then one may compute $\nabla^2\widehat\Psi\,\mathbf{v}$ without forming $\nabla^2\widehat\Psi$ by differentiating the reduced-gradient expression directionally. One convenient representation is \[ \nabla^2\widehat\Psi(\mathbf{p})\,\mathbf{v} = \partial_{\mathbf{p}\mathbf{p}}\Psi\,\mathbf{v} + \partial_{\mathbf{p}\mathbf{U}}\Psi\,\mathbf{U}'[\mathbf{v}] - \Bigl(\partial_{\mathbf{p}\mathbf{p}}\mathcal{R}\,\mathbf{v} + \partial_{\mathbf{p}\mathbf{U}}\mathcal{R}\,\mathbf{U}'[\mathbf{v}]\Bigr)^T\boldsymbol{\lambda} - \bigl(\partial_{\mathbf{p}}\mathcal{R}\bigr)^T\boldsymbol{\lambda}'[\mathbf{v}], \] where $\boldsymbol{\lambda}$ solves the first-order adjoint equation and $\boldsymbol{\lambda}'[\mathbf{v}]$ is the directional derivative of the adjoint, obtained from the differentiated adjoint system \[ \bigl(\partial_{\mathbf{U}}\mathcal{R}\bigr)^T\boldsymbol{\lambda}'[\mathbf{v}] = \bigl(\partial_{\mathbf{U}\mathbf{U}}\Psi\,\mathbf{U}'[\mathbf{v}] + \partial_{\mathbf{U}\mathbf{p}}\Psi\,\mathbf{v}\bigr)^T - \Bigl(\partial_{\mathbf{U}\mathbf{U}}\mathcal{R}\,\mathbf{U}'[\mathbf{v}] + \partial_{\mathbf{U}\mathbf{p}}\mathcal{R}\,\mathbf{v}\Bigr)^T\boldsymbol{\lambda}. \] This is the standard second-order adjoint framework specialized to a residual-defined state. In many large-scale settings, the principal cost is one additional linear solve with $(\partial_{\mathbf{U}}\mathcal{R})^T$ per Hessian-vector product, plus evaluations of second-derivative contractions. If second derivatives are unavailable, quasi-Newton updates can be used, but then the implementation must treat curvature information as heuristic rather than exact. V.F. Numerical correctness tests for adjoints Because the pipeline couples heterogeneous solvers, gradient correctness must be tested routinely. We record two minimal tests that are module-agnostic. V.F.1. Directional finite-difference check for reduced gradients Fix a parameter point $\mathbf{p}$ where the evaluation is declared conclusive, and pick a random direction $\mathbf{v}$ with $\|\mathbf{v}\|_2=1$. For a small step $h>0$, compute \[ D_{\mathrm{FD}} := \frac{\widehat\Psi(\mathbf{p}+h\mathbf{v})-\widehat\Psi(\mathbf{p}-h\mathbf{v})}{2h}, \qquad D_{\mathrm{ADJ}} := \mathbf{v}^T\nabla\widehat\Psi(\mathbf{p}), \] and form the relative discrepancy \[ \mathrm{err} := \frac{|D_{\mathrm{FD}}-D_{\mathrm{ADJ}}|}{\max(1,|D_{\mathrm{FD}}|,|D_{\mathrm{ADJ}}|)}. \] A run-time acceptance criterion must be declared (e.g., $\mathrm{err}\le \tau$ for a tolerance $\tau$ that accounts for solver tolerances and $h$-sensitivity). If the check fails persistently over multiple random $\mathbf{v}$, gradients are to be treated as non-credible at $\mathbf{p}$, and any optimizer step predicated on them must be rejected. V.F.2. Consistency checks for linearizations (JVP/VJP identity) For any residual block $\mathcal{R}_k$, linearization consistency can be tested by verifying the adjoint identity \[ \boldsymbol{\lambda}^T\bigl(\partial_{\mathbf{U}}\mathcal{R}_k\,\mathbf{w}\bigr) \approx \mathbf{w}^T\bigl((\partial_{\mathbf{U}}\mathcal{R}_k)^T\boldsymbol{\lambda}\bigr) \] for random $\mathbf{w}$ and $\boldsymbol{\lambda}$ at the solved state, using the module-provided JVP and VJP actions. This check isolates transposition/sign errors and mismatched discretization assumptions between primal and adjoint implementations. If either the reduced-gradient check or the JVP/VJP consistency check fails, the pipeline must emit an explicit diagnostic flag and return an inconclusive status for derivative-based decision-making at that parameter point. VI. Certified Numerical Evaluation of Boozer-Derived Metrics (QS/Ripple) as Optimization-Safe Oracles This section specifies a conservative certification layer for scalar metrics computed from a Boozer-coordinate solve. The goal is to prevent an optimization loop from accepting an apparent improvement in quasi-symmetry (QS) or ripple when that improvement may be an artifact of (i) an ill-conditioned Boozer linear system, (ii) an under-resolved discretization, or (iii) solver tolerances. The statements below are purely numerical: they certify what can be concluded about a finite-dimensional discretized Boozer stage given residual norms and conditioning proxies. Any physical interpretation of the certified margins must be validated externally. VI.A. Problem: truncation, grid resolution, and ill-conditioning can create false QS/ripple improvements Let $\mathbf{u}$ be the equilibrium state and consider a discretized Boozer transform on selected flux surfaces. Downstream metrics are often computed from a truncated Fourier representation of $B$ in Boozer angles. Even when the equilibrium solve is converged, the Boozer map can be (nearly) singular or poorly conditioned, so that a small residual $\|N\tilde{\mathbf{y}}-\mathbf{r}\|$ may still correspond to a large solution error $\|\tilde{\mathbf{y}}-\mathbf{y}\|$ when $\sigma_{\min}(N)$ is small. Moreover, additional truncation occurs when mapping $\mathbf{y}$ to a finite coefficient vector $\mathbf{b}$ and then to a scalar metric. To make constraint evaluation safe, we seek a computable bound $\Delta \mathcal{M}\ge 0$ such that the true (discretized) metric value $\mathcal{M}(\mathbf{u})$ satisfies \[ \mathcal{M}(\mathbf{u}) \le \widehat{\mathcal{M}}(\mathbf{u}) + \Delta \mathcal{M}(\mathbf{u}), \] where $\widehat{\mathcal{M}}$ is the nominal value produced by the pipeline. When a constraint is of the form $\mathcal{M}\le \varepsilon$, the optimizer must enforce $\widehat{\mathcal{M}}+\Delta\mathcal{M}\le \varepsilon$, and the evaluation is *inconclusive* if the certificate inputs (conditioning lower bound, truncation bound, etc.) are missing. VI.B. Certification inputs/outputs We fix one surface $\psi$ for notation; multiple surfaces are handled by stacking blocks. VI.B.1. Required operators and maps Assume the Boozer stage is represented (after discretization and possible linearization) by a linear system \[ N(\mathbf{u},\mathbf{p})\,\mathbf{y} = \mathbf{r}(\mathbf{u},\mathbf{p}), \] with computed approximation $\tilde{\mathbf{y}}$. The certification layer requires the following declared ingredients: 1. The linear operator/matrix $N$ and right-hand side $\mathbf{r}$, or the ability to compute products $\mathbf{w}\mapsto N\mathbf{w}$ and $\mathbf{v}\mapsto N^T\mathbf{v}$. 2. A map from Boozer unknowns to a finite vector of Boozer Fourier coefficients $\mathbf{b}$: \[ \mathbf{b} = S(\mathbf{u},\mathbf{p})\,\mathbf{y} + \mathbf{b}_0(\mathbf{u},\mathbf{p}), \] where $S$ is linear in $\mathbf{y}$ (the common case when coefficients are extracted by linear functionals of the solved transform). 3. A fixed projection/mask $P$ selecting undesired modes and an optional additional linear map $Q$ for other quadratic metrics. 4. Declared weights $W$ used in the metric definition. VI.B.2. Residual norms, conditioning proxies, and $\|\Delta\mathbf{b}\|$ bounds Define the Boozer linear residual \[ \mathbf{e} := \mathbf{r} – N\tilde{\mathbf{y}}. \] The certification layer must output: (a) A residual norm $\|\mathbf{e}\|_2$ (or a declared equivalent norm). (b) A *lower bound* $\\underline{\sigma}$ on the smallest singular value $\sigma_{\min}(N)$, so that \[ \\underline{\sigma} \le \sigma_{\min}(N). \] This can be produced, for example, by a dedicated numerical procedure (e.g., Lanczos/Arnoldi estimates with a declared stopping tolerance) or by problem-specific analytic bounds; the method is an implementation choice, but the returned value must be auditable (iteration counts, residuals, and failure flags). (c) A bound $\Delta_{\mathrm{trunc}}$ controlling additional error not represented by the linear residual model (e.g., Fourier truncation, quadrature error). Since such bounds are application-dependent, we treat $\Delta_{\mathrm{trunc}}\ge 0$ as a declared module output; if it is not available, the corresponding metric certificates are inconclusive. VI.B.3. Resolution metadata and solver tolerances The diagnostic record must include resolution parameters (grid sizes, Fourier cutoffs, number of collocation points, selected surfaces $\psi$), and solver tolerances/termination flags for both the Boozer solve and any auxiliary conditioning-estimate procedure. VI.C. Computable error bars for QS/ripple-type metrics and conservative constraint tightening We first bound the Boozer-solve error and then propagate it to coefficient and scalar-metric bounds. VI.C.1. Solution error bound from a singular-value lower bound \textbf{Lemma VI.1 (Linear solve error bound).} Assume $N\in\mathbb{R}^{m\times m}$ is invertible and $\\underline{\sigma}\le \sigma_{\min}(N)$. Then \[ \|\tilde{\mathbf{y}}-\mathbf{y}\|_2 \le \frac{\|\mathbf{e}\|_2}{\\underline{\sigma}}, \] where $\mathbf{y}=N^{-1}\mathbf{r}$ is the exact solution of the discretized linear system. \textbf{Proof.} $\tilde{\mathbf{y}}-\mathbf{y}=N^{-1}(N\tilde{\mathbf{y}}-\mathbf{r})=-N^{-1}\mathbf{e}$. Hence $\|\tilde{\mathbf{y}}-\mathbf{y}\|_2\le \|N^{-1}\|_2\,\|\mathbf{e}\|_2 = \|\mathbf{e}\|_2/\sigma_{\min}(N)\le \|\mathbf{e}\|_2/\\underline{\sigma}$. $\square$ \textbf{Inconclusive cases.} If $\\underline{\sigma}\le 0$ or the conditioning estimate fails, then the bound is undefined and the Boozer-derived metrics must be flagged inconclusive. VI.C.2. Coefficient error bound Assume $\mathbf{b}=S\mathbf{y}+\mathbf{b}_0$ and define the nominal coefficient vector $\tilde{\mathbf{b}}:=S\tilde{\mathbf{y}}+\mathbf{b}_0$. Then \[ \|\mathbf{b}-\tilde{\mathbf{b}}\|_2 \le \|S\|_2\,\|\mathbf{y}-\tilde{\mathbf{y}}\|_2 + \Delta_{\mathrm{trunc}}. \] Using Lemma VI.1 yields the computable certificate \[ \Delta_{\mathbf{b}} := \|S\|_2\,\frac{\|\mathbf{e}\|_2}{\\underline\sigma} + \Delta_{\mathrm{trunc}}, \qquad \text{so that }\|\mathbf{b}-\tilde{\mathbf{b}}\|_2 \le \Delta_{\mathbf{b}}. \] Here $\|S\|_2$ may be computed directly if $S$ is explicit and moderate-sized, or bounded by a declared estimate if only operator actions are available. VI.C.3. QS/ripple metric bound for weighted $\ell_2$ norms A common QS metric is \[ \mathcal{M}_{\mathrm{QS}}(\mathbf{b}) := \|W\,P\mathbf{b}\|_2. \] Define the nominal metric $\widehat{\mathcal{M}}_{\mathrm{QS}}:=\|W\,P\tilde{\mathbf{b}}\|_2$. Then by the triangle inequality, \[ \mathcal{M}_{\mathrm{QS}}(\mathbf{b}) \le \widehat{\mathcal{M}}_{\mathrm{QS}} + \|W P\|_2\,\|\mathbf{b}-\tilde{\mathbf{b}}\|_2 \le \widehat{\mathcal{M}}_{\mathrm{QS}} + \\underbrace{\|W P\|_2\,\Delta_{\mathbf{b}}}_{=:\Delta\mathcal{M}_{\mathrm{QS}}}. \] Thus the certified upper bound is \[ \mathcal{M}_{\mathrm{QS}}^{\mathrm{hi}} := \widehat{\mathcal{M}}_{\mathrm{QS}} + \Delta\mathcal{M}_{\mathrm{QS}}. \] For a constraint $\mathcal{M}_{\mathrm{QS}}\le \varepsilon_{\mathrm{QS}}$, the conservative constraint value to be enforced is \[ \mathcal{M}_{\mathrm{QS}}^{\mathrm{hi}} – \varepsilon_{\mathrm{QS}} \le 0. \] VI.C.4. Other quadratic-form ripple surrogates If a ripple surrogate is a quadratic form $\mathcal{R}(\mathbf{b})=\|Q\mathbf{b}\|_2$ for a declared linear map $Q$, the same argument yields \[ \mathcal{R}(\mathbf{b}) \le \|Q\tilde{\mathbf{b}}\|_2 + \|Q\|_2\,\Delta_{\mathbf{b}}. \] For scalar quadratic forms $\mathbf{b}^T H\mathbf{b}$ with symmetric $H$, one may instead use $|\mathbf{b}^T H\mathbf{b}-\tilde{\mathbf{b}}^T H\tilde{\mathbf{b}}|\le \|H\|_2\,(\|\mathbf{b}\|_2+\|\tilde{\mathbf{b}}\|_2)\,\|\mathbf{b}-\tilde{\mathbf{b}}\|_2$, but this requires an additional bound on $\|\mathbf{b}\|_2$, which can be conservatively taken as $\|\tilde{\mathbf{b}}\|_2+\Delta_{\mathbf{b}}$. VI.D. Refinement triggers and inexact-oracle rules for optimization The certificate $\Delta\mathcal{M}_{\mathrm{QS}}$ quantifies numerical uncertainty in a metric. Two algorithm-agnostic rules make this usable inside a gradient-based constrained optimizer. VI.D.1. Refinement triggers based on margin-to-uncertainty ratio For a constraint $\mathcal{M}\le \varepsilon$, define the certified margin \[ \mathrm{margin}_{\mathrm{cert}} := \varepsilon – (\widehat{\mathcal{M}}+\Delta\mathcal{M}). \] A minimal refinement trigger is: – If $\mathrm{margin}_{\mathrm{cert}}\ge 0$, the constraint is certified feasible at the declared discretization. – If $\mathrm{margin}_{\mathrm{cert}}<0$ but $\varepsilon-\widehat{\mathcal{M}}\ge 0$, feasibility is *uncertain* (nominally feasible but not certified); refinement is required before declaring feasibility. - If $\varepsilon-\widehat{\mathcal{M}}<0$, the constraint is violated even nominally. A common additional trigger is to require a relative tightness condition such as \[ \Delta\mathcal{M} \le \gamma\,\max(1,\widehat{\mathcal{M}}) \] for a declared $\gamma\in(0,1)$, before treating the metric as sufficiently resolved for design decisions. VI.D.2. Inexact-oracle acceptance logic (monotone case) Consider a scalar merit function $\varphi(\mathbf{p})$ used by an optimizer (objective, penalty merit, or barrier merit). Suppose the pipeline can produce a certified bound \[ \varphi(\mathbf{p}) \le \widehat\varphi(\mathbf{p}) + \Delta\varphi(\mathbf{p}), \] where $\Delta\varphi$ is computed from the same certificate primitives (residual norms, $\\underline\sigma$, truncation bounds). Then a conservative sufficient condition to accept a trial step $\mathbf{p}_{\mathrm{new}}$ as improving (in the sense of upper bounds) is \[ \widehat\varphi(\mathbf{p}_{\mathrm{new}}) + \Delta\varphi(\mathbf{p}_{\mathrm{new}}) \le \widehat\varphi(\mathbf{p}_{\mathrm{old}}) - c\,\mathrm{pred} + \Delta\varphi(\mathbf{p}_{\mathrm{old}}), \] where $\mathrm{pred}\ge 0$ is the optimizer's predicted decrease and $c\in(0,1)$ is a declared acceptance constant. If the certificate is missing or $\Delta\varphi$ is too large to verify inequality with practical steps, then the evaluation is inconclusive and the correct response is refinement (or, if refinement is unavailable, rejection of the trial step). VI.E. Smooth conservative approximations for nonsmooth metric ingredients Some metrics include nonsmooth operations, most commonly $|\cdot|$, $\max$, or $\sup$ over a discrete set. Since the adjoint framework in Section V assumes differentiability, any nonsmooth ingredient used inside a gradient-based optimizer must be replaced by a declared smooth surrogate. A minimal example is the absolute value. Replace $|x|$ by \[ |x|_{\varepsilon} := \sqrt{x^2+\varepsilon^2},\qquad \varepsilon>0\ \text{declared}, \] which satisfies $|x|\le |x|_{\varepsilon}\le |x|+\varepsilon$. Thus, if a metric $\mathcal{M}$ is built by composing $|\cdot|$ with sums/norms, one obtains an explicit additional conservative inflation term controlled by $\varepsilon$. The value of $\varepsilon$ must be recorded in diagnostics because it changes both the objective and its gradients. Similarly, $\max_i a_i$ may be approximated by $\mathrm{softmax}_\tau(a):=\tau\log\sum_i \exp(a_i/\tau)$, for a declared temperature $\tau>0$, with the inequality $\max_i a_i\le \mathrm{softmax}_\tau(a)\le \max_i a_i + \tau\log n$. This provides a controlled conservative relaxation when $\max$ enters a constraint. VI.F. Gap flag: mapping from numerical certificate margins to physical accept/reject decisions The certificate bounds above are statements about the *discretized Boozer stage and the induced discretized metric*. Whether a certified margin $\varepsilon-(\widehat{\mathcal{M}}+\Delta\mathcal{M})$ is sufficient to claim physically adequate quasi-symmetry/ripple performance requires external validation (e.g., by comparison to trusted higher-fidelity calculations and/or experimental observables). Within this paper, the certification layer only guarantees that the optimizer does not exploit numerical non-credibility: any feasibility/stationarity claim is conditional on the declared certificate primitives and on the interpretation of the underlying metric. VII. Certified Ideal-MHD Stability Margin Layer This section specifies a numerical certification layer for a discretized ideal-MHD stability proxy formulated as a generalized eigenvalue problem. The role of the layer is to output a *conservative* lower bound on a stability margin (when possible), together with diagnostics and refinement triggers, so that an optimization loop cannot silently treat an under-resolved or ill-conditioned eigen computation as a verified stability improvement. All certification statements below are about the chosen *discrete* generalized eigenproblem. Whether the proxy is physically adequate for a target regime is a separate question. VII.A. Stability proxies and metrics We assume stability is summarized by a scalar proxy $\lambda$ derived from a generalized eigenproblem. The sign convention is chosen so that larger $\lambda$ indicates greater stability margin, and the feasibility constraint is \[ \lambda(\mathbf{u},\mathbf{y},\mathbf{p}) \ge \lambda_{\min}. \] In the most common situation one takes $\lambda$ to be the smallest eigenvalue $\lambda_1$ of a symmetric generalized eigenproblem (Section VII.B). In that case, instability may be certified one-sidedly by exhibiting any admissible trial vector with negative Rayleigh quotient, while stability ($\lambda_1>0$) requires a *lower bound* on $\lambda_1$. The module interface therefore returns both nominal and certified quantities: – nominal smallest-eigenpair estimate $(\tilde\lambda,\tilde{\mathbf{x}})$, – certificate outputs $\lambda^{\mathrm{lo}}\le \lambda_1 \le \lambda^{\mathrm{hi}}$ (when available), – diagnostics: residual norms, normalization error, gap/conditioning estimates, and discretization metadata. VII.B. Discretized generalized eigenproblem and normalization conventions Let $A=A(\mathbf{u},\mathbf{y},\mathbf{p})\in\mathbb{R}^{n\times n}$ and $M=M(\mathbf{u},\mathbf{y},\mathbf{p})\in\mathbb{R}^{n\times n}$ satisfy \[ A=A^T,\qquad M=M^T,\qquad M \succ 0. \] We define eigenpairs $(\lambda,\mathbf{x})$ by \[ A\mathbf{x} = \lambda M\mathbf{x}, \] together with the normalization \[ \mathbf{x}^T M\mathbf{x} = 1. \] Given a computed approximate eigenvector $\tilde{\mathbf{x}}$, we will always form a normalized vector \[ \mathbf{v} := \frac{\tilde{\mathbf{x}}}{\sqrt{\tilde{\mathbf{x}}^T M\tilde{\mathbf{x}}}}, \] provided $\tilde{\mathbf{x}}^T M\tilde{\mathbf{x}}>0$. If this quantity is nonpositive (numerical failure or non-SPD $M$), the eigen stage is declared inconclusive. Define the generalized Rayleigh quotient \[ \mu := \frac{\mathbf{v}^T A\mathbf{v}}{\mathbf{v}^T M\mathbf{v}} = \mathbf{v}^T A\mathbf{v}\quad\text{(since }\mathbf{v}^T M\mathbf{v}=1\text{)}, \] and the associated residual \[ \mathbf{r} := A\mathbf{v} – \mu M\mathbf{v}. \] The pair $(\mu,\mathbf{r})$ is computable from operator actions $A\mathbf{v}$ and $M\mathbf{v}$, and it is the primitive from which several certificates follow. VII.C. A posteriori eigenvalue certification We state two complementary certification outputs: (i) a robust one-sided *instability* detection, and (ii) a conditional *lower bound* on $\lambda_1$ when additional separation information is available. VII.C.1. One-sided instability proof (negative Rayleigh quotient) \textbf{Proposition VII.1 (Rayleigh quotient upper bound on $\lambda_1$).} For symmetric $(A,M\succ 0)$, the smallest generalized eigenvalue satisfies \[ \lambda_1 \le \mu(\mathbf{v}) := \frac{\mathbf{v}^T A\mathbf{v}}{\mathbf{v}^T M\mathbf{v}} \] for every nonzero $\mathbf{v}$. \textbf{Proof.} By the variational characterization of the smallest generalized eigenvalue, $\lambda_1 = \min_{\mathbf{w}\ne 0} \frac{\mathbf{w}^T A\mathbf{w}}{\mathbf{w}^T M\mathbf{w}}$. Evaluating at $\mathbf{v}$ yields the claim. $\square$ \textbf{Corollary VII.2 (Certified instability).} If there exists $\mathbf{v}$ with $\mu(\mathbf{v})<0$, then $\lambda_1<0$. In particular, returning any $\mathbf{v}$ with negative Rayleigh quotient is a one-sided certificate of instability for the discrete proxy. This corollary does not require residual control: it uses only the computed quotient. The tradeoff is that it certifies only the existence of an unstable direction, not a stability margin. VII.C.2. Residual-based enclosures and a Temple-type lower bound To obtain a certified \emph{lower bound} on $\lambda_1$, one needs information beyond $\mu$ alone. We use a residual-based enclosure coupled with a separation hypothesis, expressed here as a certified lower bound on $\lambda_2$. Let $B := M^{-1/2} A M^{-1/2}$ (symmetric) and define $\mathbf{w} := M^{1/2}\mathbf{v}$, so $\|\mathbf{w}\|_2^2 = \mathbf{v}^T M\mathbf{v} = 1$. Then $\mu=\mathbf{w}^T B\mathbf{w}$, and the standard (non-generalized) residual is \[ \mathbf{r}_B := B\mathbf{w} - \mu\mathbf{w}. \] A computable surrogate for $\|\mathbf{r}_B\|_2$ is obtained without explicitly forming $B$: since $ \mathbf{r}_B = M^{-1/2}(A\mathbf{v} - \mu M\mathbf{v}) = M^{-1/2}\mathbf{r}, $ we have \[ \|\mathbf{r}_B\|_2^2 = \mathbf{r}^T M^{-1}\mathbf{r}. \] Thus the stability module must either (i) provide the exact value $\mathbf{r}^T M^{-1}\mathbf{r}$ by solving $M\mathbf{q}=\mathbf{r}$ (SPD solve) and forming $\mathbf{r}^T\mathbf{q}$, or (ii) provide an auditable bound $\mathbf{r}^T M^{-1}\mathbf{r} \le \eta_r$ computed by an iterative method with declared tolerance. \textbf{Lemma VII.3 (Eigenvalue enclosure from residual).} Let $B$ be symmetric with eigenvalues $\theta_1\le\cdots\le\theta_n$. For $\|\mathbf{w}\|_2=1$, define $\mu=\mathbf{w}^T B\mathbf{w}$ and $\mathbf{r}_B=B\mathbf{w}-\mu\mathbf{w}$. Then there exists $j$ such that \[ |\theta_j-\mu|\le \|\mathbf{r}_B\|_2. \] Consequently, the generalized eigenproblem has an eigenvalue $\lambda_j$ with \[ |\lambda_j-\mu|\le \|\mathbf{r}_B\|_2 = \sqrt{\mathbf{r}^T M^{-1}\mathbf{r}}. \] \textbf{Temple-type certified lower bound.} Assume additionally that a number $\beta$ is available such that \[ \beta > \mu \quad\text{and}\quad \beta \le \lambda_2. \] Then applying Lemma III.6 to $B$ yields \[ \lambda_1 \ge \mu – \frac{\|\mathbf{r}_B\|_2^2}{\beta-\mu} = \mu – \frac{\mathbf{r}^T M^{-1}\mathbf{r}}{\beta-\mu}. \] We define the certified lower bound \[ \lambda^{\mathrm{lo}} := \mu – \frac{\eta_r}{\beta-\mu}, \] where $\eta_r$ is a certified upper bound on $\mathbf{r}^T M^{-1}\mathbf{r}$. If either $\beta$ is unavailable, $\beta\le\mu$, or $\eta_r$ is not trustworthy (solver failure), then $\lambda^{\mathrm{lo}}$ is not defined and the stability certificate is inconclusive. \textbf{Interface requirement (gap evidence).} The nontrivial input is the statement $\beta\le\lambda_2$. The method by which $\beta$ is produced is implementation-dependent (e.g., verified gap estimates, complementary subspace bounds, or validated enclosures), but the stability module must report: – the numerical procedure used to obtain $\beta$, – its termination metrics, – a failure flag when the evidence is insufficient. The outer optimization loop is permitted to treat $\lambda^{\mathrm{lo}}$ as a certified bound only when these diagnostics indicate success. VII.C.3. Certified constraint value and rejection rule Given a certified lower bound $\lambda^{\mathrm{lo}}$, the conservative stability constraint value (written as $g\le 0$) is \[ g_{\mathrm{stab,cert}} := \lambda_{\min} – \lambda^{\mathrm{lo}}. \] The stability layer enforces the rule: \textbf{No stability claim without positive certificate margin.} If $\lambda^{\mathrm{lo}}\le \lambda_{\min}$ (or $\lambda^{\mathrm{lo}}$ is undefined), then the evaluation is either (i) certified infeasible (when $\lambda^{\mathrm{lo}}$ is defined and violates) or (ii) inconclusive (when the certificate cannot be formed). In neither case may the optimizer record the design point as feasible for stability. VII.C.4. Adaptive refinement driven by certificate tightness Let $\tilde\lambda$ denote the nominal smallest eigenvalue returned by an eigensolver. Define the (certificate) uncertainty width \[ \Delta\lambda := \tilde\lambda – \lambda^{\mathrm{lo}} \quad (\text{when }\lambda^{\mathrm{lo}}\text{ defined}). \] A minimal refinement trigger is: – if $\Delta\lambda$ exceeds a declared tolerance (absolute or relative), refine the eigen discretization/solver tolerances until either (i) $\Delta\lambda$ shrinks sufficiently or (ii) the module declares it cannot certify with available resources. When $\lambda^{\mathrm{lo}}$ is undefined, refinement is mandatory if stability feasibility is needed for decision-making. VII.D. Sensitivity of certified margins For optimization, one seeks derivatives of a stability constraint margin. The nominal eigenvalue derivative $d\tilde\lambda/d\mathbf{p}$ is obtained through the coupled adjoint framework of Section V (or by Proposition III.4 for symmetric problems when the smallest eigenvalue is simple and the eigenpair is well-conditioned). For a certified margin, a conservative differentiable surrogate can be formed by differentiating the certificate expression where it is smooth. Assume: – $\beta$ is treated as constant with respect to $\mathbf{p}$ over a local region where the same certification routine and separation evidence applies, or the routine provides a differentiable $\beta(\mathbf{p})$; – $\eta_r$ is defined as $\eta_r = \mathbf{r}^T M^{-1}\mathbf{r}$ computed to a tolerance tight enough that derivative noise is dominated by outer optimization tolerances. Define \[ \lambda_{\mathrm{cert}}(\mathbf{p}) := \mu(\mathbf{p}) – \frac{\eta_r(\mathbf{p})}{\beta(\mathbf{p})-\mu(\mathbf{p})}. \] Whenever $\beta-\mu>0$ and all ingredients are differentiable, the directional derivative in direction $\mathbf{v}$ is \[ \lambda_{\mathrm{cert}}'[\mathbf{v}] = \mu'[\mathbf{v}] – \frac{\eta_r'[\mathbf{v}](\beta-\mu) – \eta_r(\beta’-\mu’)}{(\beta-\mu)^2}. \] Here $\mu'[\mathbf{v}]$ can be evaluated by differentiating the Rayleigh quotient (using the adjoint/JVP interfaces for $A$ and $M$), and $\eta_r'[\mathbf{v}]$ can be evaluated by differentiating $\mathbf{r}^T M^{-1}\mathbf{r}$ via the implicit differentiation of the SPD solve $M\mathbf{q}=\mathbf{r}$ (treating $\mathbf{q}=M^{-1}\mathbf{r}$ as an implicitly defined state). If these derivatives are not credible (e.g., near eigenvalue multiplicity or certification routine discontinuity), the stability module must flag derivative outputs as inconclusive. Because $\lambda^{\mathrm{lo}}$ is intended to be conservative, any smooth surrogate used for gradient-based steps must be reported alongside the exact conservative bound used for pass/fail decisions. VII.E. Minimal reproducible stability report per sample When stability metrics are evaluated over a family of samples (e.g., a grid in $(\psi,\alpha)$ or other parameters entering the discretization), the stability layer must output per sample a record containing at minimum: 1. Discretization metadata: basis size/mode truncations, quadrature, and any sample coordinates. 2. Nominal outputs: $\tilde\lambda$, normalized $\mathbf{v}$, and the Rayleigh quotient $\mu$ used for certification. 3. Residual diagnostics: $\|\mathbf{r}\|_2$, the normalization defect $|\mathbf{v}^T M\mathbf{v}-1|$, and the value (or bound) $\eta_r\approx \mathbf{r}^T M^{-1}\mathbf{r}$. 4. Separation diagnostics: the reported $\beta$ and the procedure status that supports interpreting $\beta\le\lambda_2$. 5. Certificate: $\lambda^{\mathrm{lo}}$ (and, if computed, $\lambda^{\mathrm{hi}}$ or an enclosure width), together with a boolean flag indicating whether the certificate is valid. This record is designed so that (i) an optimizer can make conservative feasibility decisions, and (ii) an auditor can trace feasibility claims to numerical evidence. VII.F. Gap flag: physical adequacy of proxies must be externally validated The certification layer above can prevent numerical false positives for the *chosen discretized proxy problem*. It does not establish that the proxy faithfully predicts physical ideal-MHD stability limits in the intended operational regime. Any mapping from $\lambda^{\mathrm{lo}}>0$ (for the proxy) to a physical accept/reject decision requires external validation against higher-fidelity calculations and/or experimental stability boundaries. The stability module must therefore treat physical adequacy as an explicit gap: it can certify what the discrete proxy implies, but not that the proxy is the correct physics model. VIII. Bootstrap-Current Self-Consistency as a Differentiable Constraint This section treats bootstrap-current self-consistency as an explicit component of the end-to-end implicit system. The mathematical purpose is twofold: (i) to identify conditions under which the bootstrap fixed point is well-posed (existence/uniqueness and controlled sensitivity), and (ii) to expose a conservative constraint layer that prevents optimization from operating near bifurcation/ill-conditioning where the coupled map $\mathbf{p}\mapsto \mathbf{j}_{\mathrm{bs}}$ can become non-robust. All statements are about the chosen finite-dimensional discretization of the bootstrap closure. Any claim that the closure accurately predicts experimental bootstrap currents requires external calibration/verification. VIII.A. Bootstrap-equilibrium coupling as a fixed-point problem VIII.A.1. Map formulation $\mathbf{j}=\mathcal{T}(\mathbf{j};\mathbf{u},\mathbf{p})$ and residual form We adopt the fixed-point representation already introduced in Section IV.B.4: \[ \mathcal{R}_{\mathrm{bs}}(\mathbf{j};\mathbf{u},\mathbf{p}) := \mathbf{j}-\mathcal{T}(\mathbf{j};\mathbf{u},\mathbf{p}) = \mathbf{0}, \] where $\mathbf{u}$ denotes equilibrium degrees of freedom and $\mathbf{p}$ are design parameters (geometry, profiles, coil parameters). The mapping $\mathcal{T}$ is an interface for a chosen bootstrap model coupled to the equilibrium representation (e.g., via neoclassical closures, profile reconstructions, or reduced surrogate fits). For the sensitivity and certification logic below, the only assumed structure is differentiability in the discretized variables at points where the pipeline declares the model applicable. VIII.A.2. Contraction condition $L(\mathbf{p})<1$ for uniqueness and rapid convergence Fix a norm $\|\cdot\|$ on $\mathbb{R}^{n_j}$. For fixed $(\mathbf{u},\mathbf{p})$, suppose there is a closed set $\mathcal{J}\subset\mathbb{R}^{n_j}$ such that $\mathcal{T}(\cdot;\mathbf{u},\mathbf{p})$ maps $\mathcal{J}$ into itself and is a contraction on $\mathcal{J}$: \[ \|\mathcal{T}(\mathbf{j}_1;\mathbf{u},\mathbf{p})-\mathcal{T}(\mathbf{j}_2;\mathbf{u},\mathbf{p})\| \le L(\mathbf{u},\mathbf{p})\,\|\mathbf{j}_1-\mathbf{j}_2\|\quad\forall \mathbf{j}_1,\mathbf{j}_2\in\mathcal{J}, \] for some $L(\mathbf{u},\mathbf{p})<1$. \textbf{Theorem VIII.1 (Banach fixed point; discrete bootstrap closure).} If $\mathcal{T}(\cdot;\mathbf{u},\mathbf{p})$ is a contraction on a complete metric space $(\mathcal{J},\|\cdot\|)$ with constant $L<1$, then there exists a unique fixed point $\mathbf{j}^*\in\mathcal{J}$ satisfying $\mathbf{j}^*=\mathcal{T}(\mathbf{j}^*;\mathbf{u},\mathbf{p})$. Moreover, the Picard iterates $\mathbf{j}^{k+1}=\mathcal{T}(\mathbf{j}^k;\mathbf{u},\mathbf{p})$ satisfy \[ \|\mathbf{j}^k-\mathbf{j}^*\| \le \frac{L^k}{1-L}\,\|\mathbf{j}^1-\mathbf{j}^0\|. \] This theorem motivates using a contraction estimate both as (i) a solver-health diagnostic and (ii) an optimization-safe feasibility constraint: if $L$ approaches 1, then uniqueness and sensitivity deteriorate. VIII.B. Certified feasibility constraint and robustness penalties VIII.B.1. Enforce $L(\mathbf{p})\le L_{\max}<1$ In a coupled pipeline, $L$ is evaluated at the solved state $(\mathbf{u}(\mathbf{p}),\mathbf{j}(\mathbf{p}))$. A natural local contraction proxy is the operator norm \[ L(\mathbf{p}) := \bigl\|\partial_{\mathbf{j}}\mathcal{T}(\mathbf{j};\mathbf{u},\mathbf{p})\bigr\|,\qquad (\mathbf{u},\mathbf{j})=(\mathbf{u}(\mathbf{p}),\mathbf{j}(\mathbf{p})), \] where the norm is induced by the chosen vector norm on $\mathbb{R}^{n_j}$. Since $L$ is rarely computed exactly, we treat contraction certification analogously to other certificates: the bootstrap module returns \[ \widehat L \approx L,\qquad \Delta L\ge 0\ \text{such that}\ L\le \widehat L + \Delta L =: L^{\mathrm{hi}}. \] The quantity $L^{\mathrm{hi}}$ is the conservative value used for pass/fail decisions. \textbf{Certified contraction constraint.} Declare a design-independent threshold $L_{\max}<1$ and enforce \[ L^{\mathrm{hi}}(\mathbf{p}) \le L_{\max}. \] If $L^{\mathrm{hi}}$ cannot be produced (missing derivative/conditioning evidence), the bootstrap evaluation is inconclusive and must not be treated as feasible. \textbf{Implementation note (auditable norm estimation).} For Euclidean norms, $\|\partial_{\mathbf{j}}\mathcal{T}\|_2$ can be estimated by power iteration using Jacobian-vector products $\mathbf{v}\mapsto \partial_{\mathbf{j}}\mathcal{T}\,\mathbf{v}$, with reported iteration residuals and a stopping rule. The resulting $\widehat L$ is a lower bound on the true $\|\cdot\|_2$ only in special cases; therefore, for certification, the module must either (i) provide an explicit upper bound $L^{\mathrm{hi}}$ (e.g., via a conservative bound on $\|\cdot\|$), or (ii) mark the estimate as non-certified. VIII.B.2. Penalize proximity to bifurcation Even when $L^{\mathrm{hi}}<1$ holds, optimization may be drawn toward regions where $L$ is close to 1, amplifying both numerical and physical uncertainty. A smooth penalty that discourages this is \[ \Pi(L) := \frac{1}{1-L},\qquad L<1. \] To retain differentiability and avoid singularity at $L=1$, one may use a declared smooth conservative surrogate, e.g. \[ \Pi_{\varepsilon}(L) := \frac{1}{\max(\varepsilon,1-L)},\qquad \varepsilon>0\ \text{declared}, \] or an equivalent smooth approximation consistent with the optimizer’s requirements. Any such $\varepsilon$ must be recorded in diagnostics because it changes the objective landscape. VIII.C. Bifurcation/degeneracy conditions and operational implications The fixed-point residual can lose well-posedness when the Jacobian of $\mathcal{R}_{\mathrm{bs}}$ in $\mathbf{j}$ becomes singular: \[ \partial_{\mathbf{j}}\mathcal{R}_{\mathrm{bs}} = I – \partial_{\mathbf{j}}\mathcal{T}. \] Two distinct but related failure modes are important for an optimization-safe workflow: 1. \textbf{Non-uniqueness / branching.} If $I-\partial_{\mathbf{j}}\mathcal{T}$ becomes singular, multiple nearby solutions may coexist, and the mapping $\mathbf{p}\mapsto \mathbf{j}(\mathbf{p})$ may be set-valued or discontinuous. 2. \textbf{Sensitivity blow-up.} Even before singularity, $\|(I-\partial_{\mathbf{j}}\mathcal{T})^{-1}\|$ can grow large, magnifying perturbations in $\mathbf{u}$ or $\mathbf{p}$ into large changes in $\mathbf{j}$. This degrades robustness to manufacturing errors, reconstruction uncertainty, and numerical tolerances. For these reasons, the contraction (or resolvent) information is treated as part of feasibility, not merely as solver metadata. VIII.D. Differentiation through the fixed point At parameter points where $\mathcal{R}_{\mathrm{bs}}(\mathbf{j};\mathbf{u},\mathbf{p})=0$ and $\partial_{\mathbf{j}}\mathcal{R}_{\mathrm{bs}}$ is invertible, $\mathbf{j}$ is a differentiable function of $(\mathbf{u},\mathbf{p})$ by Theorem III.1. For a directional perturbation $(\delta\mathbf{u},\delta\mathbf{p})$, differentiation of $\mathbf{j}-\mathcal{T}(\mathbf{j};\mathbf{u},\mathbf{p})=0$ gives \[ \bigl(I-\partial_{\mathbf{j}}\mathcal{T}\bigr)\,\delta\mathbf{j} = \partial_{\mathbf{u}}\mathcal{T}\,\delta\mathbf{u} + \partial_{\mathbf{p}}\mathcal{T}\,\delta\mathbf{p}, \] so that \[ \delta\mathbf{j} = \bigl(I-\partial_{\mathbf{j}}\mathcal{T}\bigr)^{-1}\Bigl(\partial_{\mathbf{u}}\mathcal{T}\,\delta\mathbf{u} + \partial_{\mathbf{p}}\mathcal{T}\,\delta\mathbf{p}\Bigr). \] This linear solve is exactly the primal linearization associated with the bootstrap residual block in the coupled system of Section IV, and the corresponding transpose solve is the bootstrap adjoint component in Section V.B.4. \textbf{Diagnostic requirement.} Any evaluation that uses $\delta\mathbf{j}$ (directly or via adjoints) must report a conditioning proxy for $I-\partial_{\mathbf{j}}\mathcal{T}$ (e.g., an estimate of $\sigma_{\min}(I-\partial_{\mathbf{j}}\mathcal{T})$ or an iterative-solver growth/stagnation indicator). When this proxy indicates near-singularity, derivative outputs are to be flagged as non-credible. VIII.E. Coupling to stability margins and coil constraints; sensitivity upper bounds for manufacturing errors The bootstrap variable $\mathbf{j}$ couples into the equilibrium and therefore affects any downstream metric $\Psi(\mathbf{U}(\mathbf{p}),\mathbf{p})$. When robustness to perturbations $\delta\mathbf{p}$ is required, a basic bound follows from the fixed-point sensitivity identity: \[ \|\delta\mathbf{j}\| \le \bigl\|\bigl(I-\partial_{\mathbf{j}}\mathcal{T}\bigr)^{-1}\bigr\|\,\Bigl(\|\partial_{\mathbf{u}}\mathcal{T}\|\,\|\delta\mathbf{u}\| + \|\partial_{\mathbf{p}}\mathcal{T}\|\,\|\delta\mathbf{p}\|\Bigr). \] Under the contraction hypothesis $\|\partial_{\mathbf{j}}\mathcal{T}\|\le L<1$, one has the resolvent bound \[ \bigl\|\bigl(I-\partial_{\mathbf{j}}\mathcal{T}\bigr)^{-1}\bigr\| \le \frac{1}{1-L}. \] Thus, if conservative upper bounds $L^{\mathrm{hi}}\ge L$, $\|\partial_{\mathbf{u}}\mathcal{T}\|\le C_u$, and $\|\partial_{\mathbf{p}}\mathcal{T}\|\le C_p$ are available (each with auditable provenance), then \[ \|\delta\mathbf{j}\| \le \frac{1}{1-L^{\mathrm{hi}}}\,(C_u\,\|\delta\mathbf{u}\| + C_p\,\|\delta\mathbf{p}\|). \] This type of estimate is useful as a conservative screening rule: even if nominal constraints pass, a large factor $1/(1-L^{\mathrm{hi}})$ indicates that small uncertainties could invalidate stability or coil feasibility. When combined with linearized robust constraints (Section III.H) applied to a scalar constraint $g(\mathbf{p})\le 0$, one may treat the bootstrap sensitivity amplification as contributing to $\|\nabla g\|$ via the end-to-end adjoint gradient. The key point is interpretive: the robust penalty $\|W^{-T}\nabla g\|$ becomes large precisely in regions where $\mathbf{j}$ is ill-conditioned, so bootstrap ill-posedness is naturally reflected in robust optimization metrics—provided gradients are credible and the bootstrap block is not near singular. VIII.F. Proposed experimental falsifiers The bootstrap fixed-point structure yields intermediate observables that are, in principle, testable without relying on reactor-regime conditions. A minimal falsifier tied directly to the present mathematics is: - \textbf{Pressure-perturbation $\iota$-response test.} If the closure implies a predicted change $\delta\mathbf{j}$ (and therefore a predicted change in rotational transform $\delta\iota(\psi)$ through the equilibrium response) under a controlled pressure-profile perturbation, then measured $\delta\iota$ can be compared to the model-predicted response. Disagreement beyond declared uncertainty bands is evidence that either (i) the map $\mathcal{T}$ is miscalibrated, (ii) the operating point is outside the model's domain of validity, or (iii) the coupled equilibrium response is inconsistent with the assumed discretized interface. This paper does not claim that such comparisons will succeed; it records the required observable pathway so that incorrect bootstrap closures can be rejected or localized. VIII.G. Gap flag: contraction constants and the bootstrap model require external calibration/verification The contraction constant $L$, the derivative operators $\partial\mathcal{T}$, and any inferred robustness bounds are properties of the chosen bootstrap closure and discretization. Whether (and where) the closure is physically adequate, and whether the computed contraction/conditioning diagnostics meaningfully correlate with experimental behavior, require external validation. Within this pipeline, the role of the bootstrap layer is therefore conditional: it can enforce numerical well-posedness and expose sensitivity amplification, but it cannot by itself establish physical correctness. IX. Certified Constraint Evaluation from Discrete Samples via Lipschitz Bounds Many engineering and geometric constraints in stellarator design are naturally expressed as extrema of a scalar field evaluated over a compact domain (a torus parameter space, a coil centerline, or a winding surface). In practice these extrema are computed on a discrete grid. This section provides a conservative wrapper that turns discrete samples into certified continuous bounds, provided a Lipschitz constant is supplied (or conservatively bounded) in the sampling coordinates. The wrapper is generic and can be applied to coil clearance, curvature, peak field, and analogous gridded constraints. IX.A. Generic theorem: discrete sampling $\to$ certified continuous min/max bounds on compact domains Let $(\Omega,\|\cdot\|)$ be a compact metric space and let $f\colon \Omega\to\mathbb{R}$ be $L$-Lipschitz. Let $G\subset \Omega$ be a $\delta$-net, i.e., for every $x\in\Omega$ there exists $g\in G$ with $\|x-g\|\le \delta$. Define the sampled extrema \[ \\underline f_G := \min_{g\in G} f(g),\qquad \overline f_G := \max_{g\in G} f(g). \] Then the continuous extrema obey the certified bounds \[ \min_{x\in\Omega} f(x) \ge \\underline f_G - L\delta,\qquad \max_{x\in\Omega} f(x) \le \overline f_G + L\delta. \] (These inequalities are immediate from the Lipschitz property and the $\delta$-net definition; cf. Theorem III.8.) \textbf{Certified envelopes.} For downstream use we define \[ \min_{\Omega}^{\mathrm{lo}} f := \\underline f_G - L\delta,\qquad \max_{\Omega}^{\mathrm{hi}} f := \overline f_G + L\delta. \] Any feasibility decision involving $\min_{\Omega} f$ or $\max_{\Omega} f$ must be based on these conservative quantities (unless a stronger certificate is available). IX.B. Adaptive refinement strategy to reduce $\delta$ (and tighten certificates) The certificate tightness is controlled by the product $L\delta$. Since $L$ is often set by geometry and physics, refinement typically aims to reduce $\delta$. A minimal adaptive strategy is: 1. Choose an initial sample set $G_0\subset\Omega$ with associated covering radius $\delta_0$. 2. Evaluate $f$ on $G_k$, compute $\\underline f_{G_k}$ and $\overline f_{G_k}$, and form the certified bounds $\min_{\Omega}^{\mathrm{lo}} f$, $\max_{\Omega}^{\mathrm{hi}} f$. 3. If the certificate is inconclusive for decision-making (e.g., a constraint depends on $\max_{\Omega} f$ and $\max_{\Omega}^{\mathrm{hi}} f$ is too close to the threshold, or the width $2L\delta_k$ is too large relative to the desired margin), refine to $G_{k+1}$ such that $\delta_{k+1}\le \eta\,\delta_k$ for a declared $\eta\in(0,1)$, and repeat. \textbf{Stopping conditions.} Refinement ends when either (i) the required feasibility/violation decision is certified with a declared margin, or (ii) a declared resource limit is reached (maximum samples, wall-clock budget), in which case the constraint evaluation must be returned as \emph{inconclusive}. \textbf{Metadata requirement.} Each certificate must report $\delta$ (or enough information to reconstruct it), the norm used in $\Omega$, the Lipschitz bound $L$ and its provenance, and the sampled extrema $(\\underline f_G,\overline f_G)$. IX.C. Embedding/non-self-intersection checks as feasibility constraints Geometric feasibility of parameterized surfaces and curves is often required as a prerequisite for interpreting downstream metrics. True embeddedness is global and difficult to certify generically; here we record two conservative, differentiable proxy constraints that can be certified via sampling. IX.C.1. Immersion (non-degenerate surface metric) proxy Let a surface be parameterized by $X\colon \mathbb{T}^2\to\mathbb{R}^3$, $X=X(\theta,\zeta)$, and define the (unsigned) area element \[ J(\theta,\zeta) := \|\partial_\theta X(\theta,\zeta)\times \partial_\zeta X(\theta,\zeta)\|_2. \] A necessary condition for immersion is $J(\theta,\zeta)>0$ for all $(\theta,\zeta)$. Given a grid $G\subset\mathbb{T}^2$ with covering radius $\delta$, suppose $J$ is Lipschitz with constant $L_J$ in the chosen torus metric. Then \[ \min_{\mathbb{T}^2} J \ge \min_{g\in G} J(g) – L_J\delta. \] Thus a conservative immersion constraint is \[ J_{\min}^{\mathrm{lo}} := \min_{g\in G} J(g) – L_J\delta \ge J_{\star} > 0, \] where $J_{\star}$ is a declared safety threshold. If $L_J$ is not available, the check is inconclusive and must not be treated as passed. IX.C.2. Separation (self-approach) proxy To detect near self-intersection, define a sampled separation quantity on a set of parameter pairs. Let $G\subset\mathbb{T}^2$ and let $\mathcal{P}\subset G\times G$ exclude near-diagonal pairs (e.g., enforce $\|(\theta_1,\zeta_1)-(\theta_2,\zeta_2)\|\ge \delta_{\mathrm{diag}}$ in the torus metric). Define \[ S(\xi_1,\xi_2) := \|X(\xi_1)-X(\xi_2)\|_2,\qquad \xi_i\in\mathbb{T}^2. \] If $S$ is Lipschitz on the restricted pair-domain with constant $L_S$, then \[ \min S \ge \min_{(g_1,g_2)\in \mathcal{P}} S(g_1,g_2) – L_S\,\delta_{\mathrm{pair}}, \] where $\delta_{\mathrm{pair}}$ is the covering radius in the pair-domain metric. Enforcing a positive certified lower bound provides an auditable proxy for avoiding self-approach at the sampled resolution. \textbf{Remark.} These checks are proxies: immersion and pairwise separation bounds do not prove global embeddedness in full generality. They are nevertheless useful as optimization-safe feasibility filters because they prevent the optimizer from exploiting grossly degenerate parameterizations. IX.D. How to wrap any gridded constraint $c(x)\le 0$ into a certified constraint $c_{\mathrm{cert}}$ IX.D.1. Correct sign handling for max vs. min constraints Constraints commonly take one of the following forms, with $\Omega$ compact: (1) \textbf{Max-type constraint} (peak value must be below a limit): \[ \max_{x\in\Omega} f(x) \le b. \] A conservative sufficient condition is \[ \max_{\Omega}^{\mathrm{hi}} f \le b, \] where $\max_{\Omega}^{\mathrm{hi}} f$ is computed from samples via $\overline f_G + L\delta$. (2) \textbf{Min-type constraint} (clearance must exceed a limit): \[ \min_{x\in\Omega} f(x) \ge a. \] A conservative sufficient condition is \[ \min_{\Omega}^{\mathrm{lo}} f \ge a, \] where $\min_{\Omega}^{\mathrm{lo}} f=\\underline f_G-L\delta$. In the standard NLP form $g(\mathbf{p})\le 0$, these become \[ \text{Max-type:}\quad g_{\mathrm{cert}} := \bigl(\max_{\Omega}^{\mathrm{hi}} f\bigr) – b \le 0, \qquad \text{Min-type:}\quad g_{\mathrm{cert}} := a – \bigl(\min_{\Omega}^{\mathrm{lo}} f\bigr) \le 0. \] IX.D.2. Composite constraints with multiple sampled fields If a constraint depends on multiple fields, e.g. \[ \max_{x\in\Omega} \bigl(f_1(x)+f_2(x)\bigr) \le b, \] one may either (i) certify the sum directly by applying a Lipschitz bound for $f_1+f_2$ with constant $L_1+L_2$, or (ii) certify each $f_i$ and combine conservatively: $ \max(f_1+f_2) \le \max f_1 + \max f_2 $ with each max replaced by its certified upper bound. IX.D.3. Obtaining Lipschitz constants The wrapper requires an auditable bound $L$. Two common sources are: (a) \textbf{Analytic bounds from derivatives.} If $f$ is differentiable and $\Omega$ is parameterized by coordinates $\xi$, then \[ L \ge \sup_{\xi\in\Omega} \|\nabla_{\xi} f(\xi)\|_* , \] with $\|\cdot\|_*$ the dual norm. The supremum itself can be certified using the same sampling method applied to $\|\nabla f\|_*$, if a Lipschitz bound for $\|\nabla f\|_*$ is also available, yielding a two-level certificate. (b) \textbf{Problem-specific global Lipschitz facts.} Some functions admit global Lipschitz bounds independent of design parameters (e.g., distance-to-a-fixed-set is 1-Lipschitz in ambient Euclidean coordinates). When such a bound is used, the metric and norm with respect to which the bound holds must be stated explicitly; a bound in ambient coordinates does not automatically yield a bound in parameter space unless the parameterization map has a certified Jacobian bound. If no certified $L$ is available, the constraint evaluation must be marked inconclusive. IX.D.4. Mandatory reporting of sampled vs. certified values (unit/consistency checks) For each wrapped constraint, the pipeline must report at least: \[ \\underline f_G,\ \overline f_G,\ \delta,\ L,\ \min_{\Omega}^{\mathrm{lo}} f,\ \max_{\Omega}^{\mathrm{hi}} f, \] and the resulting conservative constraint value $g_{\mathrm{cert}}$. This ensures that (i) the magnitude of the certificate correction $L\delta$ is visible, and (ii) unit mismatches (e.g., $f$ in Tesla but $L$ computed in Tesla/radian with the wrong coordinate scaling) can be detected. IX.E. Gap flag: availability/rigor of Lipschitz constants for complex derived fields may be limited in practice The sampling certificate is only as rigorous as the Lipschitz bound $L$ and the covering-radius estimate $\delta$. For complex derived quantities (e.g., peak $|\mathbf{B}|$ on a deformed winding surface computed by Biot–Savart plus numerical quadrature), obtaining a tight, auditable $L$ can be difficult. The appropriate pipeline behavior in such cases is not to ignore the certificate but to (i) report that $L$ is heuristic/non-certified and therefore the continuous constraint is inconclusive, or (ii) substitute a more conservative overestimate of $L$ and accept the resulting (possibly loose) certificate. Any feasibility claim made using these bounds is conditional on the correctness of $L$ and on the declared coordinate metric. X. Coil Realizability as a Convex Certified Subproblem with Differentiable Outer Sensitivities This section specifies a coil-realizability layer posed as a parameterized convex optimization problem whose solution provides (i) an auditable feasibility certificate for engineering constraints, and (ii) outer-loop sensitivities compatible with the implicit-adjoint framework. The mathematical claims are conditional on standard convex-optimization regularity (strong duality, constraint qualification, and numerical KKT credibility). The layer certifies feasibility for the chosen discretization of fields and constraints; any mapping to as-built coil performance requires separate validation. X.A. Surface-current-potential ($\Phi$-based) representation and why convexity matters Let $\Gamma_c=\Gamma_c(\mathbf{p}_{\mathrm{coil}})\subset\mathbb{R}^3$ be a winding surface parameterized by $(\theta,\zeta)\in\mathbb{T}^2\mapsto X_c(\theta,\zeta)$. A surface current density $\mathbf{K}$ on $\Gamma_c$ can be represented (in many coil-synthesis formulations) via a scalar potential $\Phi$ so that $\mathbf{K}$ is linear in surface derivatives of $\Phi$. We treat the discretization abstractly and assume a finite basis expansion \[ \Phi(\theta,\zeta)=\sum_{j=1}^{n_\Phi} x_j\,\psi_j(\theta,\zeta),\qquad \mathbf{x}=(x_j)\in\mathbb{R}^{n_\Phi}. \] Downstream quantities used in engineering constraints (e.g., normal-field error on a target surface, peak current density on $\Gamma_c$, peak magnetic field magnitude on $\Gamma_c$) are modeled as affine or convex functions of $\mathbf{x}$ once geometry and upstream equilibrium are fixed. Convexity is critical: if the inner coil stage is convex and solved to KKT optimality, then (i) infeasibility is a definitive statement for the discretized model, and (ii) outer sensitivities can be obtained from dual variables (Section III.G) without differentiating through the primal optimizer iterates. X.B. Convex objective and regularization Fix upstream inputs $(\mathbf{u},\mathbf{p})$ that determine geometry-dependent operators. We write a generic convex quadratic objective \[ \min_{\mathbf{x}\in\mathbb{R}^{n_\Phi}}\; \frac12\|W_0\,(A(\mathbf{u},\mathbf{p})\mathbf{x}-\mathbf{b}(\mathbf{u},\mathbf{p}))\|_2^2 + \frac{\alpha}{2}\|L(\mathbf{p})\mathbf{x}\|_2^2. \] Here: – $A\mathbf{x}-\mathbf{b}$ represents a linearized normal-field error (or related linear fitting residual) sampled on a target surface; $W_0$ is a declared weighting (units and sampling weights). – $L$ is a discrete regularization operator (e.g., surface gradient/Laplacian) promoting smooth/low-complexity $\Phi$ and thus $\mathbf{K}$. – $\alpha\ge 0$ is a declared regularization weight. Other convex objectives are admissible (e.g., $\ell_1$ or Huber penalties for robustness to localized residuals), provided the inner problem remains convex and the chosen smoothing (if any) is declared and included in diagnostics. X.C. Hard engineering constraints as convex constraints We collect hard engineering constraints in the form \[ \mathbf{c}(\mathbf{x};\mathbf{u},\mathbf{p})\le \mathbf{0},\qquad A_{\mathrm{eq}}\mathbf{x}=\mathbf{b}_{\mathrm{eq}}, \] where each component of $\mathbf{c}$ is convex in $\mathbf{x}$. Three representative constraint families are: X.C.1. Peak surface current density bounds Assume a linear map from coefficients to a discretized current-density representation, $\mathbf{k}=K(\mathbf{u},\mathbf{p})\mathbf{x}+\mathbf{k}_0(\mathbf{u},\mathbf{p})$. A simple convex bound is an $\ell_\infty$ or weighted $\ell_2$ constraint, e.g. \[ \|D_k\,\mathbf{k}\|_\infty \le K_{\max} \quad\text{or}\quad \|D_k\,\mathbf{k}\|_2\le K_{\max}. \] If $\mathbf{k}$ is stacked over sample points on $\Gamma_c$, the $\ell_\infty$ version is a set of linear inequalities; the $\ell_2$ version is a second-order cone (SOC) constraint. X.C.2. Peak field bounds (SOC constraints) Suppose the magnetic field at sampled points $x_i\in\Gamma_c$ can be represented as \[ \mathbf{B}_i(\mathbf{x}) = \mathbf{B}_{i,0}(\mathbf{u},\mathbf{p}) + H_i(\mathbf{u},\mathbf{p})\mathbf{x}, \] with $H_i\in\mathbb{R}^{3\times n_\Phi}$. Enforcing a peak bound $\|\mathbf{B}_i\|_2\le B_{\max}$ at each sample is convex: \[ \|\mathbf{B}_{i,0}+H_i\mathbf{x}\|_2 \le B_{\max},\qquad i=1,\dots,N_B. \] This is an SOC representable constraint. X.C.3. Manufacturability/complexity spectral penalties If $\mathbf{x}$ corresponds to Fourier (or other spectral) coefficients of $\Phi$, convex complexity control can be imposed by weighted quadratic or norm constraints \[ \|W_x\mathbf{x}\|_2 \le X_{\max}, \] or by including $\|W_x\mathbf{x}\|_2^2$ in the objective. The weights $W_x$ must be declared so that high-mode suppression is auditable. X.D. Robust feasibility against manufacturing tolerances Hard constraints based on sampled values are vulnerable to both (i) manufacturing perturbations (in coil placement, currents, or geometry), and (ii) discretization/sampling error. Here we address manufacturing uncertainty via norm-bounded robust counterparts; sampling/discretization is addressed by the Lipschitz certificates of Section IX and must be applied separately when constraints represent extrema over continuous domains. Consider a constraint family indexed by $i$ of the form \[ \|\mathbf{a}_i(\mathbf{u},\mathbf{p}) + G_i(\mathbf{u},\mathbf{p})\mathbf{x}\|_2 \le b_i, \] and suppose the data $\mathbf{a}_i$ are uncertain due to manufacturing perturbations modeled as $\mathbf{a}_i\mapsto \mathbf{a}_i+U_i\,\boldsymbol{\xi}$ with $\|\boldsymbol{\xi}\|_2\le \rho$. Then a conservative robust sufficient condition is \[ \|\mathbf{a}_i+G_i\mathbf{x}\|_2 + \rho\,\|U_i\|_2 \le b_i. \] This follows from $\sup_{\|\boldsymbol{\xi}\|\le\rho}\|\mathbf{v}+U_i\boldsymbol{\xi}\|_2\le \|\mathbf{v}\|_2+\rho\|U_i\|_2$ and preserves SOC representability. More generally, for affine inequality constraints $h(\mathbf{x})=\mathbf{d}^T\mathbf{x}-e\le 0$ under coefficient perturbations $\mathbf{d}\mapsto \mathbf{d}+\Delta\mathbf{d}$ with $\|W\Delta\mathbf{d}\|_2\le \rho$, the linearized robustification produces a norm inflation of the form $\rho\|W^{-T}\mathbf{x}\|_2$, consistent with Section III.H. \textbf{Diagnostic requirement.} Any robust constraint must report the uncertainty model (norm, $\rho$, and the operators $U_i$ or $W$) with declared units. X.E. Differentiation through the convex program We now treat the coil stage as the parameterized convex program \[ V(\mathbf{u},\mathbf{p}) := \min_{\mathbf{x}}\; \phi(\mathbf{x};\mathbf{u},\mathbf{p})\quad\text{s.t.}\quad \mathbf{c}(\mathbf{x};\mathbf{u},\mathbf{p})\le \mathbf{0},\; A_{\mathrm{eq}}\mathbf{x}=\mathbf{b}_{\mathrm{eq}}. \] Let $(\mathbf{x}^*,\boldsymbol{\mu}^*,\boldsymbol{\nu}^*)$ satisfy KKT conditions with $\boldsymbol{\mu}^*\ge 0$. Under the regularity assumptions stated in Proposition III.9, the directional derivative of the optimal value is \[ V'(\mathbf{u},\mathbf{p})[\delta\mathbf{u},\delta\mathbf{p}] = \partial_{(\mathbf{u},\mathbf{p})}\phi(\mathbf{x}^*;\mathbf{u},\mathbf{p})[\delta\mathbf{u},\delta\mathbf{p}] + (\boldsymbol{\mu}^*)^T\,\partial_{(\mathbf{u},\mathbf{p})}\mathbf{c}(\mathbf{x}^*;\mathbf{u},\mathbf{p})[\delta\mathbf{u},\delta\mathbf{p}], \] with no explicit $d\mathbf{x}^*/d(\mathbf{u},\mathbf{p})$ term. For use in the end-to-end adjoint system (Section V), the coil block may also be inserted as a KKT residual (Section IV.B.5), yielding the coil adjoint component (Section V.B.1). Both viewpoints are compatible; the envelope formula is often preferable when only outer derivatives of coil-derived scalar quantities are needed. \textbf{Numerical credibility checks.} The coil module must report: – primal feasibility $\|[\mathbf{c}(\mathbf{x}^*)]_+\|$ and equality residual $\|A_{\mathrm{eq}}\mathbf{x}^*-\mathbf{b}_{\mathrm{eq}}\|$, – dual feasibility violation $\|[\boldsymbol{\mu}^*]_-\|$, – complementarity measure (or barrier parameter if interior-point is used), – duality gap (when available). If these indicate failure, derivatives carried by $\boldsymbol{\mu}^*,\boldsymbol{\nu}^*$ are not to be treated as credible. X.F. Differentiable surrogates for min-distance/peak constraints (softmin/softmax) Some coil-relevant constraints are naturally expressed as extrema over a continuum (e.g., peak field over $\Gamma_c$, minimum coil-plasma clearance over a curve/surface). Two layers are then needed: 1. \textbf{Continuous-to-sampled reduction:} choose a grid $G$ and compute sampled values. 2. \textbf{Certified continuous envelope:} apply Section IX to bound the true extrema between samples, provided a Lipschitz constant in sampling coordinates is available. When a gradient-based optimizer requires differentiability with respect to $\mathbf{x}$, one may replace sampled max/min operators by smooth surrogates. For sampled values $a_1(\mathbf{x}),\dots,a_N(\mathbf{x})$, define \[ \mathrm{softmax}_\tau(a(\mathbf{x})) := \tau\log\sum_{i=1}^N \exp(a_i(\mathbf{x})/\tau),\qquad \tau>0\ \text{declared}, \] which satisfies $\max_i a_i\le \mathrm{softmax}_\tau(a)\le \max_i a_i+\tau\log N$. A conservative differentiable constraint can be formed by enforcing \[ \mathrm{softmax}_\tau(a(\mathbf{x})) + \tau\log N \le b, \] or equivalently by using $\mathrm{softmax}_\tau$ as an objective penalty with explicit inflation recorded. Analogous $\mathrm{softmin}$ definitions apply for minima. Any such $\tau$ (and the resulting conservative inflation term) must be recorded in diagnostics because it affects both feasibility and derivatives. X.G. From continuous $\Phi$ certificate to discrete coils (filament extraction) and post-checks preserving guarantees The convex coil stage produces a continuous surface-current representation (through $\Phi$ or $\mathbf{K}$). Practical implementations often extract a finite set of filamentary coil curves $\{\gamma_k\}$ from level sets or streamlines of $\Phi$, followed by geometric smoothing. From a certification perspective, filament extraction is a nontrivial transformation that may break previously certified constraints if not re-checked. Therefore, after extraction, all engineering constraints that are ultimately required of discrete coils (clearance, curvature, peak field, inter-coil spacing) must be re-evaluated on the discrete curves and wrapped in certificates. Concretely, for each discrete coil curve $\gamma_k(t)$, define constraint-relevant scalar fields along the curve, e.g. – clearance to plasma $d_{\mathrm{cp}}(t)$, – curvature $\kappa_k(t)$, – coil field magnitude $\|\mathbf{B}(\gamma_k(t))\|$. Each must be certified over $t\in[0,2\pi]$ using Section IX with declared Lipschitz bounds in the curve parameter (or a parameterization-invariant arc-length bound when available). Without these post-checks, the extraction stage cannot inherit the convex-stage guarantees. X.H. Coil/topology constraints X.H.1. Minimum clearance, curvature, and peak field constraints For discrete coils, constraints typically take the min/max forms discussed in Section IX. For example: \[ \min_t d_{\mathrm{cp}}(t) \ge d_{\min},\qquad \max_t \kappa_k(t) \le \kappa_{\max},\qquad \max_t \|\mathbf{B}(\gamma_k(t))\|\le B_{\max}. \] Each is certified from samples $G$ and a Lipschitz bound in the curve parameterization. The conservative constraints enforced by the optimizer must use $\min^{\mathrm{lo}}$ or $\max^{\mathrm{hi}}$ values (Section IX.D.1). X.H.2. Topological degree / coil-count constraints (gap flag) Constraints on the number of coils, linkage, or topological class of curves are inherently discrete/nonconvex. While some aspects can be encoded via parametrization choices (fixing $N_c$) or via post-processed checks (e.g., verifying curve closure and non-self-intersection with sampled separation certificates), a fully rigorous, optimization-compatible enforcement of topological constraints in a discretized setting is nontrivial. \textbf{Gap flag.} Any claim that a particular discretized constraint set enforces a desired topological class requires careful verification specific to the discretization and extraction procedure. Within this framework, topology checks are treated as auditable post-conditions (certified when possible via sampling bounds) rather than as guaranteed consequences of convexity. XI. Hamiltonian Field-Line Island/Stochasticity Certificate (VMEC/SPEC $\to$ Boozer Compatible) This section specifies an optimization-safe numerical certificate intended to detect when nested-flux-surface assumptions are at risk due to magnetic islands or stochasticity. The certificate is designed to be (i) conservative (one-sided) where possible, (ii) differentiable with respect to pipeline parameters away from nonsmooth events (mode switching, resonance crossings), and (iii) compatible with the coupled implicit-adjoint viewpoint by being representable as a deterministic map of upstream quantities. The content is necessarily interface-level: rigorous island-width bounds for a fully general 3D field require careful operator definitions and, in some regimes, long-time orbit integration. Here we record a standard Hamiltonian perturbation certificate for near-integrable fields. Any mapping from the certificate outputs to a definitive physical accept/reject decision requires external validation. XI.A. Motivation: nested-surface assumptions fail near low-order rationals Several downstream quantities in an end-to-end pipeline (e.g., Boozer-coordinate metrics and derived spectra) presuppose the existence of well-defined nested surfaces on which coordinates are meaningful. When resonant perturbations create islands or stochastic layers, coordinate transforms can become ill-posed or lose continuity with respect to design parameters. An optimization loop that ignores this can accept spurious improvements in downstream metrics that are purely consequences of applying those metrics outside their domain of validity. Accordingly, the role of the present layer is not to “fix” island physics, but to provide a conservative, auditable flag that downstream surface-based metrics are not to be treated as conclusive at a design point. XI.B. Inputs: a Hamiltonian representation and perturbation spectra We adopt a near-integrable field-line Hamiltonian model with $\zeta$ as the independent variable (“time”) and canonical variables $(\psi,\theta)$, where $\psi\in[0,1]$ is a normalized toroidal-flux coordinate and $\theta\in\mathbb{T}$ is a poloidal-like angle. The field-line flow is represented by a $2\pi$-periodic-in-$\zeta$ Hamiltonian \[ \mathcal{H}(\psi,\theta,\zeta) = \mathcal{H}_0(\psi) + \epsilon\,\mathcal{H}_1(\psi,\theta,\zeta), \qquad \mathcal{H}_1(\psi,\theta,\zeta)=\sum_{(m,n)\in\mathcal{K}} h_{mn}(\psi)\cos(m\theta-n\zeta+\varphi_{mn}). \] The unperturbed frequency is \[ \iota(\psi) := \partial_{\psi}\mathcal{H}_0(\psi), \] which plays the role of rotational transform in the reduced field-line model. The certificate consumes the following numerical inputs (all as functions of $\psi$ on a declared grid with resolution metadata): 1. A transform/geometry-derived estimate $\widehat\iota(\psi)$ and, when used for certification, an auditable bound on its derivative $\iota'(\psi)$. In particular, the module must provide either: – a certified lower bound $\\underline\iota'(\psi)$ on $|\iota'(\psi)|$ over each interval on which a resonance is analyzed, or – an explicit failure/inconclusive flag when such a bound cannot be justified. 2. A finite set of perturbation Fourier modes $\{(m,n)\}$ of interest, together with mode-amplitude estimates $\widehat h_{mn}(\psi)$ and nonnegative uncertainty bounds $\Delta h_{mn}(\psi)$ such that \[ |h_{mn}(\psi) – \widehat h_{mn}(\psi)| \le \Delta h_{mn}(\psi) \] in the discretized model. If $\Delta h_{mn}$ is unavailable (e.g., truncation uncertainty not quantified), then the corresponding island-width outputs are non-certified. 3. A declared mode-truncation set $\mathcal{K}$ and any remainder bound $\Delta_{\mathrm{tail}}(\psi)$ intended to upper bound the combined influence of omitted modes on the scalar perturbation magnitude. This tail bound is optional but required if the certificate is to be interpreted as conservative beyond the explicitly included $(m,n)$. XI.C. Outputs: differentiable island-width and Chirikov-overlap risk metrics The certificate outputs two families of scalars: (a) A per-resonance island half-width upper bound $\Delta\psi_{mn}^{\mathrm{hi}}\ge 0$ (and optionally a full width $w_{mn}^{\mathrm{hi}}:=2\Delta\psi_{mn}^{\mathrm{hi}}$). (b) An overlap/stochasticity risk scalar $S^{\mathrm{hi}}$ based on a Chirikov-type overlap indicator computed from the collection $\{\Delta\psi_{mn}^{\mathrm{hi}}\}$ and certified resonance-separation bounds. The required property is one-sidedness in the discretized model: if the module returns $S^{\mathrm{hi}}<1$ under declared success flags, then the computed near-integrable model predicts no overlap among the tested resonances; if $S^{\mathrm{hi}}\ge 1$ or certification fails, the correct interpretation is "risk detected or inconclusive" (not "nested surfaces confirmed"). XI.D. Certification philosophy and differentiability The certificate is constructed as a composition of (i) root-finding for resonance locations, (ii) local Taylor approximations of $\iota$ at resonances, and (iii) a pendulum-model island width formula. The basic nonsmooth events are: 1. resonance creation/annihilation when $m\iota(\psi)-n$ loses a root in $[0,1]$, 2. changes in which mode dominates an aggregate bound (max over modes), 3. $\iota'(\psi_*)\to 0$ (shearless or near-shearless points) where the standard width formula becomes singular. To maintain differentiability for use in gradient-based algorithms, max/min operations may be replaced by declared softmax/softmin surrogates (Section VI.E) at the level of the risk scalar $S$; the conservative inflation implied by the surrogate must be recorded in diagnostics. XI.E. A near-integrable island-width upper bound from a single resonant mode Fix a mode $(m,n)$ with $m\ge 1$. Define the resonance function \[ F_{mn}(\psi) := m\iota(\psi)-n. \] A resonant surface $\psi_*$ (for this mode) is a root $F_{mn}(\psi_*)=0$. Suppose $F_{mn}$ is differentiable and $|F_{mn}'(\psi)|=m|\iota'(\psi)|$ admits a certified lower bound $\\underline\alpha_{mn}>0$ on an interval $I$ containing $\psi_*$. Then $F_{mn}$ is strictly monotone on $I$, so the root is unique on $I$. XI.E.1. Resonance location enclosure from a root residual Let $\widehat\psi_*\in I$ be a computed approximate root and define the residual $\rho:=|F_{mn}(\widehat\psi_*)|$. By the mean value theorem, there exists $\xi$ between $\widehat\psi_*$ and the true root $\psi_*$ such that \[ \rho = |F_{mn}(\widehat\psi_*)-F_{mn}(\psi_*)| = |F’_{mn}(\xi)|\,|\widehat\psi_* – \psi_*| \ge \\underline\alpha_{mn}\,|\widehat\psi_* – \psi_*|. \] Hence the root location is enclosed by \[ |\widehat\psi_* – \psi_*| \le \frac{\rho}{\\underline\alpha_{mn}}. \] This bound is used to conservatively inflate uncertainties that depend on evaluating $h_{mn}(\psi)$ and $\iota'(\psi)$ at $\psi_*$. XI.E.2. Pendulum-model island width (discrete certificate form) Under the standard single-resonance approximation in canonical form, the effective Hamiltonian near $\psi_*$ reduces (after a near-identity transformation) to a pendulum Hamiltonian \[ \mathcal{H}_{\mathrm{eff}}(I,\phi) \approx \frac{a}{2}I^2 – \epsilon\,|h_{mn}(\psi_*)|\cos\phi, \qquad a := m\iota'(\psi_*), \] where $I$ is an action-like coordinate proportional to $\psi-\psi_*$ and $\phi=m\theta-n\zeta+\varphi_{mn}$. For this model, the separatrix half-width in $I$ is $2\sqrt{\epsilon|h_{mn}(\psi_*)|/|a|}$, and the full width is $4\sqrt{\epsilon|h_{mn}(\psi_*)|/|a|}$. Translating back to $\psi$ yields the standard width scaling. For certification we define a conservative upper bound (in the discretized model) \[ \Delta\psi_{mn}^{\mathrm{hi}} := 2\sqrt{\frac{\epsilon\,H_{mn}^{\mathrm{hi}}}{m\,\\underline s_{mn}}}, \qquad H_{mn}^{\mathrm{hi}} := \max\bigl(0,\ |\widehat h_{mn}(\widehat\psi_*)| + \Delta h_{mn}(\widehat\psi_*) + \Delta_{\mathrm{tail}}(\widehat\psi_*)\bigr), \] where $\\underline s_{mn}>0$ is a certified lower bound on $|\iota'(\psi)|$ on the resonance enclosure interval around $\widehat\psi_*$. If $\\underline s_{mn}\le 0$ or cannot be produced, the output for that $(m,n)$ is marked inconclusive. The quantity $H_{mn}^{\mathrm{hi}}$ is written to allow both per-mode uncertainty $\Delta h_{mn}$ and a tail bound $\Delta_{\mathrm{tail}}$. If only $\Delta h_{mn}$ is available, set $\Delta_{\mathrm{tail}}\equiv 0$ and treat the certificate as conditional on the declared truncation set $\mathcal{K}$. XI.F. Chirikov-type overlap risk metric with certified upper bounds Consider two resonances $(m_1,n_1)$ and $(m_2,n_2)$ with enclosed resonance locations $\psi_{*,1}\in[\widehat\psi_{*,1}-\delta_1,\widehat\psi_{*,1}+\delta_1]$ and $\psi_{*,2}\in[\widehat\psi_{*,2}-\delta_2,\widehat\psi_{*,2}+\delta_2]$, where $\delta_i=\rho_i/\\underline\alpha_i$ from XI.E.1. A conservative lower bound on their separation is \[ \Delta\psi_{12}^{\mathrm{lo}} := \max\bigl(0,\ |\widehat\psi_{*,1}-\widehat\psi_{*,2}| – (\delta_1+\delta_2)\bigr). \] Define the pairwise overlap indicator upper bound \[ S_{12}^{\mathrm{hi}} := \begin{cases} \displaystyle \frac{\Delta\psi_{m_1n_1}^{\mathrm{hi}}+\Delta\psi_{m_2n_2}^{\mathrm{hi}}}{\Delta\psi_{12}^{\mathrm{lo}}}, & \Delta\psi_{12}^{\mathrm{lo}}>0,\\ +\infty, & \Delta\psi_{12}^{\mathrm{lo}}=0. \end{cases} \] A global risk scalar can be taken as \[ S^{\mathrm{hi}} := \max_{(1,2)\in\mathcal{P}_{\mathrm{res}}} S_{12}^{\mathrm{hi}}, \] where $\mathcal{P}_{\mathrm{res}}$ is a declared set of resonance pairs deemed relevant (e.g., neighboring low-order rationals). For differentiability, the max may be replaced by $\mathrm{softmax}_\tau$ with declared $\tau$, together with the corresponding conservative inflation. \textbf{Interpretation rule.} – If $S^{\mathrm{hi}}<1$ under successful certification flags for all contributing modes/pairs, the near-integrable model predicts non-overlapping islands for the tested resonances. - If $S^{\mathrm{hi}}\ge 1$, overlap risk is detected in the certificate model. - If any required inputs (resonance enclosures, $|\iota'|$ lower bounds, amplitude bounds) are missing or inconclusive, then $S^{\mathrm{hi}}$ is marked inconclusive and must not be treated as evidence of nested surfaces. XI.G. Integration: explicit trigger criteria for metric rejection Downstream surface-based metrics that assume nested flux surfaces may be treated as conclusive only when the island/stochasticity certificate is successful and indicates low risk. A minimal auditable trigger is: \[ \text{if } S^{\mathrm{hi}}\ge S_{\max}\ \text{or certification fails, then mark Boozer-derived metrics as inconclusive}, \] where $S_{\max}\in(0,1)$ is a declared threshold providing a safety margin below the overlap condition. The threshold is an engineering decision and must be recorded in diagnostics. XI.H. Experimental falsifier: square-root island-width scaling under controlled resonant perturbations Within the pendulum model, the island width scales as the square root of the resonant perturbation amplitude. A falsifiable experimental signature is therefore: - Apply a controlled resonant perturbation that predominantly modifies a single $(m,n)$ harmonic of the relevant normal-field/perturbation spectrum over a limited range of $\psi$. If the near-integrable model is applicable, the measured island width (in an appropriate diagnostic proxy) should scale approximately as $w_{mn}\propto \sqrt{|h_{mn}|}$ over a range of perturbation amplitudes. Systematic deviations (after accounting for diagnostic resolution and mode mixing) indicate that either (i) the near-integrable Hamiltonian reduction is not valid in the tested regime, or (ii) the pipeline's mapping from upstream equilibrium outputs to $h_{mn}$ is inconsistent. XI.I. Gap flag: certificate tightness and mapping from spectra to island metrics need external validation The certificate above is a conservative statement about a specific reduced Hamiltonian model whose inputs are spectra $h_{mn}(\psi)$ and transform profiles $\iota(\psi)$ produced by upstream computations. Whether (and for which configurations) these inputs are sufficiently accurate, whether omitted higher-order terms materially change island widths, and whether a threshold on $S^{\mathrm{hi}}$ reliably predicts downstream failure modes must be established by external validation against trusted field-line tracing and/or experimental diagnostics. Within this paper, the island/stochasticity layer is therefore an auditable *rejection filter* for nested-surface-dependent metrics, not a physical proof of integrability. XII. Energetic-Particle / Alpha Prompt-Loss Certificate from Boozer Data This section specifies a conservative, optimization-compatible wrapper for a fast-ion (or alpha) prompt-loss proxy driven by Boozer-coordinate data. The goal is not to predict detailed loss physics, but to prevent an optimizer from exploiting numerically fragile trajectory indicators or poorly resolved phase-space sampling. The wrapper is stated for a finite-dimensional discretization of the magnetic field in Boozer form and a chosen reduced guiding-center model. XII.A. Motivation and requirements A prompt-loss indicator is typically computed by integrating a reduced particle model over a finite time horizon and declaring loss if the trajectory exits a confinement region. Such an indicator can be brittle: (i) it depends on time stepping, stopping tolerances, and event detection; (ii) it is non-smooth at the loss boundary; and (iii) it can be falsely optimistic if phase-space sampling misses the worst-case initial conditions. The certification objective in this paper is therefore numerical and algorithmic: 1. Make the oracle one-sided and conservative: if a design point is declared safe by the certificate, then the chosen discrete model does not predict loss for any initial condition in a declared set, up to a declared discretization/time-integration uncertainty. 2. Provide explicit inconclusive semantics when the certificate prerequisites fail (missing Lipschitz bounds, integrator failure, non-differentiable event handling not smoothed). 3. Permit differentiation through a smooth conservative surrogate of the loss functional (when used as an objective/constraint ingredient), while still using the conservative certificate for pass/fail. XII.B. Inputs from the existing pipeline Fix design parameters $\mathbf{p}$ and a solved state $\mathbf{U}(\mathbf{p})$. The prompt-loss layer consumes: 1. A Boozer-based discrete representation of $\mathbf{B}(\psi,\theta,\zeta)$ and any metric coefficients required by a chosen reduced guiding-center model, packaged as a finite set of coefficients $\mathbf{b}$ and a reconstruction operator $\mathcal{B}(\mathbf{b};\xi)$ that yields field values and required derivatives at phase-space points $\xi$. 2. A declared confinement region $\mathcal{D}\subset\mathbb{R}^{n_s}$ in the reduced phase-space variables $\xi$ (e.g., $\xi=(\psi,\theta,\zeta,v_\parallel,\mu)$ or a further reduction), together with a subset $\Xi_0\subset\mathcal{D}$ of admissible initial conditions. 3. A time horizon $T>0$ and an ODE (or differential-algebraic) model \[ \dot\xi = f(\xi,t;\mathbf{b}),\qquad t\in[0,T], \] with a declared numerical integrator used to produce approximate trajectories $\tilde\xi(t;\xi_0)$. 4. A sampling set $G\subset\Xi_0$ intended to be a $\delta$-net in a declared norm $\|\cdot\|$ on $\Xi_0$ (Section III.F), together with the computed covering radius $\delta$. XII.C. Construction of one-sided prompt-loss bounds XII.C.1. A loss functional with conservative smoothing Define a scalar function $d\colon \mathcal{D}\to\mathbb{R}$ representing signed distance (or a surrogate) to the boundary of $\mathcal{D}$ such that \[ \xi\in\mathcal{D}\iff d(\xi)\ge 0. \] Loss corresponds to $d(\xi(t))<0$ for some $t\in[0,T]$. The hard indicator $ \mathbb{I}_{\mathrm{loss}}(\xi_0)=\mathbf{1}\{\min_{t\in[0,T]} d(\xi(t;\xi_0))<0\} $ is non-smooth. For optimization, we introduce a smooth conservative surrogate of the ";minimum distance to loss" functional. Define the minimum-distance functional \[ \mathcal{D}_{\min}(\xi_0) := \min_{t\in[0,T]} d(\xi(t;\xi_0)). \] For differentiability, replace the minimum by a soft-min with temperature $\tau>0$ and a discrete time grid $0=t_0<\cdots<t_M=T$: \[ \widehat{\mathcal{D}}_{\min,\tau}(\xi_0) := -\tau\log\sum_{k=0}^{M} \exp\bigl(-d(\tilde\xi(t_k;\xi_0))/\tau\bigr). \] This satisfies the standard bounds \[ \min_k d(\tilde\xi(t_k;\xi_0)) - \tau\log(M+1) \le \widehat{\mathcal{D}}_{\min,\tau}(\xi_0) \le \min_k d(\tilde\xi(t_k;\xi_0)). \] Thus, if the certificate ultimately uses $\min_k d(\tilde\xi(t_k;\xi_0))$ for conservatism, $\widehat{\mathcal{D}}_{\min,\tau}$ can be used only as a differentiable auxiliary quantity; the parameters $(\tau,M)$ must be included in diagnostics. XII.C.2. Lipschitz propagation through ODE flow (Gr\"onwall certificate) We require a bound on how perturbations in initial conditions affect trajectories. \textbf{Assumption XII.1 (Uniform Jacobian bound for the reduced model).} Assume $f$ is continuously differentiable in $\xi$ and there exists $L_f\ge 0$ such that \[ \|\partial_{\xi} f(\xi,t;\mathbf{b})\| \le L_f\qquad\text{for all }(\xi,t)\in\mathcal{D}\times[0,T], \] where $\|\cdot\|$ is the induced operator norm for the chosen phase-space norm. \textbf{Lemma XII.2 (Flow Lipschitz constant).} Let $\xi(t;\xi_0)$ and $\xi(t;\eta_0)$ be two solutions that remain in $\mathcal{D}$ on $[0,T]$. Under Assumption XII.1, \[ \|\xi(t;\xi_0)-\xi(t;\eta_0)\| \le e^{L_f t}\,\|\xi_0-\eta_0\|,\qquad t\in[0,T]. \] \textbf{Proof.} Let $\Delta(t)=\xi(t;\xi_0)-\xi(t;\eta_0)$. Then $\dot\Delta=f(\xi,t;\mathbf{b})-f(\eta,t;\mathbf{b})$. By the mean value theorem and the Jacobian bound, $\|\dot\Delta\|\le L_f\|\Delta\|$. Gr\"onwall's inequality yields $\|\Delta(t)\|\le e^{L_f t}\|\Delta(0)\|$. $\square$ Assume additionally that $d$ is Lipschitz on $\mathcal{D}$: $|d(\xi)-d(\eta)|\le L_d\|\xi-\eta\|$. Then for any fixed time $t$, the composition $\xi_0\mapsto d(\xi(t;\xi_0))$ is Lipschitz with constant $L_d e^{L_f t}$, and hence the minimum over $t\in[0,T]$ is Lipschitz with constant $L_d e^{L_f T}$: \[ |\mathcal{D}_{\min}(\xi_0)-\mathcal{D}_{\min}(\eta_0)|\le L_d e^{L_f T}\,\|\xi_0-\eta_0\|. \] XII.C.3. Certified worst-case loss margin from discrete samples Define the worst-case (most negative) minimum-distance margin \[ \mathcal{D}_{\mathrm{wc}} := \min_{\xi_0\in\Xi_0} \mathcal{D}_{\min}(\xi_0). \] Let $G\subset\Xi_0$ be a $\delta$-net. Using Theorem III.8 with Lipschitz constant $L_{\min}:=L_d e^{L_f T}$, we obtain the certified lower bound \[ \mathcal{D}_{\mathrm{wc}} \ge \min_{g\in G} \mathcal{D}_{\min}(g) - L_{\min}\,\delta. \] In practice $\mathcal{D}_{\min}(g)$ is approximated numerically by a time-discretized minimum $\min_k d(\tilde\xi(t_k;g))$. To maintain one-sidedness, the certificate must include an additional nonnegative integration/time-discretization uncertainty $\Delta_{\mathrm{int}}$ such that \[ \mathcal{D}_{\min}(g) \ge \min_k d(\tilde\xi(t_k;g)) - \Delta_{\mathrm{int}}. \] (If $\Delta_{\mathrm{int}}$ is not available, the certificate is inconclusive.) Thus the computable certified lower bound is \[ \mathcal{D}_{\mathrm{wc}}^{\mathrm{lo}} := \min_{g\in G}\Bigl(\min_k d(\tilde\xi(t_k;g))\Bigr) - \Delta_{\mathrm{int}} - L_d e^{L_f T}\,\delta. \] \textbf{Certified prompt-loss rule.} If $\mathcal{D}_{\mathrm{wc}}^{\mathrm{lo}}\ge 0$, then for the declared reduced model, initial-condition set $\Xi_0$, and time horizon $T$, the computation provides a conservative certificate of no prompt loss (no trajectory reaches $d<0$) within $[0,T]$. If $\mathcal{D}_{\mathrm{wc}}^{\mathrm{lo}}<0$, the result indicates loss risk or inconclusive safety. XII.D. Differentiation through the certificate and integration into constraints/objectives The certified bound $\mathcal{D}_{\mathrm{wc}}^{\mathrm{lo}}$ involves minima over samples and time indices and is generally non-smooth. For gradient-based optimization one may: 1. Use a smooth surrogate (softmin over $g\in G$ and $k$) as an objective/penalty ingredient, while preserving conservatism by adding the corresponding inflation terms (softmin error bounds) and the explicit certificate corrections $\Delta_{\mathrm{int}}$ and $L_d e^{L_f T}\delta$. 2. Differentiate the sampled surrogate by differentiating through the time integrator as an implicit system (Section V), i.e., treat the discrete time-stepping map as residual equations and apply an adjoint through time. In either case, the pass/fail constraint must be evaluated using $\mathcal{D}_{\mathrm{wc}}^{\mathrm{lo}}$ (or a more conservative variant), not the smooth surrogate. XII.E. Configuration outputs for validation targeting To make the layer falsifiable and diagnosable, the module must output: 1. The worst-case sampled initial condition $g^*\in G$ attaining (or nearly attaining) the sampled minimum, and the corresponding time index $k^*$ at which $d(\tilde\xi(t_{k^*};g^*))$ is smallest. 2. The certificate scalars $L_f$, $L_d$, $\delta$, and $\Delta_{\mathrm{int}}$, with metadata describing how each bound was obtained. 3. The certified margin $\mathcal{D}_{\mathrm{wc}}^{\mathrm{lo}}$ and the raw sampled minimum $\min_{g\in G,k} d(\tilde\xi(t_k;g))$, so the magnitude of conservatism is auditable. XII.F. Gap flag: physical adequacy and conservatism level The certificate above is a rigorous statement only for the declared reduced guiding-center model, the declared set $\Xi_0$, and the declared time horizon $T$, subject to the correctness of the Lipschitz and integration-error bounds. Whether the chosen reduced model, initial-condition set, and time horizon are physically adequate surrogates for alpha prompt loss in the targeted regime requires external validation (benchmarking against trusted orbit-following tools and/or experimental fast-ion diagnostics). This section therefore provides a numerically conservative oracle, not a verified physical loss prediction. XIII. Shape Calculus for Boundary Optimization and Global/Topological Constraints This section records the shape-derivative structure that underlies boundary optimization of embedded tori and clarifies how continuous (mesh-independent) shape gradients relate to finite Fourier parameterizations used in Section II. The results are standard in shape calculus and are stated here as an interface between continuous geometry variation and discrete adjoint gradients. XIII.A. Shape space as a manifold of embedded tori modulo reparameterization Let $\Gamma\subset\mathbb{R}^3$ be a smooth embedded torus representing the plasma boundary. A (small) deformation of $\Gamma$ is described by a smooth vector field $V\colon\mathbb{R}^3\to\mathbb{R}^3$, generating a perturbed surface \[ \Gamma_\epsilon := (\mathrm{Id}+\epsilon V)(\Gamma) \] for sufficiently small $|\epsilon|$. Variations tangential to $\Gamma$ correspond to reparameterizations and do not change geometric functionals that depend only on the embedded surface; this is reflected in the classical result that shape derivatives depend only on $V\cdot n$ on $\Gamma$, where $n$ is the unit normal. XIII.B. Hadamard structure theorem for shape derivatives Let $J(\Gamma)\in\mathbb{R}$ be a shape functional. The (Eulerian) shape derivative in direction $V$ is \[ DJ(\Gamma)[V] := \frac{d}{d\epsilon}\Big|_{0} J(\Gamma_\epsilon), \] when the limit exists. \textbf{Theorem XIII.1 (Hadamard structure; interface form).} Assume $J$ is shape differentiable at $\Gamma$ and is invariant under reparameterization of $\Gamma$. Then there exists a scalar density $G\in L^2(\Gamma)$ (a shape gradient) such that \[ DJ(\Gamma)[V] = \int_{\Gamma} G(x)\,(V(x)\cdot n(x))\,dS(x) \] for all sufficiently smooth vector fields $V$. The density $G$ is not unique if additional constraints (e.g., volume preservation) are imposed; however, any implementation claiming a shape gradient must output a $G$ that reproduces directional derivatives $DJ(\Gamma)[V]$ for normal variations. XIII.C. Discretization into Fourier boundary parameters and consistency with discrete adjoints Suppose the boundary is parameterized by a smooth map $X(\theta,\zeta;\mathbf{p}_{\mathrm{geom}})$ as in Section II.A.1. A parameter perturbation $\delta\mathbf{p}$ induces a deformation field on the surface \[ V_{\delta\mathbf{p}}(X(\theta,\zeta)) := \partial_{\mathbf{p}} X(\theta,\zeta;\mathbf{p})\,\delta\mathbf{p}. \] Assuming $J$ is shape differentiable and using Theorem XIII.1, \[ \frac{d}{d\epsilon}\Big|_{0} J\bigl(\Gamma(\mathbf{p}+\epsilon\delta\mathbf{p})\bigr) = \int_{\Gamma(\mathbf{p})} G(x)\,\bigl(V_{\delta\mathbf{p}}(x)\cdot n(x)\bigr)\,dS(x). \] If $\partial_{\mathbf{p}}X$ is expressed in the Fourier basis used in Section II, then the gradient with respect to $\mathbf{p}$ is obtained by pulling back to parameter space: \[ \nabla_{\mathbf{p}} J(\mathbf{p})\cdot\delta\mathbf{p} = \int_{\mathbb{T}^2} G\bigl(X(\theta,\zeta)\bigr)\,\Bigl(\partial_{\mathbf{p}}X(\theta,\zeta;\mathbf{p})\,\delta\mathbf{p}\cdot n(\theta,\zeta)\Bigr)\,J(\theta,\zeta)\,d\theta\,d\zeta, \] where $J(\theta,\zeta)=\|\partial_\theta X\times\partial_\zeta X\|$ is the surface Jacobian. \textbf{Consistency requirement (continuous vs. discrete gradients).} If a discrete pipeline provides gradients $\nabla_{\mathbf{p}}\widehat\Psi$ via the coupled adjoint system (Section V), then for shape-dominated quantities $\Psi$ one can check consistency by comparing directional derivatives from the discrete adjoint against the continuous shape-gradient identity above using normal-direction test variations. Disagreement beyond declared discretization and solver tolerances indicates either (i) an inconsistency in how geometry enters the discrete residuals $\mathcal{R}$, or (ii) insufficient geometric resolution. XIII.D. Topological degree constraints tied to winding-surface requirements Some design requirements are global and topological (e.g., the boundary remains an embedded torus, the winding surface remains in the same isotopy class, or a map preserves orientation). In continuous settings, such constraints are not expressible as simple pointwise inequalities. A common practical proxy is to enforce an orientation-preserving immersion constraint via a certified lower bound on the surface Jacobian $J$ (Section IX.C.1). If $J_{\min}^{\mathrm{lo}}>0$, the parameterization is an immersion with consistent orientation. This does not, by itself, prove embeddedness; nevertheless it excludes fold singularities that invalidate many downstream numerical operators. XIII.E. Integration with the end-to-end pipeline The shape-calculus identities are used only as an interpretive layer: they explain why gradients should be mesh-independent (depend on normal variations) and provide a path to compare and debug discrete adjoints. The end-to-end sensitivity pipeline remains the coupled implicit-adjoint system of Section V applied to the fully discretized residual (Section IV). XIII.F. Gap flag: rigorous enforcement of topological constraints Rigorous enforcement of embeddedness, self-avoidance, and topological constraints for discretized tori and coil curves typically requires specialized geometric algorithms and, in some cases, exact arithmetic or certified interval methods. The present framework therefore treats topology constraints as (i) fixed-by-parameterization choices where possible, and (ii) auditable post-checks with conservative sampling certificates when available. Any stronger claim requires external verification specific to the discretization. XIV. Robust Optimization and Certified Worst-Case Margins Across Modules This section unifies (i) numerical certification error bars (Sections VI, VII, IX, X, XII) with (ii) manufacturing/parameter uncertainty (Section III.H) to produce optimization-safe worst-case constraint values. The focus is on conservative bounds that are computable from outputs already required by the pipeline: certified intervals, adjoint gradients, and declared uncertainty models. XIV.A. Unified uncertainty model on parameters with declared units Let $\mathbf{p}\in\mathbb{R}^{n_p}$ be the nominal design parameters and let $\delta\mathbf{p}$ represent uncertainty. We assume an ellipsoidal uncertainty set \[ \mathcal{U} := \{\delta\mathbf{p}: \|W\,\delta\mathbf{p}\|_2\le \rho\}, \] with invertible $W$ and magnitude $\rho>0$. The choice of coordinates and units is part of the model and must be reported. XIV.B. Linearized worst-case bounds for generic certified constraints Let $g(\mathbf{p})\le 0$ be an intended constraint. In the present pipeline, the quantity available to the optimizer may already be conservative due to numerical certification: \[ \text{(numerical certificate)}\qquad g_{\mathrm{cert}}(\mathbf{p}) := \widehat g(\mathbf{p}) + \Delta_{\mathrm{num}}(\mathbf{p}), \] where $\Delta_{\mathrm{num}}\ge 0$ is assembled from module-level certificates (e.g., Boozer metric bound, eigenvalue lower bound converted to a conservative $g$, Lipschitz sampling envelopes, convex-coil feasibility margins). Assume $g_{\mathrm{cert}}$ is differentiable at $\mathbf{p}$ (or a differentiable surrogate is used for gradient computation while the conservative value is used for pass/fail). Then the linearized robust upper bound over $\mathcal{U}$ is \[ \sup_{\delta\mathbf{p}\in\mathcal{U}} \bigl(g_{\mathrm{cert}}(\mathbf{p}) + \nabla g_{\mathrm{cert}}(\mathbf{p})^T\delta\mathbf{p}\bigr) = g_{\mathrm{cert}}(\mathbf{p}) + \rho\,\|W^{-T}\nabla g_{\mathrm{cert}}(\mathbf{p})\|_2 \] by Proposition III.10. \textbf{Certified robust constraint value.} We define \[ g_{\mathrm{rob}}(\mathbf{p}) := g_{\mathrm{cert}}(\mathbf{p}) + \rho\,\|W^{-T}\nabla g_{\mathrm{cert}}(\mathbf{p})\|_2. \] Then enforcing $g_{\mathrm{rob}}(\mathbf{p})\le 0$ guarantees the linearized constraint is satisfied for all $\delta\mathbf{p}\in\mathcal{U}$ and includes numerical uncertainty through $\Delta_{\mathrm{num}}$. \textbf{Reporting requirement.} The pipeline must report the decomposition $ g_{\mathrm{rob}} = \widehat g + \Delta_{\mathrm{num}} + \rho\|W^{-T}\nabla g_{\mathrm{cert}}\|_2 $ so that robustness inflation is auditable. XIV.C. Robust stability margins Let the stability feasibility condition be $\lambda\ge \lambda_{\min}$, encoded as $g_{\mathrm{stab}}=\lambda_{\min}-\lambda\le 0$. The stability certificate provides $\lambda^{\mathrm{lo}}\le \lambda$, hence the conservative constraint value is \[ g_{\mathrm{stab,cert}} = \lambda_{\min}-\lambda^{\mathrm{lo}}. \] When $\nabla \lambda^{\mathrm{lo}}$ is available and credible, a linearized robust stability constraint is \[ g_{\mathrm{stab,rob}} := (\lambda_{\min}-\lambda^{\mathrm{lo}}) + \rho\,\|W^{-T}\nabla(\lambda_{\min}-\lambda^{\mathrm{lo}})\|_2 = (\lambda_{\min}-\lambda^{\mathrm{lo}}) + \rho\,\|W^{-T}\nabla\lambda^{\mathrm{lo}}\|_2. \] If $\nabla\lambda^{\mathrm{lo}}$ is flagged non-credible (e.g., near eigenvalue multiplicity or certification discontinuity), then the robustification must be treated as inconclusive and the correct response is refinement or rejection for robustness-based decisions. XIV.D. Composition of numerical certificates with manufacturing uncertainty Many constraints have the structure \[ \text{true quantity }\mathcal{Q}(\mathbf{p})\ \le\ \widehat{\mathcal{Q}}(\mathbf{p}) + \Delta_{\mathrm{num}}(\mathbf{p}). \] A robust bound for $\mathcal{Q}(\mathbf{p}+\delta\mathbf{p})$ is obtained by adding the manufacturing uncertainty bound to the certified upper value: \[ \mathcal{Q}(\mathbf{p}+\delta\mathbf{p}) \le \widehat{\mathcal{Q}}(\mathbf{p}) + \Delta_{\mathrm{num}}(\mathbf{p}) + \rho\,\|W^{-T}\nabla \widehat{\mathcal{Q}}_{\mathrm{cert}}(\mathbf{p})\|_2 \]\nunder the same linearization assumptions. This explicitly separates numerical vs. physical uncertainty contributions and avoids double counting. XIV.E. Optimization algorithm compatibility: inexact oracle and rejection triggers When objective/constraint evaluation is certified only up to a known error bar, optimization steps should be accepted only if predicted improvements dominate uncertainty. A minimal conservative acceptance condition for a scalar merit function $\varphi$ is: \[ \widehat\varphi(\mathbf{p}_{\mathrm{new}})+\Delta\varphi(\mathbf{p}_{\mathrm{new}}) \le \widehat\varphi(\mathbf{p}_{\mathrm{old}})+\Delta\varphi(\mathbf{p}_{\mathrm{old}}) – c\,\mathrm{pred}, \] with $c\in(0,1)$ and $\mathrm{pred}\ge 0$ provided by the algorithm model (Section VI.D.2). \textbf{Rejection trigger.} If certificates imply that $\Delta\varphi$ (or any constraint certificate) is too large to decide improvement/feasibility at the step scale, the correct response is certificate-driven refinement or step rejection; the pipeline must not silently accept the nominal values. XIV.F. Minimal I/O hooks for robust certificates For each constraint $g_i$, the robust wrapper requires: 1. The conservative certified value $g_{i,\mathrm{cert}}$ (already in the pipeline contract for each module). 2. A credible gradient $\nabla g_{i,\mathrm{cert}}$ from the coupled adjoint system (Section V), with the derivative-credibility flags. 3. The uncertainty model $(W,\rho)$ in declared units. The output is $g_{i,\mathrm{rob}}$ and its decomposition into nominal, numerical-certificate, and robustness inflation terms. XV. Implementation Contract, Diagnostics, and Rejection Protocol This section consolidates the pipeline’s minimal interface requirements into an auditable contract: each module must provide sufficient information to (i) evaluate objectives/constraints conservatively, (ii) differentiate end-to-end via implicit adjoints, and (iii) reject evaluations that are numerically non-credible. XV.A. Module I/O specification for reproducible end-to-end differentiation For each module $\mathcal{M}$ that contributes a residual block $\mathcal{R}_k$ to the coupled system (Section IV), the implementation must provide, at the converged state: 1. \textbf{Primal outputs:} the state variables for the block (e.g., $\mathbf{u}$, $\mathbf{y}$, $(\lambda,\mathbf{x})$, $\mathbf{j}$, $(\mathbf{z},\boldsymbol{\mu},\boldsymbol{\nu})$). 2. \textbf{Residual and convergence diagnostics:} $\|\mathcal{R}_k\|$ (in a declared norm), iteration counts, termination flags, and solver tolerances. 3. \textbf{Linearization actions:} (a) JVP $\mathbf{w}\mapsto \partial_{\mathbf{U}}\mathcal{R}_k\,\mathbf{w}$, (b) VJP $\boldsymbol{\lambda}\mapsto (\partial_{\mathbf{U}}\mathcal{R}_k)^T\boldsymbol{\lambda}$, (c) parameter VJP $\boldsymbol{\lambda}\mapsto (\partial_{\mathbf{p}}\mathcal{R}_k)^T\boldsymbol{\lambda}$. 4. \textbf{Conditioning / credibility indicators:} for linear solves, a lower singular-value bound or a declared conditioning proxy; for eigenproblems, residual norms and any gap/separation evidence used for certification; for KKT systems, primal/dual feasibility and complementarity/duality gap. 5. \textbf{Certificates:} if a block is used in constraints, it must return conservative bounds $[\cdot^{\mathrm{lo}},\cdot^{\mathrm{hi}}]$ or equivalent scalars $\Delta$ sufficient to compute a conservative constraint value (Sections VI, VII, IX, X, XII). XV.B. Per-iteration report schema A pipeline evaluation at $\mathbf{p}$ must output \[ \mathcal{E}(\mathbf{p})=(\widehat{\mathcal{J}},\widehat{\mathbf{g}},\mathsf{Diag}), \] where $\mathsf{Diag}$ contains, for each reported scalar observable $\mathcal{O}$, the triple: 1. $\widehat{\mathcal{O}}$: nominal computed value. 2. $\Delta\mathcal{O}$ or interval $[\mathcal{O}^{\mathrm{lo}},\mathcal{O}^{\mathrm{hi}}]$: conservative numerical certificate (or an explicit marker that no certificate is available). 3. Metadata: discretization parameters, solver tolerances, residual norms, and any smoothing parameters ($\varepsilon,\tau$, etc.). Constraints $g_i\le 0$ must be reported both as nominal $\widehat g_i$ and as conservative certified $g_{i,\mathrm{cert}}$ when applicable. XV.C. Mandatory rejection criteria The pipeline must return an explicit status $\mathsf{Status}\in\{\mathrm{conclusive},\mathrm{inconclusive},\mathrm{failed}\}$ and must set $\mathsf{Status}=\mathrm{inconclusive}$ (or $\mathrm{failed}$) whenever any of the following occurs for any constraint-relevant module: 1. \textbf{Boozer certification insufficient:} missing or nonpositive $\\underline\sigma$ bound, missing truncation bound $\Delta_{\mathrm{trunc}}$, or inability to form $\Delta\mathcal{M}$ (Section VI). 2. \textbf{Stability certificate missing or nonpositive margin:} inability to form $\lambda^{\mathrm{lo}}$ with declared evidence, or $\lambda^{\mathrm{lo}}\le \lambda_{\min}$ when feasibility is required (Section VII). 3. \textbf{Coil certificate violated or KKT non-credible:} primal/dual feasibility not achieved to declared tolerances, complementarity/duality gap too large, or certified engineering constraints violated (Section X). 4. \textbf{Bootstrap contraction/conditioning violated:} inability to provide a certified contraction upper bound $L^{\mathrm{hi}}$ or detection of near-singularity of $I-\partial_{\mathbf{j}}\mathcal{T}$ when derivatives are used (Section VIII). 5. \textbf{Island/stochasticity risk or failure:} certification failure or $S^{\mathrm{hi}}$ exceeding a declared threshold used to gate nested-surface-dependent metrics (Section XI). 6. \textbf{Adjoint correctness failure:} persistent failure of finite-difference directional checks or JVP/VJP consistency checks (Section V.F). If $\mathsf{Status}\neq\mathrm{conclusive}$, the optimizer must treat the affected objective/constraint evaluations as invalid for accept/reject decisions. XV.D. Falsifiable experiments and diagnostic comparisons on existing devices This paper’s implementation contract is designed to expose intermediate observables that can be compared against measurements (Sections VIII.F, XI.H, XII.E). The existence of such observables is part of the auditability requirement: a module that cannot produce diagnostics that could, in principle, be checked against data should be treated as a heuristic surrogate and not used for hard constraints without additional external justification. XV.E. Gap flag: mapping from diagnostics to definitive physical conclusions The diagnostic and certification outputs provide numerical trustworthiness statements for the discrete pipeline components. Converting those into physical accept/reject decisions (e.g., mapping $\lambda^{\mathrm{lo}}>0$ to operational stability) requires device- and regime-specific validation. This remains an explicit external gap. XVI. Limitations, Open Gaps, and Required External Verification This section records limitations that follow from the mathematical structure of the pipeline and from the fact that several modules are treated as interfaces rather than physically validated models. XVI.A. Unverified assumptions embedded in closures/surrogates and proxy models 1. \textbf{Proxy adequacy.} Stability and loss metrics are defined via discretized proxy problems (generalized eigenvalues, reduced ODEs). The certification layers guarantee numerical soundness for those proxies, not their physical completeness. 2. \textbf{Certificate prerequisites.} Several one-sided bounds require additional information (singular-value lower bounds, separation bounds for eigenvalues, Lipschitz constants, integration-error bounds). If these are not supplied, the correct output is inconclusive rather than an optimistic feasibility claim. 3. \textbf{Declared smoothing.} Differentiability is often obtained by smoothing nonsmooth primitives (softmax/softmin, $\sqrt{x^2+\varepsilon^2}$). These change the optimization problem; the parameters must be declared, and feasibility decisions must be made against conservative bounds that account for smoothing error. XVI.B. Conditions under which differentiability/certification claims may fail 1. \textbf{Nonsmooth events.} Model switching (e.g., equilibrium representation selection), resonance creation/annihilation, and active-set changes in KKT systems can introduce nonsmooth dependence on $\mathbf{p}$. The pipeline must either (i) smooth these decisions with declared conservative surrogates, or (ii) treat derivatives as non-credible at switching points. 2. \textbf{Degenerate eigenvalues.} When stability eigenvalues are multiple or nearly multiple, eigenvalue derivatives may be undefined or unstable, and certificate expressions requiring separation (Temple-type bounds) may be unavailable. 3. \textbf{KKT degeneracy.} Envelope-theorem gradients rely on constraint qualification and strong duality. Near degeneracy (loss of strict complementarity, near-infeasibility) can lead to unstable dual variables and unreliable sensitivities. 4. \textbf{Ill-conditioning in implicit blocks.} When $\partial_{\mathbf{U}}\mathcal{R}$ is ill-conditioned, implicit differentiation amplifies numerical error. Conditioning proxies and rejection rules are therefore part of the mathematical interface, not implementation details. XVI.C. Calibration-or-reject requirements for reduced/surrogate components Any reduced or surrogate component inserted into $\mathcal{R}$ or used to define certificates must be accompanied by: 1. A declared domain of applicability (parameter regimes, expected smoothness, and required diagnostics). 2. Intermediate observables that can falsify the surrogate on existing devices. 3. A rejection rule when those observables disagree beyond declared uncertainty bands. Without these items, the pipeline may still compute gradients and nominal metrics, but those outputs should be treated as heuristic rather than optimization-safe constraints. XVI.D. Computational cost considerations and matrix-free requirements The coupled adjoint framework requires transpose-Jacobian solves for each scalar objective/constraint. For large discretizations, explicit Jacobians are typically infeasible. Therefore: 1. Matrix-free JVP/VJP actions are essential for equilibrium, Boozer, and stability blocks. 2. Certification primitives that require additional linear solves (e.g., $M^{-1}$ in eigen residual bounds, singular-value estimates for Boozer) must be accounted for explicitly in the evaluation budget and may require adaptive invocation (only when a constraint is near active). 3. When resource limits prevent refinement required by certificates, the pipeline must return inconclusive rather than silently degrading certification rigor. Conclusion We have formulated compact stellarator design as a single constrained nonlinear program in which all downstream physics and engineering outputs are defined implicitly by a coupled residual system $\mathcal{R}(\mathbf{U},\mathbf{p})=0$. This unified view makes the end-to-end map $\mathbf{p}\mapsto \mathbf{U}(\mathbf{p})$ explicit at the level needed for rigorous differentiation and for auditable feasibility decisions. On the sensitivity side, the paper establishes an implicit-adjoint calculus for the full pipeline in the finite-dimensional discretized setting. By expressing equilibrium, Boozer solves, generalized eigenproblems, bootstrap fixed points, and convex coil subproblems as residual blocks, we obtain a single adjoint system whose block structure permits modular back-substitution: downstream adjoint information propagates upstream through transpose-Jacobian actions, and parameter gradients are assembled from VJP contractions without forming dense Jacobians. The treatment makes explicit the non-generic points at which differentiability and adjoint credibility can fail (ill-conditioning of linearized blocks, eigenvalue degeneracy, KKT degeneracy, and model-switching events), and it couples these mathematical conditions to mandatory diagnostic reporting and rejection semantics. A second principal contribution is an optimization-safe certification philosophy for inequality constraints, designed to prevent false feasibility/stability claims caused by discretization, undersampling, and solver tolerances. Concretely: 1. For Boozer-derived quasi-symmetry/ripple metrics, we provide computable upper bounds on metric error bars by combining a residual norm with a lower singular-value bound and declared truncation terms, yielding conservative constraint tightening of the form $\widehat{\mathcal{M}}+\Delta\mathcal{M}\le \varepsilon$. 2. For stability proxies represented by symmetric generalized eigenproblems, we describe auditable one-sided instability detection via Rayleigh quotients and conditional certified lower bounds on the smallest eigenvalue using residual-based enclosures supplemented by explicit separation evidence. Stability feasibility is therefore gated by the rule that no stability claim is made without a positive certificate margin. 3. For any constraint expressed as a min/max over a compact domain but evaluated on a grid (clearance, curvature, peak field, and related geometric checks), we supply a generic Lipschitz/$\delta$-net wrapper that converts sampled extrema into conservative continuous bounds, with adaptive refinement driven by certificate tightness. 4. For coil realizability, we cast coil synthesis as a parameterized convex program and treat its KKT system as part of the coupled state. This yields auditable engineering feasibility for the discretized model and provides outer sensitivities through dual variables under standard convex-optimization regularity and explicit numerical credibility checks. The paper also integrates additional physics-coupling layers into the same contract: bootstrap-current self-consistency is treated as a fixed-point residual with contraction/conditioning diagnostics that bound sensitivity amplification; an island/stochasticity risk certificate is proposed to gate the applicability of nested-surface-dependent metrics; and a prompt-loss certificate is constructed as a one-sided, Lipschitz-sampled bound for a declared reduced guiding-center model, with explicit smoothing separated from pass/fail certification. Finally, we unify numerical certification and manufacturing/parameter uncertainty into certified worst-case constraint values by combining (i) conservative numerical bounds with (ii) linearized robust inflation terms $\rho\,\|W^{-T}\nabla g\|_2$. This produces an auditable decomposition of each robust margin into nominal value, numerical error bar, and uncertainty inflation, and it leads to explicit inexact-oracle step-acceptance and refinement/rejection logic. The limitations are intrinsic to the paper’s scope: all certificates are guarantees for the declared finite-dimensional discretizations and for explicitly stated bounds (singular-value estimates, Lipschitz constants, eigenvalue separation evidence, and integration-error bounds). Where such prerequisites cannot be supplied, the correct output is inconclusive rather than optimistic. Moreover, the certification layers do not validate the physical adequacy of any proxy model (Boozer-based metrics, ideal-MHD eigenproblems, bootstrap closures, near-integrable island estimates, or reduced prompt-loss dynamics); those mappings to physical accept/reject decisions remain external verification tasks. Taken together, the implicit-adjoint formulation and the certified-constraint interfaces provide a mathematically explicit template for end-to-end, gradient-based stellarator optimization in which feasibility and improvement claims are made only when they are numerically defensible and reproducible from module-level diagnostics. The framework is therefore best viewed as an optimization-safety and auditability layer: it standardizes what must be computed, what must be certified, and when a design point must be rejected as non-credible, thereby enabling systematic external benchmarking and falsification against higher-fidelity tools and experimental data. [HARD CODED END-OF-PAPER MARK — ALL CONTENT SHOULD BE ABOVE THIS LINE] ================================================================================ MODEL CREDITS This autonomous solution attempt was generated with the Intrafere LLC AI Harness, MOTO, and the following model(s): – x-ai/grok-4 (23 API calls) – openai/gpt-5.2 (19 API calls) – z-ai/glm-5 (8 API calls) – moonshotai/kimi-k2.5 (6 API calls) Total AI Model API Calls: 56 ================================================================================

More Info