Grok Challenge #2, Integrated Water Systems (DRO-MPC)

This is the first paper and rough draft answer generated for @Grok’s 2/28/2026 challenge to MOTO ASI by Intrafere Research Group. A fully autonomous super intelligence AI harness. This paper cost around $15 to produce and took approximately two hours. As of the time of this writing, MOTO is continuing to solve Grok’s challenge with successive papers, attempting recursive improvement and solution-set diversification.

Company Website: https://intrafere.com/
Software GitHub that produced this paper: https://github.com/Intrafere/MOTO-Autonomous-ASI
Grok Global Freshwater Crisis Challenge Link: https://x.com/grok/status/2028278338381316587

Disclaimer: This is an autonomous AI solution generated with the MOTO harness. This paper was not peer reviewed and was autonomously generated without user oversight or interaction beyond the original user prompt, therefore, this text may contain errors. These papers often contain ambitious content and/or extraordinary claims, all content should be viewed with extreme scrutiny. Grok (@Grok / xAI) is unaffiliated with Intrafere. The challenge prompt was sourced from a public post and does not imply endorsement or affiliation.

(EDITOR NOTE: This single paper does not attempt to solve the user’s prompt entirely, it is meant to be one piece toward the complex solution required for the users prompt – this paper is the first paper in the series – total solutions typically are achieved in later papers). Metadata and original autonomous prompt available below.
OUTLINE
================================================================================

Abstract

I. Introduction
A. Renewable-powered integrated water systems (IWS): coupled mass, energy, pressure, and quality constraints
B. Motivation: nonstationary uncertainty (demand/inflows/renewables), reliability mandates, and multi-objective cost–reliability–emissions tradeoffs
C. Problem statement: multi-objective Wasserstein distributionally robust MPC (DRO-MPC) for IWS with tractable reformulations and closed-loop guarantees
D. Claimed contributions to be verified (from internal database):
1. Wasserstein chance/CVaR DRO in receding-horizon control for IWS
2. Recursive feasibility and ISS via tube-MPC and terminal robust control invariant (RCI) sets
3. Nonstationarity handling via conformal-calibrated drifting Wasserstein radii $\kappa_t$
4. Tractable conic (SOCP/power-cone/exponential-cone/SDP) reformulations for hydraulics, quality, and risk
5. Scalable/high-dimensional alternatives (MMD-DRO, sliced/MaxSW-DRO, quantized coresets)
6. Extensions: KL-DRO, scenario optimization, DRMDP/DRDP, bilevel/coalitional/decentralized, privacy, sensor placement
E. Roadmap of the paper and explicit note that all “evidence” modules are unverified and must be independently checked

II. Integrated Water System Modeling for MPC
A. Networked state-space model interface (storages + inventories)
1. State vector: reservoirs/tanks/aquifer memory/head proxies; optional battery SOC coupling
2. Control vector: desal/reuse/recharge/extraction/pumping/grid import; possible on/off binaries
3. Disturbances: demand, inflow/climate, renewable availability; optional prices/carbon intensity
B. Constraints typical for IWS
1. Mass balance and storage bounds
2. Energy balance and renewable coupling to pumping/desalination
3. Hydraulic pressure feasibility constraints (nodal head lower bounds)
4. Ecosystem flow minimum-release constraints
5. Water-quality constraints (e.g., salinity/TDS) and blending/mixing relations
C. Cost and performance objectives for multi-objective control
1. Opex/energy cost and procurement
2. Emissions/carbon-weighted objective and emissions budgets
3. Reliability and service metrics (shortage, pressure violations, ecosystem deficits)
4. Tail-risk objectives (CVaR-type and “catastrophic” risk metrics)

III. Uncertainty Models and Ambiguity Sets
A. Empirical distribution(s) from data streams and ensembles
1. Single empirical law $\hat{\mathbb{P}}_N$ for trajectory or stagewise samples
2. Multiple ensemble sources $\{\hat{\mathbb{P}}_N^k\}_{k=1}^K$ (e.g., forecast/GCM ensembles)
B. Wasserstein ambiguity sets
1. Definition of Wasserstein ball and choice of ground metric/norm
2. Stagewise vs trajectory-wise ambiguity (flag: time-consistency claims must be checked)
3. Conditional/nested ambiguity (nodewise kernels) and causal constraints (flag: requires careful formalism)
C. Alternative ambiguity/risk models (included for comparison and extensions)
1. KL/relative-entropy (entropic) ambiguity and exponential-cone representability
2. Moment-based (mean–covariance) uncertainty with data-driven confidence sets (SOC/SDP tractable)
3. Stable/heavy-tailed ambiguity via characteristic-function constraints (flag: verify tractable embeddings)
4. Scenario optimization / scenario MPC for distribution-free guarantees
5. Martingale-based safety calibration / anytime-valid bounds (flag: verify required assumptions)

IV. Multi-Objective Formulations and Pareto Set Construction
A. Multi-objective problem statement in receding horizon
1. Weighted-sum scalarization and supported Pareto points
2. $\varepsilon$-constraint method (emissions caps, reliability budgets, quality bounds)
B. Pareto-front computation with certificates (as claimed)
1. Accuracy notions for Pareto approximation (gap/coverage metrics)
2. Continuation/parametric solves and warm-start strategies
3. Differentiable/implicit-gradient conic programming approach for fast Pareto tracing (flag: regularity conditions must be verified)

V. Wasserstein DRO Building Blocks for MPC
A. Distributionally robust objectives
1. Worst-case expected stage cost over Wasserstein ball
2. Worst-case CVaR / tail-risk penalties; “Sinkhorn-CVaR” style regularizations (flag: must define precisely)
B. Distributionally robust constraints
1. Wasserstein DRO chance constraints for shortages/pressure/ecosystem constraints
2. Joint chance constraints and risk allocation (Boole-type decompositions)
3. Reformulations for linear/convex constraint functions (Esfahani–Kuhn-type duality; flag: ensure assumptions hold)
C. Dependence and correlation handling
1. Joint uncertainties $(Dem_t, Ren_t)$ and claimed comonotonic worst-case dependence reformulation
2. Sensitivity to correlation structure; rearrangement inequality arguments (flag: verify applicability)
D. Ensemble-aware Wasserstein DRO via barycenters
1. Wasserstein barycenter ambiguity built from multiple ensembles
2. Claimed tractable 1D OT barycenter/SOCP reductions (flag: specify when dimension reduction is valid)

VI. Tractable Physical Modeling Layers Compatible with DRO-MPC
A. Convex hydraulics and pressure feasibility
1. Power-cone / SOCP relaxations of head-loss and pump power relations
2. Exactness theorem claims for radial (tree) or fixed-direction operation regimes (flag: provide precise conditions)
3. Integration of conic hydraulics into DRO chance constraints and MPC
B. Forchheimer aquifer head-loss modeling
1. Quadratic head-loss $h=a q + b q^2$ and power-cone reformulation
2. Exactness under energy-monotone objectives (flag: define monotonicity and prove)
3. Embedding into MI-SOCP when discrete decisions are present
C. Water-quality constraints
1. Auditable “worst-case blend” certificates (LP/SOCP forms) for mixing constraints
2. Optional distributional reliability for uncertain concentrations (violation functions and DRO wrapping)
D. Nonlinear water-quality transport (ADR) and polynomial approximations
1. Polynomial ADR dynamics and sum-of-squares (SOS) viability/RCI certificates
2. SDP hierarchy, convergence discussions, and computational scaling
E. Desalination and treatment process models (RO) compatible with conic DRO
1. Reverse osmosis (RO) specific energy and throughput constraints as convex surrogates in flow, recovery ratio, and operating pressure
2. Osmotic-pressure and temperature/salinity dependence: exponential-cone representable log/exp surrogates and conservative piecewise-convex envelopes
3. Coupling to renewable power limits, battery buffering, and grid import; treatment/desalination on/off logic (MI-conic interface)
4. Uncertain feed salinity/temperature and source-water quality: affine-in-uncertainty templates for Wasserstein DRO wrapping of energy and product-quality constraints
VII. DRO-MPC Architectures and Closed-Loop Guarantees
A. Tube-based distributionally robust MPC (DR-TBMPC)
1. Nominal + error dynamics, disturbance sets, and tube tightening
2. Probabilistic recursive feasibility claims under Wasserstein DRO chance constraints
3. Input-to-state stability (ISS) claims and required Lyapunov arguments (flag: specify exact assumptions)
B. Terminal ingredients: robust control invariant sets under ambiguity
1. Maximal RCI set definitions and Pre-operator under uncertainty
2. Zonotopic fixed-point iterations for RCI under Wasserstein-inflated disturbances
3. Volume/generator certificates and out-of-sample inclusion claims (flag: define the statistical guarantee carefully)
C. Robust viability + risk constraints
1. Viability kernel viewpoint for constraint satisfaction under uncertainty
2. Integration with terminal sets for recursive feasibility
D. Conformal-calibrated drifting Wasserstein DRO-MPC (CC-DRO-MPC)
1. Online residuals and calibration; shrinking/adjusting $\kappa_t$
2. Marginal coverage guarantees vs joint/horizon-wise guarantees
3. Recursive feasibility with time-varying ambiguity (flag: must reconcile calibration with MPC dependence)

VIII. Time Consistency and Multistage Extensions Beyond Standard MPC
A. Distributionally robust MDP (DRMDP) formulation
1. Ambiguity on transition kernels; state augmentation with regimes
2. Constrained DRMDPs and Pareto front via sweeping budgets (emissions/reliability)
3. Implementation recipe: discretized regimes + local Wasserstein radii $\varepsilon(s,a)$
B. Nested entropic distributionally robust dynamic programming (NE-DRDP)
1. Entropic/nested risk measures and exponential-cone Bellman backups
2. Claimed contraction/value iteration convergence (flag: verify conditions)
C. Martingale optimal transport (MOT) / martingale-calibration DRO
1. Forecast-consistent constraints and temporal dependence modeling
2. Integration hooks with MPC/SDDP (flag: formalize feasibility and complexity)
D. Distributionally robust SDDP / multistage convex programming interface
1. Stagewise DRO inner sup problems and dualization leading to LP/conic subproblems
2. Multi-objective studies via weight sweeps in multistage setting

IX. Scalability and High-Dimensional Uncertainty
A. Kernel-based MMD-DRO
1. RKHS ambiguity and kernel ridge regression (KRR) dual SDP approximations
2. Out-of-sample violation bounds with approximation error terms (flag: check rate derivations)
3. Distributed MPC via kernel locality (flag: specify decomposition assumptions)
B. Sliced/Max-Sliced Wasserstein DRO (MaxSW-DRO)
1. Projection to 1D transports and semidefinite relaxations
2. Claimed complexity and approximation guarantees vs full Wasserstein (flag: define the precise guarantee)
C. Optimal-transport quantization / coreset construction
1. Replace $N$ samples by $M\ll N$ support points with radius adjustment via quantization error
2. Inner/outer feasibility bracketing with $\kappa’$, $\kappa”$
3. Nodewise/stagewise extensions for scenario trees

X. Alternative Reliability Certificates (Comparative/Hybrid Layer)
A. Scenario optimization and scenario MPC
1. Finite-sample distribution-free guarantees and union-bound accounting over time
2. Risk allocation across multiple constraint classes (service, ecosystem, pressure, energy)
B. Moment-based DRO chance constraints with unimodality enhancement
1. Mean–covariance sets and unimodality assumptions yielding tighter bounds (flag: verify unimodality requirement)
C. Extreme-value / catastrophic tail-risk module
1. Peaks-over-threshold model for compound deficits and tail-index guarantees (flag: ensure integration with MPC is coherent)
2. Comparison against Wasserstein and conformal guarantees for tail events
D. KL-DRO / entropic-risk alternative robustness layer
1. Relative-entropy DRO reformulations and stress reweighting interpretation
2. Time-consistent multistage control implications (flag: clarify relationship to NE-DRDP)

XI. Decentralization, Strategic Interaction, and Data Governance (Extensions)
A. Multi-district decentralized operation under shared aquifers
1. Distributionally robust generalized Nash equilibrium (DR-GNE) sketch and local Wasserstein balls
2. ADMM-DRO style decomposition (flag: convergence under ambiguity must be verified)
B. Coalition formation and distributionally robust core allocations
1. Hedonic coalition formation game, stability notions, Shapley-value fairness claims
2. VI reformulations for equilibrium computation
C. Distributionally robust bilevel optimization (DRBO) for planning + operations
1. Upper-level capacity decisions anticipating lower-level DRO operations
2. Tractability considerations and KKT/dual-based single-level reductions (flag: nonconvexities)
D. Adjustable robust / two-stage DRO-ARO with column-and-constraint generation
1. Anticipative capacity sizing with nonanticipative operational recourse
2. VSS-type gap metrics between DRO-ARO and nominal/RO baselines (flag: define precisely)
E. Differential privacy in federated WDRO
1. DP-WDRO mechanism sketch and privacy–robustness tradeoffs
2. SOCP reformulations with DP-calibrated noise (flag: ensure statistical meaning of ambiguity under DP)

XII. Sensor Placement and Estimation as Robustness Enablers (Extensions)
A. Fisher-information-aware sensor placement
1. Claimed link between Fisher information, uncertainty reduction, and Wasserstein conservatism
2. Submodular maximization formulation and (1−1/e) approximation guarantee (flag: verify submodularity proof)
B. Distributionally robust moving-horizon estimation (DR-MHE) interface (if used)
1. Role of state estimation (aquifer head/storage, pressures) in closed-loop guarantees
2. Coupling DR-MHE coverage to MPC tightening (flag: quantify rigorously)

XIII. Theorems, Proof Roadmap, and Verification Checklist
A. Precise theorem statements to be established in the paper (subject to verification)
1. Strong duality and tractable reformulation for Wasserstein DRO chance constraints in the IWS model class
2. Recursive feasibility of DRO-MPC with terminal RCI set under specified disturbance/ambiguity assumptions
3. ISS (or alternative stability concept) under tube-based policy and Lipschitz dynamics assumptions
4. Exactness conditions for conic hydraulics and Forchheimer power-cone reformulations
5. Validity of conformal-calibrated $\kappa_t$ update for marginal (and limits for joint) coverage
6. Complexity/scaling statements for MMD-DRO, MaxSW-DRO, and quantized coresets
B. Proof strategy map
1. Dualization steps and interchange of $\sup$/$\inf$ (flag: justify measurability/compactness)
2. Set invariance arguments (RCI, viability) and tube tightening derivations
3. Conic modeling equivalences and exactness proofs
4. Statistical generalization bounds: what is actually provable vs heuristic
C. Explicit gaps/risks requiring independent validation
1. Time-consistency claims for trajectory-wise Wasserstein vs kernel-wise/nested formulations
2. Coverage/violation guarantees under dependence across MPC steps
3. Heavy-tail characteristic-function DRO tractability and correctness
4. Unimodality assumptions realism and how to test/relax them
XIV. Algorithmic Summary, Tuning, and Implementation Details
A. End-to-end CC-WDRO-MPC loop
1. Data ingestion and scenario/trajectory construction
2. Ambiguity-set selection (stagewise vs trajectory-wise) and metric scaling
3. Solving the multi-objective scalarization (weighted sum or $\varepsilon$-constraints)
4. Applying $u_{t|t}$, updating state estimates, and rolling the horizon
B. Practical tuning knobs with explicit mathematical meaning
1. Choosing $p$, norm $\|\cdot\|$, and feature scaling for $W_p$
2. Risk parameters: CVaR level $\alpha$, chance targets $\varepsilon$, and risk-allocation $\varepsilon_j$
3. Horizon length $H$, terminal-set selection, and tube-tightening margins
4. Updating and calibrating drifting radii $\kappa_t$: operational rules and failure modes
C. Solver and modeling stack
1. Conic representations used (LP/SOCP/power cone/exponential cone/SDP) and when MI-conic arises
2. Warm-starting across MPC steps and across Pareto-parameter sweeps
3. Numerical conditioning issues (unit scaling, degeneracy, and dual infeasibility detection)

XV. Numerical Experiments and Case Studies (for falsification and stress-testing)
A. Benchmark setups for renewable-powered IWS
1. Test networks (small radial, medium weakly meshed, and aggregated multi-district)
2. Disturbance models: demand/inflow/renewables with nonstationarity and correlation
3. Quality and hydraulics toggles: mixing-only vs ADR surrogate; conic hydraulics vs simplified flow limits
B. Evaluation protocol
1. Out-of-sample reliability metrics: violation probability, tail metrics, and worst-case stress tests
2. Closed-loop performance: cost, emissions, and service quality under realized trajectories
3. Pareto-front reporting: supported tradeoffs, $\varepsilon$-budget tradeoffs, and sensitivity to weights
C. Ablations comparing robustness layers
1. Wasserstein DRO vs scenario MPC vs moment-based vs KL/entropic alternatives
2. Effect of drifting $\kappa_t$ calibration vs fixed-radius baselines
3. High-dimensional approximations: MMD-DRO, MaxSW-DRO, and quantized coresets
D. Computational scaling and runtimes
1. Scaling in $N$ (samples), $d$ (uncertainty dimension), $H$ (horizon), and network size
2. Impact of mixed-integer decisions and conic hydraulics on solve time

XVI. Optional Alternative Modeling and Stability Perspectives (non-essential extensions)
A. Koopman lifting for nonlinear water-quality/hydraulics surrogates in DRO-MPC
1. Lifted linear predictor construction and interface with conic DRO constraints
2. Comparison to polynomial/SOS surrogates for invariance and feasibility certification
B. Port-Hamiltonian/passivity viewpoint for water-energy coupling
1. Energy-based storage functions as stability certificates
2. Relationship to terminal-cost/terminal-set MPC conditions
C. Online convex optimization (OCO) viewpoint for time-varying multi-objective objectives
1. Regret-style performance metrics and their relation to receding-horizon control
2. When OCO-style guarantees complement (but do not replace) DRO feasibility certificates
Conclusion
A. Summary of the proposed multi-objective Wasserstein DRO-MPC framework for renewable-powered IWS
B. Key mathematical takeaways: tractability, closed-loop feasibility/stability, and Pareto tradeoff computation
C. Practical implications and limitations (nonstationarity, dependence, high-dimensionality, mixed-integer physics)
D. Future directions: verified benchmarks, tighter time-consistent guarantees, scalable distributed implementations, and formal integration of privacy/sensing with DRO-MPC

[HARD CODED BRACKETED DESIGNATION THAT SHOWS END-OF-PAPER DESIGNATION MARK]
[HARD CODED END-OF-OUTLINE MARK — ALL OUTLINE CONTENT SHOULD BE ABOVE THIS LINE]

================================================================================

================================================================================
AUTONOMOUS AI SOLUTION

User’s Research Prompt: Solve the global freshwater scarcity crisis entirely by pioneering breakthrough STEM innovations that deliver clean abundant water sustainably to all humans and ecosystems.

Paper Title: Multi-Objective Wasserstein Distributionally Robust MPC for Renewable-Powered Integrated Water Systems

AI Model Authors: x-ai/grok-4.1-fast, openai/gpt-5.2, moonshotai/kimi-k2.5

Generated: 2026-03-01
================================================================================

Abstract

Renewable-powered integrated water systems couple storage and conveyance dynamics with hydraulic pressure feasibility, treatment/desalination, and power balance constraints, while operating under nonstationary uncertainty in demands, inflows, and renewable supply and under competing cost–emissions–reliability objectives. This paper develops a modular mathematical blueprint for multi-objective model predictive control under distributional ambiguity, using Wasserstein distributionally robust optimization (WDRO) as the primary uncertainty model and conic reformulations to retain computational tractability.

The framework (i) specifies a control-oriented IWS state/control/disturbance interface and multi-objective scalarizations that yield Pareto-efficient solutions under standard convexity assumptions; (ii) uses dual representations of worst-case expectations over Wasserstein balls (including a Lipschitz simplification for $W_1$) to robustify costs and convex risk surrogates; and (iii) enforces reliability conservatively by replacing chance constraints with distribution-free CVaR/hinge-loss constraints and then applying WDRO to the resulting convex expectations. To align optimization with physics, we provide conic-compatible layers for head-loss and pump envelopes, Forchheimer-type groundwater losses, and mixing-based water-quality constraints, including conditional tightness results when slack-generating variables are strictly penalized.

For closed-loop operation, we present a tube-based DR-MPC architecture that couples WDRO planning with deterministic tightening and terminal invariance, yielding conditional recursive feasibility and an ISS-style stability template under standard invariance and Lyapunov assumptions. Extensions address nonstationarity via conformal-calibrated drifting robustness parameters, time-consistent multistage control via rectangular DRMDPs, and scalability via MMD ambiguity, sliced transport surrogates, and OT-coreset radius bracketing. Throughout, the paper emphasizes auditability by separating proved statements from assumption-dependent interfaces and providing an explicit verification checklist for deployment.

I. Introduction

Renewable-powered integrated water systems (IWS) couple water inventories (reservoirs, tanks, aquifers), hydraulic feasibility (pressures/heads), treatment and desalination processes, and electrical power balances. In such systems, uncertainty is both unavoidable and operationally consequential: short-term demand and renewable variability drive real-time feasibility (e.g., shortages, pressure violations), while longer-term inflow and climate uncertainty drives storage and aquifer banking decisions. At the same time, operators face competing objectives—operating cost, emissions, and reliability—whose tradeoffs must be made explicit and auditable.

This paper develops a modular mathematical framework for multi-objective model predictive control (MPC) of IWS under distributional ambiguity, with Wasserstein distributionally robust optimization (WDRO) as the primary uncertainty model. The emphasis is not on a single monolithic theorem, but on assembling a rigorous, checkable pipeline in which (i) the IWS physics are expressed in convex (often conic) form, (ii) uncertainty enters through ambiguity sets around data-driven empirical distributions, and (iii) reliability requirements are enforced through tractable robust risk surrogates. Because the broader research context for this project is AI-generated and unverified, the paper is written with explicit conditional statements, caveats, and a verification checklist: results are stated with their assumptions exposed, and no statistical or closed-loop guarantee is asserted without identifying what must be checked in an instantiated deployment.

Main ingredients and results. The core mathematical components established in the body are as follows.

1) Control-oriented IWS model and objectives (Section II). We define a discrete-time state/control/disturbance interface for storages, optional quality and aquifer memory states, and optional energy storage. We formalize constraint classes that represent mass balance, energy balance with renewable coupling, pressure/head feasibility, ecosystem releases, and mixing-based water-quality constraints, together with multi-objective performance channels (cost, emissions, and reliability/tail-risk penalties).

2) Data-driven ambiguity sets (Section III). We define stagewise and trajectory-wise empirical distributions for disturbances and build Wasserstein balls around them, clarifying the modeling tradeoffs between per-stage and horizon-vector ambiguity. We also record alternative ambiguity models (KL, moment-based, scenario methods) as comparative modules rather than assumed substitutes.

3) Multi-objective MPC and Pareto tradeoffs (Section IV). We state weighted-sum and $\varepsilon$-constraint scalarizations for receding-horizon optimization and provide standard convexity-based sufficient conditions under which solutions are Pareto efficient. We describe practical Pareto-front approximation procedures compatible with conic reformulations.

4) WDRO building blocks for tractable MPC (Section V). We use a dual representation of worst-case expectations over Wasserstein balls centered at discrete empirical measures (Theorem 5.1) and record the Lipschitz simplification for $W_1$ (Corollary 5.2). For reliability constraints, we do not claim exact WDRO chance-constraint reformulations; instead, we adopt a conservative and distribution-free route from chance constraints to CVaR constraints and hinge-loss expectations, which can then be robustified via Wasserstein duality.

5) Conic-compatible physical layers (Section VI). To keep the WDRO reformulations solver-ready, we provide convex (conic-representable) modeling layers for hydraulics and groundwater head loss (SOCP/power-cone epigraphs and envelopes) and for water quality (linear mixing constraints and a box-robust “worst-case blend” reduction). We also prove conditional tightness statements for certain inequality relaxations when the relevant “slack-generating” variables are strictly penalized and otherwise separable (e.g., Propositions 6.1 and 6.3).

6) Closed-loop architectures and conditional guarantees (Section VII). We present a tube-based DR-MPC template that separates (a) distributional robustness for service constraints and costs (handled via WDRO/CVaR on nominal plans) from (b) deterministic robustness for bounded modeling/estimation error (handled via tube tightening and terminal sets). Under standard tube invariance and terminal invariance assumptions, we state a recursive feasibility result (Theorem 7.4) and an ISS-type stability template (Proposition 7.8), explicitly noting what must be verified when robustified costs are nonsmooth. We also describe a conformal-calibrated mechanism for drifting ambiguity/tightening parameters, while carefully distinguishing marginal (per-step) coverage from joint (horizon-wise) guarantees.

7) Time consistency, scalability, and comparative certificates (Sections VIII–X). We describe time-consistent multistage extensions via rectangular ambiguity in DRMDPs (with a contraction property in the discounted case), and scalable alternatives for high-dimensional uncertainty (MMD ambiguity, sliced approximations, and OT coresets with radius inflation/deflation inclusions). We also present scenario optimization, moment-based bounds, and KL-DRO as alternative reliability layers, with assumptions and limitations stated explicitly.

8) Extensions and auditability (Sections XI–XVI, XIII). We record modular interfaces for decentralization and data governance, sensing and estimation, and optional modeling/stability perspectives. Section XIII consolidates what is proved versus what is only an interface, and provides a checklist of assumptions that must be validated before interpreting the framework as a real-world reliability guarantee.

Roadmap. Section II introduces the control-oriented IWS model and objective channels. Sections III–V define ambiguity sets and WDRO primitives, and Section IV formalizes multi-objective Pareto constructions. Section VI provides conic physical modeling layers that make the WDRO-MPC subproblems tractable. Section VII presents the closed-loop DR-MPC architecture and conditional feasibility/stability statements, including a conservative treatment of nonstationarity via drifting parameters. Sections VIII–X extend the framework to time-consistent multistage formulations, scalable approximations, and comparative reliability certificates. Sections XI–XVI record extensions (decentralization, privacy, sensing/estimation, and alternative viewpoints). Section XIII serves as a proof ledger and verification checklist, and Section XV specifies a falsifiable numerical evaluation protocol.

Scope note. The paper is intended as a mathematical blueprint that is explicit about what is guaranteed, what is conditional, and what remains a modeling choice. In particular, (i) Wasserstein/CVaR constraints are used as convex surrogates for chance constraints rather than claimed to be exact, and (ii) any statistical calibration of ambiguity radii or conformal sets is treated as assumption-dependent and must be validated for the intended data-generating conditions and dependence structure.

II. Integrated Water System Modeling for MPC

This section specifies a discrete-time control-oriented model of an integrated water system (IWS) suitable for receding-horizon optimization under uncertainty. The purpose is to define (i) a state-control-disturbance interface, (ii) constraint classes that capture mass, energy, hydraulics, ecosystem flows, and quality, and (iii) a menu of stage costs for multi-objective control. The modeling choices here are intentionally modular: later sections will wrap these constraints and objectives with distributional robustness and tractable convex relaxations.

II.A. Networked state-space model interface (storages + inventories)

We work on a discrete time grid t = 0,1,2,\ldots with sampling period \Delta t > 0. Let the water network be a directed graph \mathcal{G} = (\mathcal{N},\mathcal{E}), where \mathcal{N} indexes junctions, tanks, reservoirs, demand nodes, and treatment/desalination facilities, and \mathcal{E} indexes conveyance links (pipes, canals) and controlled links (pumps, valves). Direction is chosen for modeling convenience; actual flow direction may be enforced (e.g., by sign constraints) or handled by later convexification/outer iterations.

State. Let x_t \in \mathbb{R}^{n_x} collect inventory-like states. A typical decomposition is

x_t := \begin{bmatrix} s_t \\ a_t \\ b_t \end{bmatrix},

where:

1) s_t \in \mathbb{R}^{n_s} are water storages (surface reservoirs, tanks, aquifer storage proxies). The components are interpreted as volumes (or equivalent heads/levels) subject to bounds.

2) a_t \in \mathbb{R}^{n_a} are auxiliary memory states that represent slow hydrologic/quality dynamics (e.g., aquifer head proxy, delayed recharge response, or linear surrogate states used to approximate transport).

3) b_t \in \mathbb{R}^{n_b} is an optional electrical storage state (e.g., battery state-of-charge) when co-optimizing water and energy buffers.

Control. Let u_t \in \mathbb{R}^{n_u} denote manipulated decisions. For concreteness, we stack representative water-flow and power variables as
\[\nu_t := \begin{bmatrix} q_t^{\mathrm{prod}} \\ q_t^{\mathrm{pump}} \\ q_t^{\mathrm{treat}} \\ q_t^{\mathrm{recharge}} \\ q_t^{\mathrm{extract}} \\ p_t^{\mathrm{grid}} \\ p_t^{\mathrm{ch}} \\ p_t^{\mathrm{dis}} \end{bmatrix},
\]
so that u_t \equiv \nu_t in this notation.

Here q-variables are water flows (m$^3$/s) aggregated by asset class, and p-variables are power/energy decisions (kW or kWh per period). Some assets may have discrete on/off modes; those are represented by additional binary variables \delta_t \in \{0,1\}^{n_\delta} when needed, yielding a mixed-integer model.

Disturbance / uncertainty. Let w_t \in \mathbb{R}^{n_w} collect exogenous quantities such as demands, inflows, renewable availability, electricity prices, and carbon intensity. A generic decomposition is

w_t := \begin{bmatrix} d_t \\ i_t \\ r_t \\ \pi_t \\ \chi_t \\ c_t^{\mathrm{src}} \end{bmatrix},

where d_t are nodal demands, i_t inflows (catchment/runoff/import), r_t renewable power availability, \pi_t electricity prices, \chi_t carbon intensity (kgCO$_2$e/kWh), and c_t^{\mathrm{src}} source water quality parameters (e.g., salinity at intakes).

Dynamics. We assume the system admits a (possibly nonlinear) controlled evolution of the form

x_{t+1} = f(x_t,u_t,w_t), \qquad x_t \in \mathcal{X},\; u_t \in \mathcal{U}(x_t),\; w_t \in \mathcal{W}.

For many MPC constructions it is useful to isolate the storage-like components and write affine conservation with additive disturbances:

s_{t+1} = s_t + \Delta t\, \big( M u_t + N w_t + b \big),

where M and N are incidence/aggregation matrices consistent with mass balance conventions and b can encode known baseline fluxes (e.g., mandated releases treated as negative supply). Auxiliary states a_t and battery state b_t can be appended similarly:

a_{t+1} = f_a(a_t, s_t, u_t, w_t), \qquad b_{t+1} = b_t + \eta^{\mathrm{ch}} p_t^{\mathrm{ch}} – \frac{1}{\eta^{\mathrm{dis}}} p_t^{\mathrm{dis}},

with efficiencies \eta^{\mathrm{ch}},\eta^{\mathrm{dis}} \in (0,1]. When dynamics are nonlinear, later tractable layers may rely on convex restrictions/relaxations or local linearizations; the present section only declares the interface.

Remark (trajectory variables). Over an MPC horizon H, define predicted sequences

\mathbf{x} := (x_{t|t},\ldots,x_{t+H|t}),\quad \mathbf{u} := (u_{t|t},\ldots,u_{t+H-1|t}),\quad \mathbf{w} := (w_{t|t},\ldots,w_{t+H-1|t}),

with x_{t|t} = x_t measured/estimated.

II.B. Constraints typical for IWS

Constraints fall into (i) deterministic physical constraints and (ii) service/reliability constraints that will later be enforced robustly or distributionally robustly.

II.B.1. Mass balance and storage bounds

For storage components s_t (volumes), impose bounds

\\underline{s} \le s_t \le \overline{s} \qquad \text{(componentwise)}.

For each node n \in \mathcal{N}, a nodal conservation constraint can be written using an incidence matrix B \in \mathbb{R}^{|\mathcal{N}|\times |\mathcal{E}|} and edge flows q_t \in \mathbb{R}^{|\mathcal{E}|}:

B q_t + e_t^{\mathrm{inj}}(u_t,w_t) – e_t^{\mathrm{dem}}(w_t) + e_t^{\mathrm{stor}}(x_t,x_{t+1}) = 0.

Here e^{\mathrm{inj}} models controllable injections (production, transfers), e^{\mathrm{dem}} demands, and e^{\mathrm{stor}} the net withdrawal/deposit due to storage changes. In aggregated models, this reduces to the affine storage dynamics given in II.A.

Shortage and spillage slack variables. Reliability constraints are often expressed through nonnegative slacks:

\mathrm{short}_t \ge 0,\quad \mathrm{spill}_t \ge 0,

and modified balance equations, for example at a demand aggregate:

\text{delivered}_t + \mathrm{short}_t = \text{demand}_t.

These slacks will later be penalized or constrained via chance/CVaR-type conditions.

II.B.2. Energy balance and renewable coupling to pumping/desalination

Let P_t^{\mathrm{ren}}(w_t) denote available renewable power at time t (possibly curtailable) and let P_t^{\mathrm{grid}} \ge 0 denote grid import. Let P_t^{\mathrm{load}} be total electrical load from pumps, desalination, treatment, and auxiliaries. A typical per-period power balance is

P_t^{\mathrm{ren}}(w_t) + P_t^{\mathrm{grid}} + P_t^{\mathrm{dis}} = P_t^{\mathrm{load}}(x_t,u_t,w_t) + P_t^{\mathrm{ch}},

with bounds

0 \le P_t^{\mathrm{grid}} \le \overline{P}^{\mathrm{grid}},\quad 0 \le P_t^{\mathrm{ch}} \le \overline{P}^{\mathrm{ch}},\quad 0 \le P_t^{\mathrm{dis}} \le \overline{P}^{\mathrm{dis}}.

The mapping P_t^{\mathrm{load}} depends on hydraulic and process physics. For instance, pumping load may be modeled (at varying fidelity) as proportional to flow times head increase:

P_t^{\mathrm{pump}} \approx \sum_{e\in\mathcal{E}_{\mathrm{pump}}} \frac{\rho g}{\eta_e}\, q_{e,t} \Delta h_{e,t},

with density \rho, gravity g, and pump efficiency \eta_e. Desalination and treatment loads can be included as convex functions of throughput or as piecewise-linear envelopes.

II.B.3. Hydraulic pressure feasibility constraints (nodal head lower bounds)

Hydraulic feasibility is captured by nodal head (pressure) variables h_{n,t} (meters of head) and link head-loss constraints. A common feasibility requirement is a minimum head at critical nodes:

h_{n,t} \ge \\underline{h}_n \qquad \forall n \in \mathcal{N}_{\mathrm{crit}}.

Link relations depend on pipe/friction models (e.g., Darcy-Weisbach, Hazen-Williams) and are generally nonconvex if written exactly. We record a generic form for each edge e=(i\to j):

h_{i,t} – h_{j,t} = \ell_e(q_{e,t}) – \Delta h^{\mathrm{pump}}_{e,t},

where \ell_e(\cdot) is a monotone increasing head-loss function (often approximately polynomial in |q|), and \Delta h^{\mathrm{pump}}_{e,t} \ge 0 is controllable head gain for pumps. Section VI will introduce conic relaxations/approximations under which these constraints become compatible with conic DRO reformulations.

II.B.4. Ecosystem flow minimum-release constraints

Environmental regulations often impose minimum releases from reservoirs or minimum flows in specific reaches. Let \mathcal{E}_{\mathrm{eco}} \subseteq \mathcal{E} index regulated release links. A typical constraint is

q_{e,t} \ge \\underline{q}^{\mathrm{eco}}_{e,t}(w_t) \qquad \forall e \in \mathcal{E}_{\mathrm{eco}},

where \\underline{q}^{\mathrm{eco}}_{e,t} may be deterministic (seasonal schedule) or uncertain (e.g., tied to ecological indicators).

II.B.5. Water-quality constraints (e.g., salinity/TDS) and blending/mixing relations

A control-relevant quality layer can be formulated via conservative mixing constraints at blending nodes. Consider a blending node k receiving inflows from sources \mathcal{I}_k with flows f_{i,t} \ge 0 and source concentrations c_{i,t} (mg/L). The delivered concentration is

c_{k,t}^{\mathrm{out}} = \frac{\sum_{i\in\mathcal{I}_k} c_{i,t} f_{i,t}}{\sum_{i\in\mathcal{I}_k} f_{i,t}} \quad \text{when } \sum_i f_{i,t} > 0.

A standard quality requirement is c_{k,t}^{\mathrm{out}} \le \overline{c}_{k,t}. Cross-multiplication yields the bilinear inequality

\sum_{i\in\mathcal{I}_k} (c_{i,t} – \overline{c}_{k,t}) f_{i,t} \le 0.

When c_{i,t} are known parameters, this is linear in f. When c_{i,t} are uncertain (part of w_t), it becomes an uncertain inequality suitable for robust/DRO wrapping. More detailed transport dynamics (advection-diffusion-reaction) can be appended to x_t via auxiliary states a_t; Section VI.D outlines polynomial/SOS surrogates for such cases.

II.C. Cost and performance objectives for multi-objective control

We define multiple performance channels that can be scalarized or constrained to form a multi-objective MPC problem.

II.C.1. Opex/energy cost and procurement

Let \ell^{\mathrm{cost}}(x_t,u_t,w_t) denote operational cost (e.g., electricity procurement, chemicals, maintenance proxies). A common form is

\ell^{\mathrm{cost}}(x_t,u_t,w_t) = \pi_t\, P_t^{\mathrm{grid}} \Delta t + c^{\mathrm{chem}}\, q_t^{\mathrm{treat}}\Delta t + c^{\mathrm{start}}\,\|\delta_t-\delta_{t-1}\|_1,

where \pi_t is the electricity price in w_t.

II.C.2. Emissions / carbon-weighted objective and emissions budgets

If \chi_t (kgCO$_2$e/kWh) is the grid carbon intensity, a direct emissions proxy is

\ell^{\mathrm{CO2}}(x_t,u_t,w_t) = \chi_t\, P_t^{\mathrm{grid}} \Delta t.

One may either penalize emissions via a weight \lambda_{\mathrm{CO2}} in the objective, or impose a budget constraint over the horizon.

II.C.3. Reliability and service metrics (shortage, pressure violations, ecosystem deficits)

Service reliability is captured through nonnegative violation variables, for example

\mathrm{short}_t \ge 0,\qquad \mathrm{pressviol}_t := \sum_{n\in\mathcal{N}_{\mathrm{crit}}} (\\underline{h}_n – h_{n,t})_+,

\mathrm{ecodef}_t := \sum_{e\in\mathcal{E}_{\mathrm{eco}}} (\\underline{q}^{\mathrm{eco}}_{e,t} – q_{e,t})_+.

These can be penalized (soft constraints) or controlled through risk constraints (chance constraints, CVaR bounds) in later sections.

II.C.4. Tail-risk objectives (CVaR-type and catastrophic risk metrics)

Let g(x_t,u_t,w_t) denote a scalar “violation cost” (e.g., total shortage or max of several deficits). Tail risk can be expressed via conditional value-at-risk (CVaR). For a random variable Z, CVaR at level \alpha \in (0,1) is

\mathrm{CVaR}_\alpha(Z) := \inf_{\eta\in\mathbb{R}} \left\{ \eta + \frac{1}{1-\alpha}\, \mathbb{E}\big[(Z-\eta)_+\big] \right\}.

In an MPC setting, Z may represent a stagewise violation g(x_{t+\tau|t},u_{t+\tau|t},w_{t+\tau|t}) or an aggregate (e.g., maximum shortage over the horizon). Section V will specify distributionally robust versions of these quantities under Wasserstein ambiguity.

Summary of the control layer. Over horizon H, an MPC subproblem will minimize a scalarization of the multi-objective performance (cost, emissions, and risk) subject to dynamics and constraints above, with uncertainty entering through w. The remainder of the paper focuses on (i) data-driven ambiguity sets for w, (ii) tractable distributionally robust reformulations of objectives and constraints, and (iii) closed-loop architectures that maintain feasibility and stability properties under uncertainty.

III. Uncertainty Models and Ambiguity Sets

This section formalizes the uncertainty descriptions used in the distributionally robust MPC (DRO-MPC) layer. The central object is an ambiguity set $\mathcal{P}$ of probability laws for the disturbance process $w$. The controller then optimizes against a worst-case law in $\mathcal{P}$. We emphasize definitions and modeling interfaces; any statistical guarantees for selecting radii or calibrating $\mathcal{P}$ are deferred and must be justified separately.

III.A. Empirical distribution(s) from data streams and ensembles

The disturbance $w_t\in\mathbb{R}^{n_w}$ may include demand, inflow/climate forcings, renewable availability, and possibly market/CO$_2$ signals. In data-driven MPC, the ambiguity set is commonly centered at an empirical distribution built from a rolling data window and/or ensemble forecasts.

III.A.1. Single empirical law $\hat{\mathbb{P}}_N$

Stagewise samples. Fix a time $t$ and consider a dataset of $N$ disturbance samples $\{\hat w_t^i\}_{i=1}^N\subset\mathbb{R}^{n_w}$. The empirical distribution is

\[
\hat{\mathbb{P}}_{t,N} := \frac{1}{N}\sum_{i=1}^N \delta_{\hat w_t^i},
\]

where $\delta_z$ is the Dirac measure at $z$. This model is appropriate when one enforces robustified constraints at a single step (or assumes i.i.d./exchangeable disturbances across time, which is typically not literally true for water/renewables but may be used as an approximation).

Trajectory samples. For horizon-$H$ MPC it is often more natural to treat the disturbance on the entire horizon as a random vector

\[
\mathbf{w}_{t} := (w_{t|t},\ldots,w_{t+H-1|t}) \in \mathbb{R}^{H n_w}.
\]

Given $N$ scenario trajectories $\{\hat{\mathbf{w}}_t^i\}_{i=1}^N$, define

\[
\hat{\mathbb{P}}^{\mathrm{traj}}_{t,N} := \frac{1}{N}\sum_{i=1}^N \delta_{\hat{\mathbf{w}}_t^i}.
\]

This representation captures temporal correlation across the horizon but leads to ambiguity sets on a higher-dimensional space and raises time-consistency questions for multistage policies (discussed later in Section VIII).

Remark (data windows and nonstationarity). In practice one may form $\{\hat w_t^i\}$ from (i) recent residuals $w_t – \hat w_t$ of a forecasting model, (ii) climatological analogs restricted to similar seasons/regimes, or (iii) ensemble members from a forecasting pipeline. The choice determines what distribution is being robustified against: unconditional marginals, forecast-conditional errors, or scenario-generator distributions.

III.A.2. Multiple ensemble sources $\{\hat{\mathbb{P}}_N^k\}_{k=1}^K$

If multiple probabilistic forecasters or climate ensembles are available, we may obtain empirical laws $\hat{\mathbb{P}}_{t,N}^k$ for $k=1,\ldots,K$, each supported on $N$ samples (not necessarily the same $N$). This motivates ambiguity sets that aggregate information across ensembles, e.g., by taking intersections/unions of balls around each center, or by building a barycenter-type center distribution. Since tractability can depend strongly on how such aggregation is defined, we only fix notation here:

\[
\hat{\mathbb{P}}_{t,N}^k := \frac{1}{N_k}\sum_{i=1}^{N_k} \delta_{\hat w_{t}^{k,i}}\quad \text{or}\quad
\hat{\mathbb{P}}^{\mathrm{traj},k}_{t,N} := \frac{1}{N_k}\sum_{i=1}^{N_k} \delta_{\hat{\mathbf{w}}_{t}^{k,i}}.
\]

III.B. Wasserstein ambiguity sets

We next define Wasserstein balls around empirical distributions. Throughout, $(\Xi,\mathrm{d})$ denotes a Polish metric space (in applications $\Xi=\mathbb{R}^d$ with a norm-induced metric), and $\mathcal{P}(\Xi)$ the set of Borel probability measures on $\Xi$.

III.B.1. Wasserstein distance and Wasserstein ball

For $p\in[1,\infty)$, the $p$-Wasserstein distance between $\mathbb{P},\mathbb{Q}\in\mathcal{P}(\Xi)$ with finite $p$-th moments is

\[
W_p(\mathbb{P},\mathbb{Q}) := \left(\inf_{\pi\in\Pi(\mathbb{P},\mathbb{Q})} \int_{\Xi\times\Xi} \mathrm{d}(\xi,\zeta)^p\,\pi(d\xi,d\zeta)\right)^{1/p},
\]

where $\Pi(\mathbb{P},\mathbb{Q})$ is the set of couplings with marginals $\mathbb{P}$ and $\mathbb{Q}$.

Given a center law $\hat{\mathbb{P}}$ and radius $\kappa\ge 0$, the (closed) Wasserstein ball is

\[
\mathcal{B}_{W_p}(\hat{\mathbb{P}},\kappa) := \{\mathbb{P}\in\mathcal{P}(\Xi):\; W_p(\mathbb{P},\hat{\mathbb{P}}) \le \kappa\}.
\]

Choice of ground metric. If $\Xi=\mathbb{R}^d$, a common choice is $\mathrm{d}(\xi,\zeta)=\|\xi-\zeta\|$ for a norm $\|\cdot\|$ (e.g., $\ell_1$, $\ell_2$, or weighted norms). The norm should be scaled to reflect engineering units (e.g., m$^3$/s for demands versus kW for renewables); otherwise the ball geometry can be dominated by whichever components have largest numerical magnitudes.

III.B.2. Stagewise vs trajectory-wise ambiguity

Stagewise ambiguity (open-loop, per-time). For a horizon problem with constraint functions that separate by time, one can impose $H$ ambiguity sets

\[
\mathbb{P}_{t+\tau}\in \mathcal{B}_{W_p}(\hat{\mathbb{P}}_{t+\tau,N},\kappa_{t+\tau}),\qquad \tau=0,\ldots,H-1.
\]

This approach is computationally attractive because it reduces the dimension of the transport problem, but it neglects cross-time dependence unless it is rebuilt into $\hat{\mathbb{P}}_{t+\tau,N}$ through augmented features.

Trajectory-wise ambiguity (open-loop, horizon vector). Alternatively, one can place a single ball on $\mathbf{w}_t\in\mathbb{R}^{H n_w}$:

\[
\mathbb{P}^{\mathrm{traj}}_t \in \mathcal{B}_{W_p}\big(\hat{\mathbb{P}}^{\mathrm{traj}}_{t,N},\kappa_t\big).
\]

This captures temporal correlation within the horizon but can be conservative and may lead to time-inconsistent multistage interpretations if one later desires policies that react to partial observations. In this paper we primarily use the trajectory-wise ball as a modeling option for receding-horizon open-loop plans; multistage/time-consistent extensions are discussed separately in Section VIII.

III.B.3. Conditional / nested ambiguity (cautionary interface)

For multistage decision rules (policies that adapt within the horizon), a natural formalism is to put ambiguity on conditional distributions (transition kernels) rather than on the unconditional joint law. Doing so requires precise definitions of (i) information patterns, (ii) regular conditional distributions, and (iii) a distance between conditional laws. We record an abstract interface but do not assume it is automatically well-posed:

\[
\mathcal{P}_{\mathrm{cond}} := \{\mathbb{P}: \mathbb{P}(w_{t+1}\in\cdot\mid \mathcal{I}_t) \in \mathcal{B}(\hat{\mathbb{P}}(\cdot\mid\mathcal{I}_t),\kappa(\mathcal{I}_t)) \text{ a.s.}\},
\]

where $\mathcal{I}_t$ denotes the sigma-algebra of information available at time $t$. Any use of such sets must verify measurability and dynamic programming compatibility; we therefore treat conditional/nested Wasserstein sets as an extension rather than a baseline assumption.

III.C. Alternative ambiguity/risk models (comparative layer)

Although Wasserstein balls are the main ambiguity model used later for tractable dual reformulations, other ambiguity models can be useful depending on available statistics, desired conservatism, and computational constraints.

III.C.1. KL (relative-entropy) ambiguity

Let $\hat{\mathbb{P}}$ be a reference measure supported on $\Xi$. The KL divergence of $\mathbb{P}$ from $\hat{\mathbb{P}}$ (when $\mathbb{P}\ll\hat{\mathbb{P}}$) is

\[
D_{\mathrm{KL}}(\mathbb{P}\|\hat{\mathbb{P}}) := \int \log\left(\frac{d\mathbb{P}}{d\hat{\mathbb{P}}}\right) d\mathbb{P}.
\]

A KL ball is $\{\mathbb{P}: D_{\mathrm{KL}}(\mathbb{P}\|\hat{\mathbb{P}})\le \rho\}$. KL-DRO often yields exponentially-tilted worst-case distributions and can lead to smooth dual formulations (typically involving log-sum-exp terms when $\hat{\mathbb{P}}$ is discrete). Whether such reformulations remain tractable jointly with the IWS physics constraints depends on the exact constraint functions and is not assumed a priori.

III.C.2. Moment-based ambiguity (mean–covariance)

Suppose $\xi\in\mathbb{R}^d$ is an uncertainty vector (e.g., a subset or affine transformation of $w$) and we estimate mean $\mu$ and covariance $\Sigma$ with confidence sets

\[
\mu\in \mathcal{M},\qquad \Sigma\in\mathcal{S} \subseteq \mathbb{S}^d_+.
\]

Define the ambiguity set

\[
\mathcal{P}_{\mathrm{mom}} := \{\mathbb{P}: \mathbb{E}_{\mathbb{P}}[\xi] \in \mathcal{M},\; \mathrm{Cov}_{\mathbb{P}}(\xi) \in \mathcal{S}\}.
\]

For affine-in-$\xi$ constraints and quadratic norms, conservative chance-constraint approximations can often be expressed as second-order cone constraints (details depend on the chosen inequality and are treated later in the comparative Section X).

III.C.3. Scenario optimization / scenario MPC

Scenario methods do not posit an ambiguity set of distributions; instead they enforce constraints for a finite set of sampled disturbance scenarios. For a constraint family $g(z,\xi)\le 0$, a scenario program enforces $g(z,\xi^i)\le 0$ for $i=1,\ldots,N$ with $\xi^i$ drawn i.i.d. from the (unknown) data-generating process. Under conditions on sampling and convexity, one can obtain distribution-free bounds on violation probability that depend on the number of samples and an effective dimension of the decision problem. Because such guarantees are delicate in receding-horizon settings (dependence across time, re-sampling, and warm-starting), we treat scenario MPC as a separate reliability certificate layer in Section X.

III.C.4. Heavy-tailed / stable-law ambiguity (caution)

For renewable power fluctuations or extreme demand spikes, heavy-tailed models are sometimes proposed. Any claim that a heavy-tail ambiguity set leads to tractable conic constraints must be verified carefully: tractability depends on the exact definition of the ambiguity set (e.g., characteristic function constraints) and on how the chance constraint is approximated. We therefore do not assume such tractability in the main development; heavy-tail ideas are discussed only as an optional extension.

Summary. The remainder of the paper will primarily use Wasserstein balls around empirical distributions (stagewise or trajectory-wise) as the core ambiguity model. These definitions are sufficient to state distributionally robust objectives and constraints in Section V, where we will explicitly specify the assumptions under which tractable dual reformulations are valid.

IV. Multi-Objective Formulations and Pareto Set Construction

This section specifies how the multiple performance channels introduced in Section II.C are combined into a well-defined receding-horizon optimization problem, and how one may compute Pareto tradeoffs between (at least) operational cost, emissions, and reliability risk. The discussion is split into (i) mathematically standard scalarizations and (ii) practical Pareto-set approximation procedures compatible with convex and conic reformulations developed later.

IV.A. Multi-objective problem statement in receding horizon

Fix a horizon length $H\in\mathbb{N}$. For a given state estimate $x_t$, choose decision variables $\mathbf{u}=(u_{t|t},\ldots,u_{t+H-1|t})$ and (if used) auxiliary variables such as slacks $\mathbf{s}$ or risk epigraph variables. Let $\mathbf{w}_t$ denote the (random) disturbance vector over the horizon.

We assume $m\ge 2$ objective channels of the form
\[
J_i(\mathbf{u};x_t) := \Phi_i\big(x_t,\mathbf{u}\big) + \sup_{\mathbb{P}\in\mathcal{P}_t}\; \mathbb{E}_{\mathbb{P}}\big[\Psi_i(x_t,\mathbf{u},\mathbf{w}_t)\big],\qquad i=1,\ldots,m,
\]
where $\Phi_i$ collects deterministic terms (e.g., switching costs) and $\Psi_i$ is a stagewise-aggregated random performance (e.g., total grid energy, total emissions, total shortage penalty, or a tail-risk functional after epigraphization). The ambiguity set $\mathcal{P}_t$ may be stagewise or trajectory-wise (Section III).

A feasible receding-horizon plan satisfies dynamics and constraints for all stages $\tau=0,\ldots,H-1$ in the sense adopted later (deterministic, robust, or distributionally robust). Denote the resulting feasible set by $\mathcal{F}_t(x_t)$.

Definition 4.1 (Pareto dominance and efficiency). A feasible plan $\mathbf{u}\in\mathcal{F}_t(x_t)$ dominates another feasible plan $\mathbf{v}\in\mathcal{F}_t(x_t)$ if $J_i(\mathbf{u};x_t)\le J_i(\mathbf{v};x_t)$ for all $i$ and strict inequality holds for at least one $i$. A plan is Pareto efficient if it is not dominated by any other feasible plan.

The Pareto set at time $t$ is the set of efficient objective vectors $\{(J_1,\ldots,J_m):\mathbf{u}\in\mathcal{F}_t(x_t)\text{ is efficient}\}$. In MPC, one implements the first control action $u_{t|t}$ from a chosen Pareto-efficient (or near-efficient) plan, then repeats at $t+1$.

IV.A.1. Weighted-sum scalarization and supported Pareto points

Given weights $\lambda\in\mathbb{R}^m_+\setminus\{0\}$, solve the scalarized problem
\[
\min_{\mathbf{u}\in\mathcal{F}_t(x_t)}\; J_\lambda(\mathbf{u};x_t) := \sum_{i=1}^m \lambda_i J_i(\mathbf{u};x_t).
\]

Proposition 4.2 (Weighted sums yield Pareto-efficient points under standard convexity). Suppose $\mathcal{F}_t(x_t)$ is convex and each $J_i(\cdot;x_t)$ is convex on $\mathcal{F}_t(x_t)$. If $\lambda\in\mathbb{R}^m_{++}$ (all components strictly positive), then any minimizer $\mathbf{u}^\star$ of $J_\lambda$ is Pareto efficient.

Proof. If $\mathbf{u}^\star$ were not efficient, there would exist $\mathbf{v}\in\mathcal{F}_t(x_t)$ with $J_i(\mathbf{v})\le J_i(\mathbf{u}^\star)$ for all $i$ and strict inequality for some $j$. Multiplying by $\lambda\in\mathbb{R}^m_{++}$ and summing gives $J_\lambda(\mathbf{v})0\}$ is discontinuous. A standard convex, conservative surrogate uses the implication
\[
\mathrm{CVaR}_{1-\varepsilon}(g(z,\xi)) \le 0 \quad\Longrightarrow\quad \mathbb{P}(g(z,\xi)\le 0)\ge 1-\varepsilon,
\]
valid under any probability law (it follows because $\mathrm{VaR}_{1-\varepsilon}(g)\le \mathrm{CVaR}_{1-\varepsilon}(g)$). Consequently, we impose the distributionally robust CVaR constraint
\[
\sup_{\mathbb{P}\in\mathcal{B}} \mathrm{CVaR}^{\mathbb{P}}_{1-\varepsilon}(g(z,\xi)) \le 0,
\]
which is a conservative sufficient condition for the original Wasserstein DRO chance constraint.

Epigraph form. Using the hinge representation of CVaR, a sufficient implementation is: introduce $\eta\in\mathbb{R}$ and require
\[
\eta + \frac{1}{\varepsilon}\, \sup_{\mathbb{P}\in\mathcal{B}}\mathbb{E}_{\mathbb{P}}[(g(z,\xi)-\eta)_+] \le 0.
\]
The only inner object is a worst-case expectation of the convex loss $(g-\eta)_+$, which fits Theorem 5.1.

V.B.2. Joint chance constraints and risk allocation

Suppose we require simultaneous satisfaction of constraint families $g_j(z,\xi)\le 0$ for $j=1,\ldots,J$ (e.g., multiple critical pressure nodes and multiple ecosystem reaches). A joint chance condition
\[
\inf_{\mathbb{P}\in\mathcal{B}} \mathbb{P}\Big(\bigcap_{j=1}^J \{g_j(z,\xi)\le 0\}\Big) \ge 1-\varepsilon
\]
is again challenging. A common (conservative) approach is risk allocation via the union bound:
\[
\mathbb{P}\Big(\bigcup_{j=1}^J \{g_j(z,\xi)>0\}\Big) \le \sum_{j=1}^J \mathbb{P}(g_j(z,\xi)>0).
\]
Thus it suffices to enforce, for allocations $\varepsilon_j\ge 0$ with $\sum_j \varepsilon_j\le \varepsilon$, the individual constraints
\[
\inf_{\mathbb{P}\in\mathcal{B}}\mathbb{P}(g_j(z,\xi)\le 0)\ge 1-\varepsilon_j,\qquad j=1,\ldots,J.
\]
In our tractable pipeline we implement each of these via robust CVaR constraints at level $1-\varepsilon_j$, which preserves convexity when each $g_j$ is convex in $z$ and affine (or otherwise tractably convex) in $\xi$.

V.B.3. Conic representability for affine-in-uncertainty constraint functions

This subsection records a modeling pattern sufficient for conic MPC: constraint functions that are affine in $\xi$ and convex in decision variables.

Assumption 5.6 (Affinely-parameterized convex constraints).
Let
\[
g(z,\xi)=g_0(z) + G(z)^\top \xi,
\]
where $g_0$ is convex and $G(z)$ is an affine mapping of $z$ (or constant). This covers many IWS reliability constraints after linearization/epigraphization: for example, shortage constraints depend affinely on demand $d$, renewable availability $r$, and inflow $i$ through the mass-energy balance.

Under Assumption 5.6, for any fixed $z,\eta$, the hinge $(g(z,\xi)-\eta)_+$ is convex in $\xi$, and the inner supremum in Theorem 5.1 becomes a convex maximization of the form
\[
\sup_{\xi\in\Xi}\Big(\max\{g_0(z)+G(z)^\top\xi-\eta,\,0\}-\lambda\|\xi-\hat\xi^i\|^p\Big).
\]
When $p=1$ and $\Xi=\mathbb{R}^d$, one can upper bound this inner supremum by splitting the max and using the conjugate of the norm:
\[
\sup_{\xi}\big(g_0(z)+G(z)^\top\xi-\eta-\lambda\|\xi-\hat\xi^i\|\big)
= g_0(z)+G(z)^\top\hat\xi^i-\eta + \sup_{\Delta}\big(G(z)^\top\Delta-\lambda\|\Delta\|\big)
\]
which equals $+\infty$ unless $\|G(z)\|_*\le \lambda$, in which case it equals $g_0(z)+G(z)^\top\hat\xi^i-\eta$. Consequently, Wasserstein-1 DRO hinge expectations often reduce to constraints that combine (i) sample-wise hinge terms and (ii) a dual-norm bound on the sensitivity $G(z)$.

For $p=2$ and $\|\cdot\|=\|\cdot\|_2$, the penalty $\lambda\|\xi-\hat\xi^i\|_2^2$ leads to quadratic conjugates, and the inner maximization is typically second-order-cone or rotated-cone representable after standard epigraph transformations (the precise representation depends on whether $g$ is scalar and on how $G(z)$ enters).

Implementation note.
In this paper we do not fix $p$ globally; rather, we assume the instantiated choice of $(p,\|\cdot\|)$ and the constraint/cost templates are selected so that each occurrence of Theorem 5.1 yields an LP/SOCP/exponential-cone/SDP program compatible with the physical modeling layer (Section VI). When mixed-integer decisions are present, these become MI-conic programs.

V.C. Dependence and correlation handling

Uncertainty in IWS is inherently coupled (e.g., demand and renewable availability can be correlated through weather). Wasserstein ambiguity sets can be placed either on the joint vector $\xi=(d,i,r,\ldots)$ or on separate marginals.

1) Joint Wasserstein ball. If $\Xi\subseteq\mathbb{R}^d$ contains the full joint vector, $\mathcal{B}_{W_p}(\hat{\mathbb{P}}_N,\kappa)$ already allows adversarial changes in both marginals and dependence structure, limited only by the transport budget. This can be conservative but is conceptually clean.

2) Stagewise/marginal balls. If balls are imposed separately on components (or per-time marginals), dependence is not controlled unless one explicitly includes lagged features in $\xi$. Any claimed “exact” dependence worst-cases (e.g., comonotonicity) must be restricted to settings where the relevant functionals are monotone and effectively one-dimensional; such reductions do not hold in full generality for multivariate constraints.

V.D. Ensemble-aware Wasserstein DRO via barycentric centers (optional)

When multiple empirical centers $\hat{\mathbb{P}}_N^k$ are available (Section III.A.2), an alternative to choosing the “worst” ensemble is to form a single center distribution $\bar{\mathbb{P}}$ as a Wasserstein barycenter:
\[
\bar{\mathbb{P}} \in \arg\min_{\mathbb{P}\in\mathcal{P}(\Xi)} \sum_{k=1}^K \omega_k W_p^p(\mathbb{P},\hat{\mathbb{P}}_N^k),\qquad \omega_k\ge 0,\ \sum_k\omega_k=1.
\]
One may then build $\mathcal{B}_{W_p}(\bar{\mathbb{P}},\kappa)$. Computing barycenters is tractable in special cases (notably one-dimensional supports, where optimal transport reduces to quantile operations), but becomes computationally intensive in high dimensions. Accordingly, we treat barycentric centers as an optional preprocessing step; the DRO reformulations in this paper rely only on having a discrete reference distribution, regardless of how it is obtained.

Summary. Section V introduced the basic DRO objects used later in the paper: Wasserstein-robust expectations (Theorem 5.1), robust hinge/CVaR constructions for tail-risk objectives, and a convex surrogate pathway from chance constraints to robust CVaR constraints. The next section develops physical modeling layers (hydraulics, aquifer head loss, and quality constraints) in conic form so that these DRO blocks yield tractable (MI-)conic MPC subproblems.

VI. Tractable Physical Modeling Layers Compatible with DRO-MPC

This section collects convex modeling layers for hydraulics, groundwater head loss, and water quality that are compatible with the conic DRO blocks in Section V. The guiding principle is modularity:

1) the physical layer should produce a convex (preferably conic-representable) feasible set in the decision variables; and
2) the dependence of constraint functions on uncertainty should be affine (or otherwise tractably convex) so that the Wasserstein duality templates apply.

We emphasize that the conic formulations below are relaxations or conservative envelopes of the underlying nonlinear physics unless additional operating assumptions are imposed. Where exactness is asserted, it is stated as a conditional statement with an explicit proof under the stated assumptions.

VI.A. Convex hydraulics and pressure feasibility

VI.A.1. Directed flow model and convex head-loss epigraphs

Fix a directed graph $\mathcal{G}=(\mathcal{N},\mathcal{E})$. For each edge $e=(i\to j)\in\mathcal{E}$, let $q_e\ge 0$ denote the (directed) flow and $h_i$, $h_j$ denote nodal heads. Let $u_e\ge 0$ denote controllable head gain for pump edges (set $u_e\equiv 0$ for passive pipes). A common steady-state modeling pattern is
\[
h_i – h_j + u_e \ge \ell_e(q_e),\qquad q_e\ge 0,\quad u_e\ge 0,
\]
where $\ell_e(\cdot)$ is a convex, increasing head-loss function on $[0,\infty)$. This inequality is a convex relaxation of the equality $h_i-h_j+u_e=\ell_e(q_e)$.

Typical choices include:

– Quadratic loss (a surrogate for Darcy–Weisbach in certain regimes): $\ell_e(q)=a_e q + b_e q^2$ with $a_e,b_e\ge 0$.
– Power-law loss (a surrogate for Hazen–Williams): $\ell_e(q)=k_e q^\gamma$ with $\gamma>1$, often approximated by a rational exponent $\gamma=m/n$ or by piecewise power envelopes.

Conic epigraphs. Two cases are immediately conic representable:

(a) Quadratic: represent $t_e\ge q_e^2$ via a rotated second-order cone (RSOC). One standard 3D RSOC is
\[
\mathcal{Q}_r := \{(x,y,z): x\ge 0,\ y\ge 0,\ 2xy\ge \|z\|_2^2\},
\]
so that the scalar inequality $t_e\ge q_e^2$ can be encoded, for example, as
\[
(t_e,\tfrac{1}{2},q_e)\in \mathcal{Q}_r.
\]
Equivalently (without using the rotated form explicitly), one may impose the standard SOC constraint
$\|[2q_e,\, t_e-1]\|_2\le t_e+1$, which is exactly equivalent to $t_e\ge q_e^2$ together with $t_e\ge 0$.
Then impose
\[
h_i – h_j + u_e \ge a_e q_e + b_e t_e.
\]

(b) Rational power $q^{m/n}$: for integers $m\ge n\ge 1$, the epigraph $t\ge q^{m/n}$ on $q\ge 0$ is representable using power-cone constraints by introducing auxiliary variables and chaining monomial relations. We will use the standard 3D power cone
\[
\mathcal{K}_{\mathrm{pow}}(\alpha) := \{(x,y,z): x\ge 0,\ y\ge 0,\ |z|\le x^\alpha y^{1-\alpha}\},\qquad \alpha\in(0,1).
\]
In practice, a modeling system (or a disciplined convex programming transformation) can encode $t\ge k q^{m/n}$ using a small number of power-cone constraints when $m/n$ has a short binary expansion. We treat this as an implementation detail: the only mathematical requirement used later is that the head-loss epigraph is a closed convex cone-representable set.

VI.A.2. Pump head gain and pumping energy as convex envelopes

Let $P_e\ge 0$ denote electrical power for pump edge $e$. A physically motivated model is $P_e \propto q_e u_e$ (flow times head gain), which is bilinear and not convex.

Two convex modeling options that remain compatible with conic DRO are:

1) Fixed-head or piecewise-fixed head: if $u_e$ is fixed by operating policy (e.g., a pump enforces a roughly constant head gain when on), then $P_e$ is affine in $q_e$, hence convex.

2) Convex outer envelope with bounded variables: if bounds $0\le q_e\le \bar q_e$ and $0\le u_e\le \bar u_e$ hold, then the bilinear term admits a convex relaxation via its McCormick envelope. Introduce $y_e$ to represent $q_e u_e$ and impose
\[
\begin{aligned}
&y_e \ge 0,\quad y_e \ge \bar q_e u_e + \bar u_e q_e – \bar q_e\bar u_e,\\
&y_e \le \bar q_e u_e,\quad y_e \le \bar u_e q_e,
\end{aligned}
\]
then set $P_e \ge c_e y_e$ for a proportionality constant $c_e>0$. This relaxation is exact when either $q_e$ or $u_e$ is forced to an extreme point by other constraints (e.g., binary on/off with fixed $u_e$), but is otherwise conservative.

Because Section V requires only convexity/conic representability (not exact physical equality), these envelopes are sufficient for embedding hydraulics into conic DRO-MPC.

VI.A.3. Tightness of inequality-relaxed head-loss constraints (what can and cannot be guaranteed)

The inequality form $h_i-h_j+u_e\ge \ell_e(q_e)$ permits “extra head,” i.e., solutions in which the head drop plus pump gain strictly exceeds the modeled loss. In general, a head-penalizing objective alone does not imply that every edge inequality is tight: decreasing one nodal head can tighten multiple outgoing constraints simultaneously, so feasibility may prevent removing slack on a particular edge. Accordingly, tightness should only be asserted under assumptions that allow a local improvement argument.

Proposition 6.1 (Tightness on pump edges when pump head gain is penalized).
Consider a convex optimization problem with decision variables $(h,q,u,\ldots)$ that includes, for each pump edge $e=(i\to j)\in\mathcal{E}_{\mathrm{pump}}$, the constraints
\[
h_i – h_j + u_e \ge \ell_e(q_e),\qquad u_e\ge 0,
\]
where $\ell_e$ is convex and nondecreasing on $[0,\infty)$. Assume that, for each such pump edge $e$, the variable $u_e$ appears in the objective through a term $\psi_e(u_e)$ that is strictly increasing in $u_e$, and that $u_e$ appears nowhere else in the constraints except through the two inequalities above. Then, at any optimal solution,
\[
h_i – h_j + u_e = \ell_e(q_e)\qquad \forall e\in\mathcal{E}_{\mathrm{pump}}.
\]

Proof. Fix a pump edge $e$ and suppose $h_i-h_j+u_e>\ell_e(q_e)$ at an optimal solution. Then $u_e$ can be reduced by a small $\varepsilon>0$ (up to the available slack, and while maintaining $u_e\ge 0$) without affecting any other constraint, because $u_e$ enters only this edge inequality. This preserves feasibility and strictly decreases the objective (since $\psi_e$ is strictly increasing), contradicting optimality.

Remark 6.2 (Passive pipes and modeling choices).
1) For passive edges with $u_e\equiv 0$, slack $h_i-h_j>\ell_e(q_e)$ cannot in general be removed by a local argument unless additional structure is imposed (e.g., fixing source heads and enforcing equalities in a tree network, or introducing explicit slack variables and penalizing them).
2) Therefore, when using inequality-relaxed head-loss models, it is good practice to (i) make pump gains $u_e$ decision variables that are penalized (so Proposition 6.1 applies), and (ii) be explicit about whether passive-pipe inequalities are intended as conservative feasibility relaxations or as approximations expected to be tight in the operating regime.

VI.B. Forchheimer aquifer head-loss modeling

In groundwater/aquifer conveyance and well hydraulics, head loss is often modeled as a sum of linear and quadratic terms in flow (a Forchheimer-type relation). For a link with flow $q\ge 0$, consider
\[
\Delta h \ge a q + b q^2,\qquad a\ge 0,\ b\ge 0.
\]
This is convex in $(\Delta h,q)$ and can be embedded directly into the conic hydraulic layer.

VI.B.1. SOCP reformulation

Introduce an auxiliary variable $t\ge 0$ and impose
\[
\Delta h \ge a q + b t,\qquad t\ge q^2.
\]
The constraint $t\ge q^2$ is representable by a rotated SOC as in Section VI.A.1(a). Therefore, Forchheimer head loss yields an SOCP-representable epigraph.

VI.B.2. Conditional tightness under energy-monotone pumping models

When $\Delta h$ enters only through (i) lower bounds of the form “available head must exceed required head” and (ii) an objective that penalizes pumping effort increasing in $\Delta h$, the inequality is tight.

Proposition 6.3 (Tightness of Forchheimer inequality when $\Delta h$ is penalized).
Consider a convex optimization problem in variables $(q,\Delta h,\ldots)$ that includes
\[
\Delta h \ge a q + b q^2,\quad q\ge 0,
\]
and whose objective is strictly increasing in $\Delta h$ holding other variables fixed. If $\Delta h$ appears nowhere else except through this inequality and through constraints that are nondecreasing in $\Delta h$ (e.g., $\Delta h\ge 0$), then every optimal solution satisfies $\Delta h = a q + b q^2$.

Proof. If strict inequality held at an optimum, decreasing $\Delta h$ while keeping other variables fixed would preserve feasibility and strictly decrease the objective.

Remark 6.4 (Discrete well on/off).
If a well has on/off decisions $\delta\in\{0,1\}$ and flow bounds $0\le q\le \delta\bar q$, the SOCP epigraph above combines naturally with mixed-integer constraints, yielding an MI-SOCP layer compatible with the DRO blocks in Section V.

VI.C. Water-quality constraints

VI.C.1. Deterministic mixing constraints as linear inequalities

At a blending node $k$ with inflows $f_i\ge 0$ and concentrations $c_i$, the output concentration constraint $c_k^{\mathrm{out}}\le \bar c_k$ can be written (when total inflow is positive) as
\[
\sum_{i\in\mathcal{I}_k} (c_i-\bar c_k) f_i \le 0.
\]
If $c_i$ are known parameters, this is linear in the flow variables $f_i$.

VI.C.2. Robust “worst-case blend” under interval uncertainty

Suppose the concentrations are uncertain but bounded componentwise:
\[
c_i \in [\\underline c_i,\overline c_i],\qquad f_i\ge 0.
\]
Define the violation function
\[
\varphi(f,c) := \sum_{i\in\mathcal{I}_k} (c_i-\bar c_k) f_i.
\]
Because $f_i\ge 0$, $\varphi$ is nondecreasing in each $c_i$. Hence the worst-case value over a box is attained at $c_i=\overline c_i$.

Proposition 6.5 (Box-robust blending reduces to a linear constraint).
Under $f\ge 0$ and $c_i\in[\\underline c_i,\overline c_i]$, the robust constraint
\[
\sup_{c\in\prod_i[\\underline c_i,\overline c_i]} \sum_i (c_i-\bar c_k) f_i \le 0
\]
is equivalent to
\[
\sum_i (\overline c_i-\bar c_k) f_i \le 0.
\]

Proof. By monotonicity in each $c_i$, the supremum is attained at the upper corner $\overline c$.

This yields an auditable certificate: if the upper bounds $\overline c_i$ are justified (e.g., via conservative sensing or regulatory limits), then any flow allocation satisfying the above linear inequality guarantees the output concentration bound.

VI.C.3. Distributional reliability for uncertain concentrations (DRO wrapping)

If concentrations $c$ are modeled as random and we want probabilistic reliability, the blending constraint fits the affine-in-uncertainty template of Section V.B.3. Let $\xi:=c\in\mathbb{R}^{|\mathcal{I}_k|}$ be the uncertainty vector and define
\[
g(f,\xi) := \sum_i (\xi_i-\bar c_k) f_i.
\]
Then $g$ is affine in $\xi$ and linear in $f$. A conservative distributionally robust chance constraint
$
\inf_{\mathbb{P}\in\mathcal{B}}\mathbb{P}(g(f,\xi)\le 0)\ge 1-\varepsilon
$
can be enforced by the robust CVaR surrogate
\[
\sup_{\mathbb{P}\in\mathcal{B}}\mathrm{CVaR}^{\mathbb{P}}_{1-\varepsilon}(g(f,\xi)) \le 0,
\]
which reduces to robust hinge expectations as in Section V.A.2 and V.B.1. This pathway preserves convexity in $f$ and integrates directly into the multi-objective MPC formulations of Section IV.

VI.D. Nonlinear water-quality transport (ADR) and polynomial/SOS surrogates

When quality cannot be captured by instantaneous mixing alone (e.g., salinity transport in long pipes/channels, reactive constituents), a reduced-order dynamic model can be appended to the MPC state $x_t$ via auxiliary states $a_t$. A common structure is a discretized advection–diffusion–reaction (ADR) system
\[
a_{t+1} = F(a_t,u_t,w_t),
\]
which is generally nonlinear.

VI.D.1. Polynomial approximation template

Assume that on an operating region $\Omega\subset\mathbb{R}^{n_a+n_u+n_w}$, the map $F$ is approximated by a polynomial $\tilde F$ (e.g., via Taylor expansion around an operating point, regression on simulation data, or collocation with basis functions):
\[
a_{t+1} \approx \tilde F(a_t,u_t,w_t),\qquad \tilde F\text{ polynomial in }(a_t,u_t,w_t).
\]
Similarly, inequality constraints can be written as polynomial inequalities $\tilde g(a_t,u_t,w_t)\le 0$ over $\Omega$.

VI.D.2. SOS certificates for invariant/viability sets (optional)

If one seeks deterministic robust invariance/viability certificates for the polynomial surrogate, one can search for a polynomial Lyapunov/barrier function or an invariant-set certificate using sum-of-squares (SOS) constraints, which reduce to semidefinite programs (SDPs).

Interface (no blanket guarantee claimed). Let $\mathcal{W}$ be a bounded disturbance set and $\mathcal{U}$ a compact convex input set. A set $\mathcal{X}\subset\mathbb{R}^{n_a}$ is (robust) forward invariant for $a_{t+1}=\tilde F(a_t,u_t,w_t)$ if for all $a\in\mathcal{X}$ there exists $u\in\mathcal{U}$ such that $\tilde F(a,u,w)\in\mathcal{X}$ for all $w\in\mathcal{W}$. Verifying such a property can be cast as a polynomial nonnegativity condition and relaxed via SOS. The resulting SDP can serve as a design-time certificate to construct terminal sets for MPC, but its computational feasibility depends strongly on dimension and polynomial degree.

Summary. Section VI provided conic-compatible physical modeling layers: (i) convex epigraphs for head-loss relations and pump envelopes, together with a conditional tightness result when heads are penalized; (ii) an SOCP representation of Forchheimer quadratic head loss with a tightness condition under monotone penalties; and (iii) linear robust blending constraints and an affine-in-uncertainty interface for wrapping quality constraints with Wasserstein DRO. The next section builds on these layers to define closed-loop DRO-MPC architectures (tube-based and terminal-set based) and to state conditions for recursive feasibility and stability-type properties.

VII. DRO-MPC Architectures and Closed-Loop Guarantees

This section defines closed-loop MPC architectures that combine (i) the data-driven ambiguity sets of Sections III–V with (ii) tractable conic physical constraints from Section VI. The goal is to make precise what can be guaranteed at the closed-loop level (recursive feasibility and stability-type properties) under explicit assumptions.

A recurring theme is the separation between:

1) a tractable *planning* problem solved at each time step using a Wasserstein ambiguity set (distributional robustness), and
2) a *feedback* mechanism (tube, terminal set, or local policy) that prevents constraint violation from accumulating under bounded modeling/prediction errors.

Throughout, we focus on constraints expressed as convex inequalities in the predicted states and controls. Any probabilistic statements are made conditional on the chosen ambiguity model and on what is (and is not) controlled in time.

VII.A. Tube-based distributionally robust MPC (DR-TBMPC)

VII.A.1. Nominal + error dynamics and tube tightening

We present a tube MPC template for systems that admit an additive disturbance decomposition around a nominal model. For clarity, assume a (possibly linearized) discrete-time model
\[
x_{t+1} = f(x_t,u_t) + w_t,
\]
with state constraint set $\mathcal{X}\subset\mathbb{R}^{n_x}$ and input constraint set $\mathcal{U}\subset\mathbb{R}^{n_u}$ closed and convex. (The mapping $w_t$ may represent net forecast error after incorporating known forecast means; if dynamics are nonlinear, one may interpret the decomposition as local, with $w_t$ absorbing linearization/model mismatch.)

Let $\hat x_{t+\tau|t}$ and $\hat u_{t+\tau|t}$ denote nominal predicted state and control, and let the implemented control be
\[
u_t = \hat u_{t|t} + K e_t,\qquad e_t:=x_t-\hat x_{t|t},
\]
where $K$ is a fixed feedback gain. Define the nominal dynamics
\[
\hat x_{t+\tau+1|t} = f(\hat x_{t+\tau|t},\hat u_{t+\tau|t})
\]
and the error dynamics (exact if $f$ is affine; otherwise a bound is required)
\[
e_{t+1} \in \mathcal{A} e_t \oplus \mathcal{D}_t,
\]
where $\mathcal{A}$ is a matrix (e.g., $A+BK$ for linear $f$), and $\mathcal{D}_t$ is a bounded set capturing additive disturbances and model mismatch. Tube MPC enforces nominal constraints tightened by an error tube set $\mathcal{E}_\tau$:
\[
\hat x_{t+\tau|t}\in \mathcal{X}\ominus \mathcal{E}_\tau,\qquad \hat u_{t+\tau|t}\in \mathcal{U}\ominus K\mathcal{E}_\tau.
\]
The tightening sets are constructed to satisfy a forward invariance recursion
\[
\mathcal{E}_{\tau+1} \supseteq \mathcal{A}\mathcal{E}_\tau \oplus \mathcal{D}_t \quad (\text{or }\mathcal{D}_{t+\tau}),
\]
so that if $e_t\in \mathcal{E}_0$ then $e_{t+\tau}\in \mathcal{E}_\tau$ for all $\tau$ (by induction).

Connection to Wasserstein DRO. The set $\mathcal{D}_t$ is not a distribution; it is a deterministic bound on the disturbance entering the error channel. In a WDRO-MPC pipeline, a common separation is:

– Use Wasserstein DRO to model stochasticity in *service constraints* and *costs* (e.g., demand uncertainty in shortages, renewable uncertainty in grid import, uncertain quality parameters), via the robust expectations and robust CVaR constraints in Section V.
– Use tube tightening to handle *modeling error* and *state-estimation error* deterministically, since these errors are typically not i.i.d. and may be adversarial within bounds.

In other words, the tube does not replace WDRO; it complements it by closing the loop and preventing small prediction errors from causing constraint violations.

VII.A.2. Distributionally robust constraint enforcement inside the tube

Let $z$ stack the nominal trajectory variables $(\hat x_{t|t},\ldots,\hat x_{t+H|t},\hat u_{t|t},\ldots,\hat u_{t+H-1|t})$. Suppose a reliability constraint at stage $\tau$ has the form
\[
g_{\tau}(z,\xi) \le 0,
\]
where $\xi$ is the uncertainty vector (stagewise or trajectory-wise as in Section III). To ensure closed-loop satisfaction when the implemented state differs from the nominal state, we enforce a tightened constraint
\[
g_{\tau}(z,\xi) \le -\rho_{\tau},
\]
where $\rho_{\tau}\ge 0$ is a deterministic margin chosen so that the effect of tube deviations is covered.

A sufficient condition is available when $g_{\tau}$ is Lipschitz in the state argument.

Assumption 7.1 (Tube-Lipschitz constraint). For each $\tau$, there exists $L_{\tau}\ge 0$ such that for all feasible $(\hat x,\hat u)$ and all deviations $e$,
\[
|g_{\tau}(\hat x+e,\hat u,\xi)-g_{\tau}(\hat x,\hat u,\xi)| \le L_{\tau}\|e\|
\]
for all $\xi$ in the uncertainty support $\Xi$.

If $e_{t+\tau}\in \mathcal{E}_\tau$, then a valid tightening is
\[
\rho_{\tau} := \sup_{e\in\mathcal{E}_\tau} L_{\tau}\|e\|.
\]
In many conic models (affine constraints, norms, and hinge penalties), $L_{\tau}$ can be computed from dual norms of the affine sensitivity coefficients.

Now enforce WDRO reliability on the tightened constraint using robust CVaR (Section V.B.1): for a target $\varepsilon\in(0,1)$, impose
\[
\sup_{\mathbb{P}\in\mathcal{B}_{W_p}(\hat{\mathbb{P}}_N,\kappa)}\mathrm{CVaR}^{\mathbb{P}}_{1-\varepsilon}\big(g_{\tau}(z,\xi)+\rho_{\tau}\big)\le 0.
\]
This is conservative but convex whenever $g_{\tau}$ is convex in decision variables and affine (or conically tractable) in $\xi$, as discussed in Section V.

VII.A.3. A basic recursive feasibility result (deterministic tube + WDRO constraints)

We state a standard recursive feasibility theorem for tube MPC with terminal ingredients. The statement is conditional and separates deterministic tube invariance from WDRO feasibility of the nominal plan.

Assumption 7.2 (Error tube invariance). There exist sets $\mathcal{E}_\tau$ for $\tau=0,\ldots,H$ and a disturbance bound set $\mathcal{D}_t$ such that if $e_t\in \mathcal{E}_0$, then under the implemented feedback $u_t=\hat u_{t|t}+Ke_t$, the resulting error satisfies $e_{t+\tau}\in \mathcal{E}_\tau$ for all $\tau\in\{0,\ldots,H\}$.

Assumption 7.3 (Terminal controller and terminal set). There exist a terminal set $\mathcal{X}_f$ and a terminal feedback $\kappa_f(\cdot)$ such that:

(i) (Constraint admissibility) $x\in\mathcal{X}_f\Rightarrow \kappa_f(x)\in\mathcal{U}$ and $f(x,\kappa_f(x))\in\mathcal{X}_f$ for all disturbances in the error channel covered by the tube.

(ii) (Nominal feasibility) The nominal terminal condition $\hat x_{t+H|t}\in \mathcal{X}_f\ominus \mathcal{E}_H$ is enforced in the MPC problem.

Theorem 7.4 (Recursive feasibility of DR-TBMPC; conditional statement). Fix $H$. Suppose that at time $t$ the DR-TBMPC optimization problem is feasible, producing a nominal sequence $\hat u_{t|t},\ldots,\hat u_{t+H-1|t}$ and a terminal nominal state $\hat x_{t+H|t}\in \mathcal{X}_f\ominus \mathcal{E}_H$. Suppose also that the tube invariance (Assumption 7.2) holds, and that the terminal set/controller satisfy Assumption 7.3. Then, after applying $u_t=\hat u_{t|t}+K e_t$ and shifting the horizon, there exists a feasible candidate solution for the optimization at time $t+1$. In particular, feasibility is preserved recursively as long as the disturbance remains within the tube disturbance bounds used to construct $\mathcal{E}_\tau$.

Proof (standard shift argument). Use the optimal nominal plan at time $t$ to construct a candidate nominal plan at time $t+1$ by discarding $\hat u_{t|t}$, shifting $\hat u_{t+1|t},\ldots,\hat u_{t+H-1|t}$, and appending the terminal law $\kappa_f$ at the end. Tube invariance ensures the tightened constraints remain satisfied for the implemented trajectory, and terminal invariance ensures the appended control keeps the terminal condition feasible.

Remark 7.5 (What this does not guarantee). Theorem 7.4 is a *deterministic* recursive feasibility statement with respect to the tube disturbance bound. It does not by itself provide a probability statement about WDRO chance constraints; those remain modeled via robust CVaR constraints on the nominal plan. Interpreting the combination as an overall probabilistic guarantee requires careful accounting of (i) how the ambiguity set is selected, (ii) how violations across time aggregate, and (iii) whether the uncertainty samples are independent across MPC steps.

VII.A.4. ISS-type stability for tube MPC with robustified stage costs (conditional)

For stability, we present an input-to-state stability (ISS) type statement in the classical tube MPC sense, again conditional on standard assumptions.

Assumption 7.6 (Nominal stabilizability and terminal Lyapunov decrease). There exists a continuous, positive definite function $V_f$ and a terminal controller $\kappa_f$ such that for the nominal closed-loop (no disturbance)
\[
V_f(f(x,\kappa_f(x))) – V_f(x) \le -\ell(x,\kappa_f(x))
\]
for all $x\in\mathcal{X}_f$, where $\ell$ is the (deterministic) nominal stage cost used in the MPC objective.

Assumption 7.7 (Bounded disturbance in the error channel). The tube disturbance bound set $\mathcal{D}_t$ is uniformly bounded by $\|d\|\le \bar d$ for all $d\in\mathcal{D}_t$.

Proposition 7.8 (ISS bound; informal but standard). Under Assumptions 7.2, 7.6, and 7.7, and assuming the tightening makes the optimization feasible for all times, the closed-loop state satisfies an ISS-type bound
\[
\|x_t\| \le \beta(\|x_0\|,t) + \gamma(\bar d)
\]
for some class-$\mathcal{KL}$ function $\beta$ and class-$\mathcal{K}$ function $\gamma$, with $\gamma(\bar d)\to 0$ as $\bar d\to 0$.

Comment. A rigorous proof follows standard robust MPC arguments: show the optimal value function is a Lyapunov function for the nominal dynamics, then bound the effect of bounded disturbances through Lipschitz continuity of the cost and constraints plus tube invariance. The WDRO parts of the objective (worst-case expectations, robust hinge terms) are convex but may be nonsmooth; one must ensure that the cost is still proper and that the Lyapunov decrease inequality can be stated in terms of the optimal value (often using a relaxed decrease with bounded error terms).

VII.B. Terminal ingredients: robust control invariant (RCI) sets under ambiguity

Tube MPC requires a terminal set that is invariant under a local control law while respecting constraints under uncertainty. In the present modular setting, it is useful to separate uncertainty into:

– a bounded disturbance affecting the state evolution (handled by RCI sets), and
– distributional uncertainty affecting service constraints (handled by WDRO constraints on the nominal plan).

VII.B.1. Robust control invariant sets and the Pre operator

Consider a controlled difference inclusion
\[
x_{t+1} \in F(x_t,u_t) := f(x_t,u_t) \oplus \mathcal{D},
\]
where $\mathcal{D}$ is a bounded disturbance set.

Definition 7.9 (RCI set). A set $\mathcal{C}\subseteq\mathcal{X}$ is robust control invariant (RCI) for $x_{t+1}\in F(x_t,u_t)$ if for every $x\in\mathcal{C}$ there exists $u\in\mathcal{U}$ such that $f(x,u)+d\in\mathcal{C}$ for all $d\in\mathcal{D}$.

Define the predecessor operator
\[
\mathrm{Pre}(\mathcal{C}) := \{x\in\mathcal{X}: \exists u\in\mathcal{U}\ \text{s.t.}\ f(x,u)\oplus \mathcal{D}\subseteq \mathcal{C}\}.
\]
Then $\mathcal{C}$ is RCI iff $\mathcal{C}\subseteq \mathrm{Pre}(\mathcal{C})$.

For affine dynamics $x^+=Ax+Bu+d$ with polytopic $\mathcal{X},\mathcal{U},\mathcal{D}$, $\mathrm{Pre}(\mathcal{C})$ preserves polytopicity (via Minkowski differences), but exact computation can be expensive in high dimension.

VII.B.2. Zonotopic inner approximations (computational interface)

To obtain scalable terminal sets, one can compute inner invariant sets in a restricted family, such as zonotopes
\[
\mathcal{Z}(c,G) := \{c + G\theta: \|\theta\|_\infty\le 1\},
\]
where $G\in\mathbb{R}^{n_x\times m}$ is a generator matrix. One seeks a fixed point in this family by enforcing
\[
\mathcal{Z}(c,G) \subseteq \mathrm{Pre}(\mathcal{Z}(c,G)).
\]
A typical sufficient condition (for linear systems) is obtained by choosing a linear feedback $u=Kx$ and seeking a robust positively invariant (RPI) set for the closed-loop error $e^+=(A+BK)e + d$. For zonotopes, generator propagation can be implemented by setting
\[
G^+ \approx (A+BK)G \quad \text{and adding generators for }\mathcal{D},
\]
followed by order reduction. Because these are algorithmic approximations, they provide *inner* invariant sets; the invariance is certified by a containment check that is exact (e.g., via linear programming) once a candidate set is proposed.

Connection to Wasserstein ambiguity. If one wishes to incorporate ambiguity in the disturbance magnitude itself (e.g., using a data-driven bound), one must be careful: a Wasserstein ball is a set of distributions, not a deterministic set $\mathcal{D}$. A conservative bridge is to map the ambiguity set to a deterministic disturbance envelope (e.g., a high-probability set or a conformal prediction set) and use that envelope as $\mathcal{D}$. This is the approach used in Section VII.D.

VII.B.3. Out-of-sample inclusion claims: what is safe to assert

If a deterministic envelope $\mathcal{D}$ is constructed from data (e.g., by conformal prediction), one may obtain a finite-sample *marginal* coverage statement of the form
\[
\mathbb{P}(w_{t+1}\in \mathcal{D}_t) \ge 1-\alpha.
\]
Such a statement does not imply that $w_{t+k}\in\mathcal{D}_{t+k-1}$ for all $k$ simultaneously with probability $\ge 1-\alpha$ without additional dependence assumptions or anytime-valid calibration. Therefore, any out-of-sample statement about invariant-set inclusion must specify whether it is marginal-in-time, per-step, or uniform-over-time. Section VII.D makes this distinction explicit.

VII.C. Robust viability viewpoint and integration with risk constraints

RCI sets address invariance under disturbances, but in IWS operations one often cares about *viability*: maintaining satisfaction of state and input constraints while allowing objectives (cost/emissions) to trade off.

Definition 7.10 (Robust viability kernel; finite horizon). Given a constraint set $\mathcal{K}\subseteq \mathcal{X}\times\mathcal{U}$ and disturbance set $\mathcal{D}$, define the robust viability kernel over horizon $H$ as the set of states $x$ from which there exists an admissible control sequence keeping the state in $\mathcal{X}$ and inputs in $\mathcal{U}$ for all disturbance realizations in $\mathcal{D}$ for $H$ steps.

In MPC, one typically approximates viability by terminal constraints and tube tightening:

– Tube tightening yields a tractable sufficient condition for robust constraint satisfaction.
– Terminal sets yield a sufficient condition for *infinite-horizon* feasibility.

Risk constraints (chance/CVaR) enter as additional constraints on service variables (shortage, pressure violations, quality exceedances). Enforcing these risk constraints on the nominal plan does not automatically imply the same risk level for the implemented closed-loop trajectory. However, if the constraint functions are Lipschitz in the state and the tube bound is deterministic, one can conservatively shift the risk constraint by a worst-case margin as in Section VII.A.2.

VII.D. Conformal-calibrated drifting Wasserstein DRO-MPC (CC-DRO-MPC)

This subsection provides a concrete mechanism for time-varying ambiguity radii $\kappa_t$ using conformal prediction on streaming residuals. The purpose is to adapt to nonstationarity while retaining an auditable, finite-sample *marginal* coverage guarantee.

VII.D.1. Online residuals and conformal calibration

Let $\hat w_{t|t-1}$ be a one-step-ahead point forecast of $w_t$ produced by any forecasting model (physical, statistical, or machine learning). Define the residual
\[
r_t := w_t – \hat w_{t|t-1}.
\]
Conformal calibration constructs a prediction set $\mathcal{R}_t$ for $r_t$ using past residuals. A simple choice is a norm ball
\[
\mathcal{R}_t := \{r:\ \|r\| \le \rho_t\},
\]
where $\rho_t$ is chosen as a quantile of $\{\|r_s\|\}_{s\in\mathcal{I}_t}$ over a calibration index set $\mathcal{I}_t$ (e.g., a rolling window).

Proposition 7.11 (Marginal conformal coverage; standard statement). Under exchangeability of the residual sequence $\{r_s\}_{s\in\mathcal{I}_t\cup\{t\}}$, if $\rho_t$ is chosen as the appropriate empirical quantile of the calibration scores $\|r_s\|$, then
\[
\mathbb{P}(r_t\in \mathcal{R}_t) \ge 1-\alpha.
\]

Caution. Exchangeability can fail under concept drift; rolling-window conformal methods are often used heuristically in such cases but require additional assumptions for formal validity. In this paper we therefore use conformal calibration primarily as an *auditable tuning rule* and interpret the resulting guarantee as marginal-in-window.

VII.D.2. From conformal sets to Wasserstein radii

To connect conformal envelopes to Wasserstein ambiguity, we distinguish two roles:

1) *Deterministic tightening/tubes*: use $\mathcal{R}_t$ (or its projection) to define a bounded disturbance set $\mathcal{D}_t$ used in tube construction.

2) *Distributional ambiguity*: define a time-varying Wasserstein radius $\kappa_t$ for the WDRO constraints, based on a scale estimate from residuals.

A simple, unit-consistent rule is
\[
\kappa_t := c_\kappa\, \rho_t,
\]
where $c_\kappa$ is a dimensionless factor accounting for the difference between the geometry of the conformal norm and the ground metric used in the Wasserstein distance (and for stagewise vs trajectory-wise scaling). This is not a theorem: it is a modeling choice that must be validated empirically.

Alternatively, one may define $\hat{\mathbb{P}}_{t,N}$ as the empirical distribution of residuals and place a Wasserstein ball around it. The “radius” in that case becomes a hyperparameter that trades off conservatism and feasibility; conformal calibration can still be used to detect drift (large residuals) and trigger radius increases.

VII.D.3. Marginal coverage vs joint/horizon-wise coverage

CC-DRO-MPC can provide strong *per-step* interpretability but limited *joint* guarantees.

– A conformal set $\mathcal{R}_t$ calibrated at level $1-\alpha$ yields $\mathbb{P}(r_t\in\mathcal{R}_t)\ge 1-\alpha$ marginally.
– Over an MPC horizon $H$, joint inclusion $\mathbb{P}(r_{t+1}\in\mathcal{R}_{t+1},\ldots,r_{t+H}\in\mathcal{R}_{t+H})$ can be much smaller than $1-\alpha$ without further assumptions.

Therefore, when CC-DRO-MPC is used to build deterministic tubes or tighten constraints across the horizon, the resulting design is conservative if one uses union bounds, and potentially optimistic if one ignores joint dependence. In this paper we adopt the conservative interpretation: conformal calibration is used primarily for per-step disturbance bounds and for adapting $\kappa_t$, not as a claim of horizon-wise coverage.

VII.D.4. Recursive feasibility with time-varying ambiguity

Time-varying $\kappa_t$ changes the feasible set of the WDRO constraints from one MPC step to the next. Recursive feasibility thus requires either:

(i) a *monotone* radius update (e.g., $\kappa_{t+1}\ge \kappa_t$), which preserves feasibility but may become overly conservative; or

(ii) a design ensuring feasibility for a worst-case radius $\bar\kappa$ and using smaller radii only to reduce objective conservatism.

Proposition 7.12 (Feasibility under decreasing radii). If the WDRO constraints are of the form
$
\sup_{\mathbb{P}:W_p(\mathbb{P},\hat{\mathbb{P}})\le \kappa} \phi(z;\mathbb{P}) \le 0
$
with $\phi$ monotone nondecreasing in $\kappa$ (which holds for robust expectations and robust CVaR bounds), then feasibility at radius $\bar\kappa$ implies feasibility for any smaller $\kappa\le \bar\kappa$. Consequently, designing terminal/tube ingredients for $\bar\kappa$ and only *shrinking* the operational radius preserves recursive feasibility (subject to the deterministic tube assumptions).

This is the recommended safe mode for CC-WDRO-MPC: choose conservative terminal/tube ingredients and allow $\kappa_t$ to shrink to gain performance without risking loss of feasibility.

Summary. Section VII specified a closed-loop architecture for Wasserstein DRO-MPC in which WDRO constraints are enforced on nominal trajectories (possibly with Lipschitz tube margins) and deterministic tube/terminal ingredients ensure recursive feasibility under bounded model/forecast errors. We also outlined a conformal calibration mechanism to adapt disturbance envelopes and Wasserstein radii over time, emphasizing that conformal guarantees are typically marginal rather than joint over a horizon. Section VIII turns to genuinely multistage and time-consistent formulations beyond standard (open-loop) MPC.

VIII. Time Consistency and Multistage Extensions Beyond Standard MPC

Sections IV–VII primarily treat MPC subproblems as (possibly distributionally robust) open-loop optimizations over a finite horizon, followed by receding-horizon re-optimization. This is often appropriate for real-time operation, but it is not a time-consistent multistage stochastic control model: open-loop plans ignore within-horizon recourse, and trajectory-wise ambiguity sets can lead to preference reversals when information is revealed.

This section provides a mathematically explicit interface for *time-consistent* extensions based on (i) distributionally robust Markov decision processes (DRMDPs), (ii) nested (compositional) risk measures that admit Bellman recursions, (iii) martingale/forecast-calibration constraints that enforce temporal coherence, and (iv) multistage decomposition via (distributionally robust) SDDP. The emphasis is on formulations that are well-posed and whose dynamic programming operators are time-consistent by construction. Claims about tractability depend on discretization choices and on how ambiguity sets are parameterized; we state these as conditional interfaces.

VIII.A. Distributionally robust MDP (DRMDP) formulation

VIII.A.1. Markov state, action, and Wasserstein ambiguity on transition kernels

Let $\mathsf{S}$ be a (measurable) state space and $\mathsf{A}(s)$ be a nonempty action set at state $s\in\mathsf{S}$. In the IWS context, $s$ can include storage/quality/energy buffer states and (optionally) a regime index encoding seasonal/climate/forecast regimes. Let the controlled state evolution be Markov:
\[
s_{t+1} \sim P(\cdot\mid s_t,a_t),\qquad a_t\in\mathsf{A}(s_t),
\]
where $P(\cdot\mid s,a)$ is an unknown transition kernel.

A DRMDP posits an ambiguity set $\mathcal{P}(s,a)$ of possible kernels at each $(s,a)$, and evaluates a policy against the *worst* kernel consistent with the ambiguity model. A common time-consistent structure is **rectangularity**: ambiguity sets factor across states and actions, so Nature can choose a worst-case kernel at each $(s,a)$ independently of other states.

Definition 8.1 (Rectangular Wasserstein kernel ambiguity; interface).
Fix $p\in[1,\infty)$ and a ground metric $\mathrm{d}$ on $\mathsf{S}$. Suppose that for each $(s,a)$ we have a reference conditional law $\hat P(\cdot\mid s,a)$ (e.g., an empirical or fitted model) and a radius $\varepsilon(s,a)\ge 0$. Define
\[
\mathcal{P}(s,a) := \big\{Q(\cdot\mid s,a)\in\mathcal{P}(\mathsf{S}):\; W_p\big(Q(\cdot\mid s,a),\hat P(\cdot\mid s,a)\big)\le \varepsilon(s,a)\big\}.
\]
The global ambiguity model is the product family $\{\mathcal{P}(s,a)\}_{(s,a)}$, which is rectangular by construction.

Remark 8.2 (Why this enforces time consistency).
For discounted infinite-horizon control (and other settings admitting a Bellman recursion), rectangularity ensures that the value function satisfies a dynamic programming equation with an *inner* $\sup$ taken stagewise at $(s,a)$. This avoids inconsistencies that can occur when ambiguity is placed only on the unconditional joint trajectory law.

VIII.A.2. Robust Bellman operator and contraction (discounted case)

Let $c(s,a)$ be a bounded one-step cost and $\gamma\in(0,1)$ a discount factor. For a bounded function $V\in\mathcal{B}(\mathsf{S})$ define the distributionally robust Bellman operator
\[
(TV)(s) := \inf_{a\in\mathsf{A}(s)} \left\{ c(s,a) + \gamma\, \sup_{Q\in\mathcal{P}(s,a)} \mathbb{E}_{s’\sim Q}[V(s’)]\right\}.
\]
The map $V\mapsto \sup_{Q\in\mathcal{P}(s,a)} \mathbb{E}_Q[V]$ is monotone and 1-Lipschitz in the sup norm:
\[
\Big|\sup_{Q\in\mathcal{P}(s,a)}\mathbb{E}_Q[V] – \sup_{Q\in\mathcal{P}(s,a)}\mathbb{E}_Q[U]\Big|
\le \sup_{Q\in\mathcal{P}(s,a)} \mathbb{E}_Q\big[|V-U|\big]
\le \|V-U\|_\infty.
\]
Therefore the robust Bellman operator is a $\gamma$-contraction.

Proposition 8.3 (Discounted DRMDP fixed point; conditional statement).
Assume $c$ is bounded and $\gamma\in(0,1)$. Then $T$ is a contraction on $(\mathcal{B}(\mathsf{S}),\|\cdot\|_\infty)$, hence has a unique fixed point $V^\star$ satisfying $V^\star = T V^\star$. Moreover, value iteration $V^{k+1}=TV^k$ converges to $V^\star$ in $\|\cdot\|_\infty$.

Proof.
The contraction bound follows from the 1-Lipschitz property above and the fact that the infimum over actions preserves Lipschitz constants. The Banach fixed point theorem yields existence, uniqueness, and convergence.

Remark 8.4 (Tractability caveat).
The DRMDP recursion is computationally meaningful only if one can evaluate/approximate
$\sup_{Q\in\mathcal{P}(s,a)} \mathbb{E}_Q[V]$. For Wasserstein sets, tractability depends on (i) the state representation (continuous vs discretized), (ii) the choice of $p$ and the metric, and (iii) whether $V$ has a structure enabling a dual formula (e.g., Lipschitz or convex). In practice one typically discretizes $\mathsf{S}$ (regime/state aggregation) or uses function approximation; this paper does not claim a universal tractable method.

VIII.A.3. Constrained DRMDPs and Pareto fronts via budget sweeping

To encode multi-objective requirements, introduce multiple stage-cost components, e.g.
$c^{\mathrm{cost}}(s,a)$, $c^{\mathrm{CO2}}(s,a)$, $c^{\mathrm{risk}}(s,a)$ (where the last may represent a convex surrogate such as a hinge penalty for service deficits). A constrained DRMDP can impose discounted expected budgets under worst-case transitions. One common time-consistent route is Lagrangian scalarization.

Given multipliers $\lambda\ge 0$, define a scalar stage cost
\[
c_\lambda(s,a) := c^{\mathrm{cost}}(s,a) + \lambda_1 c^{\mathrm{CO2}}(s,a) + \lambda_2 c^{\mathrm{risk}}(s,a),
\]
and solve the scalar DRMDP for each $\lambda$. Sweeping $\lambda$ generates supported Pareto points (cf. Section IV) at the policy level.

Caution.
If one enforces hard constraints (e.g., chance/CVaR constraints) in a DRMDP, one must ensure the constraint set is itself time-consistent (typically by augmenting the state with remaining budget or by using a nested risk measure). Simply constraining an unconditional horizon-wide probability in a multistage setting can break dynamic programming.

VIII.B. Nested entropic distributionally robust dynamic programming (NE-DRDP)

This subsection records a time-consistent risk-sensitive alternative in which the performance criterion is a *nested* risk measure rather than an open-loop horizon functional.

VIII.B.1. Entropic risk and a Bellman-type recursion

For a random variable $Z$ and parameter $\theta>0$, the entropic risk functional is
\[
\rho_\theta(Z) := \frac{1}{\theta}\log \mathbb{E}[e^{\theta Z}].
\]
In a Markov setting with discount $\gamma\in(0,1)$, a canonical risk-sensitive recursion is
\[
V(s) = \inf_{a\in\mathsf{A}(s)} \left\{ c(s,a) + \gamma\, \rho_\theta\big(V(s’)\big)\right\},\qquad s’\sim P(\cdot\mid s,a).
\]
This criterion is time-consistent because it is built from one-step conditional risk mappings composed over time.

VIII.B.2. Entropic DRO via KL ambiguity (exponential-cone interface)

Entropic risk is closely related to relative-entropy (KL) robustness. Suppose that for each $(s,a)$ there is a nominal transition $\hat P(\cdot\mid s,a)$ and we consider a KL ball
\[
\mathcal{P}_{\mathrm{KL}}(s,a):=\{Q(\cdot\mid s,a): D_{\mathrm{KL}}(Q\|\hat P(\cdot\mid s,a))\le \rho(s,a)\}.
\]
Then an entropic-type robust expectation of a bounded function $V$ has a variational (dual) form of the template
\[
\sup_{Q\in\mathcal{P}_{\mathrm{KL}}(s,a)} \mathbb{E}_Q[V(s’)]
= \inf_{\lambda>0}\left\{ \lambda\,\rho(s,a) + \lambda\log \mathbb{E}_{\hat P}[\exp(V(s’)/\lambda)]\right\},
\]
whenever the log-moment generating function is finite. For discrete $\hat P$ (scenario trees), the term $\log\sum_i \hat p_i \exp(V_i/\lambda)$ can be modeled with exponential-cone constraints after epigraphization.

Caution.
1) The displayed dual identity is a known variational template; applying it requires verifying measurability and finiteness conditions.
2) While KL-based DRDPs are often computationally attractive on discrete scenario trees, they are *not* Wasserstein models; their statistical calibration and robustness interpretation differ from Section III.

VIII.B.3. Convergence/contraction claims (what is safe to state)

If $\gamma\in(0,1)$ and costs/value functions are bounded, Bellman-type operators based on (conditional) coherent risk measures are often nonexpansive in $\|\cdot\|_\infty$ up to the discount factor. For entropic mappings, additional regularity can be needed (e.g., boundedness to avoid infinite log-moment terms). Therefore, in this paper we treat convergence of value iteration for NE-DRDP as conditional on boundedness/discretization assumptions satisfied in the instantiated numerical model.

VIII.C. Martingale optimal transport (MOT) / martingale-calibration DRO

Forecast-driven operation often yields conditional moment information: given the information at time $t$, a forecasting system produces a conditional mean (and sometimes quantiles) for next disturbances or net injections. A challenge with trajectory-wise ambiguity sets is that the worst-case distribution may violate such forecast consistency unless explicitly constrained.

VIII.C.1. Forecast-consistent constraints via martingale conditions

Let $\mathbf{w}=(w_0,\ldots,w_{T-1})$ denote a disturbance process on a finite horizon, with filtration $\mathcal{F}_t:=\sigma(w_0,\ldots,w_t)$. A martingale-calibration constraint fixes conditional means:
\[
\mathbb{E}[w_{t+1}\mid \mathcal{F}_t] = m_t \quad\text{a.s.},
\]
where $m_t$ is a forecast (possibly itself random but $\mathcal{F}_t$-measurable). When imposed on an ambiguity set, such constraints limit the adversary to distributions that respect the operator’s conditional forecasts.

VIII.C.2. MOT ambiguity set on path space (interface)

Let $\hat{\mathbb{P}}$ be an empirical (scenario) distribution on paths $\mathbf{w}$. A generic MOT-style ambiguity set can be defined as
\[
\mathcal{P}_{\mathrm{MOT}} := \Big\{\mathbb{P}:\; W_p(\mathbb{P},\hat{\mathbb{P}})\le \kappa,\; \mathbb{E}_{\mathbb{P}}[w_{t+1}\mid \mathcal{F}_t]=m_t\ \text{a.s. for all }t\Big\},
\]
with an additional *causality* condition if one is optimizing over policies that adapt to $\mathcal{F}_t$. In a scenario-tree discretization, martingale constraints become linear equalities relating parent-node and child-node probabilities and disturbance values.

Computational implication (discrete trees).
If $\hat{\mathbb{P}}$ is supported on finitely many tree leaves and the decision rules are nonanticipative with respect to the tree, then inner maximization problems of the form $\sup_{\mathbb{P}\in\mathcal{P}_{\mathrm{MOT}}} \mathbb{E}_{\mathbb{P}}[\cdot]$ can reduce to linear programs in the node/leaf probabilities (plus transport/coupling variables when a Wasserstein constraint is present). This provides a plausible interface with multistage decomposition (next subsection), but the exact formulation must be specified carefully to ensure the causality and transport constraints are compatible.

Caution.
Martingale/forecast-consistency constraints are powerful but model-specific. They require a precise definition of what is being forecast (raw disturbances vs residuals vs conditional distributions) and can be invalid if the forecast is biased; thus they should be used as a tunable module rather than an assumed property.

VIII.D. Distributionally robust SDDP / multistage convex programming interface

Long-horizon IWS operation (seasonal storage management, aquifer banking, coordinated desal/reuse scheduling) is naturally multistage. When the per-stage physics and costs admit convex representations (LP/SOCP/power cone/exponential cone), stochastic dual dynamic programming (SDDP) is a standard decomposition tool. This subsection states a DR-SDDP interface at a high level.

VIII.D.1. Multistage convex program template

Consider a finite-horizon multistage problem with state $x_t$ (storages, quality proxies, battery SOC) and decisions $u_t$, with convex stage constraints
\[
(x_{t+1},y_t) \in \mathcal{C}_t(x_t,u_t,w_t),
\]
where $y_t$ are auxiliary variables (e.g., hydraulic heads/flows in conic form). Let $\ell_t(x_t,u_t,y_t,w_t)$ be a convex stage cost.

A distributionally robust multistage objective under *stagewise/conditional* ambiguity takes the form
\[
\inf_{\pi}\ \sup_{\{\mathbb{P}_t(\cdot\mid\mathcal{F}_t)\in\mathcal{P}_t\}}\ \mathbb{E}\Big[\sum_{t=0}^{T-1} \ell_t(\cdot)\Big],
\]
where $\pi$ is a nonanticipative policy and $\mathcal{P}_t$ constrains the conditional law of $w_{t+1}$ given $\mathcal{F}_t$. Under rectangularity assumptions (ambiguity decomposes by stage and node), dynamic programming recursions exist and are time-consistent.

VIII.D.2. Stage subproblems and dualization (why DRO can remain tractable)

In SDDP, one approximates the cost-to-go $V_{t+1}(x_{t+1})$ by a maximum of affine cuts. At stage $t$, given $x_t$ and a scenario/node, one solves a convex subproblem of the form
\[
\min_{u_t,y_t,x_{t+1}}\ \ell_t(x_t,u_t,y_t,w_t) + \sup_{\mathbb{P}\in\mathcal{P}_t} \mathbb{E}_{\mathbb{P}}[\widehat V_{t+1}(x_{t+1})]
\]
subject to $(x_{t+1},y_t)\in\mathcal{C}_t(x_t,u_t,w_t)$.

If $\widehat V_{t+1}$ is piecewise affine (a max of affine cuts) and the dependence on $w_t$ is affine inside $\mathcal{C}_t$, then the inner worst-case expectation often becomes a convex program that can be dualized (similarly in spirit to Section V), yielding an LP/SOCP/exponential-cone representable stage problem. The exact cone depends on the ambiguity model (Wasserstein vs KL vs moment-based) and on how the cuts are represented.

VIII.D.3. Multi-objective studies in multistage settings

The multi-objective constructions of Section IV extend naturally to DR-SDDP:

1) Weighted sum: define a scalar stage cost $\ell_{t,\lambda} = \sum_i \lambda_i \ell_t^{(i)}$ and run DR-SDDP for each $\lambda$ to obtain supported multistage Pareto policies.

2) Budget sweep: introduce state augmentation for remaining budget (e.g., remaining emissions allowance) and treat feasibility of the augmented state as a hard constraint; alternatively, solve a Lagrangian family and post-process to recover tradeoffs.

Caution.
Any claim of time consistency hinges on using conditional/rectangular ambiguity (or nested risk). Trajectory-wise Wasserstein balls on the full path law, when used with adaptive decisions, generally require a causal/nested OT formalism; otherwise dynamic programming does not apply.

Summary.
Section VIII described time-consistent extensions beyond standard open-loop MPC: rectangular DRMDPs with Wasserstein ambiguity on transition kernels, nested entropic/KL-based dynamic programming interfaces, martingale/forecast-calibrated ambiguity for temporal coherence, and a DR-SDDP template for long-horizon multistage convex IWS models. The next sections turn to scalability and high-dimensional uncertainty approximations that are particularly relevant when $w_t$ collects many spatially distributed demands and renewables.

IX. Scalability and High-Dimensional Uncertainty

Wasserstein DRO formulations based on an empirical distribution with $N$ support points often scale at least linearly in $N$ per robustified constraint, and trajectory-wise ambiguity increases the dimension from $n_w$ to $H n_w$. In renewable-powered integrated water systems, $w_t$ may include spatial demand fields, distributed renewable availability, and multi-site inflows, so $n_w$ can be large. This section collects three scalability layers that can be used without changing the physical modeling layer (Sections II and VI): (i) kernel-based mean-embedding ambiguity (MMD-DRO), (ii) sliced/Max-sliced Wasserstein approximations, and (iii) optimal-transport quantization (coresets) to reduce $N$.

The purpose of this section is *not* to claim uniform superiority of these alternatives; rather, it provides mathematically explicit ambiguity definitions and conservative inclusion relations so that any numerical comparison can be interpreted rigorously.

IX.A. Kernel-based MMD-DRO (mean-embedding ambiguity)

IX.A.1. RKHS mean embedding and MMD balls

Let $(\Xi,\mathcal{B})$ be a measurable space (typically $\Xi\subseteq\mathbb{R}^d$). Let $k:\Xi\times\Xi\to\mathbb{R}$ be a positive definite kernel with associated reproducing kernel Hilbert space (RKHS) $\mathcal{H}_k$ and feature map $\varphi(\xi):=k(\xi,\cdot)\in\mathcal{H}_k$. For any probability measure $\mathbb{P}$ such that $\mathbb{E}_{\mathbb{P}}[\sqrt{k(\xi,\xi)}]<\infty$, define its kernel mean embedding \[ \mu_{\mathbb{P}} := \mathbb{E}_{\xi\sim\mathbb{P}}[\varphi(\xi)]\in\mathcal{H}_k. \] Given $\mathbb{P},\mathbb{Q}$, the maximum mean discrepancy (MMD) is \[ \mathrm{MMD}_k(\mathbb{P},\mathbb{Q}) := \|\mu_{\mathbb{P}}-\mu_{\mathbb{Q}}\|_{\mathcal{H}_k}. \] With empirical center $\hat{\mathbb{P}}_N=\tfrac{1}{N}\sum_{i=1}^N\delta_{\hat\xi^i}$, define the MMD ambiguity set \[ \mathcal{B}_{\mathrm{MMD}}(\hat{\mathbb{P}}_N,\varepsilon) := \{\mathbb{P}:\ \mathrm{MMD}_k(\mathbb{P},\hat{\mathbb{P}}_N)\le \varepsilon\}. \] Remark 9.1 (Why this can help in high dimension). The geometry of $\mathrm{MMD}_k$ depends on the chosen kernel rather than on Euclidean transport in $\mathbb{R}^d$. For some kernels, MMD can remain statistically meaningful in moderate-to-high dimension where Wasserstein distances are sample-hungry. However, MMD robustness is with respect to matching *features* induced by $k$; it is not a transport-based robustness notion. IX.A.2. Worst-case expectation for RKHS test functions A key property is that robust expectations over MMD balls are explicit for functions that lie in the RKHS. Proposition 9.2 (Closed form for RKHS objectives). Let $f\in\mathcal{H}_k$. Then \[ \sup_{\mathbb{P}\in\mathcal{B}_{\mathrm{MMD}}(\hat{\mathbb{P}}_N,\varepsilon)} \mathbb{E}_{\mathbb{P}}[f(\xi)] = \mathbb{E}_{\hat{\mathbb{P}}_N}[f(\xi)] + \varepsilon\,\|f\|_{\mathcal{H}_k}. \] Proof. By the reproducing property, $\mathbb{E}_{\mathbb{P}}[f]=\langle f,\mu_{\mathbb{P}}\rangle$. Hence $\mathbb{E}_{\mathbb{P}}[f]-\mathbb{E}_{\hat{\mathbb{P}}_N}[f]=\langle f,\mu_{\mathbb{P}}-\mu_{\hat{\mathbb{P}}_N}\rangle\le \|f\|\,\|\mu_{\mathbb{P}}-\mu_{\hat{\mathbb{P}}_N}\|$, with equality attained by moving the mean embedding in the direction of $f$ up to radius $\varepsilon$. Implementation implication. If the uncertain part of a stage cost or constraint surrogate is approximated by an RKHS function (e.g., via kernel ridge regression), then the MMD-robustification reduces to a deterministic offset $+\varepsilon\|f\|_{\mathcal{H}_k}$ and a sample-average term, avoiding an $N$-by-constraint transport dual. IX.A.3. Constraint surrogates and kernel ridge regression (KRR) interface In DRO-MPC, constraint functions $g(z,\xi)$ are usually not elements of a pre-chosen RKHS. A computational interface is: 1) Choose a scalar constraint surrogate $\tilde g(z,\cdot)\in\mathcal{H}_k$ that approximates $g(z,\cdot)$ on $\Xi$. For fixed $z$, one can fit $\tilde g$ from samples $(\hat\xi^i,g(z,\hat\xi^i))$ via KRR. 2) Enforce MMD-robust safety on the surrogate, e.g., require \[ \sup_{\mathbb{P}\in\mathcal{B}_{\mathrm{MMD}}} \mathbb{E}_{\mathbb{P}}[(\tilde g(z,\xi))_+] \le \tau, \] or analogously with a CVaR/hinge epigraph. Caution. Any guarantee then depends on an approximation bound between $g$ and $\tilde g$. Without an explicit, validated bound (which is application-specific), MMD-DRO should be interpreted as a scalable *heuristic robustness layer* rather than a replacement for Wasserstein certificates. IX.B. Sliced and Max-sliced Wasserstein DRO IX.B.1. Definitions Let $\Xi\subseteq\mathbb{R}^d$. For a unit vector $\theta\in\mathbb{S}^{d-1}$, define the projection $\Pi_\theta(\xi)=\theta^\top\xi\in\mathbb{R}$. Let $\mathbb{P}_\theta$ denote the pushforward distribution of $\mathbb{P}$ under $\Pi_\theta$. For $p\in[1,\infty)$, the sliced Wasserstein distance is \[ \mathrm{SW}_p(\mathbb{P},\mathbb{Q}) := \left(\int_{\mathbb{S}^{d-1}} W_p(\mathbb{P}_\theta,\mathbb{Q}_\theta)^p\,d\sigma(\theta)\right)^{1/p}, \] where $\sigma$ is the uniform measure on the sphere. A computable approximation samples finitely many directions $\Theta=\{\theta_1,\ldots,\theta_L\}$ and uses \[ \widehat{\mathrm{SW}}_{p,L}(\mathbb{P},\mathbb{Q}) := \left(\frac{1}{L}\sum_{\ell=1}^L W_p(\mathbb{P}_{\theta_\ell},\mathbb{Q}_{\theta_\ell})^p\right)^{1/p}. \] The Max-sliced Wasserstein distance (in sampled form) is \[ \mathrm{MaxSW}_{p,L}(\mathbb{P},\mathbb{Q}) := \max_{\ell\in\{1,\ldots,L\}} W_p(\mathbb{P}_{\theta_\ell},\mathbb{Q}_{\theta_\ell}). \] Given an empirical center $\hat{\mathbb{P}}_N$, one can define ambiguity sets $\{\mathbb{P}:\widehat{\mathrm{SW}}_{p,L}(\mathbb{P},\hat{\mathbb{P}}_N)\le \kappa\}$ or $\{\mathbb{P}:\mathrm{MaxSW}_{p,L}(\mathbb{P},\hat{\mathbb{P}}_N)\le \kappa\}$. Remark 9.3 (Why slicing can reduce complexity). In one dimension, Wasserstein distances between empirical distributions can be computed by sorting projected samples and comparing quantiles. Thus, for fixed directions, sliced-Wasserstein ambiguity leverages 1D OT structure and may reduce the cost of ambiguity computations in high $d$. However, the induced ambiguity set is different from the full $d$-dimensional Wasserstein ball and will generally be neither a subset nor a superset without additional assumptions. IX.B.2. Conservative use in MPC: projection-wise robustification A conservative pattern that preserves interpretability is to robustify only *selected linear functionals* of $\xi$, corresponding to operationally meaningful projections. Example 9.4 (Operational projections). For $\xi=(d,r,i,\ldots)$, projections can include: total demand $\mathbf{1}^\top d$, net renewable shortfall $\mathbf{1}^\top(P^{\mathrm{load}}-r)_+$ approximations, or region-aggregated inflows. If service constraints depend on these aggregates approximately linearly, then enforcing WDRO on the 1D projections can be substantially cheaper than WDRO on the full vector. Caution. Projection-wise robustness does not protect against adversarial shifts in directions not represented in $\Theta$. Therefore, any use of sliced or Max-sliced sets should justify why the chosen projections capture the dominant uncertainty directions for the constraint functions of interest. IX.C. Optimal-transport quantization and coreset construction When using Wasserstein balls $\mathcal{B}_{W_p}(\hat{\mathbb{P}}_N,\kappa)$, the dominant scaling driver in many dual reformulations is the number $N$ of support points in $\hat{\mathbb{P}}_N$. A generic acceleration is to approximate $\hat{\mathbb{P}}_N$ by a reduced-support empirical distribution $\hat{\mathbb{P}}_M$ with $M\ll N$ (a coreset), and adjust the ambiguity radius using a certified Wasserstein bound between $\hat{\mathbb{P}}_N$ and $\hat{\mathbb{P}}_M$. IX.C.1. Quantized empirical measures Let $\hat\xi^1,\ldots,\hat\xi^N$ be samples. Choose representative points $\tilde\xi^1,\ldots,\tilde\xi^M$ and weights $\tilde p_j\ge 0$ with $\sum_j \tilde p_j=1$, yielding \[ \hat{\mathbb{P}}_M := \sum_{j=1}^M \tilde p_j\,\delta_{\tilde\xi^j}. \] A common construction is clustering (e.g., $k$-means for $p=2$) and setting $\tilde\xi^j$ to centroids and $\tilde p_j$ to cluster proportions. The relevant certificate is a bound \[ W_p(\hat{\mathbb{P}}_N,\hat{\mathbb{P}}_M) \le \delta. \] Computing the exact $W_p$ between two discrete measures is an optimal transport problem; in practice $\delta$ may be estimated by solving this OT problem once offline, or upper bounded via a chosen assignment plan from samples to centroids. IX.C.2. Radius adjustment via triangle inequality (outer and inner approximations) The following inclusion relations are purely metric and require no stochastic assumptions. Proposition 9.5 (Outer approximation by radius inflation). If $W_p(\hat{\mathbb{P}}_N,\hat{\mathbb{P}}_M)\le \delta$, then \[ \mathcal{B}_{W_p}(\hat{\mathbb{P}}_N,\kappa) \subseteq \mathcal{B}_{W_p}(\hat{\mathbb{P}}_M,\kappa+\delta). \] Proof. If $W_p(\mathbb{P},\hat{\mathbb{P}}_N)\le\kappa$, then by the triangle inequality $W_p(\mathbb{P},\hat{\mathbb{P}}_M)\le W_p(\mathbb{P},\hat{\mathbb{P}}_N)+W_p(\hat{\mathbb{P}}_N,\hat{\mathbb{P}}_M)\le \kappa+\delta$. Proposition 9.6 (Inner approximation by radius deflation). If $W_p(\hat{\mathbb{P}}_N,\hat{\mathbb{P}}_M)\le \delta$ and $\kappa\ge \delta$, then \[ \mathcal{B}_{W_p}(\hat{\mathbb{P}}_M,\kappa-\delta) \subseteq \mathcal{B}_{W_p}(\hat{\mathbb{P}}_N,\kappa). \] Proof. If $W_p(\mathbb{P},\hat{\mathbb{P}}_M)\le \kappa-\delta$, then $W_p(\mathbb{P},\hat{\mathbb{P}}_N)\le W_p(\mathbb{P},\hat{\mathbb{P}}_M)+W_p(\hat{\mathbb{P}}_M,\hat{\mathbb{P}}_N)\le (\kappa-\delta)+\delta=\kappa$. Interpretation for MPC. - Solving WDRO-MPC with center $\hat{\mathbb{P}}_M$ and inflated radius $\kappa'\!:=\kappa+\delta$ yields a conservative feasible set relative to using $(\hat{\mathbb{P}}_N,\kappa)$. - Solving with deflated radius $\kappa''\!:=\max\{\kappa-\delta,0\}$ yields an inner approximation: any solution feasible for $(\hat{\mathbb{P}}_M,\kappa'')$ is feasible for $(\hat{\mathbb{P}}_N,\kappa)$. Thus the pair $(\kappa'',\kappa')$ provides a bracketing mechanism for feasibility and performance when using coresets. IX.C.3. Nodewise/stagewise coresets and scenario trees In stagewise WDRO-MPC, one can quantize each stagewise empirical law $\hat{\mathbb{P}}_{t+\tau,N}$ separately to $\hat{\mathbb{P}}_{t+\tau,M}$, yielding stagewise bounds $\delta_{t+\tau}$ and per-stage radius adjustments. In multistage tree-based models (Section VIII.D), one may quantize nodewise conditional empirical distributions (at each node/$(s,a)$ bin) to keep the number of child scenarios per node bounded. The same triangle-inequality inclusions apply locally at each node, but any global guarantee then depends on how local ambiguity sets compose (rectangularity/time consistency). Summary. Section IX presented three scalability layers for WDRO-MPC in high-dimensional uncertainty settings: (i) MMD ambiguity sets, which yield closed-form robustification for RKHS functions; (ii) sliced/Max-sliced Wasserstein distances, which exploit 1D OT structure through projections; and (iii) OT-based coreset quantization, which reduces sample support size while providing rigorous Wasserstein-ball inclusions via radius inflation/deflation. The next section compares alternative reliability certification layers, including scenario optimization and moment/KL-based DRO, to clarify when Wasserstein-based robustness is preferable. X. Alternative Reliability Certificates (Comparative/Hybrid Layer) Wasserstein DRO provides a geometric, transport-based robustness model and (under structural assumptions) leads to tractable conic reformulations (Sections V--VI). However, it is not the only mathematically principled way to certify reliability. This section collects alternative and hybrid certification layers that can be swapped into the multi-objective MPC templates of Section IV. The goal is comparative clarity: 1) What reliability notion is being certified (distribution-free vs ambiguity-set-based vs parametric tail modeling)? 2) What assumptions are required (i.i.d./exchangeability, boundedness, moment existence, etc.)? 3) What tractable optimization problem results (LP/SOCP/exponential-cone/SDP; convex vs mixed-integer)? X.A. Scenario optimization and scenario MPC X.A.1. Scenario programs and a standard finite-sample violation bound Consider a convex constraint family parameterized by uncertainty $\xi\in\Xi$: \[ g(z,\xi) \le 0, \] where $z$ collects decision variables (e.g., an open-loop MPC plan plus epigraph variables). In the scenario approach one draws $N$ i.i.d. samples $\xi^1,\ldots,\xi^N$ from the (unknown) data-generating distribution $\mathbb{P}_\star$ and solves the *scenario program* \[ \min_z\ \phi(z)\quad\text{s.t.}\quad g(z,\xi^i)\le 0,\ i=1,\ldots,N, \] where $\phi$ is a convex objective. Define the (true) violation probability of any candidate $z$ as \[ V(z) := \mathbb{P}_\star\big(g(z,\xi)>0\big).
\]

The scenario approach provides distribution-free bounds on $V(z_N)$, where $z_N$ is an optimizer of the scenario program, under assumptions that ensure the solution depends on the sample through a limited number of *support constraints*.

Theorem 10.1 (Classical scenario bound; conditional statement).
Assume:

(i) the scenario problem is convex and feasible for almost every sample draw;

(ii) an optimal solution $z_N$ exists and is unique (or a measurable selection rule is used);

(iii) the number of support constraints of the optimizer is almost surely at most $d$, where $d$ is an appropriate notion of effective dimension (often bounded by the number of scalar decision variables).

Then for any target violation level $\varepsilon\in(0,1)$,
\[
\mathbb{P}_\star^N\big( V(z_N) > \varepsilon \big)
\le \sum_{i=0}^{d-1} {N\choose i}\, \varepsilon^i(1-\varepsilon)^{N-i}.
\]
Here $\mathbb{P}_\star^N$ denotes the probability over the sampling of $\xi^1,\ldots,\xi^N$.

Discussion.
1) The bound is *distribution-free*: it does not assume bounded support, Lipschitzness, or a parametric family.
2) The assumptions are nontrivial in MPC: uniqueness/measurability and identifying the correct support-rank parameter $d$ must be checked for the instantiated conic program (and can change under mixed-integer decisions).
3) The guarantee is *one-shot*: it applies to the single solution $z_N$ computed from one dataset.

X.A.2. Scenario MPC and time aggregation of risk

In receding-horizon control, one repeatedly solves scenario programs at each time step. Even if each step has a certified bound $V(z_t)\le \varepsilon$ with high confidence, an overall statement about constraint violation over $T$ closed-loop steps requires additional accounting.

A conservative and auditable approach is a union bound across time:
\[
\mathbb{P}_\star\Big(\exists t\in\{0,\ldots,T-1\}: g(z_t,\xi_t)>0\Big)
\le \sum_{t=0}^{T-1} \mathbb{P}_\star\big(g(z_t,\xi_t)>0\big),
\]
so allocating per-step risk budgets (possibly time-varying) yields a transparent bound. This bound can be loose, but it makes explicit what is being certified.

Practical interface with Sections IV–VII.
Scenario constraints can be used either (i) as hard constraints enforced for sampled trajectories $\hat{\mathbf{w}}_t^i$ in an MPC horizon, or (ii) as inner approximations inside a tube-MPC tightening, where $\xi$ represents forecast errors entering service constraints.

X.B. Moment-based certificates and unimodality enhancement

Moment-based reliability certificates replace a distributional ambiguity set (Section III) by constraints on low-order moments. They are attractive when one trusts mean/covariance estimates but not the full empirical distribution, or when one needs a very compact representation of uncertainty.

X.B.1. One-sided Chebyshev/Cantelli bound and SOC margins

Let $Y$ be a scalar random variable with mean $\mu$ and variance $\sigma^2$. Cantelli’s inequality states that for any $t>0$,
\[
\mathbb{P}(Y-\mu \ge t) \le \frac{\sigma^2}{\sigma^2+t^2}.
\]
Thus to ensure $\mathbb{P}(Y\le 0)\ge 1-\varepsilon$, a sufficient condition is
\[
\mu + \sqrt{\frac{1-\varepsilon}{\varepsilon}}\,\sigma \le 0.
\]

In control applications, one often has $Y=a^\top\xi + b$ (affine in uncertainty). If $\mathbb{E}[\xi]=m$ and $\mathrm{Cov}(\xi)=\Sigma$, then
\[
\mu = a^\top m + b,\qquad \sigma = \sqrt{a^\top\Sigma a}.
\]
Consequently, a *moment-based chance constraint surrogate* becomes
\[
a^\top m + b + \sqrt{\frac{1-\varepsilon}{\varepsilon}}\,\sqrt{a^\top\Sigma a} \le 0,
\]
which is second-order-cone representable in $(a,b)$ when $\Sigma$ is fixed, and remains conic-representable in many cases when $a$ depends affinely on the decision vector $z$.

Caution.
1) This certificate is conservative and can be very loose for heavy-tailed distributions.
2) It requires finite second moments.
3) If $m$ and $\Sigma$ are estimated, one should account for estimation uncertainty (e.g., confidence sets $m\in\mathcal{M}$, $\Sigma\in\mathcal{S}$), which typically yields robust SOC/SDP constraints; the exact tractable form depends on how $\mathcal{M},\mathcal{S}$ are defined.

X.B.2. Unimodality enhancement (optional and assumption-sensitive)

For scalar unimodal distributions, variance-based tail bounds can be tightened relative to Chebyshev. One classical example is the Vysochanski\u012d–Petunin inequality: if $Y$ is unimodal with mean $\mu$ and standard deviation $\sigma$, then for $k>\sqrt{8/3}$,
\[
\mathbb{P}(|Y-\mu|\ge k\sigma) \le \frac{4}{9k^2}.
\]
This can, in principle, reduce the required safety margin for unimodal uncertainties.

Caution.
1) Unimodality is a strong structural assumption that may fail for multimodal weather-driven uncertainties.
2) Extending unimodality-based improvements to multivariate constraints (e.g., joint demand vectors) is nontrivial; a safe practice is to apply unimodality improvements only to carefully chosen scalar aggregates (such as total demand or net deficit indices).

X.C. Extreme-value / catastrophic tail-risk module (interface)

Wasserstein and moment-based models are often tuned to typical deviations; they may not represent rare compound extremes (e.g., multi-week inflow deficit coinciding with renewable drought) unless the ambiguity radius is made very large. Extreme value theory (EVT) provides a different modeling objective: characterize *tail quantiles* and exceedance probabilities.

X.C.1. Peaks-over-threshold modeling interface

Let $Z$ be a scalar deficit metric (e.g., daily shortage, maximum pressure violation, or a weighted sum of service deficits). Fix a high threshold $u$, and define exceedances $X:=Z-u$ conditional on $Z>u$. A common EVT approximation postulates that for sufficiently high $u$, the conditional distribution of $X$ is close to a generalized Pareto distribution (GPD) with parameters $(\sigma,\xi)$ (scale and tail index).

How this can enter MPC.
1) Choose a scalar tail metric $Z_t$ at each stage (or over a horizon), derived from the physical variables of Section II (e.g., $Z_t=\mathrm{short}_t$ or $Z_t=\max_j g_j(z_t,\xi_t)$).
2) Fit/track tail parameters online on exceedance data.
3) Impose design constraints on estimated high quantiles (e.g., a $1$-in-$M$ event level) or penalize them in the objective.

Caution.
This layer is inherently parametric and data-hungry in the tail. Any rigorous guarantee requires carefully specified statistical assumptions (mixing, stationarity or controlled nonstationarity, threshold selection) and must address estimation uncertainty; we therefore treat EVT as an exploratory, falsifiable module rather than a baseline certificate.

X.C.2. Comparison against WDRO and conformal calibration

– WDRO (Sections III–V) gives a worst-case expectation or CVaR over a Wasserstein ball centered at observed data; it can be conservative in tails unless the radius is large.
– Conformal prediction (Section VII.D) gives marginal coverage for prediction sets but does not directly provide extreme quantiles.
– EVT aims directly at tail quantiles but relies on stronger modeling assumptions.

A practical hybrid is to use WDRO/CVaR for routine reliability and an EVT-driven supplementary objective or constraint for catastrophic events, with clear reporting that the EVT component is model-based.

X.D. KL-DRO / entropic-risk alternative robustness layer

KL (relative-entropy) ambiguity provides a robustness notion different from Wasserstein: rather than transporting mass in $\Xi$, it reweights likelihoods of observed samples. This yields smooth dual forms (log-sum-exp type) that are attractive computationally, especially on discrete scenario trees.

X.D.1. KL ball around a discrete empirical distribution

Let $\hat{\mathbb{P}}_N$ be the empirical distribution on samples $\hat\xi^i$. Consider distributions $\mathbb{Q}$ supported on the same points with weights $q\in\Delta_N$ (probability simplex). The KL divergence from $q$ to the uniform empirical weights $\hat p_i=1/N$ is
\[
D_{\mathrm{KL}}(q\|\hat p) = \sum_{i=1}^N q_i \log\Big(\frac{q_i}{1/N}\Big).
\]
A KL ambiguity set is $\{q\in\Delta_N: D_{\mathrm{KL}}(q\|\hat p)\le \rho\}$.

X.D.2. Robust expectation dual (variational form)

Given losses $\ell_i := \ell(z,\hat\xi^i)$, consider
\[
\sup_{q\in\Delta_N:\ D_{\mathrm{KL}}(q\|\hat p)\le \rho}\ \sum_{i=1}^N q_i\,\ell_i.
\]
A standard variational representation yields the convex dual form
\[
\sup_{q:\ D_{\mathrm{KL}}(q\|\hat p)\le \rho} \sum_i q_i\ell_i
= \inf_{\lambda>0}\left\{ \lambda\rho + \lambda\log\Big(\frac{1}{N}\sum_{i=1}^N \exp(\ell_i/\lambda)\Big) \right\},
\]
provided the exponential terms are finite.

Implementation notes.
1) For fixed $\lambda$, the log-sum-exp term is convex in $\ell$. Thus one can either (i) solve a 1D outer minimization over $\lambda$ (e.g., bisection or line search) with an inner convex program, or (ii) model the joint minimization in a conic form if the modeling stack supports exponential cones and the chosen parameterization preserves convexity.
2) KL-DRO is particularly natural in scenario-tree DRDP/SDDP settings (Section VIII) because the ambiguity can be imposed nodewise as a constraint on child probabilities.

X.D.3. Interpretation as stress reweighting

In the discrete case, the adversary chooses weights $q_i$ that emphasize samples with larger losses $\ell_i$, subject to a KL budget $\rho$. The dual variable $\lambda$ controls the aggressiveness of reweighting: small $\lambda$ produces heavier emphasis on worst samples, while large $\lambda$ approaches the empirical mean.

Summary.
Section X compared several reliability certification layers and how they plug into multi-objective MPC: scenario optimization yields distribution-free finite-sample guarantees under convexity/support-rank assumptions; moment-based certificates provide compact SOC/SDP constraints when mean/covariance are trusted; EVT targets catastrophic tails but is model-based; and KL-DRO provides smooth log-sum-exp robustification with a stress-reweighting interpretation and strong compatibility with multistage scenario trees.

XI. Decentralization, Strategic Interaction, and Data Governance (Extensions)

Large integrated water systems are frequently operated by multiple agencies (districts, utilities, irrigation operators) that share physical couplings (pipes, pumps, reservoirs, aquifers) and informational couplings (shared uncertainty models and measurements). This section records modular optimization/game-theoretic interfaces that can be combined with the WDRO-MPC building blocks in Sections IV–V and the conic physics layers in Section VI. The aim is to define mathematically well-posed extension templates; no global convergence or out-of-sample claims are asserted without explicit assumptions.

XI.A. Multi-district decentralized operation under shared aquifers

XI.A.1. Coupled constraints and local decisions

Suppose there are M operators (agents) indexed by i\in\{1,\ldots,M\}. Agent i controls local variables z_i (e.g., local pumping, treatment, storage actions over a horizon) and has local constraints z_i\in\mathcal{Z}_i. Agents are coupled through shared physical constraints, for example a common aquifer head proxy, shared conveyance capacity, or a global power/renewable constraint. Abstractly write coupling as
\[
\sum_{i=1}^M A_i z_i = b,\qquad z_i\in\mathcal{Z}_i.
\]
Each agent has a (possibly WDRO) objective J_i(z_i,z_{-i}) that can include a local cost/emissions/risk scalarization; the dependence on z_{-i} can enter only via the coupling constraints or via shared prices (e.g., a locational marginal electricity price or a scarcity price on aquifer extraction).

XI.A.2. Distributionally robust generalized Nash equilibrium (DR-GNE) template

A generalized Nash equilibrium (GNE) is a joint decision z=(z_1,\ldots,z_M) such that each z_i solves its own optimization given others, subject to shared feasibility.

Definition 11.1 (DR-GNE; high-level definition).
Let \mathcal{Z}(z_{-i}) denote the feasible set for agent i given the other agents’ decisions (including shared constraints). A vector z^\star is a DR-GNE if for each i,
\[
z_i^\star \in \arg\min_{z_i\in\mathcal{Z}(z_{-i}^\star)}\; J_i(z_i,z_{-i}^\star)
\]
where each J_i may contain WDRO terms such as $\sup_{\mathbb{P}\in\mathcal{B}_{W_p}(\hat{\mathbb{P}},\kappa)}\mathbb{E}_{\mathbb{P}}[\ell_i(\cdot)]$ and/or robust CVaR constraints as in Section V.

Remarks.
1) Existence/uniqueness of DR-GNE generally requires monotonicity/convexity assumptions (e.g., each agent objective convex in z_i and constraint sets convex with a suitable constraint qualification). In mixed-integer or nonconvex settings, DR-GNE computation becomes significantly more delicate.
2) If all agents share a single convex social objective (a potential game), then equilibria correspond to minimizers of a centralized convex program; otherwise a variational inequality (VI) characterization is typical.

XI.A.3. Variational inequality formulation (convex setting)

Assume each agent objective is differentiable in its own decision variable and convex in z_i for fixed z_{-i}. Define the feasible set of the shared-coupling game
\[
\mathcal{K}:=\{z: z_i\in\mathcal{Z}_i\ \forall i,\ \sum_i A_i z_i=b\}.
\]
Define the pseudo-gradient mapping F(z):=\big(\nabla_{z_1}J_1(z),\ldots,\nabla_{z_M}J_M(z)\big). A standard characterization (under suitable constraint qualifications) is that z^\star is a (variational) GNE if it solves the VI
\[
\text{find }z^\star\in\mathcal{K}\text{ such that }\langle F(z^\star), z-z^\star\rangle \ge 0\quad \forall z\in\mathcal{K}.
\]
When WDRO terms are nonsmooth (hinge/CVaR), one replaces gradients by subgradients and interprets the VI in a monotone-operator sense.

XI.A.4. ADMM-style decomposition for a centralized WDRO program (cooperative case)

In a cooperative setting one may solve a centralized WDRO-MPC problem by decomposing over agents using consensus and ADMM. Consider a convex program
\[
\min_{z_1,\ldots,z_M}\ \sum_{i=1}^M \Phi_i(z_i)\quad\text{s.t.}\quad \sum_i A_i z_i=b,\ z_i\in\mathcal{Z}_i,
\]
where each $\Phi_i$ may include conic WDRO epigraph constraints (Sections V–VI). Introducing a dual variable y for the coupling constraint yields the augmented Lagrangian
\[
\mathcal{L}_\rho(z,y)=\sum_i \Phi_i(z_i) + y^\top\left(\sum_i A_i z_i-b\right) + \frac{\rho}{2}\left\|\sum_i A_i z_i-b\right\|_2^2.
\]
Standard ADMM updates alternate between solving (parallel) proximal subproblems for each agent and updating y. Any convergence claim requires verifying convexity, closedness, and that the subproblems are solved exactly (or inexactly under a summability condition); these requirements are model-specific once MI-conic components appear.

XI.B. Coalition formation and distributionally robust core allocations

When agents can cooperate partially (e.g., sharing desalination capacity or storage), cooperative game models can represent the value of coalition formation.

XI.B.1. Cost game and the core (deterministic template)

Let $\mathcal{M}=\{1,\ldots,M\}$. For each coalition $S\subseteq\mathcal{M}$, define the coalition value (cost) as the optimal objective value of the coalition-operated WDRO-MPC (or planning) problem:
\[
v(S):=\inf_{\{z_i\}_{i\in S}}\ \sum_{i\in S}\Phi_i(z_i)\quad\text{s.t. shared constraints restricted to }S.
\]
The core is the set of allocations $\alpha\in\mathbb{R}^M$ such that $\sum_{i\in\mathcal{M}}\alpha_i=v(\mathcal{M})$ and $\sum_{i\in S}\alpha_i\le v(S)$ for all $S$. Interpreting v(S) as a WDRO objective is mathematically straightforward; interpreting v(S) as a real-world “guaranteed” cost requires that the coalition model and ambiguity calibration be justified externally.

XI.B.2. Shapley value as a descriptive allocation (not a guarantee)

One can compute the Shapley value associated with v(\cdot) to distribute the grand-coalition cost by marginal contributions. This is an equitable descriptive statistic but does not in general produce a core allocation; when it does, that must be checked for the instantiated v.

XI.B.3. VI/dual interpretations

For convex cooperative problems with coupling constraints, dual variables associated with shared resources (e.g., aquifer extraction limits) can be interpreted as shadow prices. Such prices can serve as a mechanism for coalition settlement, but this is a modeling choice; there is no universal theorem that dual prices yield stable allocations unless additional structure (e.g., linearity and transferable utility) is present.

XI.C. Distributionally robust bilevel optimization (DRBO) for planning + operations

Capacity planning (pipes, storage, desalination capacity) is naturally upper-level, while operation is lower-level and can be WDRO-MPC.

XI.C.1. Bilevel template

Let y denote design variables (capacities) in a set $\mathcal{Y}$ and z denote operational variables (possibly a receding-horizon policy proxy) in $\mathcal{Z}(y)$. A bilevel model is
\[
\begin{aligned}
\min_{y\in\mathcal{Y},\ z}\quad & C_{\mathrm{cap}}(y) + C_{\mathrm{op}}(z)\\
\text{s.t.}\quad & z \in \arg\min_{\tilde z\in\mathcal{Z}(y)}\ \Big\{\text{WDRO-OpCost}(\tilde z;y)\Big\}.
\end{aligned}
\]
Here WDRO-OpCost may be a multi-objective scalarization (Section IV) with Wasserstein-robust expectations and robust CVaR constraints (Section V) combined with conic physics (Section VI).

XI.C.2. Single-level reduction caveat

A common computational approach replaces the lower-level problem by its KKT conditions (or by strong duality) to obtain a single-level mathematical program. This is valid only when the lower-level problem is convex, satisfies constraint qualifications, and is free of integer variables. If the operational problem is MI-conic (due to on/off treatment/desalination modes), then KKT-based reductions are not valid without additional disjunctive modeling; the resulting single-level problem is typically nonconvex.

XI.D. Adjustable robust / two-stage DRO-ARO with column-and-constraint generation

A middle ground between bilevel and MPC is a two-stage model: first-stage decisions (e.g., capacities or day-ahead schedules) are chosen before uncertainty is revealed; second-stage recourse decisions adapt after observing $\xi$.

XI.D.1. Two-stage DRO-ARO template

Let y be first-stage and let x(\xi) be a recourse decision rule. For a loss L(y,x(\xi),\xi), a two-stage WDRO problem is
\[
\min_{y\in\mathcal{Y}}\ \sup_{\mathbb{P}\in\mathcal{B}}\ \mathbb{E}_{\mathbb{P}}\Big[\inf_{x\in\mathcal{X}(y,\xi)} L(y,x,\xi)\Big],
\]
where $\mathcal{X}(y,\xi)$ encodes conic feasibility (mass/energy/hydraulics) with possibly uncertain right-hand sides. Even when the second-stage is convex, interchanging $\inf$ and $\sup$ requires justification; in practice, one often adopts tractable conservative approximations such as restricting x(\xi) to affine decision rules.

XI.D.2. Column-and-constraint generation (CCG) interface

When a tractable separation oracle exists for the worst-case distribution or scenario (e.g., for certain WDRO duals), one can iteratively add “most-violated” constraints/cuts. This is algorithmic: correctness depends on strong duality of the inner WDRO representation and finite termination is not guaranteed absent polyhedral structure. We therefore treat CCG as an implementation tool rather than a theorem.

XI.E. Differential privacy in federated WDRO (data governance)

When data are distributed across agencies, sharing raw samples $\hat\xi^i$ can be infeasible. Differential privacy (DP) provides a formal mechanism for releasing statistics while limiting information leakage.

XI.E.1. DP interface for empirical centers and ambiguity radii

Suppose each agent i holds a dataset $\mathcal{D}_i$ of residuals or disturbance samples. A centralized WDRO controller would build an empirical distribution $\hat{\mathbb{P}}$ (or sufficient statistics) from the pooled data. Under DP, one instead releases a privatized statistic $\tilde\theta$ (e.g., a privatized mean, covariance, or a privatized coreset) such that for neighboring datasets $\mathcal{D},\mathcal{D}’$,
\[
\mathbb{P}(\mathcal{M}(\mathcal{D})\in S) \le e^{\varepsilon}\,\mathbb{P}(\mathcal{M}(\mathcal{D}’)\in S) + \delta\quad\forall S,
\]
for a mechanism $\mathcal{M}$. The WDRO layer then uses $\tilde\theta$ to define $\hat{\mathbb{P}}$ or a moment set.

XI.E.2. Robustness–privacy tradeoff (conceptual)

DP noise increases the discrepancy between the privatized center and the true empirical center. In Wasserstein terms, if one can certify
$W_p(\hat{\mathbb{P}},\tilde{\mathbb{P}})\le \delta_{\mathrm{DP}}$, then the radius-inflation inclusions of Section IX.C imply that using $\tilde{\mathbb{P}}$ with radius $\kappa+\delta_{\mathrm{DP}}$ is conservative relative to using $(\hat{\mathbb{P}},\kappa)$. Establishing such a bound requires analyzing the specific DP mechanism and the geometry of the released statistic; we do not assert a general formula.

Summary.
Section XI defined extension templates for (i) decentralized multi-agent operation via DR-GNE and VI formulations, (ii) coalition and cost allocation concepts via core-type constraints, (iii) bilevel planning-operations models with WDRO at the operational level, (iv) two-stage adjustable robust/DRO interfaces and cut generation, and (v) differential-privacy-aware data sharing with a conservative radius-inflation interpretation. The next section turns to sensing and estimation modules that can reduce uncertainty and tighten WDRO-MPC conservatism.

XII. Sensor Placement and Estimation as Robustness Enablers (Extensions)

This section records two extension modules that influence robustness indirectly by improving information: (i) sensor placement, which changes what is measurable and how accurately key states/disturbances can be inferred; and (ii) moving-horizon estimation (MHE), which provides state estimates and uncertainty envelopes that feed into the tube-tightening and terminal ingredients of Section VII.

The main mathematical point is modular: better sensing reduces (a) the size of deterministic error tubes used for recursive feasibility and (b) the required conservatism in distributionally robust constraints when uncertainties are forecast-conditional on measured data.

XII.A. Fisher-information-aware sensor placement

XII.A.1. Measurement model and Fisher information surrogate

Consider a parametric measurement model at time t
\[
y_t = h_S(x_t) + v_t,
\]
where S is the set of deployed sensors, y_t is the sensor output, and v_t is measurement noise. To obtain a tractable sensor placement objective, we specialize to a local linearization around an operating point $\bar x$:
\[
y_t \approx H_S x_t + v_t,
\]
where $H_S$ stacks the measurement rows contributed by sensors in S.

Assume $v_t$ is zero-mean with covariance $R_S\succ 0$. A standard information surrogate for how informative S is about x is the (observed) Fisher information matrix
\[
\mathcal{I}(S) := H_S^\top R_S^{-1} H_S \in \mathbb{S}^{n_x}_+.
\]
To avoid rank issues and to model prior information (e.g., from a process model and past data), define the regularized information
\[
\mathcal{I}_0 + \mathcal{I}(S),\qquad \mathcal{I}_0\succeq 0.
\]
Common scalarizations include
\[
\text{D-optimality:}\quad f(S) := \log\det(\mathcal{I}_0 + \mathcal{I}(S)),
\]
\[
\text{A-optimality:}\quad g(S) := -\mathrm{tr}\big((\mathcal{I}_0 + \mathcal{I}(S))^{-1}\big),
\]
which correspond (heuristically) to reducing volume/average variance of an approximate Gaussian posterior.

XII.A.2. Submodularity of log-det information gain (a verifiable sufficient condition)

The statement below is a classical diminishing-returns property used to justify greedy sensor placement.

Proposition 12.1 (Submodularity of regularized D-optimality under additive information).
Assume each candidate sensor j contributes a positive semidefinite matrix $M_j\succeq 0$ such that
$
\mathcal{I}(S)=\sum_{j\in S} M_j
$
and $\mathcal{I}_0\succ 0$. Define
$
f(S)=\log\det(\mathcal{I}_0+\sum_{j\in S} M_j).
$
Then f is monotone ($S\subseteq T\Rightarrow f(S)\le f(T)$) and submodular:
\[
f(S\cup\{j\})-f(S) \ge f(T\cup\{j\})-f(T)\qquad \forall S\subseteq T,\ j\notin T.
\]

Proof.
Monotonicity follows because adding $M_j\succeq 0$ increases the PSD order, and $\log\det$ is monotone on $\mathbb{S}_{++}^{n_x}$.
For submodularity, define $A_S:=\mathcal{I}_0+\sum_{i\in S} M_i\succ 0$. By the matrix determinant lemma,
\[
f(S\cup\{j\})-f(S)=\log\det(I + A_S^{-1/2} M_j A_S^{-1/2}).
\]
If $S\subseteq T$, then $A_S\preceq A_T$ so $A_S^{-1}\succeq A_T^{-1}$, hence
$
A_S^{-1/2} M_j A_S^{-1/2} \succeq A_T^{-1/2} M_j A_T^{-1/2}
$
in the Loewner order. Since $X\mapsto\log\det(I+X)$ is monotone on $\mathbb{S}_+$, the marginal gain is larger for S than for T.

Remark 12.2 (What must be checked in an IWS implementation).
To use Proposition 12.1 rigorously, one must justify the additive information model $\mathcal{I}(S)=\sum_{j\in S} M_j$. This holds, for example, when sensor noises are independent and $R_S$ is block-diagonal with fixed per-sensor blocks, and when the linearized measurement operator stacks per-sensor rows. If correlated noise or nonlinear/non-Gaussian sensing dominates, submodularity may fail and greedy selection becomes a heuristic.

XII.A.3. Linking sensor placement to WDRO-MPC conservatism

Sensor placement affects WDRO-MPC through the size of state-estimation error sets. In Section VII.A.2, constraint tightenings use margins
$
\rho_\tau = \sup_{e\in\mathcal{E}_\tau} L_\tau\|e\|.
$
If sensor deployment S yields an estimator with an error tube $e_t\in\mathcal{E}_t(S)$, then the resulting MPC feasible set enlarges as $\mathcal{E}_t(S)$ shrinks.

A design-time surrogate problem is therefore
\[
\max_{S\subseteq \mathcal{S}_{\mathrm{cand}}} \quad f(S) \quad \text{s.t.}\quad \mathrm{cost}(S)\le B,
\]
where f(S) is an information metric (e.g., log-det), and B is a sensor budget. Under Proposition 12.1, the classical greedy algorithm that iteratively adds the sensor with largest marginal gain achieves a $(1-1/e)$-approximation for the above monotone submodular maximization under a cardinality constraint; for knapsack budgets, modified greedy or continuous relaxations are typically required and must be specified.

XII.B. Distributionally robust moving-horizon estimation (DR-MHE) interface

XII.B.1. Deterministic MHE template

Let the process model be
\[
x_{k+1} = f(x_k,u_k) + w_k,
\]
with measurements
\[
y_k = h(x_k) + v_k.
\]
Given a window of length L at time t, MHE estimates $x_{t-L}$ (or $x_t$) and the state trajectory $x_{t-L:t}$ by solving
\[
\min_{x_{t-L:t}}\; \|x_{t-L}-\bar x_{t-L}\|_{P^{-1}}^2 + \sum_{k=t-L}^{t-1} \|x_{k+1}-f(x_k,u_k)\|_{Q^{-1}}^2 + \sum_{k=t-L}^{t} \|y_k-h(x_k)\|_{R^{-1}}^2,
\]
subject to state constraints $x_k\in\mathcal{X}$ (and possibly other convex constraints). For nonlinear f,h, one commonly solves successive convex approximations; for the present paper’s conic pipeline, the most compatible case is affine (or convex) surrogates of f,h so that the MHE problem is convex.

XII.B.2. A WDRO robustification for the measurement residual term (Wasserstein-1 Lipschitz case)

To interface with the Wasserstein DRO building blocks of Section V, consider a residual vector $r$ (e.g., forecast error or measurement innovation) and a convex loss $\ell(r)$ used in estimation. Suppose we have empirical residual samples $\hat r^1,\ldots,\hat r^N$ and build a Wasserstein-1 ball $\mathcal{B}_{W_1}(\hat{\mathbb{P}}_N,\kappa)$ on r.

If $\ell$ is L-Lipschitz in the norm underlying $W_1$, then Corollary 5.2 yields the exact robustification
\[
\sup_{\mathbb{P}:W_1(\mathbb{P},\hat{\mathbb{P}}_N)\le \kappa}\ \mathbb{E}_{\mathbb{P}}[\ell(r)]
= \frac{1}{N}\sum_{i=1}^N \ell(\hat r^i) + L\kappa.
\]
Thus, replacing the sample-average residual penalty in MHE by its W$_1$-robust counterpart adds a deterministic offset $L\kappa$ and leaves the minimizer unchanged. This observation is useful in two ways:

1) It justifies interpreting $\kappa$ and L as explicit knobs controlling estimator robustness without changing convexity.
2) It clarifies a limitation: if the robustification does not change the estimator minimizer (because the loss is globally Lipschitz and only the expectation is robustified), then robustness must enter elsewhere (e.g., via robust constraints on admissible residuals, or via uncertainty sets used to form $\mathcal{E}_t$).

XII.B.3. DR-MHE as set-valued estimation for tube MPC

A more direct coupling to Section VII is to construct a *set-valued* estimate (a confidence set) for the current state. Let $\hat x_t$ be a point estimate from MHE, and define an error set $\mathcal{E}_t$ such that
\[
x_t \in \hat x_t \oplus \mathcal{E}_t.
\]
Two auditable constructions are:

(i) Residual-conformal sets: build a prediction set $\mathcal{R}_t$ for measurement residuals (as in Section VII.D) and propagate it through a linearized measurement map to obtain $\mathcal{E}_t$.

(ii) Robust MHE feasibility sets: impose hard constraints $y_k-h(x_k)\in\mathcal{V}_k$ and $x_{k+1}-f(x_k,u_k)\in\mathcal{W}_k$ on bounded noise sets $\mathcal{V}_k,\mathcal{W}_k$. Then the set of feasible terminal states forms an $\mathcal{E}_t$-type set.

Once $\mathcal{E}_t$ is available, the tube MPC tightening in Section VII uses $\mathcal{E}_0$ as the initial error bound and constructs $\mathcal{E}_\tau$ forward. This provides a clear information-to-robustness pathway: sensors and estimators reduce $\mathcal{E}_t$, which reduces tightenings $\rho_\tau$, which enlarges the feasible set and reduces conservatism in multi-objective tradeoffs.

XII.B.4. Caution on probabilistic interpretation

Even if $\mathcal{E}_t$ is derived from a marginal coverage statement (e.g., conformal), the induced tube constraints provide *deterministic* safety only on the event $x_t\in \hat x_t\oplus \mathcal{E}_t$. Joint-in-time statements require additional assumptions (exchangeability, mixing, or anytime-valid calibration) and are not implied automatically.

Summary.
Section XII provided two information-centric extension modules: (i) Fisher-information-aware sensor placement with a verifiable submodularity condition for log-det objectives under additive information contributions, and (ii) an estimation interface via (distributionally robust) moving-horizon estimation that produces state-estimation error sets $\mathcal{E}_t$ feeding directly into tube tightening and terminal invariance constructions. The next section consolidates the main theorem statements proved (or conditionally proved) in this paper and provides a verification checklist for assumptions.

XIII. Theorems, Proof Roadmap, and Verification Checklist

This section consolidates the mathematically precise statements established in Sections II–XII and clarifies which claims are unconditional, which are conditional on standard assumptions, and which are only provided as modeling interfaces. The purpose is to make it auditable what has actually been proved.

XIII.A. Theorems and propositions established in the body

For ease of verification, we group results by module. All numbering below refers to statements already proved in earlier sections.

1) Wasserstein DRO duality and risk surrogates.

– Theorem 5.1 provides a dual representation of the worst-case expectation over a Wasserstein ball centered at a discrete empirical measure, under upper semicontinuity and $p$-growth conditions on the loss and closedness of the support set $\Xi$.

– Corollary 5.2 is an exact simplification for $W_1$ when the loss is globally Lipschitz, using Kantorovich–Rubinstein duality.

– Section V.B records the conservative implication
\[
\mathrm{CVaR}_{1-\varepsilon}(g(\cdot,\xi))\le 0 \implies \mathbb{P}(g(\cdot,\xi)\le 0)\ge 1-\varepsilon,
\]
which is distribution-free. The DRO layer then robustifies the CVaR surrogate. This direction is safe; the reverse implication is not used.

2) Multi-objective scalarizations.

– Proposition 4.2 (weighted sums) and Proposition 4.4 ($\varepsilon$-constraint) give standard sufficient conditions for Pareto efficiency under convexity/uniqueness assumptions.

3) Conic physical modeling layers and conditional tightness.

– Proposition 6.1: pump head-gain inequalities become tight at optimum when pump head-gain variables enter the objective through strictly increasing penalties and do not appear elsewhere.

– Proposition 6.3: Forchheimer head-loss inequality $\Delta h\ge aq+bq^2$ becomes tight at optimum when $\Delta h$ is strictly penalized and appears only monotonically elsewhere.

– Proposition 6.5: box-robust blending reduces to a linear constraint under nonnegative flows, via monotonicity.

These results certify tightness only under explicit objective/constraint-separability assumptions; they do not assert global exactness of any hydraulic relaxation without those assumptions.

4) Closed-loop feasibility/stability templates.

– Theorem 7.4: recursive feasibility of a tube-based MPC construction, conditional on tube invariance (Assumption 7.2) and terminal invariance ingredients (Assumption 7.3).

– Proposition 7.8: an ISS-type bound is stated as a standard consequence of robust MPC Lyapunov arguments, conditional on stabilizability and a terminal decrease condition (Assumption 7.6), plus bounded disturbance in the error channel (Assumption 7.7). This is a template statement: to be fully rigorous for a specific instantiation, one must verify the relevant Lipschitz and decrease inequalities for the chosen (possibly nonsmooth) robustified stage cost.

– Proposition 7.12: monotonicity of WDRO feasibility in the Wasserstein radius (smaller radius implies weaker robustification), used to justify a safe mode for time-varying radii $\kappa_t$.

5) Time-consistent multistage extensions.

– Proposition 8.3: contraction and fixed point for a discounted DRMDP Bellman operator under rectangular ambiguity, conditional on bounded stage cost and discount $\gamma\in(0,1)$. This result is independent of Wasserstein specifics; it only requires that the inner worst-case expectation is well-defined.

6) Scalability certificates.

– Proposition 9.2: closed-form robustification of an RKHS test function over an MMD ball.

– Propositions 9.5 and 9.6: outer/inner inclusions between Wasserstein balls after coreset quantization using the triangle inequality. These inclusions are unconditional metric statements.

7) Alternative reliability layers.

– Theorem 10.1: a classical scenario-optimization bound is stated as a conditional theorem, explicitly listing convexity and support-rank assumptions that must be verified for any instantiated problem.

8) Sensing/estimation.

– Proposition 12.1: submodularity of $\log\det$ D-optimality under additive information contributions and a positive definite prior.

XIII.B. Proof roadmap (how the pieces compose)

The composition logic used throughout the paper is modular and can be checked independently by module.

1) Conic modeling $\Rightarrow$ convex MPC subproblem.

– Sections II and VI specify constraints and objectives that can be represented as LP/SOCP/power-cone/exponential-cone constraints (with optional integer variables). This ensures each scalarized MPC step (Section IV) is a convex (or MI-conic) program once the uncertainty enters through affine templates.

2) Wasserstein DRO duality $\Rightarrow$ tractable robustified objectives/constraints.

– For any robustified expectation or hinge term, Theorem 5.1 reduces $\sup_{\mathbb{P}\in\mathcal{B}_{W_p}}\mathbb{E}_{\mathbb{P}}[\cdot]$ to a finite-dimensional convex program parameterized by $\lambda\ge 0$ and sample-wise inner suprema.

– For chance-type reliability constraints, we only use the safe surrogate path
\[
\text{chance} \;\Longleftarrow\; \mathrm{CVaR} \;\Longleftarrow\; \text{hinge expectation},
\]
so tractability rests on hinge-loss robustification rather than on any claim of exact reformulation of indicators.

3) Receding horizon + tube/terminal ingredients $\Rightarrow$ recursive feasibility.

– The WDRO layer is applied to the nominal plan (with optional Lipschitz margins for tube deviations). The tube/terminal layer is deterministic and yields Theorem 7.4: if the disturbance affecting the error channel stays within the set used to construct the tube, feasibility propagates.

4) Nonstationarity adaptation via $\kappa_t$.

– Conformal calibration (Section VII.D) provides a marginal per-step coverage statement under exchangeability assumptions. Proposition 7.12 is then used to argue that designing for a conservative radius $\bar\kappa$ and shrinking $\kappa_t\le \bar\kappa$ preserves feasibility of WDRO constraints (subject to the deterministic tube assumptions).

XIII.C. Verification checklist for an instantiated IWS deployment

The following checklist enumerates assumptions that must be checked (or replaced by conservative alternatives) before any claim is interpreted as a real-world guarantee.

1) Duality and finiteness in WDRO blocks (Section V).

– Check that the loss/constraint surrogate $\ell$ is upper semicontinuous and has at most $p$-growth on $\Xi$, and that the inner suprema in Theorem 5.1 are finite.

– If using $W_1$ Lipschitz simplifications (Corollary 5.2), verify global Lipschitzness in the chosen norm; local Lipschitzness is insufficient.

2) Convexity/conic representability of the physical layer (Section VI).

– Verify every relaxed physical relation used in the optimizer is convex and encoded in a closed cone/convex set.

– If interpreting any inequality relaxation as physically accurate (tight), verify the tightness conditions in Propositions 6.1 and 6.3 (objective monotonicity and separability).

3) Tube invariance and terminal invariance (Section VII).

– Construct and certify tube sets $\mathcal{E}_\tau$ satisfying the forward recursion for the adopted disturbance envelope $\mathcal{D}_t$ (or a uniform $\mathcal{D}$).

– Construct a terminal set $\mathcal{X}_f$ and terminal controller $\kappa_f$ satisfying Assumption 7.3.

4) Probabilistic interpretation across time.

– If using conformal or scenario certificates, decide whether the intended guarantee is per-step (marginal) or over a finite horizon $T$ (joint). For joint statements, explicitly allocate risk via union bounds or adopt an anytime-valid method; do not implicitly treat marginal certificates as joint.

5) Mixed-integer components.

– If on/off decisions are present, Pareto convexity claims (Proposition 4.2) no longer apply globally, and scenario/DRO duality may require additional care (e.g., strong duality is not directly applicable to nonconvex feasible sets). Treat such problems as MI-conic heuristics unless a separate argument is supplied.

Summary.
Section XIII consolidated the exact mathematical statements established in the body and provided a verification checklist that separates unconditional metric/convex-analysis facts from conditional control-theoretic and statistical assumptions.

XIV. Algorithmic Summary, Tuning, and Implementation Details

This section summarizes an end-to-end implementation of the multi-objective WDRO-MPC pipeline described in Sections II–XII. The emphasis is on explicitly identifying (i) optimization variables, (ii) ambiguity/risk hyperparameters, and (iii) solver classes compatible with the conic modeling layers.

XIV.A. End-to-end CC-WDRO-MPC loop (receding-horizon controller)

At each time step $t$:

1) State estimation and forecast ingestion.

– Obtain a state estimate $\hat x_t$ (and, if using output feedback, an estimation-error set $\mathcal{E}_t$) using MHE/filters as in Section XII.B.

– Obtain a forecast object for disturbances, such as point forecasts $\hat w_{t+\tau|t}$ and/or ensembles/scenarios $\{\hat w_{t+\tau|t}^i\}_{i=1}^N$ for $\tau=0,\ldots,H-1$.

2) Construct the empirical center distribution.

Two common, auditable choices are:

– Stagewise: for each $\tau$, set $\hat{\mathbb{P}}_{t+\tau,N}=\tfrac{1}{N}\sum_{i=1}^N\delta_{\hat w_{t+\tau|t}^i}$.

– Trajectory-wise: form $\hat{\mathbf{w}}_t^i=(\hat w_{t|t}^i,\ldots,\hat w_{t+H-1|t}^i)$ and set $\hat{\mathbb{P}}^{\mathrm{traj}}_{t,N}=\tfrac{1}{N}\sum_{i=1}^N\delta_{\hat{\mathbf{w}}_t^i}$.

3) Set the Wasserstein radius $\kappa_t$ (fixed or drifting).

– Fixed-radius mode: choose $\kappa$ offline (e.g., by backtesting reliability vs conservatism).

– Drifting-radius mode (Section VII.D): compute residuals $r_t=w_t-\hat w_{t|t-1}$, update a conformal scale $\rho_t$, then set $\kappa_t=c_\kappa\rho_t$ as an auditable rule, interpreted as a tuning mechanism rather than an unconditional statistical guarantee under drift.

4) Choose a multi-objective scalarization.

– Weighted sum: choose weights $\lambda\in\mathbb{R}^m_+$.

– $\varepsilon$-constraints: choose budgets (e.g., emissions cap, risk budgets).

5) Solve the WDRO-MPC subproblem.

– Decision variables: $\mathbf{x},\mathbf{u}$ (and $\delta$ if mixed-integer), plus conic auxiliaries for hydraulics/quality and risk epigraph variables $(\eta,\tau,\lambda$ in dual forms).

– Constraints: dynamics (Section II.A), conic physics (Section VI), and WDRO risk constraints via robust CVaR/hinge surrogates (Section V).

– If using tube MPC, enforce tightened constraints and terminal conditions (Section VII.A–VII.B).

6) Apply control and roll horizon.

– Implement $u_t=u_{t|t}^\star$ (or $u_t=\hat u_{t|t}^\star + K e_t$ in a tube policy).

– Advance to $t+1$ and repeat.

XIV.B. Practical tuning knobs and their mathematical meaning

1) Wasserstein geometry.

– Choose $p\in\{1,2\}$ and a ground norm $\|\cdot\|$. The dual norm $\|\cdot\|_*$ enters directly in sensitivity bounds (Section V.B.3) and in Lipschitz penalties (Corollary 5.2).

– Scale features (units) in $w$ before computing $W_p$ so that the ball does not overweight large-magnitude components.

2) Risk parameters.

– CVaR confidence level $\alpha$ (tail level) and chance target $\varepsilon$ (violation probability). In the conservative surrogate path, typical constraints take the form
\[
\sup_{\mathbb{P}\in\mathcal{B}}\mathrm{CVaR}^{\mathbb{P}}_{1-\varepsilon}(g(z,\xi))\le 0.
\]

– Risk allocation $\varepsilon_j$ across multiple constraints using Boole’s inequality (Section V.B.2).

3) Horizon and terminal ingredients.

– Horizon $H$: larger $H$ can improve performance but increases decision dimension and (trajectory-wise) uncertainty dimension.

– Terminal set $\mathcal{X}_f$ and terminal controller $\kappa_f$: required for recursive feasibility under tube constructions.

4) Coresets and scalability.

– If reducing sample size from $N$ to $M\ll N$, compute/upper bound $\delta=W_p(\hat{\mathbb{P}}_N,\hat{\mathbb{P}}_M)$ and inflate/deflate radii using Propositions 9.5–9.6 to obtain conservative/inner approximations.

XIV.C. Solver and modeling stack (conic compatibility)

The instantiated optimization problems fall into the following solver classes.

– LP: linearized mass balance, deterministic blending constraints, piecewise-linear costs.

– SOCP / rotated SOCP: quadratic head-loss surrogates, Forchheimer terms, second-moment certificates (Section X.B), and many WDRO dual inner problems when $p=2$ and $\|\cdot\|_2$.

– Power cone: rational power head-loss epigraphs $q^\gamma$ and certain convex monomial constraints.

– Exponential cone: KL-DRO / log-sum-exp robustification (Section X.D) and entropic-risk layers.

– SDP: SOS certificates (Section VI.D) and some kernelized approximations if lifted to finite-dimensional semidefinite constraints.

– MI-conic: on/off assets (desalination/treatment/pumps) and logical constraints.

Warm-starting.

– Across time: reuse the previous MPC solution as an initial point.

– Across Pareto sweeps: use continuation in $\lambda$ or budgets $\varepsilon$.

Numerical conditioning.

– Use physically meaningful scaling (e.g., normalize storages by capacities, powers by $\overline P^{\mathrm{grid}}$). Poor scaling can cause dual infeasibility artifacts in conic solvers.

Summary.
Section XIV provided an explicit end-to-end algorithmic loop and identified the principal tuning knobs and solver classes implied by the conic WDRO-MPC modeling pipeline.

XV. Numerical Experiments and Case Studies (protocol for falsification and stress-testing)

This paper has developed a mathematical framework and tractable reformulation templates; it has not established empirical superiority on any benchmark in the body. This section therefore specifies a falsifiable evaluation protocol: what should be simulated, what metrics should be reported, and what ablations separate the effects of WDRO, tube tightening, and alternative reliability layers.

XV.A. Benchmark setups for renewable-powered IWS

We recommend reporting at least three levels of network complexity.

1) Small radial system.

– A tree-structured distribution network with a small number of storage nodes, one desalination plant, one renewable source, and one critical pressure node.

– Purpose: isolate the hydraulics relaxation and quality mixing constraints in a setting where conic modeling is easiest to audit.

2) Medium weakly meshed system.

– Add a few loops (mesh) and multiple pumping stations, plus an aquifer banking module (recharge/extraction).

– Purpose: test whether inequality-relaxed hydraulics introduce unacceptable slack (Section VI.A.3), and assess computational scaling.

3) Aggregated multi-district system.

– Multiple demand districts with shared aquifer constraint(s) and shared renewable resource.

– Purpose: test decomposition/coordination interfaces (Section XI) and high-dimensional uncertainty components (Section IX).

Disturbance generation.

– Use historical or synthetic time series for demand, inflow, and renewables.

– Explicitly include nonstationarity (seasonal trend, regime switches) to test drifting $\kappa_t$ vs fixed $\kappa$.

XV.B. Evaluation protocol and metrics

Report closed-loop results under out-of-sample disturbance trajectories (not used in fitting/tuning) using the following metrics.

1) Reliability / constraint violations.

– Empirical violation probability per constraint class (shortage, pressure, ecosystem, quality).

– Empirical CVaR of violation magnitudes, e.g.
$\widehat{\mathrm{CVaR}}_{1-\varepsilon}(\mathrm{short}_t)$ and analogous metrics for pressure deficits.

– Joint-violation frequency over horizons $H$ and over long runs $T$, reported separately from marginal per-step frequencies.

2) Performance.

– Average operational cost and emissions.

– Pareto-front plots (cost vs emissions) at fixed risk budgets, and (cost vs risk) at fixed emissions caps.

3) Robustness and conservatism diagnostics.

– Feasibility rate of MPC subproblems over time.

– Shadow prices / dual variables on risk constraints, interpreted as marginal value of reliability.

– Slack/tightness statistics for inequality-relaxed physics constraints (e.g., how often head-loss inequalities bind).

4) Computational scaling.

– Runtime per MPC step, including breakdown by conic constraint class.

– Scaling in $N$ (samples), $H$ (horizon), and $d$ (uncertainty dimension).

XV.C. Ablations comparing robustness layers

To isolate the impact of each robustness component, compare at least:

1) Nominal MPC (no robustness) vs tube MPC (deterministic robustness only) vs WDRO-MPC (distributional robustness only) vs combined tube + WDRO.

2) Wasserstein WDRO vs scenario MPC (Section X.A) vs moment-based SOC margins (Section X.B) vs KL-DRO (Section X.D), using identical physical constraints.

3) Fixed $\kappa$ vs drifting $\kappa_t$ with the same maximum radius $\bar\kappa$, to distinguish performance improvement from feasibility risk.

4) Full-sample WDRO ($N$) vs coreset WDRO ($M\ll N$) with radius inflation/deflation (Section IX.C), reporting the empirical performance bracket.

XV.D. Reporting standards for auditable conclusions

Because the framework combines deterministic, distributionally robust, and data-driven calibration layers, it is important to report:

– the exact ambiguity definition (stagewise vs trajectory-wise),

– the chosen ground norm and scaling,

– the calibration/tuning procedure for $\kappa$, $\varepsilon$, and $\alpha$, and

– whether any claimed reliability statement is marginal-in-time, horizon-wise, or long-run (and what accounting method was used).

Summary.
Section XV specified a falsifiable experimental protocol and ablation plan suitable for evaluating the proposed multi-objective WDRO-MPC framework without conflating marginal vs joint reliability interpretations.

XVI. Optional Alternative Modeling and Stability Perspectives (non-essential extensions)

This section records three optional perspectives that can be layered on top of the main WDRO-MPC pipeline. These are included as interfaces; they are not required for the correctness of Sections II–XV.

XVI.A. Koopman lifting for nonlinear surrogates in DRO-MPC (interface)

When the physical layer is strongly nonlinear (hydraulics with changing flow directions, nonlinear quality transport), one approach is to learn a lifted linear predictor.

Template.

– Choose observable functions $\psi(x)\in\mathbb{R}^{n_\psi}$.

– Fit a linear predictor
\[
\psi(x_{t+1}) \approx A_K\psi(x_t) + B_K u_t + E_K w_t
\]
by regression on simulation/measurement data.

– Use $\psi(x)$ as the MPC state (or augment $x$ with $\psi(x)$) so that the resulting dynamics are affine in $(\psi,u,w)$, enabling the affine-in-uncertainty WDRO templates of Section V.B.3.

Caution.

– Any guarantee becomes conditional on model mismatch bounds between the true nonlinear dynamics and the learned lifted dynamics. Tube MPC can be used to absorb mismatch into $\mathcal{D}_t$ if a deterministic envelope is available.

XVI.B. Port-Hamiltonian / passivity viewpoint for water-energy coupling (interface)

For systems with explicit energy storage and dissipation (tanks as potential energy, batteries as electrical energy, pumps as actuators), one can attempt to construct a storage function $S(x)$ and show a dissipation inequality of the form
\[
S(x_{t+1})-S(x_t) \le u_t^\top y_t – \sigma(x_t)
\]
for some supply rate $u^\top y$ and dissipation $\sigma\ge 0$. If such a structure can be established (even approximately), it provides an alternative route to terminal costs/terminal sets in MPC via energy-based Lyapunov functions.

Caution.

– Establishing a passivity inequality for an IWS requires careful definition of ports and may not be compatible with aggressive convex relaxations. This section therefore provides only an organizational viewpoint.

XVI.C. Online convex optimization (OCO) viewpoint for time-varying objectives (interface)

In some settings, objectives (prices, carbon intensity, penalties) vary online and are only revealed at each time. One may view the MPC decision $u_t$ as an online decision and evaluate regret relative to a comparator policy.

A minimal interface is:

– represent the stage objective as a convex function $\ell_t(u)$ (possibly including an expected/robustified term), and

– apply an OCO update (mirror descent, proximal gradient) to track minimizers.

Relationship to WDRO-MPC.

– OCO provides performance-style guarantees (regret), whereas WDRO/tube MPC targets feasibility/reliability.

– A safe hybrid is to use WDRO/tube MPC to define a time-varying feasible set $\mathcal{U}_t$ and use OCO to choose among feasible actions to reduce cost/emissions.

Summary.
Section XVI recorded optional extension perspectives (Koopman lifting, passivity/energy-based certificates, and OCO-style performance analysis) that can complement, but do not replace, the feasibility and robustness certificates developed in the core WDRO-MPC framework.

Conclusion

This paper developed a modular mathematical framework for multi-objective distributionally robust model predictive control (MPC) of renewable-powered integrated water systems (IWS), with an emphasis on (i) explicit uncertainty modeling via Wasserstein ambiguity sets, (ii) tractable conic representations of key physical layers, and (iii) closed-loop architectures that separate distributional robustness for service constraints from deterministic robustness for model/estimation error.

On the optimization side, we formalized multi-objective receding-horizon control using standard scalarizations (weighted sums and $\varepsilon$-constraints) and stated verifiable convexity-based conditions under which solutions are Pareto efficient (Propositions 4.2 and 4.4). For Wasserstein DRO, we recorded the dual representation of worst-case expectations over Wasserstein balls centered at empirical measures (Theorem 5.1) and the exact simplification for $W_1$ under global Lipschitz losses (Corollary 5.2). For reliability constraints, we consistently used a conservative but distribution-free path from chance constraints to CVaR constraints and hinge-loss expectations, emphasizing that this direction is safe while the reverse direction is not required. These ingredients provide a generic route to LP/SOCP/power-cone/exponential-cone formulations whenever the uncertain dependence enters through affine or conically representable templates.

On the modeling side, we provided conic-compatible physical layers for hydraulics, groundwater head loss, and water quality. The hydraulics layer uses convex epigraph/inequality forms for head-loss constraints and convex envelopes for pump power; we proved conditional tightness of pump head-gain inequalities when head-gain variables are strictly penalized and appear only through those inequalities (Proposition 6.1). For Forchheimer-type quadratic head loss, we gave an SOCP epigraph formulation and a corresponding conditional tightness result under monotone penalties (Proposition 6.3). For quality, we showed that robust blending with interval-bounded concentrations reduces to a single linear inequality under nonnegative flows (Proposition 6.5), and we described how the same affine structure admits Wasserstein DRO wrapping via the Section V building blocks.

On the closed-loop control side, we described a tube-based DR-MPC architecture in which WDRO constraints are enforced on the nominal plan, while deterministic tube tightening addresses bounded disturbances in an error channel. Under standard tube invariance and terminal invariance assumptions, the usual shift argument yields conditional recursive feasibility (Theorem 7.4). We also stated an ISS-type bound as a template consequence of robust MPC Lyapunov arguments under stabilizability, terminal decrease, and bounded error disturbances (Proposition 7.8), explicitly noting where nonsmooth robustified costs require additional instantiation-specific verification. For nonstationarity, we proposed a conformal-calibrated mechanism to adapt disturbance envelopes and a drifting Wasserstein radius $\kappa_t$, while stressing that conformal coverage is typically marginal-in-time and that joint/horizon-wise guarantees require separate accounting. A particularly safe operational mode is to design feasibility for a conservative radius $\bar\kappa$ and only shrink $\kappa_t$, using monotonicity of WDRO feasibility in the radius (Proposition 7.12).

Beyond standard open-loop MPC, we clarified how time-consistent multistage robustness can be obtained through rectangular ambiguity in a DRMDP formulation. In the discounted case, the robust Bellman operator is a contraction and admits a unique fixed point with convergent value iteration (Proposition 8.3), while tractability remains conditional on the chosen state representation and on the ability to compute worst-case conditional expectations.

To address computational scaling in high-dimensional uncertainty and large-sample regimes, we recorded three complementary approximation layers: (i) MMD-based ambiguity, which yields a closed-form robust expectation offset for RKHS functions (Proposition 9.2); (ii) sliced/Max-sliced Wasserstein as projection-based approximations (presented as interfaces, with explicit cautions about what directions are protected); and (iii) OT quantization/coresets, together with rigorous Wasserstein-ball inclusion relations enabling radius inflation/deflation bracketing (Propositions 9.5 and 9.6). We also positioned scenario optimization and moment/KL-based alternatives as comparative reliability certificates, stating a classical scenario bound only in explicitly conditional form (Theorem 10.1).

The overarching mathematical takeaway is that a tractable multi-objective WDRO-MPC for IWS can be assembled by: (a) expressing the physical feasibility layer in convex conic form (possibly as a conservative relaxation), (b) wrapping uncertainty through Wasserstein-robust expectations and robust CVaR/hinge constraints with dual representations, and (c) enforcing closed-loop feasibility through deterministic tube/terminal ingredients whose assumptions are transparent and checkable. Throughout, we distinguished unconditional convex-analytic/metric statements from conditional control-theoretic and statistical statements, and we provided an explicit verification checklist (Section XIII) to prevent accidental over-interpretation of modeling interfaces as guarantees.

Several limitations and open directions follow directly from this separation of modules. First, any claim of physical exactness for relaxed hydraulics requires verifying the tightness conditions (or adopting tighter relaxations) in the operating regime. Second, translating per-step (marginal) reliability certificates into horizon-wise or long-run closed-loop statements demands explicit dependence modeling and risk accounting; neither conformal calibration nor per-step DRO constraints automatically provide such joint guarantees. Third, mixed-integer actuation breaks global convexity and can invalidate some Pareto-efficiency and duality-based arguments unless handled with separate nonconvex analysis. Finally, genuinely time-consistent multistage Wasserstein control requires conditional/rectangular or nested constructions; trajectory-wise Wasserstein balls are best interpreted as open-loop robustness models inside a receding-horizon loop unless additional causal formalism is supplied.

In summary, the paper provides a coherent, auditable mathematical blueprint for combining (i) multi-objective MPC, (ii) Wasserstein distributional robustness, and (iii) conic physical surrogates for renewable-powered IWS operation, together with clear conditions under which recursive feasibility and stability-style properties can be asserted and clear flags where further, application-specific verification is required.

[HARD CODED END-OF-PAPER MARK — ALL CONTENT SHOULD BE ABOVE THIS LINE]

================================================================================
MODEL CREDITS

This autonomous solution attempt was generated with the Intrafere LLC AI Harness,
MOTO, and the following model(s):

– x-ai/grok-4.1-fast (88 API calls)
– openai/gpt-5.2 (48 API calls)
– moonshotai/kimi-k2.5 (12 API calls)

Total AI Model API Calls: 148
================================================================================

More Info