Back to News

arXiv Cluster Highlights

• Momentum Further Constrains Sharpness at the Edge of Stochastic Stability Recent work suggests that (stochastic) gradient descent self-organizes near an instability boundary, shaping both optimization and the solutions found.. Momentum and mini-batch gradients are widely used in practical deep learning optimization, but it remains unclear whether they operate in a comparable regime of instability. • Multistage Conditional Compositional Optimization We introduce Multistage Conditional Compositional Optimization (MCCO) as a new paradigm for decision-making under uncertainty that combines aspects of multistage stochastic programming and conditional stochastic optimization.. MCCO minimizes a nest of conditional expectations and nonlinear cost functions. • Stochastic Trust-Region Methods for Over-parameterized Models Under interpolation-type assumptions such as the strong growth condition, stochastic optimization methods can attain convergence rates comparable to full-batch methods, but their performance, particularly for SGD, remains highly sensitive to step-size selection.. To address this issue, we propose a unified stochastic trust-region framework that eliminates manual step-size tuning and extends naturally to equality-constrained problems. • Understanding the Variance Dichotomy in Continuous Simulation Optimization: A Minimax Lower Bound Perspective This paper studies the variance dichotomy in continuous simulation optimization (CSO).. Existing literature shows a sharp contrast between deterministic CSO and stochastic CSO, with convergence rates in stochastic settings appearing insensitive to the magnitude of the noise variance. • Gradient Descent's Last Iterate is Often (slightly) Suboptimal We consider the well-studied setting of minimizing a convex Lipschitz function using either gradient descent (GD) or its stochastic variant (SGD), and examine the last iterate convergence.. By now, it is known that standard stepsize choices lead to a last iterate convergence rate of $\log T/\sqrt{T}$ after $T$ steps. • Broximal Alignment for Global Non-Convex Optimization Most non-convex optimization theory is built around gradient dynamics, leaving global convergence largely unexplored.. The dominant paradigm focuses on stationarity, certifying only that the gradient norm vanishes, which is often a weak proxy for actual optimization success.

Related Papers

Hiring AI researchers or engineers?

Your job post reaches ML engineers, PhD researchers, and AI leads who read arXiv daily. Transparent pricing, real impression data, no middlemen.

© 2026 AI News Online