arXiv Cluster Highlights

• Momentum Further Constrains Sharpness at the Edge of Stochastic Stability Recent work suggests that (stochastic) gradient descent self-organizes near an instability boundary, shaping both optimization and the solutions found.. Momentum and mini-batch gradients are widely used in practical deep learning optimization, but it remains unclear whether they operate in a comparable regime of instability. • Multistage Conditional Compositional Optimization We introduce Multistage Conditional Compositional Optimization (MCCO) as a new paradigm for decision-making under uncertainty that combines aspects of multistage stochastic programming and conditional stochastic optimization.. MCCO minimizes a nest of conditional expectations and nonlinear cost functions. • Stochastic Trust-Region Methods for Over-parameterized Models Under interpolation-type assumptions such as the strong growth condition, stochastic optimization methods can attain convergence rates comparable to full-batch methods, but their performance, particularly for SGD, remains highly sensitive to step-size selection.. To address this issue, we propose a unified stochastic trust-region framework that eliminates manual step-size tuning and extends naturally to equality-constrained problems. • Understanding the Variance Dichotomy in Continuous Simulation Optimization: A Minimax Lower Bound Perspective This paper studies the variance dichotomy in continuous simulation optimization (CSO).. Existing literature shows a sharp contrast between deterministic CSO and stochastic CSO, with convergence rates in stochastic settings appearing insensitive to the magnitude of the noise variance. • Gradient Descent's Last Iterate is Often (slightly) Suboptimal We consider the well-studied setting of minimizing a convex Lipschitz function using either gradient descent (GD) or its stochastic variant (SGD), and examine the last iterate convergence.. By now, it is known that standard stepsize choices lead to a last iterate convergence rate of $\log T/\sqrt{T}$ after $T$ steps. • Broximal Alignment for Global Non-Convex Optimization Most non-convex optimization theory is built around gradient dynamics, leaving global convergence largely unexplored.. The dominant paradigm focuses on stationarity, certifying only that the gradient norm vanishes, which is often a weak proxy for actual optimization success.

arXiv Cluster Highlights

Related Papers

Hiring AI researchers or engineers?