|
Research
I work on making second-order optimization practical for large-scale machine learning.
My research develops structured Gauss-Newton approximations, matrix-free solvers, and
JAX-native training methods that reduce the compute and memory costs of curvature-aware
optimization while preserving useful second-order information.
|
|
Second-Order, First-Class: A Composable Stack for Curvature-Aware Training
Mikalai Korbit, Mario Zanon
Under Review
code
/
arXiv
We introduce Somax, a systems framework for curvature-aware training that treats a
second-order update as a planned JIT-compiled step. It exposes curvature operators, solvers,
preconditioners, damping policies, and telemetry as composable modules, and shows that
planning and module choices materially affect overhead and time-to-accuracy.
|
|
Exact Gauss-Newton Optimization for
Training Deep Neural Networks
Mikalai Korbit,
Adeyemi D. Adeoye,
Alberto Bemporad, Mario Zanon
Neurocomputing, 2025
code
/
arXiv
We present EGN, a stochastic second-order optimizer that computes the Gauss-Newton
direction by solving an exact low-rank system in mini-batch space, making second-order
updates practical when the parameter dimension is much larger than the batch size.
|
|
Incremental Gauss-Newton Descent for Machine Learning
Mikalai Korbit, Mario Zanon
Under Review
code
/
arXiv
We propose IGND, a scale-invariant, easy-to-tune, fast-converging stochastic
optimization algorithm based on approximate second-order
information with nearly the same per-iteration complexity as Stochastic
Gradient Descent.
|
|
Software
I develop and maintain Somax,
an open-source JAX library for curvature-aware training.
It provides a composable framework for implementing, studying, and benchmarking a wide range
of second-order methods.
|
|
Somax: Stochastic Second-Order Optimization in JAX
Mikalai Korbit
code
/
arXiv
Somax is an Optax-native JAX library for building and running curvature-aware optimizers. It
provides modular curvature operators, estimators, solvers, preconditioners, and damping
policies behind a common step interface, making second-order methods easier to compose,
benchmark, and extend.
|
|