You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For Physics-Informed NN (PINNs) often switching to L-BFGS (from Adam type optimizers) at later stages of training seems to help final convergence. I know it is probably not in the plans of the MLX team to implement L-BFGS under mlx.optimizers, but I would be interested in giving it a try by coding a basic functionality using the Python API.
The key challenge is maintaining compatibility with mx.compile.
Before getting started, I would be thankful if @awni could give me a high-level skeleton of the steps I should follow to ensure that the implementation will be as efficient as possible given that L‑BFGS involves iterative, stateful updates (tracking curvature approximations and doing line searches) that don’t always fit neatly into a compiled, pure function framework. OR, if it is simply not a good idea to go down this path please let me know to avoid wasting time.
The text was updated successfully, but these errors were encountered:
You might want to check out some of the optimizers in mlx-optimizers. There are several which approximate the hessian and may work as well as Adam + L-BFGS
For Physics-Informed NN (PINNs) often switching to L-BFGS (from Adam type optimizers) at later stages of training seems to help final convergence. I know it is probably not in the plans of the MLX team to implement L-BFGS under
mlx.optimizers
, but I would be interested in giving it a try by coding a basic functionality using the Python API.The key challenge is maintaining compatibility with
mx.compile
.Before getting started, I would be thankful if @awni could give me a high-level skeleton of the steps I should follow to ensure that the implementation will be as efficient as possible given that L‑BFGS involves iterative, stateful updates (tracking curvature approximations and doing line searches) that don’t always fit neatly into a compiled, pure function framework. OR, if it is simply not a good idea to go down this path please let me know to avoid wasting time.
The text was updated successfully, but these errors were encountered: