Fitting a keras model using scipy.optimize.minimize #3064
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Many people (including myself) have been asking about exposing a keras model to an external optimizer. It involves getting the loss and gradients, but this is complicated by the symbolic nature of theano/tensorflow. I wrote this code to fit an arbitrary Keras model with any method from scipy.optimize.minimize. It should work similarly for any other optimization code.
While I understand SGD and variants are useful for computer vision problems, routines such as L-BFGS-B are essential for training neural networks aimed at modeling real biological neural networks (e.g. those in the visual system). My attached example code shows how a Sparse Autoencoder trained on natural images fails to obtain the expected oriented edge filters using SGD (or any other Keras optimizer, trust me I tried) but works perfectly when trained on the L-BFGS-B routine.
This example is taken from the stanford course: http://ufldl.stanford.edu/wiki/index.php/Exercise:Sparse_Autoencoder
In short, this code is essential for researchers in the visual neuroscience domain (myself) who want the flexible and intuitive model building of keras, but need other optimizers.
Any code review/suggestions are obviously welcome. If this idea doesn't fit the scope of the project, no worries. I know it would be best to turn this code into its own "optimizer" but that would involve some changes to the keras internals.
Also, it doesn't work on TensorFlow because of the weird K.learning_phase() issue.. I figured others would have a good suggestion for that.
Thanks,
Nick
example.zip