Description
Is your feature request related to a problem? Please describe.
An Evaluator
class similar would allow specifying more metrics to run and how to post-process the input batches for saving.
Describe the solution you'd like
An Evaluator
class similar to Trainer
that would have a key evaluator
in the config.
The class config would have:
- Metrics to call - Some metrics are too computationally intensive to be run during training, but there currently is no way to use them in
allennlp evaluate
. - Source & target namespaces - Currently, when predictions are saved, there is no way to save the inputs and targets as well. It makes it difficult to compare multiple models on the same dataset without implementing some other alignment method.
Describe alternatives you've considered
Modifying the evaluate
method required implementing more mixins to allow postprocessing the batch so that the inputs can be saved.
For the metrics, I have yet to find a way (that is not done inside of the model) to have some metrics not run during training but do run during evaluation.
This is something I could work on but wanted to see first if there was interest in such a feature.