Evaluator Class to allow more metrics and saving of input tokens during evaluation

Is your feature request related to a problem? Please describe.
An Evaluator class similar would allow specifying more metrics to run and how to post-process the input batches for saving.

Describe the solution you'd like

An Evaluator class similar to Trainer that would have a key evaluator in the config.

The class config would have:

Metrics to call - Some metrics are too computationally intensive to be run during training, but there currently is no way to use them in allennlp evaluate.
Source & target namespaces - Currently, when predictions are saved, there is no way to save the inputs and targets as well. It makes it difficult to compare multiple models on the same dataset without implementing some other alignment method.

Describe alternatives you've considered
Modifying the evaluate method required implementing more mixins to allow postprocessing the batch so that the inputs can be saved.

For the metrics, I have yet to find a way (that is not done inside of the model) to have some metrics not run during training but do run during evaluation.

This is something I could work on but wanted to see first if there was interest in such a feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions