Description
right now there's no visibility into how the pipeline.batch.size and pipeline.batch.interval parameters are influencing the performance of the pipeline.
It should be useful to understand if, for example, by setting the batch size to 1000, are the inputs generating 1000 events before the interval triggers and pushes the incomplete batch into filters+outputs, or if most batches end up being 200 events or less.
This means it would be nice to have a sense of how often batches are fully filled out and when they are not. Also, if batches aren't getting maxed, it'd be interesting to maybe know what their size is.
so maybe logstash should expose metric such as, for each time interval (like the one used for metric snapshots):
- the average or mean batch size
- the minimum and maximum sizes of a batch observed in that period
- the standard deviation of the batch size
Most of these metrics would required keeping the batch size count for all batches in that time period which may be too heavy. Any suggestions on other ways of measuring this?