-
Notifications
You must be signed in to change notification settings - Fork 11
Add -p/--progress bar option to rdump #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #166 +/- ##
==========================================
- Coverage 82.86% 82.84% -0.03%
==========================================
Files 34 34
Lines 3561 3573 +12
==========================================
+ Hits 2951 2960 +9
- Misses 610 613 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
I was thinking more of simply wrapping record_iterator = islice(record_stream(args.src, selector), args.skip, islice_stop)
if HAS_TQDM and args.progress:
record_iterator = tqdm.tqdm(record_iterator, unit=" records")
count = 0
record_writer = None
try:
record_writer = RecordWriter(uri, progress=args.progress)
for count, rec in enumerate(record_iterator, start=1): # noqa: B007 This does mean the progress is shown for reading records and not writing, but they are closely tied to each other anyways in rdump in the form of stdin and stdout. Note that for writing to elastic or any adapter that has an internal Queue it would mean the speed is not 100% accurate in the beginning but should normalise after the queue is full. |
That is a neat solution indeed. Implemented in 4a12985. |
Co-authored-by: Yun Zheng Hu <hu@fox-it.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, can you also apply the following patch to add a small unit test?
From 71601887035a300a0769c76a88469e320caf96c4 Mon Sep 17 00:00:00 2001
From: Yun Zheng Hu <hu@fox-it.com>
Date: Thu, 20 Feb 2025 09:46:38 +0000
Subject: [PATCH] Add a small unit test for progres bar
---
pyproject.toml | 1 +
tests/test_rdump.py | 26 ++++++++++++++++++++++++++
2 files changed, 27 insertions(+)
diff --git a/pyproject.toml b/pyproject.toml
index 39c2657..9776758 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -62,6 +62,7 @@ test = [
"flow.record[elastic]",
"duckdb; platform_python_implementation != 'PyPy' and python_version < '3.12'", # duckdb
"pytz; platform_python_implementation != 'PyPy' and python_version < '3.12'", # duckdb
+ "tqdm",
]
full = [
"flow.record[compression]",
diff --git a/tests/test_rdump.py b/tests/test_rdump.py
index 37a201f..783855a 100644
--- a/tests/test_rdump.py
+++ b/tests/test_rdump.py
@@ -696,3 +696,29 @@ def test_rdump_line_verbose(tmp_path: Path, capsys: pytest.CaptureFixture, rdump
assert "data (bytes) =" in captured.out
assert "counter (uint32) =" in captured.out
assert "foo (string) =" in captured.out
+
+
+def test_rdump_list_progress(tmp_path: Path, capsys: pytest.CaptureFixture) -> None:
+ TestRecord = RecordDescriptor(
+ "test/rdump/progress",
+ [
+ ("uint32", "counter"),
+ ],
+ )
+ record_path = tmp_path / "test.records"
+
+ with RecordWriter(record_path) as writer:
+ for i in range(100):
+ writer.write(TestRecord(counter=i))
+
+ rdump.main(["--list", "--progress", str(record_path)])
+ captured = capsys.readouterr()
+
+ # stderr should contain tqdm progress bar
+ # 100 records [00:00, 64987.67 records/s]
+ assert "\r100 records [" in captured.err
+ assert " records/s]" in captured.err
+
+ # stdout should contain the RecordDescriptor definition and count
+ assert "# <RecordDescriptor test/rdump/progress, hash=eeb21156>" in captured.out
+ assert "Processed 100 records" in captured.out
--
2.47.1
Fixes #165. This PR adds a generic process bar to
rdump
usingtqdm
.Adapters subclassed fromAbstractWriter
can declare themselves compatible with the progress bar using the following line in their__init__
call:super().__init__(progress=kwargs.get("progress"))
They can then use theself.__progress_count__()
method to add a processed record to the adapter.`This can be enabled by using the runtime rdump flag
--progress
or shorthand-p
.After review we can add the desired method toAbstractReader
too.