8000 Add -p/--progress bar option to rdump by JSCU-CNI · Pull Request #166 · fox-it/flow.record · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add -p/--progress bar option to rdump #166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 20, 2025

Conversation

JSCU-CNI
Copy link
Contributor
@JSCU-CNI JSCU-CNI commented Feb 19, 2025

Fixes #165. This PR adds a generic process bar to rdump using tqdm.

Adapters subclassed from AbstractWriter can declare themselves compatible with the progress bar using the following line in their __init__ call: super().__init__(progress=kwargs.get("progress"))

They can then use the self.__progress_count__() method to add a processed record to the adapter.`

This can be enabled by using the runtime rdump flag --progress or shorthand -p.

After review we can add the desired method to AbstractReader too.

Copy link
codecov bot commented Feb 19, 2025

Codecov Report

Attention: Patch coverage is 76.92308% with 3 lines in your changes missing coverage. Please review.

Project coverage is 82.84%. Comparing base (6033534) to head (76ee3bd).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
flow/record/tools/rdump.py 76.92% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #166      +/-   ##
==========================================
- Coverage   82.86%   82.84%   -0.03%     
==========================================
  Files          34       34              
  Lines        3561     3573      +12     
==========================================
+ Hits         2951     2960       +9     
- Misses        610      613       +3     
Flag Coverage Δ
unittests 82.84% <76.92%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yunzheng
Copy link
Member
yunzheng commented Feb 19, 2025

I was thinking more of simply wrapping tqdm around the record_stream generator, like this:

    record_iterator = islice(record_stream(args.src, selector), args.skip, islice_stop)
    if HAS_TQDM and args.progress:
        record_iterator = tqdm.tqdm(record_iterator, unit=" records")
    count = 0
    record_writer = None

    try:
        record_writer = RecordWriter(uri, progress=args.progress)
        for count, rec in enumerate(record_iterator, start=1):  # noqa: B007

This does mean the progress is shown for reading records and not writing, but they are closely tied to each other anyways in rdump in the form of stdin and stdout.

Note that for writing to elastic or any adapter that has an internal Queue it would mean the speed is not 100% accurate in the beginning but should normalise after the queue is full.

@JSCU-CNI
Copy link
Contributor Author

I was thinking more of simply wrapping tqdm around the record_stream generator, like this:

That is a neat solution indeed. Implemented in 4a12985.

@yunzheng yunzheng changed the title Add generic process bar to AbstractWriter Add -p/--progress bar option to rdump Feb 19, 2025
JSCU-CNI and others added 2 commits February 20, 2025 09:53
Co-authored-by: Yun Zheng Hu <hu@fox-it.com>
@JSCU-CNI JSCU-CNI requested a review from yunzheng February 20, 2025 08:55
Copy link
Member
@yunzheng yunzheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, can you also apply the following patch to add a small unit test?

From 71601887035a300a0769c76a88469e320caf96c4 Mon Sep 17 00:00:00 2001
From: Yun Zheng Hu <hu@fox-it.com>
Date: Thu, 20 Feb 2025 09:46:38 +0000
Subject: [PATCH] Add a small unit test for progres bar

---
 pyproject.toml      |  1 +
 tests/test_rdump.py | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/pyproject.toml b/pyproject.toml
index 39c2657..9776758 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -62,6 +62,7 @@ test = [
     "flow.record[elastic]",
     "duckdb; platform_python_implementation != 'PyPy' and python_version < '3.12'", # duckdb
     "pytz; platform_python_implementation != 'PyPy' and python_version < '3.12'", # duckdb
+    "tqdm",
 ]
 full = [
     "flow.record[compression]",
diff --git a/tests/test_rdump.py b/tests/test_rdump.py
index 37a201f..783855a 100644
--- a/tests/test_rdump.py
+++ b/tests/test_rdump.py
@@ -696,3 +696,29 @@ def test_rdump_line_verbose(tmp_path: Path, capsys: pytest.CaptureFixture, rdump
     assert "data (bytes) =" in captured.out
     assert "counter (uint32) =" in captured.out
     assert "foo (string) =" in captured.out
+
+
+def test_rdump_list_progress(tmp_path: Path, capsys: pytest.CaptureFixture) -> None:
+    TestRecord = RecordDescriptor(
+        "test/rdump/progress",
+        [
+            ("uint32", "counter"),
+        ],
+    )
+    record_path = tmp_path / "test.records"
+
+    with RecordWriter(record_path) as writer:
+        for i in range(100):
+            writer.write(TestRecord(counter=i))
+
+    rdump.main(["--list", "--progress", str(record_path)])
+    captured = capsys.readouterr()
+
+    # stderr should contain tqdm progress bar
+    #   100 records [00:00, 64987.67 records/s]
+    assert "\r100 records [" in captured.err
+    assert " records/s]" in captured.err
+
+    # stdout should contain the RecordDescriptor definition and count
+    assert "# <RecordDescriptor test/rdump/progress, hash=eeb21156>" in captured.out
+    assert "Processed 100 records" in captured.out
-- 
2.47.1

@JSCU-CNI JSCU-CNI requested a review from yunzheng February 20, 2025 10:06
@yunzheng yunzheng merged commit f4eba70 into fox-it:main Feb 20, 2025
18 of 22 checks passed
@JSCU-CNI JSCU-CNI deleted the feature/progress branch February 20, 2025 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add progres bar to rdump
2 participants
0