8000 pmdastatsd new QA and small agent changes by Erbenos · Pull Request #836 · performancecopilot/pcp · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

pmdastatsd new QA and small agent changes #836

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Feb 18, 2020
Merged

pmdastatsd new QA and small agent changes #836

merged 14 commits into from
Feb 18, 2020

Conversation

Erbenos
Copy link
Collaborator
@Erbenos Erbenos commented Feb 6, 2020

New QA has better coverage than the old one. Includes a Valgrind test.

pmdastatsd received small changes to double metric values handling and improvements to clean up procedures.

@Erbenos Erbenos requested a review from natoscott February 6, 2020 22:38
# - statsd.pmda.dropped (before: 0, after: 40)
# - statsd.pmda.time_spent_parsing (before: 0, after: non-zero)
# - statsd.pmda.time_spent_aggregating (before: 0, after: non-zero)
# - statsd.pmda.metrics_tracked, with its counter, gauge, duration and total instances (before: 0, 0, 0, 0; after: 4, 4, 2, 10)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how these tests work, but is this supposed to check also the before/after state?

expected_percentile90 = 18000000
expected_percentile95 = 19000000
expected_percentile99 = 19800000
expected_stddev = 5773500.278068
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah disregard, I see how this is done here.

"cache_cleared:1|msa",
"session_started:|ms",
":20|ms"
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: If these "bad" metric names are not needed for these test, I'd probably drop these as these were already tested above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12.py checks if the specificity of various verbose levels is ordered correctly. Payloads as above would generate additional err logs.


run_test()


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does only check of there are any errors I assume? Would it be feasible also to check those metrics created?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

08.py checks if labels are OK, the payload is nearly identical as in case of 13.py

@lzap
Copy link
Collaborator
lzap commented Feb 11, 2020

Tests look good.

Copy link
Member
@natoscott natoscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - small comments follow in-line.

qa/1599 Outdated
which ruby >/dev/null 2>&1 || _notrun "ruby not installed"
which python >/dev/null 2>&1 || _notrun "python not installed"

which valgrind >/dev/null 2>&1 || _notrun "valgrind not installed"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch over to '_check_valgrind' here (from qa/common.check)

Copy link
Collaborator Author
@Erbenos Erbenos Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by 78b11bf

if utils.check_is_in_bounds(expected_average, number_value):
status = True
elif k == "/count":
# TODO: Ask Nathan, if this is OK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part? The 0.5?

Copy link
Collaborator Author
@Erbenos Erbenos Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that one, basically I am not sure, how to accurately test for datagrams lost/processed ratio. So I just check if the value is within a somewhat expected margin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's fine - it's a common pattern - when writing shell tests we even have a shared _within_tolerance() function available for this purpose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I will leave it as is and remove the TODO comment, thanks.

Copy link
Collaborator Author
@Erbenos Erbenos Feb 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to set up a new VM to work on this on which I encounter following phenomena: even 0.5 toleration margin is not enough, particularly in relation to count of received UDP payloads, even after setting sysctl net.core.rmem_max and sysctl net.core.rmem_default to 26214400, which makes me worry about reliability of such a test - even though its UDP after all, what would your approach to this be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set the acceptable boundaries sufficiently wide that any reasonable scenario would be catered for. Add comments to the test discussing possible causes for values outside these boundaries, in case people have to come along later and widen the boundaries further. Even with wide boundaries, its often still likely to be a worthwhile test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, thanks.

sock.sendto("test_gauge2:+{}|g".format(overflow_payload).encode("utf-8"), (ip, port))
utils.print_metric("statsd.pmda.dropped")
utils.print_metric("statsd.test_gauge2")
# TODO: check if this is truly the desired behavior
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now would be a good time to check, and figure out an answer (before this is merged) - otherwise, we're all likely to forget over time.

Copy link
Collaborator Author
@Erbenos Erbenos Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually already did the other day (we had small chat in Slack about it), emailed it trough with Lukas if it is OK and we resolved it. I will remove the comment in code.

$(INSTALL) -m 755 -d $(TESTDIR)
$(INSTALL) -m 644 $(RBFILES) $(TESTDIR)
# I wish I knew how to extract such listing from $(TESTFILES), not sure why this is needed yet
$(INSTALL) -m 644 src/cases/01.py $(TESTDIR)/src/cases/01.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the same way TESTDIR is installed above, each new directory used also needs to be $(INSTALL)'d ... otherwise, the packaging ends up slightly incorrect (nothing 'owns' the directory is its not explicitly installed).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added GNUmakefile's to each qa/statsd dir in 78b11bf (as inspired by qa/slurm). I guess GNUmakefile.install are for post install steps once all test files are $(INSTALL)-ed? Will leave only default setup install clean check in them since python is interpreted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added GNUmakefile's to each qa/statsd dir in 78b11bf (as inspired by qa/slurm).

Perfect.

I guess GNUmakefile.install are for post install steps once all test files are $(INSTALL)-ed?

This is the makefile that will be in the pcp-testsuite (rpm/deb) package, used when tests are run from /var/lib/pcp/testsuite

$(INSTALL) -m 644 src/cases/13.py $(TESTDIR)/src/cases/13.py
$(INSTALL) -m 644 src/cases/14.py $(TESTDIR)/src/cases/14.py
$(INSTALL) -m 644 src/cases/15.py $(TESTDIR)/src/cases/15.py
$(INSTALL) -m 644 src/configs/complex/0/pmdastatsd.ini $(TESTDIR)/src/configs/complex/0/pmdastatsd.ini
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise for each of these new ini file dirs :|

Since these configs are so small, I'd consider generating them at runtime instead of committing them as individual files in the repo.

@@ -15,8 +15,9 @@ echo "QA output created by $seq"

test -e $PCP_PMDAS_DIR/statsd/pmdastatsd || _notrun "statsd PMDA not installed"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment at the start of this file refering to 'ruby' needs updating.

@@ -15,8 +15,9 @@ echo "QA output created by $seq"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider replacing the above three lines with common.python now.

qa/1599 Outdated
@@ -15,8 +15,9 @@ echo "QA output created by $seq"

test -e $PCP_PMDAS_DIR/statsd/pmdastatsd || _notrun "statsd PMDA not installed"

# NOTE: Miroslav is planning to re-work this in python/shell
which ruby >/dev/null 2>&1 || _notrun "ruby not installed"
which python >/dev/null 2>&1 || _notrun "python not installed"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common.python will have checked this (python not installed) for you already.

qa/1599 Outdated
for script in $scripts
do
ruby $script
python $script $here/statsd/output
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using common.python, this should then be $python rather than just python.

This lets us test both python2 and python3 more easily (we use python3 more aggressively - whenever available - over python2).

@Erbenos
Copy link
Collaborator Author
Erbenos commented Feb 12, 2020

Thank you for your review so far, I will address the issues you brought up.

…s, hardcoded various test configs to replace static files, added _check_valgrind check, removed some TODO's, updated comment at the start of 1599, removed python check as common.python checks it, cleaned up debug_output_filename output file output's to be more syntatically consistent, disabled 13.py test case
@Erbenos
Copy link
Collaborator Author
Erbenos commented Feb 17, 2020

The commit above addressed issues mentioned but also disabled 13.py as I noticed the test case is not deterministic and will need to be filtered/processed in python further.

@natoscott natoscott merged commit 78b11bf into performancecopilot:master Feb 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0