-
-
Notifications
You must be signed in to change notification settings - Fork 245
pmdastatsd new QA and small agent changes #836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pmdastatsd new QA and small agent changes #836
Conversation
…settings configuration via ini files
…ault values of hardcoded metrics, updates of non-config hardcoded metrics, counter metric type, gauge metric type
…ple stress test case, added instance processing abstractions to qa utils
… test for port to listen on setting
…ut_filename option
…oth inf/-inf and loss of precision
…after the pmlog is initialized
# - statsd.pmda.dropped (before: 0, after: 40) | ||
# - statsd.pmda.time_spent_parsing (before: 0, after: non-zero) | ||
# - statsd.pmda.time_spent_aggregating (before: 0, after: non-zero) | ||
# - statsd.pmda.metrics_tracked, with its counter, gauge, duration and total instances (before: 0, 0, 0, 0; after: 4, 4, 2, 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how these tests work, but is this supposed to check also the before/after state?
expected_percentile90 = 18000000 | ||
expected_percentile95 = 19000000 | ||
expected_percentile99 = 19800000 | ||
expected_stddev = 5773500.278068 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah disregard, I see how this is done here.
"cache_cleared:1|msa", | ||
"session_started:|ms", | ||
":20|ms" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: If these "bad" metric names are not needed for these test, I'd probably drop these as these were already tested above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
12.py checks if the specificity of various verbose levels is ordered correctly. Payloads as above would generate additional err logs.
|
||
run_test() | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test does only check of there are any errors I assume? Would it be feasible also to check those metrics created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
08.py checks if labels are OK, the payload is nearly identical as in case of 13.py
Tests look good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - small comments follow in-line.
qa/1599
Outdated
which ruby >/dev/null 2>&1 || _notrun "ruby not installed" | ||
which python >/dev/null 2>&1 || _notrun "python not installed" | ||
|
||
which valgrind >/dev/null 2>&1 || _notrun "valgrind not installed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch over to '_check_valgrind' here (from qa/common.check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed by 78b11bf
qa/statsd/src/cases/09.py
Outdated
if utils.check_is_in_bounds(expected_average, number_value): | ||
status = True | ||
elif k == "/count": | ||
# TODO: Ask Nathan, if this is OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part? The 0.5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that one, basically I am not sure, how to accurately test for datagrams lost/processed ratio. So I just check if the value is within a somewhat expected margin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's fine - it's a common pattern - when writing shell tests we even have a shared _within_tolerance() function available for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I will leave it as is and remove the TODO comment, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to set up a new VM to work on this on which I encounter following phenomena: even 0.5 toleration margin is not enough, particularly in relation to count of received UDP payloads, even after setting sysctl net.core.rmem_max
and sysctl net.core.rmem_default
to 26214400, which makes me worry about reliability of such a test - even though its UDP after all, what would your approach to this be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set the acceptable boundaries sufficiently wide that any reasonable scenario would be catered for. Add comments to the test discussing possible causes for values outside these boundaries, in case people have to come along later and widen the boundaries further. Even with wide boundaries, its often still likely to be a worthwhile test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, thanks.
qa/statsd/src/cases/06.py
Outdated
sock.sendto("test_gauge2:+{}|g".format(overflow_payload).encode("utf-8"), (ip, port)) | ||
utils.print_metric("statsd.pmda.dropped") | ||
utils.print_metric("statsd.test_gauge2") | ||
# TODO: check if this is truly the desired behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now would be a good time to check, and figure out an answer (before this is merged) - otherwise, we're all likely to forget over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually already did the other day (we had small chat in Slack about it), emailed it trough with Lukas if it is OK and we resolved it. I will remove the comment in code.
qa/statsd/GNUmakefile
Outdated
$(INSTALL) -m 755 -d $(TESTDIR) | ||
$(INSTALL) -m 644 $(RBFILES) $(TESTDIR) | ||
# I wish I knew how to extract such listing from $(TESTFILES), not sure why this is needed yet | ||
$(INSTALL) -m 644 src/cases/01.py $(TESTDIR)/src/cases/01.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same way TESTDIR is installed above, each new directory used also needs to be $(INSTALL)'d ... otherwise, the packaging ends up slightly incorrect (nothing 'owns' the directory is its not explicitly installed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added GNUmakefile's to each qa/statsd dir in 78b11bf (as inspired by qa/slurm). I guess GNUmakefile.install are for post install steps once all test files are $(INSTALL)
-ed? Will leave only default setup install clean check
in them since python is interpreted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added GNUmakefile's to each qa/statsd dir in 78b11bf (as inspired by qa/slurm).
Perfect.
I guess GNUmakefile.install are for post install steps once all test files are
$(INSTALL)
-ed?
This is the makefile that will be in the pcp-testsuite (rpm/deb) package, used when tests are run from /var/lib/pcp/testsuite
qa/statsd/GNUmakefile
Outdated
$(INSTALL) -m 644 src/cases/13.py $(TESTDIR)/src/cases/13.py | ||
$(INSTALL) -m 644 src/cases/14.py $(TESTDIR)/src/cases/14.py | ||
$(INSTALL) -m 644 src/cases/15.py $(TESTDIR)/src/cases/15.py | ||
$(INSTALL) -m 644 src/configs/complex/0/pmdastatsd.ini $(TESTDIR)/src/configs/complex/0/pmdastatsd.ini |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise for each of these new ini file dirs :|
Since these configs are so small, I'd consider generating them at runtime instead of committing them as individual files in the repo.
@@ -15,8 +15,9 @@ echo "QA output created by $seq" | |||
|
|||
test -e $PCP_PMDAS_DIR/statsd/pmdastatsd || _notrun "statsd PMDA not installed" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment at the start of this file refering to 'ruby' needs updating.
@@ -15,8 +15,9 @@ echo "QA output created by $seq" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider replacing the above three lines with common.python now.
qa/1599
Outdated
@@ -15,8 +15,9 @@ echo "QA output created by $seq" | |||
|
|||
test -e $PCP_PMDAS_DIR/statsd/pmdastatsd || _notrun "statsd PMDA not installed" | |||
|
|||
# NOTE: Miroslav is planning to re-work this in python/shell | |||
which ruby >/dev/null 2>&1 || _notrun "ruby not installed" | |||
which python >/dev/null 2>&1 || _notrun "python not installed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common.python will have checked this (python not installed) for you already.
qa/1599
Outdated
for script in $scripts | ||
do | ||
ruby $script | ||
python $script $here/statsd/output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using common.python, this should then be $python rather than just python.
This lets us test both python2 and python3 more easily (we use python3 more aggressively - whenever available - over python2).
Thank you for your review so far, I will address the issues you brought up. |
…s, hardcoded various test configs to replace static files, added _check_valgrind check, removed some TODO's, updated comment at the start of 1599, removed python check as common.python checks it, cleaned up debug_output_filename output file output's to be more syntatically consistent, disabled 13.py test case
The commit above addressed issues mentioned but also disabled 13.py as I noticed the test case is not deterministic and will need to be filtered/processed in python further. |
New QA has better coverage than the old one. Includes a Valgrind test.
pmdastatsd received small changes to double metric values handling and improvements to clean up procedures.