-
Notifications
You must be signed in to change notification settings - Fork 29
SEAB-6148: Log more diagnostic information #5821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEAB-6148: Log more diagnostic information #5821
Conversation
Seems like a lot of data, not having looked at the code yet, wonder if it should be logged to a separate appender/file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually, looks good.
Do you have some sample output, just to get a sense of what it might look like and how large it would be?
Basically, this PR introduces a new
DebugHelper
class which wires itself into the infrastructure and logs various information. "Global" info about threads, processes, the database, and memory is logged periodically via aTimer
, and request-level info about a handling thread and HibernateSession
is logged at the end of each request.Seems like a lot of data, not having looked at the code yet, wonder if it should be logged to a separate appender/file https://www.dropwizard.io/en/stable/manual/configuration.html#logging that could eventually end up in a log group so we can more easily choose what to view and/or have different expiry/deletion rules
On the one hand, yes it could make the "main" log really noisy and might be worth separating. On the other hand, if it would make easier correlating things if we're researching an issue, e.g., seeing the thread dump right before/after a request is logged would be useful. I wonder if CloudWatch has a way to "merge" log groups when querying?
|
||
// Request started. | ||
8000 | case START: | |
logStarted(request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be very useful; right now the request is only logged when completed. If we die (rare, but it's happened) processing a request, it's hard to piece together what the request was.
Looks like, yes, via the insights queries |
Some thoughts:
Regarding size, there's two groups of information being logged. The first, thread/request-level info, is pretty small:
The periodic "global" information is larger:
I've elided most of the thread dump, in total. On my development box, it's around 10 lines per thread, and 34 threads are listed, totally approximately 25kB. The memory pool dump is the next biggest entry, about 2.5kB total. The My feeling is that, although it's intriguing to ponder logging this stuff to a different group, it might be more trouble than it's worth, if we can format the debug info so it doesn't overwhelm the other log info. |
I think this could work, in addition to swapping most of these outputs from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleaning list
public final class DebugHelper { | ||
|
||
private static final Logger LOG = LoggerFactory.getLogger(DebugHelper.class); | ||
private static final long LOG_PERIOD_MS = 10000L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reviewers, every ten seconds if curious
dockstore-webservice/src/main/java/io/dockstore/webservice/helpers/CensorHelper.java
Fixed
Show fixed
Hide fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome to see new diagnostic information in the logs! It would be nice-to-have that portions that are quantifiable had metrics defined in dockstore-deploy so we could track these over time/per container.
In my understanding the logs have the same access restrictions as the container and DB so censoring the logs doesn't seem critical. For maintainability I would consider not including that part, however it is good to have.
It might be preferable use an existing library (which I couldn't immediately find) or something like https://github.com/mazen160/secrets-patterns-db/blob/master/db/rules-stable.yml or https://github.com/gitleaks/gitleaks/blob/master/config/gitleaks.toml as linked by Denis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I think that puts a finger on two things that have been bugging me about this. I think I'll have to type up something longer in #dockstore-devs
|
Ok, I made a few changes, the current state is the proposed final version. Part of my original "move it forward" plan was to remove the custom censoring code and replace it with the AWS Cloudwatch "data protection" feature. However, upon deeper inspection, I learned that the AWS feature probably wouldn't catch the type of secrets that might leak, either because it's not looking for certain types of secrets (GitHub tokens, for example), or it requires they appear in certain contexts to detect them (for example, an AWS key needs to appear in proximity to the keyword By far, the most likely place for secrets to leak is the So, I've removed the original webservice-side censoring code, and since the AWS data protection feature doesn't perform very well in our use case, I recommend that, at this juncture, we don't spend the time to enable it. I should note that, left to my own devices, I would include the original webservice-side censoring code, but having dropped the |
There is a custom data identifier feature https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL-custom-data-identifiers.html |
I think it should only be a handful of steps https://aws.amazon.com/blogs/aws/protect-sensitive-data-with-amazon-cloudwatch-logs/ but you can create a follow-up ticket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer to have at least the follow-up ticket for data protection
Description
This PR introduces a new
DiagnosticsHelper
class that the webservice application can wire into itself and log various information:running processes,the filesystem, the database, and memory pools, logged via aTimer
daemon thread.Session
, logged at the start and finish of each request, two log entries per request.In this iteration, I dropped the thread dump from the periodic logging output. At 25kB a pop, it didn't seem worth it.
Diagnostic logging is enabled using the
diagnosticsConfig
configuration file property. Periodic and request logging can be enabled/disabled separately, and the logging period can be set. By default, diagnostic logging is disabled. To enable both periodic and request logging, add the following to your config file:To review the output, pipe the webservice stdout into
tee /tmp/output
and view the resulting file. Search forDiagnosticsHelper
to find the new log entries.A big concern is that, since we're logging data pulled out of the guts of our runtime environment, we'll accidentally log a secret, especially via theps
output, which includes the arguments to commands. We purposefully do not log environment variables, which are a known vector for secrets.To reduce the chance of a secret spilling, all of the information that is logged - literally the entirety of each log message - passes through censoring code that looks for runs of characters that appear to be hex/base64, does some analysis on the character distributions, and X's them out if they appear to be keys. The crux here is that file paths resemble base64 strings because they tend to contain the same characters, so we need a way to let them through uncensored. The assumption is that a key will be well-encoded and contain a random jumble of characters, whereas a path (or other structured non-key data) will contain longer runs of the same character classes, and/or contain text that resembles English. The resulting censoring scheme works pretty well, see the comments on theCensorHelper
class for more information.Another concern is that these changes will introduce instability into the webservice. Keep an eye out for that type of problem. We're purposefully doing the periodic logging via a
Timer
/thread so in case it hangs or throws, the normal functions of the webservice aren't as likely to be impacted.We can easily add a filter that will exclude certain requests from diagnostic logging (for example, hits to the health check endpoint).
Review Instructions
View the Cloudwatch logs for qa, and confirm that diagnostic log entries are being created before and after each request, and periodically every so often. Search for "DiagnosticsHelper" to find the entries.
Issue
https://ucsc-cgl.atlassian.net/browse/SEAB-6148
Security and Privacy
Many security concerns, see description.
Please make sure that you've checked the following before submitting your pull request. Thanks!
mvn clean install
@RolesAllowed
annotation