-
Notifications
You must be signed in to change notification settings - Fork 29
further sanitize sourcefiles, strict #6103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release/1.17.0 #6103 +/- ##
====================================================
- Coverage 74.23% 74.18% -0.06%
+ Complexity 5660 5657 -3
====================================================
Files 389 389
Lines 20324 20327 +3
Branches 2098 2098
====================================================
- Hits 15087 15079 -8
- Misses 4236 4247 +11
Partials 1001 1001
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Screenshot attached to jira ticket for reviewers (that can replicate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR appears to sanitize all SourceFile
content when it is serialized to json, both in endpoint responses and when used by the ES code (because the ES code happens to convert an entry to an ES request by converting to a json representation). The former makes me a little nervous: there's lots of different types of SourceFiles, and it'd be undesirable for any non-malicious content to get mangled/reformatted as it emerged from our endpoints (on the way to the UI). Imagine a non-HTML source file that contains some greater-than/less-than signs plus some intervening content, that looks superficially like a bad tag.
Good point, I limited the clean-up to elastic search |
oops, more clean-up |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any HTML that's in the other indexed fields (description, authors, etc) could show up in the highlighted results. Should we sanitize those, too?
Follow with https://ucsc-cgl.atlassian.net/browse/SEAB-7136 |
Description
Looks like they're still working on angular/angular#36650 so took up suggestion to just sanitize inside webservice
Some docs at https://jsoup.org/cookbook/cleaning-html/safelist-sanitizer
Review Instructions
Replicate linked result and compare with staging
Issue
https://ucsc-cgl.atlassian.net/browse/SEAB-7112
Security and Privacy
Changes how we deal with html, but should be an improvement
Please make sure that you've checked the following before submitting your pull request. Thanks!
mvn clean install
@RolesAllowed
annotation