10000 further sanitize sourcefiles, strict by denis-yuen · Pull Request #6103 · dockstore/dockstore · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

further sanitize sourcefiles, strict #6103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 2, 2025

Conversation

denis-yuen
Copy link
Member
@denis-yuen denis-yuen commented Apr 28, 2025

Description
Looks like they're still working on angular/angular#36650 so took up suggestion to just sanitize inside webservice
Some docs at https://jsoup.org/cookbook/cleaning-html/safelist-sanitizer

Review Instructions
Replicate linked result and compare with staging

Issue
https://ucsc-cgl.atlassian.net/browse/SEAB-7112

Security and Privacy

Changes how we deal with html, but should be an improvement

  • Security and Privacy assessed

Please make sure that you've checked the following before submitting your pull request. Thanks!

  • Check that you pass the basic style checks and unit tests by running mvn clean install
  • Ensure that the PR targets the correct branch. Check the milestone or fix version of the ticket.
  • Follow the existing JPA patterns for queries, using named parameters, to avoid SQL injection
  • If you are changing dependencies, check the Snyk status check or the dashboard to ensure you are not introducing new high/critical vulnerabilities
  • Assume that inputs to the API can be malicious, and sanitize and/or check for Denial of Service type values, e.g., massive sizes
  • Do not serve user-uploaded binary images through the Dockstore API
  • Ensure that endpoints that only allow privileged access enforce that with the @RolesAllowed annotation
  • Do not create cookies, although this may change in the future
  • If this PR is for a user-facing feature, create and link a documentation ticket for this feature (usually in the same milestone as the linked issue). Style points if you create a documentation PR directly and link that instead.

@denis-yuen denis-yuen self-assigned this Apr 28, 2025
Copy link
codecov bot commented Apr 28, 2025

Codecov Report

Attention: Patch coverage is 25.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 74.18%. Comparing base (44f417b) to head (984c287).
Report is 4 commits behind head on release/1.17.0.

Files with missing lines Patch % Lines
.../java/io/dockstore/webservice/core/SourceFile.java 0.00% 3 Missing ⚠️
Additional details and impacted files
@@                 Coverage Diff                  @@
##             release/1.17.0    #6103      +/-   ##
====================================================
- Coverage             74.23%   74.18%   -0.06%     
+ Complexity             5660     5657       -3     
====================================================
  Files                   389      389              
  Lines                 20324    20327       +3     
  Branches               2098     2098              
====================================================
- Hits                  15087    15079       -8     
- Misses                 4236     4247      +11     
  Partials               1001     1001              
Flag Coverage Δ
bitbuckettests 25.93% <25.00%> (-0.02%) ⬇️
hoverflytests 27.63% <25.00%> (-0.01%) ⬇️
integrationtests 56.07% <25.00%> (-0.01%) ⬇️
languageparsingtests 10.82% <25.00%> (-0.01%) ⬇️
localstacktests 21.34% <25.00%> (-0.01%) ⬇️
regressionintegrationtests ?
toolintegrationtests 29.92% <25.00%> (-0.01%) ⬇️
unit-tests_and_non-confidential-tests 26.30% <25.00%> (+<0.01%) ⬆️
workflowintegrationtests 37.37% <25.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@denis-yuen denis-yuen marked this pull request as ready for review April 29, 2025 21:19
@denis-yuen
Copy link
Member Author

Screenshot attached to jira ticket for reviewers (that can replicate)

@denis-yuen denis-yuen requested review from a team, kathy-t and svonworl and removed request for a team April 29, 2025 21:45
Copy link
Contributor
@svonworl svonworl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR appears to sanitize all SourceFile content when it is serialized to json, both in endpoint responses and when used by the ES code (because the ES code happens to convert an entry to an ES request by converting to a json representation). The former makes me a little nervous: there's lots of different types of SourceFiles, and it'd be undesirable for any non-malicious content to get mangled/reformatted as it emerged from our endpoints (on the way to the UI). Imagine a non-HTML source file that contains some greater-than/less-than signs plus some intervening content, that looks superficially like a bad tag.

@denis-yuen denis-yuen removed the request for review from kathy-t April 30, 2025 14:51
@denis-yuen
Copy link
Member Author

Good point, I limited the clean-up to elastic search

@denis-yuen
Copy link
Member Author

oops, more clean-up

Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
25.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@denis-yuen denis-yuen requested review from a team, kathy-t and svonworl and removed request for a team April 30, 2025 20:08
Copy link
Contributor
@svonworl svonworl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any HTML that's in the other indexed fields (description, authors, etc) could show up in the highlighted results. Should we sanitize those, too?

@denis-yuen
Copy link
Member Author

Any HTML that's in the other indexed fields (description, authors, etc) could show up in the highlighted results. Should we sanitize those, too?

Follow with https://ucsc-cgl.atlassian.net/browse/SEAB-7136

@denis-yuen denis-yuen requested a review from svonworl May 1, 2025 14:53
@denis-yuen denis-yuen merged commit c62fc23 into release/1.17.0 May 2, 2025
21 of 24 checks passed
@denis-yuen denis-yuen deleted the further_clean_sourcefile_content branch May 2, 2025 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0