8000 SEAB-6489: Version-level sourcefile size limits by svonworl · Pull Request #5932 · dockstore/dockstore · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

SEAB-6489: Version-level sourcefile size limits #5932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

svonworl
Copy link
Contributor
@svonworl svonworl commented Jul 9, 2024

Description
This PR limits the SourceFiles in a newly-registered Version to a total size of no more than 10MB. The new code is on the manual, .dockstore.yml-based, and hosted Workflow (bioworkflows, apptools, services, notebooks) registration paths.

Review Instructions
Register an entry with a number of sourcefiles that total more than 10MB. Make sure that each sourcefile is less than 1MB to avoid the individual sourcefile size limit. To avoid limits that might be coded into the language handler (for example, the cwl handler limits the size of the "expanded" CWL, in which imports have been inlined, to 4MB), prefer test files.

Issue
https://ucsc-cgl.atlassian.net/browse/SEAB-6489

Security and Privacy

If there are any concerns that require extra attention from the security team, highlight them here and check the box when complete.

  • Security and Privacy assessed

e.g. Does this change...

  • Any user data we collect, or data location?
  • Access control, authentication or authorization?
  • Encryption features?

Please make sure that you've checked the following before submitting your pull request. Thanks!

  • Check that you pass the basic style checks and unit tests by running mvn clean install
  • Ensure that the PR targets the correct branch. Check the milestone or fix version of the ticket.
  • Follow the existing JPA patterns for queries, using named parameters, to avoid SQL injection
  • If you are changing dependencies, check the Snyk status check or the dashboard to ensure you are not introducing new high/critical vulnerabilities
  • Assume that inputs to the API can be malicious, and sanitize and/or check for Denial of Service type values, e.g., massive sizes
  • Do not serve user-uploaded binary images through the Dockstore API
  • Ensure that endpoints that only allow privileged access enforce that with the @RolesAllowed annotation
  • Do not create cookies, although this may change in the future
  • If this PR is for a user-facing feature, create and link a documentation ticket for this feature (usually in the same milestone as the linked issue). Style points if you create a documentation PR directly and link that instead.

@svonworl svonworl requested review from denis-yuen, coverbeck, david4096, kathy-t and hyunnaye and removed request for denis-yuen July 9, 2024 16:31
Copy link
codecov bot commented Jul 9, 2024

Codecov Report

Attention: Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 74.34%. Comparing base (00baec3) to head (65e9de1).
Report is 1 commits behind head on develop.

Files Patch % Lines
...a/io/dockstore/webservice/helpers/LimitHelper.java 88.88% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop    #5932      +/-   ##
=============================================
+ Coverage      74.33%   74.34%   +0.01%     
- Complexity      5369     5374       +5     
=============================================
  Files            375      376       +1     
  Lines          19432    19444      +12     
  Branches        2031     2032       +1     
=============================================
+ Hits           14444    14455      +11     
  Misses          4015     4015              
- Partials         973      974       +1     
Flag Coverage Δ
bitbuckettests 26.93% <41.66%> (+<0.01%) ⬆️
hoverflytests 27.40% <50.00%> (+0.01%) ⬆️
integrationtests 56.87% <91.66%> (+0.02%) ⬆️
languageparsingtests 11.08% <0.00%> (-0.01%) ⬇️
localstacktests 21.61% <41.66%> (+0.01%) ⬆️
toolintegrationtests 30.32% <50.00%> (+0.01%) ⬆️
unit-tests_and_non-confidential-tests 25.96% <0.00%> (-0.02%) ⬇️
workflowintegrationtests 38.25% <91.66%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member
@denis-yuen denis-yuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

}

private static long totalFileSize(Version<?> version) {
return version.getSourceFiles().stream().mapToLong(file -> file.getContent().length()).sum();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are over 1K source files with null content in a recent DB dump. Can they cause an NPE here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why yes, they can! Good catch, Charles!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: I am considering making the "content" field of SourceFile not nullable. Not sure it ever makes sense for a Sourcefile's content to be null. Thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: I am considering making the "content" field of SourceFile not nullable. Not sure it ever makes sense for a Sourcefile's content to be null. Thoughts?

Do we have SourceFiles corresponding to not-on-GitHub files? It seems like we shouldn't, but if we do have that, it seems like the content should be:

  • null if the file isn't on GitHub
  • empty if the file is on GitHub, but no content
  • the content if the file is on GitHub with content.

I guess we need to figure out how the null content got there.

Copy link
Contributor Author
@svonworl svonworl Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to figure out how the null content got there.

At least some happened when the file content retrieval code returned null, which happened for large files. Quote from #5893:

the GitHub API query used to retrieve a file's contents returns null when the file's size is >= 1MB. This null is stored as the corresponding SourceFile's content.

This was a bug, and has been fixed. Possibly, there are other causes, our file retrieval code doesn't feel super robust...

I will address the "should sourcefile content ever be null" issue in an upcoming PR.

Copy link

@svonworl svonworl merged commit 7022dd4 into develop Jul 11, 2024
18 of 19 checks passed
@svonworl svonworl deleted the feature/seab-6489/version-level-sourcefile-size-limits branch July 11, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0