-
Notifications
You must be signed in to change notification settings - Fork 29
SEAB-6489: Version-level sourcefile size limits #5932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEAB-6489: Version-level sourcefile size limits #5932
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5932 +/- ##
=============================================
+ Coverage 74.33% 74.34% +0.01%
- Complexity 5369 5374 +5
=============================================
Files 375 376 +1
Lines 19432 19444 +12
Branches 2031 2032 +1
=============================================
+ Hits 14444 14455 +11
Misses 4015 4015
- Partials 973 974 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments
dockstore-integration-testing/src/test/java/io/dockstore/client/cli/GeneralWorkflowIT.java
Show resolved
Hide resolved
dockstore-integration-testing/src/test/java/io/dockstore/webservice/WebhookIT.java
Show resolved
Hide resolved
dockstore-webservice/src/main/java/io/dockstore/webservice/helpers/LimitHelper.java
8000
Show resolved
Hide resolved
} | ||
|
||
private static long totalFileSize(Version<?> version) { | ||
return version.getSourceFiles().stream().mapToLong(file -> file.getContent().length()).sum(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are over 1K source files with null
content in a recent DB dump. Can they cause an NPE here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why yes, they can! Good catch, Charles!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related: I am considering making the "content" field of SourceFile not nullable. Not sure it ever makes sense for a Sourcefile's content to be null. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related: I am considering making the "content" field of SourceFile not nullable. Not sure it ever makes sense for a Sourcefile's content to be null. Thoughts?
Do we have SourceFile
s corresponding to not-on-GitHub files? It seems like we shouldn't, but if we do have that, it seems like the content should be:
null
if the file isn't on GitHub- empty if the file is on GitHub, but no content
- the content if the file is on GitHub with content.
I guess we need to figure out how the null
content got there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need to figure out how the
null
content got there.
At least some happened when the file content retrieval code returned null
, which happened for large files. Quote from #5893:
the GitHub API query used to retrieve a file's contents returns null when the file's size is >= 1MB. This null is stored as the corresponding SourceFile's content.
This was a bug, and has been fixed. Possibly, there are other causes, our file retrieval code doesn't feel super robust...
I will address the "should sourcefile content ever be null" issue in an upcoming PR.
|
Description
This PR limits the SourceFiles in a newly-registered Version to a total size of no more than 10MB. The new code is on the manual, .dockstore.yml-based, and hosted Workflow (bioworkflows, apptools, services, notebooks) registration paths.
Review Instructions
Register an entry with a number of sourcefiles that total more than 10MB. Make sure that each sourcefile is less than 1MB to avoid the individual sourcefile size limit. To avoid limits that might be coded into the language handler (for example, the cwl handler limits the size of the "expanded" CWL, in which imports have been inlined, to 4MB), prefer test files.
Issue
https://ucsc-cgl.atlassian.net/browse/SEAB-6489
Security and Privacy
If there are any concerns that require extra attention from the security team, highlight them here and check the box when complete.
e.g. Does this change...
Please make sure that you've checked the following before submitting your pull request. Thanks!
mvn clean install
@RolesAllowed
annotation