8000 SEAB-6803: Implement "versions needing a retroactive DOI" endpoint by svonworl · Pull Request #6086 · dockstore/dockstore · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

SEAB-6803: Implement "versions needing a retroactive DOI" endpoint #6086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 18, 2025

Conversation

svonworl
Copy link
Contributor
@svonworl svonworl commented Mar 14, 2025

Description
This PR adds a new admin/curator-only endpoint to the webservice that returns a list of versions in need of a retroactive automatic DOI. An external script will query this endpoint periodically, and use another endpoint to create a DOI for each version, to gradually/incrementally create auto DOIs for versions that:

  1. haven't been updated since we released the auto DOI functionality.
  2. are authorless and didn't previously qualify for a DOI (we are enabling DOIs for authorless versions, going forward).

The endpoint returns a specified number (default 100) of versions that are "most eligible" for an automatic DOI. In the response, each version is paired with its parent workflow, so that the script doesn't have to look it up.

The endpoint determines the list of "most eligible" versions by:

  1. determining the list of the workflows that are "eligible" for at least one automatic DOI, meaning that the workflow is published, hasn't opted out of auto DOI generation, and has at least one version that meets the requirements for an auto DOI (tagged, valid, etc) and currently has zero DOIs.
  2. eliminating the workflows with versions that have GitHub or manual DOIs (under the assumption that the author is managing their DOIs already, or that "automatic" generation is already set up, and we should focus our effort elsewhere).
  3. selecting the N "most eligible" workflows (those with the lowest DOI counts) from the list.
  4. for each "most eligible" workflow, determining the version that is "most eligible" for an auto DOI, where eligibility means whether it is the default version, has metrics, or was modified more recently.

The first three steps use the results of HQL queries that return workflow IDs, rather than iterating in Java code through the Hibernate entities corresponding to all workflows and versions, because the latter would take a very long time.

The implementation is very Streamey, Charles would be proud.

This endpoint is relatively hard to automatically test, given our design and architecture. Given that the functionality isn't critical, I suggest we user test via the generation script, after it's deployed to qa.

Review Instructions
Hit the endpoint, inspect the results, and confirm that they appear to be reasonable:

  • versions are tagged, valid, not hidden, and don't have a doi.
  • workflows are published, don't have many dois, and have not opted out of automatic DOI creation.

Issue
https://ucsc-cgl.atlassian.net/browse/SEAB-6803

Security and Privacy

We are creating a new endpoint that returns information from the database.

  • Security and Privacy assessed

e.g. Does this change...

  • Any user data we collect, or data location?
  • Access control, authentication or authorization?
  • Encryption features?

Please make sure that you've checked the following before submitting your pull request. Thanks!

  • Check that you pass the basic style checks and unit tests by running mvn clean install
  • Ensure that the PR targets the correct branch. Check the milestone or fix version of the ticket.
  • Follow the existing JPA patterns for queries, using named parameters, to avoid SQL injection
  • If you are changing dependencies, check the Snyk status check or the dashboard to ensure you are not introducing new high/critical vulnerabilities
  • Assume that inputs to the API can be malicious, and sanitize and/or check for Denial of Service type values, e.g., massive sizes
  • Do not serve user-uploaded binary images through the Dockstore API
  • Ensure that endpoints that only allow privileged access enforce that with the @RolesAllowed annotation
  • Do not create cookies, although this may change in the future
  • If this PR is for a user-facing feature, create and link a documentation ticket for this feature (usually in the same milestone as the linked issue). Style points if you create a documentation PR directly and link that instead.

Copy link
codecov bot commented Mar 14, 2025

Codecov Report

Attention: Patch coverage is 0% with 47 lines in your changes missing coverage. Please review.

Project coverage is 74.27%. Comparing base (9ca6e75) to head (216b619).
Report is 3 commits behind head on develop.

Files with missing lines Patch % Lines
...ckstore/webservice/resources/WorkflowResource.java 0.00% 37 Missing ⚠️
...java/io/dockstore/webservice/jdbi/WorkflowDAO.java 0.00% 7 Missing ⚠️
...e/webservice/core/database/WorkflowAndVersion.java 0.00% 1 Missing ⚠️
...re/webservice/core/database/WorkflowIdToCount.java 0.00% 1 Missing ⚠️
.../dockstore/webservice/jdbi/WorkflowVersionDAO.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop    #6086      +/-   ##
=============================================
- Coverage      74.45%   74.27%   -0.18%     
  Complexity      5639     5639              
=============================================
  Files            386      388       +2     
  Lines          20231    20278      +47     
  Branches        2088     2093       +5     
=============================================
  Hits           15062    15062              
- Misses          4170     4217      +47     
  Partials         999      999              
Flag Coverage Δ
bitbuckettests 26.06% <0.00%> (-0.08%) ⬇️
hoverflytests 27.66% <0.00%> (-0.07%) ⬇️
integrationtests 55.86% <0.00%> (-0.13%) ⬇️
languageparsingtests 10.82% <0.00%> (-0.03%) ⬇️
localstacktests 21.33% <0.00%> (-0.05%) ⬇️
toolintegrationtests 29.93% <0.00%> (-0.07%) ⬇️
unit-tests_and_non-confidential-tests 26.31% <0.00%> (-0.07%) ⬇️
workflowintegrationtests 37.57% <0.00%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Set<Long> eligibleWorkflowIds = workflowDAO.getWorkflowIdsEligibleForRetroactiveDoi();
Set<Long> gitHubDoiWorkflowIds = workflowDAO.getWorkflowIdsWithGitHubDoi();

// Determine the workflows "most eligible" for a DOI, which are the workflows that don't have a GitHub DOI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: should we rule out workflows with a user generated DOI as well as those that were issued more directly on GitHub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't feel super-strongly about this, but the rationale for generating retroactive DOIs for workflows with manual DOIs:

  1. there's no "automatic" DOI scheme already in place (unlike a GitHub DOI setup).
  2. the typical workflow with a manual DOI probably only has a handful (one, for most workflows), so we'd be "filling in the gaps" (if we decide to generate enough retroactive DOIs to get to the workflow).
  3. If a workflow does have lots of manual DOIs, it'll be deprioritized by the endpoint anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

< 8000 /form>

I feel somewhat strongly that if a user has went to the trouble of linking their accounts, persisting through to manual DOIs, maybe they do want to handle/control the process.
i.e. maybe that "one" DOI they generated is really the one they want people to focus on

In other words, I was a bit surprised because this PR rules out direct GitHub but not manual DOIs whereas I would have kinda ruled out manual DOIs but less sure about direct GitHub.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our slack convo https://ucsc-gi.slack.com/archives/C05EZH3RVNY/p1742244954907069, I changed the endpoint to exclude workflows that have at least one GitHub OR manual DOI.

Copy link
Contributor
@kathy-t kathy-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was here, approach looks good but will wait for the implementation of excluding workflows with manual DOIs

@svonworl svonworl requested review from denis-yuen and kathy-t March 17, 2025 23:02
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@svonworl svonworl merged commit 6265080 into develop Mar 18, 2025
21 of 24 checks passed
@svonworl svonworl deleted the feature/seab-6803/versions-that-need-dois-endpoint branch March 18, 2025 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0