-
Notifications
You must be signed in to change notification settings - Fork 29
SEAB-6803: Implement "versions needing a retroactive DOI" endpoint #6086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEAB-6803: Implement "versions needing a retroactive DOI" endpoint #6086
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6086 +/- ##
=============================================
- Coverage 74.45% 74.27% -0.18%
Complexity 5639 5639
=============================================
Files 386 388 +2
Lines 20231 20278 +47
Branches 2088 2093 +5
=============================================
Hits 15062 15062
- Misses 4170 4217 +47
Partials 999 999
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Set<Long> eligibleWorkflowIds = workflowDAO.getWorkflowIdsEligibleForRetroactiveDoi(); | ||
Set<Long> gitHubDoiWorkflowIds = workflowDAO.getWorkflowIdsWithGitHubDoi(); | ||
|
||
// Determine the workflows "most eligible" for a DOI, which are the workflows that don't have a GitHub DOI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: should we rule out workflows with a user generated DOI as well as those that were issued more directly on GitHub?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't feel super-strongly about this, but the rationale for generating retroactive DOIs for workflows with manual DOIs:
- there's no "automatic" DOI scheme already in place (unlike a GitHub DOI setup).
- the typical workflow with a manual DOI probably only has a handful (one, for most workflows), so we'd be "filling in the gaps" (if we decide to generate enough retroactive DOIs to get to the workflow).
- If a workflow does have lots of manual DOIs, it'll be deprioritized by the endpoint anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
< 8000 /form>I feel somewhat strongly that if a user has went to the trouble of linking their accounts, persisting through to manual DOIs, maybe they do want to handle/control the process.
i.e. maybe that "one" DOI they generated is really the one they want people to focus on
In other words, I was a bit surprised because this PR rules out direct GitHub but not manual DOIs whereas I would have kinda ruled out manual DOIs but less sure about direct GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our slack convo https://ucsc-gi.slack.com/archives/C05EZH3RVNY/p1742244954907069, I changed the endpoint to exclude workflows that have at least one GitHub OR manual DOI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was here, approach looks good but will wait for the implementation of excluding workflows with manual DOIs
|
Description
This PR adds a new admin/curator-only endpoint to the webservice that returns a list of versions in need of a retroactive automatic DOI. An external script will query this endpoint periodically, and use another endpoint to create a DOI for each version, to gradually/incrementally create auto DOIs for versions that:
The endpoint returns a specified number (default 100) of versions that are "most eligible" for an automatic DOI. In the response, each version is paired with its parent workflow, so that the script doesn't have to look it up.
The endpoint determines the list of "most eligible" versions by:
The first three steps use the results of HQL queries that return workflow IDs, rather than iterating in Java code through the Hibernate entities corresponding to all workflows and versions, because the latter would take a very long time.
The implementation is very
Stream
ey, Charles would be proud.This endpoint is relatively hard to automatically test, given our design and architecture. Given that the functionality isn't critical, I suggest we user test via the generation script, after it's deployed to qa.
Review Instructions
Hit the endpoint, inspect the results, and confirm that they appear to be reasonable:
Issue
https://ucsc-cgl.atlassian.net/browse/SEAB-6803
Security and Privacy
We are creating a new endpoint that returns information from the database.
e.g. Does this change...
Please make sure that you've checked the following before submitting your pull request. Thanks!
mvn clean install
@RolesAllowed
annotation