8000 Implement multi-strategy resource map lookup · Issue #2678 · NCEAS/metacatui · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Implement multi-strategy resource map lookup #2678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
7 tasks
robyngit opened this issue May 7, 2025 · 0 comments
Open
7 tasks

Implement multi-strategy resource map lookup #2678

robyngit opened this issue May 7, 2025 · 0 comments
Assignees
Labels
ADC CI-09 Enhanced data submission tools & portals (ADC deliverable) arctic data center ESS-DIVE Issues associated with the ESS-DIVE project submission & error handling Problems surrounding failed metadata submissions in the editor
Milestone

Comments

@robyngit
Copy link
Member
robyngit commented May 7, 2025

Sometimes when a user updates a dataset, the resource map (RM) hasn't been indexed in solr yet. If they try to edit the dataset before indexing completes, MetacatUI can't find the RM and creates a new one - as it would for an EML-only dataset - breaking the version chain and making it seem to the user like their data is lost. This has caused confusion for users, extra work for the data team, and undermines trust in repository reliability.

MetacatUI should stop automatically creating new resource maps for existing EML documents and instead use a multi-step strategy to locate the correct RM using system metadata, localStorage, and other fallback methods. When the RM is not found, the user should be notified that the dataset is still indexing and they should try again later or contact support.

The proposed multi-strategy lookup process

  1. Search Solr for the RM given the EML PID.
  2. If not found, look in localStorage for a cached RM PID associated with recent submissions.
  3. If not found, fetch sysmeta for the EML and walk back through the obsoletes chain to find the most recent previous version with a known RM.
  4. Once a prior RM is found, walk forward using obsoletedBy (sysmeta) until we reach the RM that corresponds to the current EML PID.
  5. If still not found, guess the PID using the common MetacatUI naming convention: resource_map_${EML_PID}.
  6. Once a RM is found, verify guessed RM points to the correct EML PID.
  7. If no RM found, then notify the user that the datapackage is indexing - suggest trying again later or contacting support.
flowchart TD
    Start["Start with EML PID"] ==> SolrSearch(["Found RM in Solr?"])
    SolrSearch == Yes ==> UseRM["Use RM for editing"]
    SolrSearch == No ==> LocalStorageCheck(["Found verified* RM in localStorage?"])
    LocalStorageCheck == Yes ==> UseRM
    LocalStorageCheck == No ==> WalkBack@{ label: "Fetch sysmeta and walk back 'obsoletes' chain" }
    WalkBack --> FoundOldRM(["Found previous RM in Solr?"])
    FoundOldRM == Yes ==> WalkForward@{ label: "Walk forward with 'obsoletedBy' to current RM" }
    WalkForward ==> ForwardVerify(["Found verified* RM via sysmeta?"])
    ForwardVerify == Yes ==> UseRM
    ForwardVerify == No ==> GuessRM["Guess RM PID using naming convention"]
    FoundOldRM == No ==> GuessRM
    GuessRM ==> VerifyGuess(["Does guessed RM point to EML PID?"])
    VerifyGuess == Yes ==> UseRM
    VerifyGuess == No ==> Notify["Show message: Dataset is indexing. Try later or contact support"]
    n1["\* Verified means that the relationship with the EML PID is shown in the RM"]

    Start@{ shape: hex}
    UseRM@{ shape: hex}
    WalkBack@{ shape: rounded}
    WalkForward@{ shape: rect}
    Notify@{ shape: hex}
    n1@{ shape: text}
     Start:::Aqua
     Start:::Ash
     SolrSe
8000
arch:::Sky
     UseRM:::Aqua
     LocalStorageCheck:::Sky
     WalkBack:::Class_01
     WalkBack:::Peach
     FoundOldRM:::Sky
     WalkForward:::Class_01
     WalkForward:::Peach
     ForwardVerify:::Sky
     GuessRM:::Peach
     VerifyGuess:::Sky
     Notify:::Rose
    classDef Ash stroke-width:1px, stroke-dasharray:none, stroke:#999999, fill:#EEEEEE, color:#000000
    classDef Aqua stroke-width:1px, stroke-dasharray:none, stroke:#46EDC8, fill:#DEFFF8, color:#378E7A
    classDef Rose stroke-width:1px, stroke-dasharray:none, stroke:#FF5978, fill:#FFDFE5, color:#8E2236
    classDef Class_01 stroke:#AA00FF, fill:#E1BEE7, color:#AA00FF
    classDef Peach stroke-width:1px, stroke-dasharray:none, stroke:#FBB35A, fill:#FFEFDB, color:#8F632D
    classDef Sky stroke-width:1px, stroke-dasharray:none, stroke:#374D7C, fill:#E2EBFF, color:#374D7C
    linkStyle 0 stroke:#2962FF,fill:none
    linkStyle 1 stroke:#00C853,fill:none
    linkStyle 2 stroke:#D50000,fill:none
    linkStyle 3 stroke:#00C853,fill:none
    linkStyle 4 stroke:#D50000,fill:none
    linkStyle 6 stroke:#00C853,fill:none
    linkStyle 7 stroke:#2962FF,fill:none
    linkStyle 8 stroke:#00C853,fill:none
    linkStyle 9 stroke:#D50000,fill:none
    linkStyle 10 stroke:#D50000,fill:none
    linkStyle 11 stroke:#2962FF,fill:none
    linkStyle 12 stroke:#00C853,fill:none
    linkStyle 13 stroke:#D50000,fill:none
Loading

Example scenarios that use the multi-strategy lookup process

Scenarios use A, B, and C to represent EML within a version chain, and A', B', and C' to represent the corresponding resource map. C is most recent, while A is the oldest in the chain.

stateDiagram-v2
    direction LR
    [*] --> EML
    state EML {
        direction LR
        [*] --> A
        A --> B: obsoletedBy
        B --> C: obsoletedBy
    }
    [*] --> RM
    state RM {
        direction LR
        [*] --> A'
        A' --> B': obsoletedBy
        B' --> C': obsoletedBy
    }
Loading

Scenario 1: Most recent version not indexed

Scenario:

  • User submits an edit to a dataset, navigates away, and then quickly returns to edit the dataset again.
  • The last resource map (RM) is not indexed yet.
  • EML: A → B, RM: A' → B'

Editor behavior:

  • Loads B
  • Queries solr for B' given B, but finds nothing because B' is not indexed yet.
  • Checks localStorage and finds B' saved from the previous edit.
  • Verifies that B' points to the correct EML PID B.
  • Uses B' as the correct RM.

Scenario 2: Most recent version not indexed & in a new browser

Scenario:

  • User edits a dataset that already has a version chain.
  • User switches browsers and tries to edit the dataset.
  • The entire dataset is not yet indexed in solr
  • There are three versions of the data package and the two most recent data packages are not yet indexed in solr (imagine: indexing backlog)
  • EML: A → B → C, RM: A' → B' → C'

Editor behaviour:

  • Loads C
  • Queries solr for C' given C, but finds nothing because C' and C are not indexed yet.
  • Checks localStorage, but finds nothing because the user is in a new browser and has no pids saved.
  • Fetches & parses sysmeta for C, looks at <obsoletes>B</obsoletes> and finds PID B.
  • Queries solr for B' given B and finds B' which is indexed.
  • Fetches & parses sysmeta for B', looks at <obsoletedBy>C'</obsoletedBy> and finds PID C'.
  • Verifies that C' points to the correct EML PID C.
  • Uses C' as the correct RM.

Scenario 3: Two most recent versions are unindexed & in a new browser

Scenario:

  • User edits a dataset that already has a version chain.
  • User switches browsers and tries to edit the dataset.
  • There are three versions of the data package and the two most recent data packages are not yet indexed in solr (imagine: indexing backlog)
  • EML: A → B → C, RM: A' → B' → C'

Editor behaviour:

  • Loads C
  • Queries solr for C' given C, but finds nothing because neither C nor C' are indexed yet.
  • Checks localStorage, but finds nothing because the user is in a new browser and has no pids saved.
  • Fetches & parses sysmeta for C, looks at <obsoletes>B</obsoletes> and finds PID B.
  • Queries solr for B' given B, but finds nothing because neither B nor B' are indexed yet.
  • Fetches & parses sysmeta for B, looks at the <obsoletes>A</obsoletes> and finds PID A.
  • Queries solr for A' given A, finds RM A' (indexed).
  • Fetches & parses sysmeta for A', looks at <obsoletedBy>B'</obsoletedBy> and finds PID B'.
  • Fetches & parses sysmeta for B', looks at <obsoletedBy>C'</obsoletedBy> and finds PID C'.
  • Verifies that C' points to the correct EML PID C.
  • Uses C' as the correct RM.

Scenario 4: Newly submitted unindexed dataset with unconventional RM PID & empty localStorage

Scenario:

  • User submits a new dataset
  • User switches browsers and tries to edit the dataset
  • The dataset is not yet indexed
  • The original resource map has an unconventional PID
  • The only documents that exist are EML A and RM A', the user is trying to create B and B'

Editor behaviour:

  • Loads A
  • Queries solr for A' given A, but finds nothing because neither A nor A' are indexed yet.
  • Checks localStorage, but finds nothing because the user is in a new browser.
  • Fetches & parses sysmeta for A, looks for <obsoletes> and finds nothing because A is the first version.
  • Guesses the RM PID is resource_map_A, tries to fetch it via Object API, but finds nothing because the RM has an unconventional PID.
  • Has no way of finding the RM, but knows the dataset is not brand new and therefore should not create a new RM.
  • Shows a message to the user that the dataset is still indexing and they should try again later or contact support.

Tasks

  • Ensure that MetacatUI uses sysmeta to confirm that EML/RM is most recent and does not rely on solr (never allow editing an obsoleted dataset, even if most recent version not indexed)
  • Remove automatic RM creation for existing datasets
  • Cache EML + RM PID pairs in localStorage after submission
  • Implement multi-strategy lookup logic (Solr → sysmeta walk → localStorage → guessed PID)
    • Use resource map retrieval logic in the Editor
  • Add fallback messaging if no RM is found, with a prompt to contact support
  • Add unit tests for each scenario to confirm the correct RM is found

Related issues

This issue fixes or is a duplicate of these issues

Generally related to resource map problems in editor

@robyngit robyngit added this to the 2.34.0 milestone May 7, 2025
@robyngit robyngit self-assigned this May 7, 2025
@robyngit robyngit added ESS-DIVE Issues associated with the ESS-DIVE project arctic data center ADC CI-09 Enhanced data submission tools & portals (ADC deliverable) submission & error handling Problems surrounding failed metadata submissions in the editor labels May 7, 2025
@robyngit robyngit moved this to Ready in MetacatUI May 7, 2025
@robyngit robyngit moved this from Ready to In Progress in MetacatUI May 20, 2025
robyngit added a commit that referenced this issue May 29, 2025
- Add a working ResourceMap resolver class
- Add a WIP simple sysMeta class

Issue #2678
robyngit added a commit that referenced this issue Jun 2, 2025
And allow passing a localForage instance to the resolver.

Issue #2678
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ADC CI-09 Enhanced data submission tools & portals (ADC deliverable) arctic data center ESS-DIVE Issues associated with the ESS-DIVE project submission & error handling Problems surrounding failed metadata submissions in the editor
Projects
Status: In Progress
Development

No branches or pull requests

1 participant
0